Modal Parameters Prediction and Damage Detection of Space Grid Structure under Environmental Effects Using Stacked Ensemble Learning

,


Introduction
Space grid steel structures (e.g., lattice grids and lattice shells) are often used in large public buildings, such as airports, train stations, and stadiums.Tese buildings have extremely high signifcance, and severe damage to people and property can occur in the event of a collapse.Structural health monitoring (SHM) efectively ensures the safety of these structures by analyzing the dynamic and static structural responses of sensors.In recent years, with the development of artifcial intelligence technology, especially the advancement of machine learning methods [1], a variety of SHM systems have been successfully used to secure these types of on-site structures [2][3][4].
Vibration-based diagnostic analysis [5,6] is a valuable tool to efectively identify and locate structural damage.By conducting a modal analysis of the dynamic data, parameters such as the frequencies and mode shapes of the structure can be obtained.Te theoretical relationship between these modal parameters and structural properties (e.g., structural stifness and mass) is apparent; therefore, changes in the modal parameters indicate structural changes.Moreover, the analysis of numerous SHM datasets from various sites shows that modal parameters, such as frequencies, are related not only to the structural domain itself but also to outside factors [7].For example, variations in loads resulting from the operation of human beings, such as heap load and trafc load.In addition, more periodic and regular factors, such as temperature, humidity, wind speed, and other environmental changes will also afect the modal data changes [8,9].Te fuctuation of modal parameters with environmental changes and operational efects can signifcantly interfere with the damage diagnosis, resulting in a false negative or false positive diagnosis [10,11].In recent years, the issues such as correlation between modal parameters and environmental or operational factors, prediction modal features, quantifcation, and separation of variability of modal parameters as well as environmental or operational efects removal techniques have received considerable scholarly attention [7,12].Research studies of correlation analysis, for example, Petters and De Roeck [13] conducted a long-term systematic monitoring study of the concrete bridge Z24 in Switzerland and revealed a nonlinear correlation between the natural frequencies and the surface temperature.Further comparative analysis showed that temperature changes would most likely negate the efects of damage on the structure in most prediction models.Zhou et al. [14,15] studied the correlation of natural frequencies with environmental factors, such as temperature, humidity, and wind speed, for the Ting Kau Bridge in Hong Kong.Results showed that natural frequencies were most strongly correlated with temperature, the frst natural frequency changing by up to 6.7% within one year; however, the data tested were more discrete than those in this study due to the large volume of the structure.Te correlation of modal parameters with environmental factors is less studied for space grid steel structures than for bridge structures due to the higher degree of difculty in testing.Zhang et al. [16] performed two months of dynamic tests on the Chinese National Aquatics Center to obtain its modal parameters (e.g., natural frequencies and damping ratios), which were analyzed for environmental correlations.Te results indicated that second-and third-order natural frequencies increase with temperature; this difers from the results obtained in the Ting Kau Bridge study.However, this relationship between natural frequencies and temperature may not be universal due to the discrete nature of the data.
Furthermore, based on the correlation analysis, eforts have been made to predict modal parameters or eliminate environmental efects in order to correctly identify the damage of interest.In summary, these methods are divided into output-only methods and input-output methods according to whether environmental measurement data are used or not [7,12].Te output-only approach assumes that the environmental efects are embedded variables in the responses, and the assessment is performed by only analyzing the responses when environmental measurements (inputs) are not available.Among these, frstly, environmental efect normalization methods have been used, such as linear or nonlinear principal component analysis [17,18] and cointegration analysis [19], so as to separate environmental variables.In addition, some feature matching methods have also been used, such as outlier discrimination [20], supervised classifcation [21], and unsupervised clustering [22,23].Notably, these feature matching methods are generally used to identify structural damage directly, rather than separating environmental and operational efects from features.
On the other hand, when the environmental measurement data from the SHM system are obtained simultaneously, things become easier compared to the output-only approach because then the input-output regression model can be developed, which is the focus of this study.Moser and Moaveni [24] and Moaveni and Behmanesh [25] applied a series of multiple linear regression (MLR) models to the prediction of natural frequencies under temperature variation and established confdence intervals for the diagnosis of future damage.Te results indicated that the quadratic polynomial exhibits a better ft than the regressive model with an exogenous input (ARX).Ni et al. [26] developed a support vector regression (SVR) model to predict natural frequencies for long-term data and subsequently combined SVR with the extraction of temperature principal components.Tis led the authors to propose the artifcial neural network (ANN) method for frequency prediction [15].Petters and De Roeck [13] predicted the natural frequencies using an ARX model regression with temperature as an exogenous input and evaluated the long-term performance of the Z24 bridge, showing that ARX predicts better than the simple linear regression.Based on fnite element simulation data, Jang and Smyth [27] used four models to predict natural frequencies, MLR, random forest, ANN, and SVR, respectively.Te results show that ANN and SVR outperformed the other two models.Although diferent models have been developed for frequency prediction under temperature variation, several drawbacks, such as overftting and unstable performance, are usually identifed with each single model.Tis suggests that no model is universally applicable because of the algorithm's preferences and the variability in monitoring data.Compared to single-class models, combining multiple estimators has been shown to be efective in improving generalization errors for classifcation and regression tasks [28,29].Ensemble learning is a class of machine learning algorithms, which is a method of combining multiple models into a more accurate and general model.In recent years, ensemble learning has been applied to the feld of SHM to further improve the accuracy of anomalies and damage detection under environmental changes [10,30].To improve the damage localization accuracy under environmental changes, Fallahian et al. [21] used weight majority voting ensemble learning method to combine two classifers.However, this supervised learning approach requires prior knowledge of the damage class labels, which has limited use in practical damage detection scenarios.Sarmadi et al. [10] improved the accuracy of damage detection by combining multiple Mahalanobis distance metrics in a sequential manner and introducing nongenerative ensemble learning into an unsupervised learning model.

Structural Control and Health Monitoring
To summarize the above studies, it can be seen that although the previously mentioned single-classinput-output based models (e.g., MLR, ANN, SVR, etc.) have application limitations in terms of modal parameter prediction performance due to their generalization problems, each model has its unique advantages.Terefore, combining multiple models through ensemble learning is a good way to solve generalization problem.Te main contribution of this paper is to propose a modal parameter prediction method based on ensemble learning, which can combine multiple heterogeneous regression estimators with higher accuracy and better generalization ability, and damage detection based on improved prediction accuracy.Specifcally, the proposed stacked ensemble prediction method for modal parameters (natural frequencies) can aggregate the best model under the efects of the environment to predict modal parameters from fve standalone basic models including MLR, Gaussian process regression (GPR), SVR, regression tree (RT), and ANN.On this basis, the natural frequencies of the future (unknown) state are predicted, and statistical hypothesis testing is performed based on the predicted residuals to accurately detect the damage state of the structure.To verify the efectiveness and applicability of the proposed method, the dynamic responses of a grid structure were recorded for a period.Environmental data, such as temperature and humidity, were also collected simultaneously.Based on these monitoring data, stacked ensemble learning (SEL) was used to predict the natural frequencies, and then damage detection was performed by statistical analysis of the prediction residuals.Note that the data used for prediction model building and damage diagnosis were collected over a short period of time.Te purpose is to avoid the interference of human operation randomness and uncertainty in the damage detection method and also to reduce the burden of data storage.Te proposed method can efectively capture the changes in dynamic features of such structures, considering diferent environmental factors such as temperature and humidity.Tus, it is expected that the presented approach can further contribute to continuous assessment under environmental change, providing technical support to ensure the safe operation of such structures.

Methodology
As mentioned earlier, the adoption of stacked ensemble learning methods for modal parameter prediction can overcome the problems of overftting and poor generalization that exist in traditional standalone model-based prediction methods.Te proposed method can more accurately establish the input-output mapping between environmental factors and modal parameters, making more accurate predictions of modal parameters under environmental changes and enabling timely detection of early-stage damages to structures.As illustrated in Figure 1, when modal parameters and environmental data from the SHM system over a long period of time are acquired simultaneously, the proposed method is divided into two stages, i.e., modal parameter prediction and damage detection.In the modal parameter prediction stage, a stacked ensemble learning model for modal parameter prediction is developed from the data obtained from the baseline state, which is usually a known health state.Specifcally, the environmental factors and modal parameters of the baseline state were frstly used as both input and output to develop fve standalone models, including MLR, GPR, SVR, RT, and ANN, respectively.Additionally, before the training processes of the standalone models, the 5-foldcross-validation method is applied to the baseline state data, among which 4 folds are used to train the model, then the 5 th fold of data is applied for blind testing.Tis process is repeated 5 times using the same folds to achieve the best training results for the standalone models.Once the standalone models are trained, a stacked ensemble learning model can fnally be built, using the outputs of the fve standalone models as its inputs and the actual modal parameters as its outputs.Finally, the performances of various models are compared using a composite performance index (CPI).In the damage detection stage, the data of both the baseline state and the future state (unknown state) are frstly predicted based on the trained stacked ensemble learning model.Ten, the prediction residuals of modal parameters of each order are normalized, and their mean values are calculated.Finally, the statistical analysis method of hypothesis testing is used to determine the damage.Details about each stage of the methodology are in the following sections.

Principal Profle of the Standalone Methods.
In this section, a concise overview of fve standalone models that have been implemented in this study, namely, MLR, GPR, SVR, RT, and ANN.Each of these models here is used to solve a regression problem for modal parameter (i.e., natural frequencies) prediction.Tese regressors take environmental factors as inputs and the dynamic modal parameters of the structure as outputs.However, some of the methods (i.e., MLR, SVR, and ANN) have already been applied by researchers to study modal parameter prediction problems under the infuence of the environment [15,25,26].Limited by space constraints, only some details of their modeling are presented in combination with the basic formulas of the methods, while more discussion on the basic principles of these three methods and their applications to this problem can be found in the summary of references [7,12].
(1) Multiple Linear Regression.As the simplest machine learning prediction method, linear regression (LR) is widely used to establish correlations between modal parameters and environmental efects due to its ease of understanding and display of model formulations.A linear regression model is driven by independent predictor variables to predict the target response variable.Te vector form formula for the MLR model is where  Y MLR ∈ R n×1 is the estimation of the response variable using MLR, that is, the i th modal parameters in this paper, and Structural Control and Health Monitoring n is the number of samples; X ∈ R n×p is the predictor variable, i.e., environmental factors, and p is the number of independent variables; β ∈ R p×1 is the coefcient; β 0 is the constant term in the model; ε is the noise term, that is, random error; and I ∈ R n×1 is the unit vector.In this study, the inevitable outliers in the monitoring process have a signifcant impact on the regression model, so robust regression is used instead of the traditional least squares-based regression to improve the robustness, and bisquare weight function was used in the regression.In addition, the robust MLR uses M-estimation to formulate estimating equations and solves them using the method of iteratively reweighted least squares [31].
(2) Gaussian Process Regression.In linear regression models, input and output values are assumed to exhibit linear dependency, and in classical Bayesian regression models, a probabilistic approach is used to fnd the distribution of data in the vicinity of the expected value.GPR is a kind of nonparametric probabilistic model in which any given subset of the organized data invariably follows a multivariate Gaussian distribution.In vector form, the GPR model can be denoted as [32] P(Y|F , X)   Structural Control and Health Monitoring where ) is the predictor variable vector of the j th sample point, and F(X j ) ∼ GP[0, k(X j , X j ′ )], that is F(X j ) are from a 0 mean Gaussian process (GP) with covariance function (i.e., kernel function) k(X j , X j ′ ); where h(X T j ) is a set of basic functions that transform the original feature vector X T j ∈ R 1×p into a new feature vector h(X T j ) ∈ R 1×q ; and θ ∈ R q×1 is a vector of coefcients.It is easy to see that the accuracy of the GPR model prediction depends heavily on the selection of the kernel functions and basis functions.
(3) Support Vector Regression.Support vector machine is a machine learning method that is also known as support vector regression when applied to determine the nonlinear relationship between the inputs and outputs.Te geometrical principle of SVR can be conceptualized as ftting the input data into the higher dimensional feature space by diferent nonlinear kernel functions, in which the data are distributed in a sparser form than the original one.Ten, the largest intervals in the feature space are defned.Te decision function of SVR can be expressed as [26].
where  Y SVR is the estimation of the response variable using SVR, K(X, X i ) is the set of m nonlinear kernel functions, b is a bias term, and ] represents the weight vector consisting of m choice coefcients.Te process of deriving the optimal decision function F SVR (X, ]) and the associated parameters (i.e., b and ]) is a global minimum optimization problem with the constraints as minimize: subject to where c is a constant called the regularization term and represents the degree of penalty of the sample with error exceeding ε and ξ i and ξ * i are positive slack variables that represent the Euclidian distance of the predicted value from the corresponding boundary values of the ε-tube.Terefore, based on the formulation described here, the parameters that need to be optimized are ε and c.In addition, any parameter associated with the kernel function also needs to be optimized; in this paper, the radial basis kernel function (RBF) is chosen as where c is the kernel scale to be optimized.
(4) Regressions Tree.Tree algorithms are an important branch of machine learning, of which decision trees (DTs) are the most basic type.Depending on the type of data being processed, DTs can be divided into classifcation trees and regression trees, where the former can be used to process discrete data and the latter can be used to process continuous data.A DT model consists of the node and the directed edge.
Tere are two types of nodes within a tree structure: internal nodes and leaf nodes.An internal node represents a feature or attribute, while a leaf node represents a category or a value.When using DTs for classifcation and regression tasks, starting from the root node, a feature of the sample is tested, and the sample is assigned to its child nodes based on the test results; at this point, each child node corresponds to a value taken for that feature.Te samples are tested and assigned in this way recursively until they reach the leaf node.In comparison, the RT is a process of predicting the dependent variable for continuous or ordered discrete values.Te prediction error is usually measured by the squared diference between the observed and predicted values, and the predicted value Y is obtained by ftting a regression model to each node.Te specifc theory of the RT algorithm can be found in reference [33].
(5) Artifcial Neural Network.ANN is currently the most commonly used machine learning model.It is widely used in classifcation and regression problems for its powerful ability to solve nonlinear problems.For the regression task of modal parameter prediction under environmental variations, the robustness of ANN models has been confrmed [15].In simple terms, a neural network is composed of a number of neurons connected by neurons that compute the inner product of the input vector and the weight vector to obtain a scalar through a nonlinear transfer function where W is the weight, b is the bias, f is the activation function, where common activation functions such as ReLU, Tanh, and Sigmoid exist, and X and  Y ANN are the input and output of the neuron, respectively.In addition, the neural network structure improves the nonlinear computational capability of the model by adding hidden layers between the input and output layers.

Teoretical Background of Stacked Ensemble Model.
Ensemble learning is a machine learning method that combines multiple independent models based on certain strategy to obtain better generalization performance.Te output of ensemble learning-based models can be combined by many methods.According to diferent ways of integration, the current ensemble learning can be roughly divided into the following three categories: bagging, boosting, and stacking.Diferent from bagging and boosting ensemble, stacking is a heterogeneous model collection technology, which is an efective tool to realize the diversity Structural Control and Health Monitoring of basic learners in the set, to improve the accuracy of the combination model.Te stacking strategy adopts a two-layer framework, which mainly focuses on how to use metalearner to ensemble the results of all basic models [29].As shown in Figure 2, the frst layer is composed of several standalone regression learners, and then the training results obtained by each learner are calculated.Finally, the output of the meta-learner in the second layer model is the fnal output [34].For stacking, there are two choices for the selection of basic learners: one is to select the same type but diferent parameters, and the other is to select diferent types of basic learners, or heterogeneous ones.In this paper, the second selection method is adopted to combine several diferent regression learners based on the two-layer framework of stacking ensemble learning.Five standalone models, i.e., MLR, GPR, SVR, RT, and ANN, are used as the frst layer of basic learners.Since these fve common standalone learners have the characteristics of excellent performance and a large gap in their training mechanisms, a combination of the fve methods is proposed to further improve the regression performance of the estimators.Ten, their output is used as input to train the second layer of meta-learner.Diferent from the static ensemble methods such as the voting committee approach, stacking is a trainable combiner, i.e., meta-learners such as tree type or neural network type can be combined to dynamically optimize the optimal model performance.Teoretically, the meta-learner in the second layer can be practically any kind of regression estimator, such as SVR, ANN, and RT.In this paper, random forest is chosen as a meta-learner to ensemble each single heterogeneous model, given its excellent ensemble performance [35].Random forest is an extension version of bagging ensemble learning, whose elementary unit is a binary tree [36].It is worth noting that, though random forests are commonly used to solve classifcation tasks, they are used in this paper to solve regression problems.Space limitations and the details of the principles of random forests can be found in reference [37] to fnd more details.

Performance Comparison of Diferent Models.
In order to quantify the predictive performance of the diferent models, fve diferent statistical parameters are used, namely, mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), correlation coefcient (R), and Nash Sutclife efciency (NSE).Essentially, these parameters are all designed to estimate the cumulative error between the predicted and actual measured values, and their formulas are, respectively, where  Y and Y are the predicted and actual values of the modal parameters.In order to obtain a comprehensive measure of the predictive performance of diferent models for comparison, the above fve statistical parameters are unifed into a composite performance index [37] as where N = 5 is the total number of statistical parameters, P j is the value of the j th statistical parameter, and P worst , j and P best , j are the worst and best values of the j th statistical parameter among the fve values generated by the same number of models.Te CPI takes on values ranging from 0 to 1, where 0 (or the lowest value) represents the worst model and 1 (or the maximum value) represents the best model.In this paper, the diferent models are ranked according to their CPI valuesfrom worst to best based on predictive performance.

Damage Detection.
After an SEL model is constructed using the data from the baseline state, the trained model is used to predict the modal parameters for the future unknown state, and then the prediction residuals for the current baseline state and the future unknown state are calculated.Te residuals between the predicted output  Y of the model and the actual output Y are expressed as where e ∈ R n×1 is the predicted residual of the i th order modal parameter of the current baseline condition or the future unknown condition.

Structural Control and Health Monitoring
Teoretically, when damage occurs in the structure during a monitoring period, the constructed models will no longer result in similar prediction performance.In other words, the deviation between the predicted and measured values of the model increases, so the damage can be detected by using the prediction residuals of the SEL model.For on-site monitoring, due to various subjective and objective uncertainties, the modal identifcation results of some orders may be inaccurate or may even be false modalities.In addition, when damage occurs in diferent parts of the structure, the changes of the modal parameters of each order may be diferent.Terefore, damage detection using a particular mode of a certain order may lead to false negative or false positive damage detection results [13].In order to eliminate the infuence of uncertainty of a single order modal in damage detection, the sum of the predicted residuals of each order of natural frequency is further constructed as follows: To identify damage more accurately, hypothesis testing of prediction residuals was performed in this study to avoid false negative or false positive diagnostic results.Te predicted residuals for each state may have some skewness from the normal distribution.Before doing the hypothesis test, the Box-Cox method [38] is used to transform the residual series with a certain skewed distribution into a normal distribution, and the transformed residuals obey the normal distribution after the Kolmogorov-Smirnov test [39].Te mean and variance of the baseline state and future unknown state prediction residuals e 0 and e d can be determined as μ e 0 , σ 2 e 0 and μ e d , σ 2 e d , respectively.Te data collected during the baseline stage need to last for a period to provide a better removal of temperature efects, so it can be regarded as a large sample of data, and its mean and variance can be approximated as the mean and variance of the population in healthy condition.
According to a priori knowledge, the stifness of the structure decreases when damage occurs, and the natural frequency decreases, so the predicted value of the model in the damaged condition is always greater than the measured value in theory.Terefore, whether an unknown future condition is damaged or not can be determined by a onesided test of the mean, μ e d of its predicted residuals e d , specifcally, (1) H 0 : μ e d ≤ μ e 0 null hypothesis: the structure is healthy; (2) H 1 : μ e d > μ e 0 alternate hypothesis: the structure is damage.
Te above hypothesis testing problem can be accomplished by t-test, and the null hypothesis H 0 is rejected by α at the critical signifcance level and the alternative hypothesis where t α,(n− 1) is the α percentile of the student t distribution of n-1 degrees of freedom and n is the number of sampling points of e d .
Obviously, the choice of the threshold value for t α,(n− 1) directly determines the appearance of false negative ("missing" alarm) or false positive ("false" alarm) damage detection results and should be chosen with caution in practical use.A specifc discussion of this parameter can be found in ref. [40].In this paper, we choose α � 0.05.

Structure Information.
To study a series of problems of SHM for spatial grid structures in natural environment, an experimental grid model was constructed near Tianjin University and then sited in a naturally exposed environment [30].Te specifcations of all the members of the Structural Control and Health Monitoring structure (as shown in Table 1) were taken from the dimensions used in commonly real projects and were not scaled down, which was to make it closer to the actual structures.Specifcally, the size of the structure is shown in Figure 3, its overall dimensions are 5.4 m × 5.4 m, containing a total of 9 orthogonal square pyramid units, each unit measuring 1.8 × 1.8 m and 0.5 m high.All the components of the grid structure are made of Q235 steel, and the nodes are bolt-ball joints, and each end of the circular steel tubes are made up of bolts, casings and closing boards that can be fxed to the joints by rotating them with a spanner.Te bolted balls on two symmetry sides of the structure are welded onto the supports, which are restrained above eight H-supported steel columns measuring 250 × 125 × 5 mm.Te C30 concrete foundations have been installed below the columns to fx them to the ground, and the lower chord nodes of the grid are 1.0 m above the ground.

Monitoring Hardware and Continue Monitoring Process.
Te grid structure was equipped with a full set of dynamic and environmental monitoring systems after its construction in April, 2021, to explore the various responses of the structure under natural environmental efects.Teoretically, the natural environment mode identifcation technique can identify the mode parameters by the dynamic response of the structure under earth impulse excitation, but in order to make the signal-to-noise (SN) ratio of the dynamic response higher, a vibration exciter was installed right beneath the bottom bolt-sphere joint of the central pyramid unit using a metal rod to pass the excitation force to the structure.Te input signal to the vibration exciter was a series of Gaussian white noise generated by a signal generator, which was then amplifed by a power amplifer to guarantee that an accurate and comprehensive structural response could be captured through the dynamic tests.Before initiating the experiment, the optimal sensor placement plan was frstly discussed.An initial fnite element model (FEM) was constructed based on the design data of the grid structure, and the 1 st -20 th modal shapes (z-dimensional) of the FEM were obtained through dynamic simulations.Tese calculated modal shapes are later used to calculate the Fisher matrix, and the optimal sensor placement was then settled by the efective independence method based on the Fisher matrix [41].Hence, 9 accelerometers (z-dimensional) were installed on the structure as presented in Figure 4, connecting to the indoor dynamic acquisition instrument through water-proof wires.Additionally, 6 fber Bragg grating (FBG) temperature and strain sensors were bonded to the surface of 6 diferent steel rods, for the record of structural strain and temperature data.Last, an ensemble meteorological monitoring system was installed next to the structure with the function of acquiring environmental data on temperature, humidity, wind speed, and direction.
Te acceleration dataset utilized in this paper was obtained through dynamic tests of the grid structure.Precisely, three constant 2-min tests were carried out in each hour of a day at the sampling frequency of 500 Hz, and the data length for each single test was 60,416.It is worth noting that the monitoring scheme in this paper is short-term, with data collection periods of only a few days rather than annual and monthly time periods.Although, in some cases, for example, the Z24 bridge, a long-term monitoring scheme is used to assess the safety of the structure under environmental and operational efects [11].However, for vibration-based SHM methods, the use of short-term monitoring data is also feasible with some a priori knowledge of the structural health state [42].In addition, short-term monitoring requires less continuous storage of data, thus greatly reducing the pressure on data storage for continuous sampling of structural dynamics at high frequencies.Te complexity of damage identifcation methods due to partial data anomalies in long-term monitoring is also avoided.For the sake of storing the acquired data expediently and clearly, the data structure of the experiment was constructed according to the format of "Date-Time-Test No.-Sensor No." as shown in Figure 5.Last but not least, the environmental data and the temperature data of the structural surface during this time period are also recorded synchronously, as shown in Figure 6.Shown in Figure 7 is one set of time history of the acceleration and the corresponding power spectrum density (PSD) curve of the measured data according to the Welch method.Here, the raw data series with a total length of 60,416 is split into segments with a length of 8,192, and the overlapping rate between data segments is 50%.From Figure 7(b), it can be obtained that the frst three orders of Te natural frequencies of the structure, however, cannot be directly determined according to the PSD of the data acquired from a single sensor, for the energy proportions of the frst few frequencies are relatively lower, in addition to the variation between diferent sensors that would also afect the calculation.Terefore, the calculation of the modal parameters (frequency, damping ratio, mode shape, etc.) requires specifc processing and analysis.
After obtaining the original dynamics data (acceleration), the covariance-driven stochastic subspace identifcation method (SSI-COV) [43] is used for the identifcation of modal parameters, which has the advantage of being faster and requires less memory.During the process of the modal parameter identifcation process of SSI-COV, the only parameter to be determined is the system order.Te stabilization diagram is utilized to determine the system order, so as to efectively identify the true and false modes from various obtained modes.Finally, the modal parameters of the tested period are obtained.A set of frst 4 mode shapes of the grid structure, as determined by the stability diagram, is shown in Figure 8. Figure 9 shows the variation curves of the natural frequencies, f and the damping ratios, and ζ of frst 4 orders (versus time) during the measurement period (May  Structural Control and Health Monitoring 18 th ∼ June 1 st , health state).Visible periodic fuctuation patterns can be obtained, from Figure 9, for both modal parameters, followed by further calculation and analysis of their fuctuation amplitudes using the equation given below: where η imax , η imin , and η i represent the maximum, minimum, and average values of the i th order natural frequency f i or the i th order damping ratio ζ i .Te results shown in Figure 10 demonstrate that the change rates of the natural frequencies are within the range of 3.13%-8.72%during the monitoring period of 2 weeks.In stark contrast, the change rates of the damping ratios are proven to be exponentially larger and are within the range of 56.14%-181.93%.Because no sudden changes on the structural level in any form (damage, abnormal loads, etc.) had been assumed to occur to the target structure during the period of monitoring process, the observed periodic fuctuation patterns of modal parameters can be concluded to be related to the environmental efects.Te specifc analysis of the correlation   between modal parameters and environmental parameters will be given in Section 4.1.

Environmental Correlation Analysis of Modal Parameters.
In this section, the correlation between the modal parameters (natural frequencies, damping ratios, and mode shapes) and the environmental efects was studied.Based on the results of the modal shape at the initial time (reference time) of the i th order, φ Ri , the modal assurance criterion (MAC) [44] is calculated regarding the modal shape at other time nodes φ oi , thus analyzing its correlation with the environmental parameters.Te Spearman correlation coefcient is utilized to achieve the quantifcation of the correlation between the parameters in this paper, which is defned as the Pearson linear correlation coefcient of two variable orders and has been widely utilized to investigate the correlation of nonnormal distributed or small sample variables.Taking two parameters with n variables, respectively (X � ( , the ranks of both X and Y are assumed to be unequal, thus the above equation can be simplifed as where d is the diference between the ranks of two columns in the matrixes.Te obtained values of ρ vary from − 1 to +1, where ρ � − 1 indicates a complete negative correlation, ρ � 1 indicates a complete positive correlation, and ρ � 0 indicates no correlation between the columns. Taking the example of the 1 st order modal parameters as an instance, Figure 11 shows the correlation scatter diagrams and correlation coefcient heatmaps, specifcally, the 1 st order modal parameters versus environmental temperature (Envir.T), environmental relative humidity (Envir.RH), wind speed, wind direction, and the structural surface temperature (Struc.T).What is worth mentioning is that only the structural temperature data acquired from the No.1 temperature sensor is given, since a similar relationship between the environmental temperature and modal parameters can be obtained.It can be concluded from Figure 11 that there is a strong correlation between temperature, humidity, and modal parameters.Compared with temperature and humidity factors, it can be seen that the correlations between the natural frequency, damping ratio, and vibration mode shape (i.e., MAC) of the grid structure and the wind speed and direction are not significant (the correlation coefcients are less than ±0.5).Te main reasons for this may be that, on the one hand, the overall stifness of the structure is large and thus the infuence of wind speed is limited; on the other hand, the wind speed of the actual tested feld environment is small and random.
To conclude, there is an obvious negative correlation between natural frequencies and temperature factors (including environmental temperature and structural surface temperature).However, a positive correlation between damping ratio and temperature can also be observed, which is even more discrete than that with natural frequency.Teoretically, it is mainly caused by the uncertainty of the process of modal analysis of calculating damping ratios.Due to this fact, many researchers have stated that a strong correlation between damping ratio and temperature can hardly be found [12].Additionally, the correlation between natural frequencies and humidity is proven to be positive, while a negative correlation between damping ratio and humidity is detected.Nevertheless, because of the characteristics of steel structures, the studied spatial grid structure's stifness is hardly afected by humidity, and thus it is reasonable to consider the obtained correlation results to be mainly caused by the obvious and natural correlation between temperature and humidity.Last, the relationship between modal shapes and environmental factors is not clear enough compared with that between natural frequencies and damping ratio.Te MAC value of the 1 st order modal shape was greater than 0.98 during the whole monitoring period (except for the individual outliers), with slight fuctuation amplitude was observed.12 Structural Control and Health Monitoring It can be seen from the above section that modal parameters such as the natural frequencies will fuctuate signifcantly due to environmental changes, and such periodic fuctuations will directly afect the damage monitoring process, resulting in a false negative or false positive safety assessment.Terefore, predicting the fuctuations of these modal parameters with environmental factors and thus removing such environmental efects from future monitoring data can lead to more accurate damage assessment results.

Principal Components Extraction of Environmental
Factors.Although many environmental factors have been tested, aiming to make the best use of the environmental monitoring data to model and predict the modal parameters, principal component analysis (PCA) [15] has been utilized in this case to extract features from the raw data.Te motivation can be concluded as follows: (1) as shown in Figure 12, the correlation between all environmental factors and the natural frequencies can hardly be comprehensively illustrated, thus making it less rigorous to use the raw data to model because of the uncertainty; (2) it can be rather clearly observed that a strong correlation between various environmental parameters, for instance, an obvious negative correlation between environmental humidity and environmental temperature can be concluded (Figure 13); (3) the temperature of the structural surface is obviously afected by the environmental temperature, especially for the data sampled during the night hours with no solar radiation, the structural surface temperature was approaching the environmental temperature.Te above reasons have made it redundant and less signifcant to utilize all environmental factors to model the structure.Terefore, in this section, PCA has been used to extract features from 10 environmental factors obtained from the experiment.Tese 10 environmental factors are environmental temperature, environmental humidity, wind speed, wind direction, and six structural surface temperature from 6 measuring points on the structural surface, as shown in Figure 4, respectively.Te results present the proportion in total variance value of top three principal components (PCs) are, respectively, 89.96%, 9.87%, and 0.08%, making up more than 99% of the total variance value, as a result of which the author considers the frst three PCs (Figure 13) as the input for the predictor models.

Prediction Models Development.
In this section, six machine learning models, i.e., fve standalone models and one ensemble model, will be developed to estimate the modal parameters.Te natural frequency is mainly considered because its physical meaning is clearer and thus more widely used than the damping ratio.In addition, the natural frequency is more signifcantly afected by the environment compared to the mode shape.Te regression input parameters, i.e., predictors or independent variables, are chosen to be the principal components of the environmental factors, and the response variable is the i th order rate of change of natural frequency.Te modelling process is described in detail in the following section.
Specifcally, MLR uses linear terms as variables and robust regression to avoid the impact of noisy data on the regression results.Since GPR, SVR, RT, and ANN models all involve the selection of a large number of hyperparameters, and the size of the hyperparameters has an important impact on the accuracy of the model, this paper uses a Bayesian optimization method [45] for the hyperparameters of the four models.Te selection was performed with 50 iterations, and the optimized Structural Control and Health Monitoring parameter types and their ranges are shown in Table 2.If the predicted responses of the base models are simply obtained using the training data, the stacked ensemble may sufer from overftting, and to reduce overftting, a k fold cross-validation (in this paper, k � 5) of the predicted response is used.To ensure that each single model is trained using the same k-fold data split, a random partition on the dataset is created prior to training, and this partition is used to defne the training and validation sets, which are passed to each single model to ensure uniformity of cross-validation using the root mean square error (RMSE) from 5-foldcross-validation. Furthermore, by training the SEL model using the predicted responses from standalone model crossvalidation, this paper uses random forests as a metalearner for stacking integration.Te random forest (FR) is a meta estimator containing and ftting a certain number of decision trees (DTs) on various subsamples of the dataset and uses a bagging-based ensemble strategy to integrate the prediction results of the DTs, thus improving the estimator's accuracy while minimizing the possibility of overftting.Specifcally, each DT of a random forest regressor (RFR) is individually trained on samples randomly selected from the dataset known as the boostrap process.Ten, the fnal  Structural Control and Health Monitoring decisions are made via the voting method, which generally integrates and combines the learning advantages of each DT.
Researchers have indicated that the RFR specially excels at overftting control in several regression problems.To obtain the best results for integration learning, the hyperparameters of the random forest are optimized using the same Bayesian optimization method with 100 iterations, and the tree model is set to be reproducible while random seeds are set.

Evaluation and Comparison of Prediction Results.
Following the above process to establish the SEL model, the monitoring data of May 18 th ∼ June 1 st were divided into a baseline state set (data before May 28 th ) and a future state set (data from May 29 th to May 31 th ) in time sequence.In fact, since the future state set can be identifed as a healthy state based on a priori knowledge, it can be used to test the performance of the model.Te model trained using the baseline state set was used to predict the natural frequencies of the frst 8 orders, and the prediction results of the train (baseline state) and test (future state set) sets are shown in Figure 14 for the 1 st order natural frequency as an example where the thickened dashed line represents the line of ideality and the thin dashed line represents the ±20% bound.Simultaneously, to determine the predictive performance of these models, fve statistical parameters and the composite performance index, CPI of the test set were calculated according to equation ( 9), and the results are shown in Table 3.As can be seen from Figure 14 and Table 3, after optimizing the hyperparameters by Bayesian method, all the six models used in this paper can predict the natural frequencies with relatively reasonable accuracy, which can be seen by the higher R values and the lower RMSE of the training set, especially the R value of ensemble learning, which is as high as 0.80.In addition, for the test set, the prediction efect of ensemble learning is also better than that of the standalone model.On the one hand, the prediction values of the test set of the ensemble learning model are more concentrated within the ±20% boundary line than those of the standalone models; there is no overftting, and on the other hand, the CPI of the ensemble learning model is signifcantly higher than that of the standalone models.4. Te 72 bars were simulated using BEAM188 units.Te semirigidity of the bolt nodes was considered by adding short rods at both ends of a bar, with a length equal to the radius of the bolt nodes.In addition, regarding the connection uncertainty of the supports, an articulated connection was used for the z-directional translational restraint, and translational restraints were added to the x-and ydirection using COMBIN14 units.Te modal calculations were performed using the stochastic subspace method to obtain the frst 15 orders of modal frequencies and vibration patterns of the structure.

Damage Detection Application
Due to the errors between the initial FEM parameters and the physical parameters of the actual structure, the modal parameters of the structure calculated by the FEM are somewhat diferent from the measured modal parameters.Te initial FEM model is further updated in order to minimize the diferences between the two.Te model updating process is an optimization problem, which was carried out here using an improved cuckoo search algorithm.Te algorithm uses a dynamic adjustment hyperparameter strategy to solve the optimization problem, which can avoid the problems of local optimum and slow convergence.As a result, more efcient and accurate global optimization search results will be obtained.Specifc details of the method can be found in reference [46].Te objective function of the optimization problem is constructed by natural frequencies and MAC as follows:   where f C and φ C are the natural frequencies and mode shapes of the structure calculated by the fnite element model, f M and φ M are the natural frequencies and mode shapes of the actual structure measured.Te updated parameters p include the nodes' stifness (elastic modulus of the small short beams), the bars' elastic modulus, the density of the steel material, the x-and y-directional translational constraints, and other FEM parameters.A comparison of the natural frequencies of the updated structure with those before the update is shown in Figure 15.It can be seen that the relative error between the frst 8 th order frequencies obtained from structural fnite element calculations and the actual structural frequencies is within 6%; at the same time, the MAC value of the frst 4 orders of calculated modal shapes and the measured modal shapes are constructed with a value greater than 0.8.Te updated model has similar dynamic characteristics compared to the real structure, which indicates that the updated FEM is competent to carry out the following analysis.
Based on the modifed FEM, the efect of environmental parameters on the natural frequency of the grid structure is simulated.From the analysis in Section 4.1, the main environmental parameter afecting the steel structure is, unquestionably, temperature.By setting the coefcient of linear expansion of steel to 1.2 × 10-5/ °C to simulate the material's heat deformation, while referring to the related study of Xia et al. [47,48], the Young's modulus with temperature is set as where θ E � − 3.6 × 10 − 4 / °C.

Setting of Environment and Damage Conditions.
In this section, the detailed settings for environment setting and damage simulation will be demonstrated.First thing frst, the input environmental parameters for the FEM are based on the in-situ environmental monitoring tests of 30 days (from June 1 st to June 30 th ).During this period, 6 temperature sensors distributed on the surface of the structure had been recording the temperature data on 6 steel bars of the structure.For those elements without measuring points, the temperature data was generated via interpolation.
A gradual process of structural damage evolution to validate the proposed damage detection approach has been simulated.It is assumed that during the monitoring period of 30 days, there are 2 types of working conditions, namely, the baseline condition (BC, from 1 st June to 15 th June) and the unknown conditions (HC, DC1 and DC2, from 16 th June to 30 th June, each condition lasts for 5 days).Notably, only the data collected in the BC condition (from 1 st June to 15 th June) are a priori, i.e., the structural responses in a healthy state, and will be used to train the regression model.Terefore, the other conditions are considered as blind datasets for validation.Te target structure in BC and HC condition were completely intact without any damaged elements.Since 21 st June, however, it was assumed that the bolt ball node unit, which accelerometer No.6 showed in Figure 4 was located has appeared to be slightly damaged.For the sake of convenience, diferent damage degrees were simulated by means of modifying the elastic modulus of the damaged members.For the DC1 condition (from 21 st June to 25 th June), the node element was damaged by 10%; for the DC2 condition (from 26 th June to 30 th June), the node element was damaged by 30%, which was considered more severe damage than in the DC1 condition.
Ideally, the goal of this part of the research is to obtain a robust regressor to detect any abnormality within the data acquired from future monitoring processes.Consequently, it is vital to carry out validation tests to prove the robustness of the proposed model, concerning the possibility of the occurrence of false positive output.In real engineering monitoring projects, it is almost an absolute that only the data of the intact structure can be accessed, therefore, in this case study, the model was trained using the data of the BC condition and validated using the datasets of HC, DC1, and DC2, which contain both healthy and damaged structural conditions.Teoretically, damaged elements within the structure will interfere with the trained model's ability to predict stray results, which can be the basis of damage diagnosis.In this case, diferent damage conditions were considered to simulate various structural damage scenarios regarding both health and diferent damage degrees.Meanwhile, noise is added to the natural frequency simulation data according to the given equation in order to consider other uncertainty disturbances in the actual monitoring process.where Y o and Y represent the change value of i th order natural frequencies of the structure before and after the addition of noise, respectively, N(0, 1) is a random number obeying a standard normal distribution, RMS(•) is the root mean square, and β is the noise level factor, which is taken as 0.1%, 1%, and 10%, respectively, equivalent to 60 db, 40 db, and 20 db signal-to-noise ratio.

Residual-Based Damage Detection.
Like the process described in Section 4.3, following the fow in Figure 1, using BC data to develop SEL models.Te predictions are then made using the model for data with diferent unknown operating conditions.Taking the 1 st order natural frequency as an example, a comparison of the predicted and test values (10% noise) is shown in Figure 16.As can be seen from the fgure, most of the tested values fall within the 95% confdence interval of the prediction interval if the future state is healthy (HC).However, once the structure is damaged due to some possible occurrence (e.g., DC1), the predicted results of the model will deviate signifcantly from the measured values.Te comparison of DC1 and DC2 shows that this deviation will gradually increase with the occurrence of damage.Te above analysis is still a qualitative judgment of the possibility of structural damage, and for further detection, the statistical testing analysis.Furthermore, the damage was detected under diferent noise conditions according to the method introduced in Section 2.2.Te predicted residuals of diferent orders of natural frequencies were averaged using equation (11), which can reduce the negative efects of some order frequencies that are insensitive.Tis approach can also efectively eliminate the uncertainty of some orders of testing in the modal analysis.Ten the t-test was performed on the results of the unknown conditions based on the normalized residual mean data of the healthy condition.Te obtained results are shown in Table 5.It can be observed from the table that all damaged cases can be accurately found out, even under diferent noise levels.It is proven that the proposed method has excellent robustness against noise.Figure 17 shows a comparison of the histogram and probability density function (pdf ) plots of the normalized prediction residuals e for the baseline condition.When it comes to diferent unknown conditions, the highest noise level (10%) is added.As can be seen, the mean value of the normalized prediction residual e increases gradually with the      Structural Control and Health Monitoring damage degrees.In addition, taking a percentile of α � 0.05 was proved to be reasonable, as all states were correctly identifed.

On-Site Monitoring Dataset Validation.
To further validate the feasibility and generalization ability of the proposed methodology, the on-site monitoring was carried out from 19 th June to 3 rd July, 2022.Te monitoring period was split into three stages, corresponding to three structural health states, namely, BC/HC (19 th June∼25 th June), DC1 (25 th June∼29 th June), and DC2 (29 th June∼3 rd July).BC indicates the baseline condition of the current grid structure, which is considered intact; HC is also an intact condition, but unlike BC, its state is unknown in advance, i.e., it is the same as DC1 and DC2, and belongs to the blind test data; while in DC1 and DC2 conditions, one of the steel rod elements near accelerometer No.6 was manually damaged by half and all, respectively, as shown in Figure 18.By reducing the circular steel pipe section area, it can be assumed that the stifness of the structure is compromised, thus simulating the potential damage scenarios that can occur to real structures.Te experiment went on in a particular order, from BC to HC to DC1 to DC2.Te data was acquired and preliminarily processed, and the 1 st and 2 nd order results are shown in Figure 19 as an example.It can be seen from the fgure that during this period, the natural frequency obviously fuctuates signifcantly due to environmental factors, and under this fuctuation, it is difcult to detect damage from changes in the natural frequencies through direct observation.In addition, the measured natural frequency of the grid structure sometimes fuctuates severely due to unexpected rainy weather, even in healthy states, which will most likely result in a false negative or false positive test result.Further, the correlation between natural frequencies (take 1 st order as an example) and the frst 3 principal components (PCs) of multiple environmental factors is shown in Figure 20.Te expected result of this data analysis process is to delineate the damage using the correlation or some trend that exists between the frequency change and the environmental PC.However, it can be observed in Figure 20 that the data from diferent conditions overlap to a great extent.Terefore, a more powerful method needs to be trained to reveal the damages.
After the SEL model was trained, damage detection was performed using the same procedure as in Section 5.1.3.Te histograms and pdf of the normalized prediction residuals for diferent conditions are shown in Figure 21.It can be seen that when the damage occurs, the mean value of the predicted value of e decreases and the variance increases signifcantly, which is a good indication of the occurrence of the damage by hypothesis testing.

Conclusions
In this paper, a method of stacked ensemble learning for natural frequency prediction is proposed, and the future state of the structure is detected based on a statistical analysis of the prediction residuals.Te dynamic data of a space steel grid structure model in the feld environment are measured for a period.Te correlation of various modal parameters with environmental factors is subsequently investigated.Te infuence of the environment on modal parameters can then be isolated and removed from measurements, allowing for improved detection of early structural damage.Te conclusions of this study are summarized as follows: (1) Te daily natural frequencies and damping ratios of space grid structures fuctuate signifcantly due to environmental infuences, while the modal shapes are not signifcantly afected by environmental factors.Among the various environmental factors, temperature is the primary cause of natural frequency and damping ratio variations.Te damping  Structural Control and Health Monitoring ratio fuctuates more than the natural frequency, and the data dispersion increases with the modal order.In comparison, the natural frequency of each order is more regularly infuenced by the environment and can be used as a critical feature for detecting early damage in space grid structures.
(2) Te primary environmental monitoring data contained redundant information as well as noise and some other uncertainties; in order to provide reliable input data for the modal parameter prediction model, a principal component analysis was performed to efectively extract the principal components from them.Multiple heterogeneous models (including MLR, GPR, SVR, RT, and ANN) using Bayesian methods with optimized hyperparameters predict the natural frequencies with some accuracy.
Still, no single model is universally applicable to all orders of frequency due to algorithmic preferences and the variability of monitoring data.Using stacked ensemble learning, the prediction outputs of the fve heterogeneous models listed above were integrated to ultimately achieve superior prediction results.
(3) Te measured natural frequencies will deviate signifcantly from the predicted values when the structure is damaged in the future; the statistical analysis of prediction residuals can improve damage detection.Te accuracy of modal recognition results and the sensitivity of modal parameters to damage vary with the natural frequency order.Terefore, using the mean value of residuals for each order of prediction efectively avoids the prediction error incurred when using a single modal parameter.Te t-test of the normalized prediction residuals of the future states with various degrees of damage and noise shows that the stacked ensemble learning model presented in this paper achieves high damage detection accuracy.
Notwithstanding the successful application of the stacked ensemble learning method to modal parameter prediction and damage detection under environmental changes in this paper, there are still some limitations and remaining challenges of the proposed method for further indepth study.In the selection of a single type of basic model, a combination scheme of diferent types of heterogeneous models can be tried to obtain excellent prediction results.In the choice of stacked ensemble learning meta-learning, theoretically any nonlinear regression estimator is feasible, and more attempts can be made for diferent monitoring data to obtain the best fnal output.In addition, as discussed in the paper, the application of the present method in shortterm monitoring is very successful, and its efectiveness in long-term monitoring needs to be further demonstrated in the future.Finally, the proposed method can be further extended in future research by combining statistical methods in order to solve more complex damage detection problems, such as damage localization and damage severity quantifcation.

Figure 1 :
Figure 1: Modal parameter prediction and damage detection process.

Figure 2 :
Figure 2: Diagram of stacked ensemble learning method.

Figure 4 :
Figure 4: Photograph of the tested feld and hardware equipment: (a) site photo of the structure, (b) structural sensors placement, (c) environmental parameter monitoring sensors, and (d) structural monitoring hardware equipment.

Figure 5 :
Figure 5: Data storage structure of continuous monitoring process.

Figure 6 :
Figure 6: Environmental and structural surface temperature monitoring data.(a) Environmental temperature.(b) Environmental relative humidity.(c) Wind speed and direction.(d) Structural temperature.

Figure 7 :Figure 8 :Figure 9 :
Figure 7: A set of dynamic monitoring time domain and frequency domain data.(a) Time history of the acceleration.(b) Power spectral density.

Figure 10 :
Figure 10: Te rate of natural frequency changes and damping ratio changes.

Figure 11 :
Figure 11: Correlation of modal parameters with environmental factors (1 st order).

Figure 12 :Figure 13 :
Figure 12: Comparison of environmental parameters such as temperature and humidity.(a) Profles of environmental temperature and environmental humidity.(b) Profles of environmental temperature and structural surface temperature.

5. 1 .
Simulation Dataset Validation 5.1.1.Structural Simulation.Based on the original design information of the structure, ANSYS was used to construct the initial fnite element model (FEM) of the structure.Material properties of Q235 steel, such as Young's modulus, mass density, and Poisson's ratio, were used to initially defne the physical parameters of the structural components of FEM.Te details are shown in Table

Figure 15 :
Figure 15: Te comparison between the natural frequencies of measured and calculated.

Table 1 :
Specifcation and quantity of main components of the structure.

Table 2 :
Type and range of model parameters to be optimized.

Table 3 :
Prediction performance of diferent models (test set).

Table 4 :
Material parameter setting for numerical model.

Table 5 :
Damage detection results for diferent conditions.