Automated Modal Identification Based on Improved Clustering Method

The automated modal identiﬁcation has been playing an important role in online structural damage detection and condition assessment. This paper proposes an improved hierarchical clustering method to identify the precise modal parameters by automatically interpreting the stabilization diagram. Two major improvements are provided in the whole clustering process. The modal uncertainty is ﬁrst introduced in the ﬁrst stage to eliminate as many as possible mathematical modal data to produce more precise clustering threshold, which helps to produce more precise clustering results. The boxplot is introduced in the last stage to assess the precision of the clustering results from a statistical perspective. Based on an iterative analysis of boxplot, the outliers of the clustering results are found out and eliminated and the precise modal results are ﬁnally produced. The Z24 benchmark experiment data are utilized to validate the feasibility of the proposed method, and comparison between the previous method and the improved method is also provided. From the result, it can be concluded that the modal uncertainty is more eﬀective than the other modal criteria in distinguishing the mathematical modal data. The modal results by clustering process are not precise in statistic and the boxplot can ﬁnd out the outliers of the clustering results and produce more precise modal results. The improved automated modal identiﬁcation method can automatically extract the physical modal data and produce more precise modal parameters.


Introduction
During the last couple of decades, structure health monitoring has been developing rapidly in the area of civil engineering [1][2][3]. As being the basic parameters of a structure, the modal parameters can reflect the damage condition and service ability of a structure [4][5][6][7][8][9][10]. Long-term modal parameters can also reveal the evolution of structural service condition and are always obtained by analyzing the continuous monitoring data. However, continuous analysis requires massive labor work.
us, to obtain massive amount of modal data, the automated modal identification technique gradually attracts people's interests [11][12][13][14][15]. Many methods, no matter in time domain such as the stochastic subspace identification (SSI) [11,12,16] technique or in frequency domain [13], are proposed to automatically identify the modal parameters. Of all the proposed methods, the automated modal identification based on automated stabilization diagram interpretation plays an important role because the stabilization diagram provides more specific instruction on the true physical modes. e main work in automated stabilization diagram interpretation is to automatically eliminate the spurious modal data, which originates from the overestimation of system orders, and pick out the truly physical modal results. e automated stabilization diagram interpretation process is accomplished by clustering method [11,14,[16][17][18][19][20][21], especially hierarchical clustering method [11,16,19,20]. For the clustering process, two critical problems, namely, the proper clustering threshold and the precision assessment of the clustering results, need to be carefully dealt with. Two kinds of thresholds, i.e., the static one and the automatically calculated one, are always used in the clustering process. e static threshold is determined normally based on the engineering experience, but, from the fully automated perspective, these static indexes are not universally suitable.
Reynders et al. [16] proposed a method to automatically calculate the clustering threshold to make the modal identification process fully automated, and this method was later validated in many cases [19,20]. However, the precision of the automated calculated threshold cannot be guaranteed. As a development, Neu et al. [19] later improved the whole clustering process and claimed that the parameters used in discriminating the mathematical modal data should be carefully selected and pointed out that the precision of the automatically calculated clustering threshold in [16] is challenging. Sun et al. [20] also proposed an approach to determine the clustering threshold and applied it to a cablestayed bridge to automatically identify the modal parameters, but it lacks some physical explanation. Another limitation of all the automated modal identification methods is that the precision of the clustering results cannot be assessed. Even though an outlier detection method was proposed in [19] to detect the outliers of clustering results, it was conducted based on a prior distribution assumption, which does not meet the reality of automated modal identification.
In this paper, an improved hierarchical clustering process is proposed to automatically identify the modal parameters and estimate the precision of the clustering results. e modal uncertainty is firstly introduced to better eliminate the mathematical modal data, which improves the precision of the automatically calculated clustering threshold. e boxplot is introduced to detect the outliers of the clustering results and provide precise modal parameters.
is paper is organized as follows: first, the background of covariance-driven SSI (SSI-Cov) method and stabilization diagram is provided. en, the proposed improved clustering process is explained and the key problems of the clustering process are discussed. Finally, the Z24 benchmark data are utilized to prove the applicability of the proposed method, and the comparison between the results of the improved method and the previous method is also provided in this section.

Background of Basic Theory in Automated
Modal Identification

Review of Stochastic Subspace Identification and Stabilization Diagram.
e SSI methods are based on the classical state-space form of the discrete-time equation of motion of a linear, time invariant N-DOF system under white noise excitation: x k+1 � Ax k + w k , where k is the sampling instant; A is the n × n state matrix; C is the l × n output matrix, selecting the l measured signals from the corresponding internal states collected in the discrete-time state vector x k ; y k is the measurements vector; and w k and v k represent the effect of unknown inputs, modelling inaccuracies and measurement noise. ese last vectors are assumed to be zero-mean realizations of stationary stochastic processes and independent of the actual state. Based on the basic assumption, the modal parameters can be extracted by relying only on the output response.
Two types of SSI method, namely, the data-driven SSI (SSI-Data) and covariance-driven SSI (SSI-Cov) [22], are always used in identifying the modal parameters. In this study, the SSI-Cov algorithm is utilized to extract modal parameters from output data of structures. e SSI-Cov algorithm consists of the following steps [20]: (1) computation of output covariance; (2) construction of the block Toeplitz matrix; (3) decomposition of the Toeplitz matrix; (4) estimation of the controllability and observability matrices; and (5) extraction of the modal parameters. e ambient vibration measurements matrix is defined as where L is the total number of sensors and Q is the number of time steps in each set of sensor measurement. en the Hankel matrix is established as e output correlations R i are then calculated according to e calculated output correlations at different time lags are then combined to form a block Toeplitz matrix T (1 | i) as en the Toeplitz matrix is decomposed via singular value decomposition as where U and V denote orthonormal matrices and denotes diagonal matrix which contains the positive singular values in descending order. e number of nonzero singular value of T (1|i) indicates the rank of Toeplitz matrix. Based on the decomposition result, the observability matrix O i and the controllability matrix Γ i are formed as follows: where U 1 , 1 and V 1 are the nonzero value of the corresponding vectors. e system matrix A and C can be obtained by where T (2|i) is composed of covariances from lag 2 to 2i as In the end, the modal parameters of the system can be extracted from the identified system matrix A and C as where Δt denotes the time step; Z r denotes the component of matrix Z; λ R r and λ I r denote the real and imaginary components of λ r , respectively; and ξ r and Φ denote the damping ratio and modal shape for the rth mode, respectively.
In order to reduce the interference of noise signal to modal results, the reference-based SSI-Cov method (SSI-Cov/ref ) was later proposed [22]. e main difference between SSI-Cov/ref and SSI-Cov is the Hankel matrix. For SSI-Cov/ref method, the Hankel data matrix is modified as where Y ref i denotes the ambient vibration measurement matrix of the specifically selected channels, which are called reference channels. Monitoring data from reference channels are usually with high signal noise ratio, which reduces the interference of noise signal to the modal results.

Stabilization Diagram.
For the SSI method, an input, i.e., the system order, has to be set before conducting the identification process. However, for a real structure, the system order is not known or cannot be estimated precisely beforehand. When the system order is set small the true modes may fail to be identified and when the system order is set big the spurious modal parameters will be calculated. e traditional method determining what value the system order should be set like looking for the gap in singular value graph fails because no obvious gap can be found for real structures especially for those under operational environment. e stabilization diagram is a graph containing many identified modal frequencies at different system orders. e calculated frequencies are plotted in a graph with frequency as abscissa and system order as ordinate. In a stabilization diagram, the data spots representing the physical modal frequencies at different system orders look like several vertical lines because physical modes stabilize for system orders, while those spurious modal frequencies look scattered.
e stabilization diagram provides a specific instruction that the modal data forming those vertical lines are true modal results and should be picked out. us, the modal identification process is transformed into the process of extracting the vertical lines in a stabilization diagram.
Mathematical Problems in Engineering 3

Overall Process of Automated Modal
Identification. e aim of the stabilization diagram based automated modal identification is to pick out the stabilization axes automatically and precisely. Considering the limitations of the existing methods, an improved automated modal identification process is proposed. Four major steps are included in the proposed improved process, namely, the automated elimination of mathematical data, the clustering based automated stabilization diagram interpretation, the automated selection of physical modal clusters, and the boxplot-based outlier detection of clustering results. e modal uncertainty is introduced in the first step to better distinguish the mathematical modal data and produce clearer stabilization diagram for later clustering. In the fourth step, the boxplot is introduced to estimate the precision of the clustering results, and the outliers are eliminated and precise modal results are provided. e whole process can be explained in detail with flowchart in Figure 1.

Modal Validation Criteria.
To get precise clustering threshold, the identified mathematical modal results must be eliminated as many as possible in the first stage. Reynders et al. [16] proposed a k-means clustering method, with k equaling 2, to separate the calculated modal results into certainly mathematical and probably physical by utilizing a vector consisting of some modal validation criteria. e authors summarized all the validation criteria and classified them into two categories, namely the hard validation criteria and the soft validation criteria. e hard validation criteria are (1) the identified damping ratios must be within 0∼0.1 and (2) the identified mode has a complex conjugate pair. e soft validation are some other criteria that cannot be utilized with static values or specific physical principles. e soft validation criteria are summarized in Table 1, and detailed information can be obtained in [16] and will not be repeated here.
d(λ i , λ j ) is a distance between the continuous-time eigenvalues λ i and λ j of modes i and j. e eigenvalue is a combination of eigenfrequency and damping ratio.
are dimensionless distance measures of modal frequency, damping ratio, and modal transform norm between modes i and j. All these relative difference criteria can be calculated as where α i denotes λ i , f ui , ζ i , and MTN s ∞i , respectively. MAC(ϕ i , ϕ j ) and MPC(Φ i ) are two criteria related to modal shape. MAC(ϕ i , ϕ j ) is the correlation coefficient between modal shapes i and j, and MPC(Φ i ) measures the complexity of modal shape i. MAC(ϕ i , ϕ j ) is calculated as where Φ i and Φ j denote the modal shapes of two different modes. All these criteria can be utilized to estimate the similarity between two modes. d( ∞i , MTN s ∞j ) are relative criteria, and these criteria values equal 0 for ideal physical modes and 1 for ideal mathematical ones. MAC(ϕ i , ϕ j ) and MPC(Φ i ) are the criteria estimating similarities of modal shapes, and the values go to 1 when the associated two modal shapes are ideally physical.

k-Means Clustering.
e k-means clustering method is utilized in the first stage to separate the calculated modal data into two parts, i.e., the certainly mathematical modal data and the probably physical ones. e k-means algorithm is a typical partition-based clustering algorithm, and the main idea is to separate the sample data into several groups by minimizing some index iteratively. e clustering procedure can be explained as follows: is a given data set of n samples. e k-means algorithm partitions X into C clusters by minimizing an objective function: where K denotes number of modes in each cluster; C X denotes the sample vector at sample data X; and C R and C S denote the centroids of the physical and spurious mode clusters, respectively. is process works iteratively by continuously changing the centroids of each cluster until the objective function in equation (14) is minimized. e modal results are finally categorized into two clusters, and the cluster with a centroid C S is discarded.

Modal Uncertainty.
Even though the soft validation criteria and k-means clustering process help to clear the stabilization diagram, however, our investigation shows, as is provided later, that the soft validation criteria cannot always work well in distinguishing the mathematical modal data. Further, if the mathematical modal data cannot be eliminated completely, the clustering threshold calculated later would not be precise, which may result in imprecise clustering results. In this study, a more effective validation criterion, i.e., the modal uncertainty, is introduced in the first stage to automatically distinguish the mathematical modal data. e modal uncertainty originates from the non-white noise input signal of the SSI method. e authors in [23][24][25] deduced the calculation process, and then, it is promoted by an efficient calculation method [26]. e modal uncertainty is a good indicator which helps to distinguish mathematical and physical modes. However, the modal uncertainty cannot improve the precision of the identified modal parameter, which means the spots in stabilization diagram containing or without containing modal uncertainty are the same. Döhler and Mevel [26] validated the fact that modal uncertainties of mathematical modes are much bigger than the physical ones and suggested to use 1.5%, which is almost the same value as the static threshold of frequency in traditional clustering process, of the frequency value as the threshold distinguishing mathematical modal results. e general procedure of the calculation of modal uncertainty can be summarized in Figure 2.

Hierarchical Clustering-Based Automated Stabilization Diagram Interpretation.
e main purpose of automated stabilization diagram interpretation is to automatically extract the true modal data, and this is always accomplished by using clustering method. e automated modal identification process aims to find out the stabilization axes formed by modal data similar in frequency, modal shape, and damping ratio. e clustering method groups the modal data based on the similarities between different modes and the modes with similarities smaller than a threshold are grouped together. e hierarchical clustering method [16,19,20] is utilized in the second step to automatically interpret the stabilization diagram. e hierarchical clustering method was first introduced in automated modal identification in [27], using the eigenfrequency difference and the MAC value as distance measures to estimate the similarities between different modes.
en, the authors in [16] proposed a method to automatically calculate the threshold in clustering process to automate the whole process, and it was later used in [19,20].
However, examples indicate that the precision of the threshold is challenging [19]. e identified modal parameters include three main modal indexes, namely, the modal frequency, modal shape, and damping ratio. e clustering process groups the similar modes together based on the similarities between different modes. e similarities can be estimated through d f , 1 − MAC, and d ξ , which reveal the similarities between modes in frequency, modal shape, and damping ratio, respectively. For modes similar to each other, all three criteria are small, while for those dissimilar ones, these criteria are rather big. e d f and MAC are calculated as where f i denotes the frequency; Φ i denotes the modal shape; and T denotes the matrix transposition. However, due to the high discreteness of the identified damping ratios, no matter for operational modal analysis or Step 1: Mathematical results elimination Step 2: Clustering-based automatic stabilization diagram interpretation Step 3: Automatic physical modal results selection Step Distance between the eigenvalues of modes i and j Relative difference in modal transfer norm between modes Modal phase collinearity 1 0 Mathematical Problems in Engineering experimental modal analysis, the d ξ values are much more discrete than the ones of d f and 1 − MAC, which may severely influence the similarity estimation between different modal results. us, the d f and 1 − MAC are commonly utilized in clustering process to estimate the similarities of different modal results [16,19,20]. Another problem is the non-uniform values of the similarities. e similarities between physical modes are not the same as each other and vary in a limited range. e clustering process groups the similar modes together by comparing the similarities with a threshold, which is also called cutoff distance, and the modes with similarities smaller than the threshold are grouped together. In this study, the only input, i.e., the cutoff distance, is automatically calculated based on the d f and MAC values of all the remaining probably physical modal results and is recommended to be calculated as where μ p and σ p denote the mean value and standard deviation of d f + 1 − MAC of all the remaining probably physical modal data in the cleared stabilization diagram, respectively. e similarities between modes vary in a limited range. However, there is no specific method directing how the threshold should be selected. Considering the discreteness of the similarities, the threshold is automatically calculated from a statistical perspective. e mean value and standard deviation of d can reflect the characteristics of the similarities and is utilized in many studies [16,20]. Hierarchical clustering is one of the clustering methods which clusters the data by creating a hierarchical nested clustering tree. e cutoff distance determines whether the small data sets should be clustered into a new big one, and it directly determines the final clustering results. e automatically calculated threshold is influenced by the elimination result of mathematical modal data. If the spurious modal data cannot be removed completely or the physical modal data are eliminated mistakenly, which can be called under-curing or over-curing, the remaining probably physical data leads to an inaccurate clustering threshold. So, if the mathematical modal data cannot be removed as many as possible, the threshold would not be precise, leading to inaccurate clustering results.

Selection of Physical Modal Results.
After the clustering process, many clusters containing the similar modal data are formed. Due to the incomplete elimination of mathematical modal data, some clusters consisting of mathematical modal data are formed simultaneously. e main difference between the physical and mathematical clusters is the data number. e physical clusters contain more data while the mathematical clusters contain fewer. To make the modal identification process fully automated, the k-means clustering method [16], with k equaling 2, is utilized here again to automatically separate the clusters into two groups based on the number of data in each cluster. e group containing clusters with more data is considered physical and provided as final clustering results.

Boxplot-Based Outlier Detection.
e precision of the clustering results mainly depends on the clustering threshold. e threshold determines how fat the extracted stabilization axes will be. In other words, the threshold determines how discrete the data forming a stabilization axis will be. e precision of the extracted modal parameters, especially for damping ratios, is important for model updating, condition assessment, and so on. However, until now, no specific index was provided to estimate the precision of the clustering threshold, which leads to the situation that the precision of the clustering results is questionable.
Neu et al. [19] proposed a method to detect the outliers of clustering results with an assumption that the modal data in the clusters obey the t-distribution. However, no mature evidence has ever been provided to indicate what distribution the modal data might obey. e distribution characteristic is not known before the true modal results are produced. erefore, a method which can assess the precision of the clustering results without using the distribution information must be introduced.
To meet the above requirement, the boxplot is introduced in this paper to detect the outliers of the clustering  results and provide precise modal parameters simultaneously. e boxplot has been widely used for outlier detection in statistic industry because the boxplot identifies the outliers based on the own characteristic of the data and needs no prior distribution information. e boxplot was first proposed by John Tuckey in 1977. It is a kind of statistical chart used to analyze the distribution characteristic of the data. Boxplot-based outlier detection is often used in the process of detecting outliers in intelligent algorithms. It calculates the maximum, minimum, median, and upper and lower quartiles of sample data and uses the upper and the lower bound as the indices to determine whether a data should be treated as an outlier. e bounds are calculated as where U and L denote the upper and lower bounds, respectively. Q1 and Q3 denote the first and third quantile values of the sample data. IQR denotes the interquartile range and is calculated as e calculation of quantile values of boxplot depends on no hypothesis of distribution characteristics. In other words, the boxplot needs no distribution assumption of the sample data and detects the outliers based on the own characteristics of the data to be analyzed. Even if there are outliers in the data sample, they have little influence on the quantile values, which ensures that the data after outlier detection process reveals the true characteristics of the data itself.
In this step, an iterative outlier detection process is established. e boxplot is applied to detect the outliers of the clustered frequency and modal damping. Once an outlier is found out in the cluster, it will be eliminated and the boxplot is used again to check whether there are still outliers in the remaining data. is process continues until no outlier is found. e boxplot-based outlier detection process is applied to every cluster from automated clustering process, and it consists of the following steps: (1) Start from the frequency values.
(2) Calculate the upper and the lower bound of the data sample.
(3) Look for the outliers of the data and eliminate them if the outliers are found. (4) Repeat steps (1) to (3) until no outlier is detected out. (5) Process the identified modal damping of the cluster with steps (2)∼(4), and this stops until no outlier is found. (6) Check again the processed data set. If no outliers are found out, the remaining data are provided as the final precise modal results; otherwise, repeat step (1) to step (5).
Boxplot-based outlier detection process identifies the outliers with no distribution assumption of the data, so it reveals the true characteristic of the identified results. e iterative outlier detection and elimination process work as data refining method and finally automatically produce precise modal results.

Structure Description.
e Z24 bridge was part of the road connection between the villages of Koppigen and Utzenstorf, Switzerland, overpassing the A1 highway between Bern and Zurich, and it was a post-tensioned concrete two-cell box-girder bridge with a main span of 30 m and two side spans of 14 m. A full forced and an ambient operational vibration test were performed before the bridge was demolished, and 291 degrees of freedom have been measured in total with three acceleration components on the pillars and mainly vertical and lateral accelerations on the bridge deck. e data were collected in 9 different setups with 5 channels, which are called reference channels, that were common to all setups. Acceleration data of all the nine experiment scenarios were presented as benchmark data for assessing the performance of modal identification and damage detection method. Detailed description of Z24 bridge and the experiment scenarios are provided in [28]. e collected acceleration data of 9 different scenarios are utilized here to validate the proposed method. e roar acceleration data will be firstly processed by SSI-Cov/ref method to calculate the modal candidates to establish the stabilization diagrams. en the proposed method is applied to the stabilization diagrams to automatically extract the physical modal parameters. e identified modal parameters were then used to assess the service condition of the bridge or update the FEM model, and this falls out of the scope of the paper and will not be discussed.

Modal Identification and Uncertainty Calculation.
e acceleration data of nine scenarios have been processed with SSI-Cov/ref algorithm. e five reference channels were chosen for the calculation of modal uncertainty, l � 50 was chosen as half the number of block rows in the data matrix, and the model order range was set 2 to 160 in steps of 2, which are the same as [16]. e original experiment data were downloaded from the website of Leuven University, making sure that all the factors influencing the identification results stay the same.
Nine stabilization diagrams of nine experimental scenarios are created, and due to the space of the paper, results of the first, fifth, and ninth setup are only provided in Figure 3. Figures 3(b), 3(d) and 3(f ) show the identified modal frequencies and the associated modal uncertainties, and the modal uncertainties are plotted as horizontal bars. Physical modes stabilize for different model orders and several vertical lines formed by physical modes can be seen, and these vertical lines are the potential stabilization axes needed to be automatically extracted out. Meanwhile, many spurious modal data also can be seen in the stabilization diagram, making the true stabilization axes unclear.

Mathematical Results Elimination.
e soft validation criteria, hard validation criteria, and modal uncertainty threshold were used in sequence to eliminate the mathematical results. e soft validation criteria vector consisted of the following indexes: (19) e initial values of the clustering center of physical and mathematical modal data were set as V P � [0, 0, 0, 1, 1, 0, 1, 0]; 1, 1, 0, 0, 1, 0, 1], (20) where V P and V M denote the values of initial cluster centers of definitely physical and certainly mathematical modal data, respectively. e 2-means clustering method was utilized to separate the cleared modal data into mathematical and probably physical, and the mathematical data were discarded. en the probably physical results that do not meet the hard validation criteria were eliminated directly. e modal uncertainty was applied to the remaining probably physical data in the last, and the modal data with uncertainties bigger than 1.5% of the associated frequency values were considered mathematical and eliminated. Figures 4(a), 4(c), and 4(e) show the cleared stabilization diagrams by the proposed improved method. Some obvious vertical lines formed by physical frequency spots shown up clearly, and most of the visually scattered modal data outside the stabilization axes, which are considered as mathematical modal results, are eliminated. e improved method eliminates almost all of the mathematical modal data and retains the probably physical ones in the stabilization axes.
To better illustrate the advantage of the improved method in mathematical data elimination, the cleared stabilization diagrams without using modal uncertainty are also provided in Figures 4(b), 4(d), and 4(f ). Compared with the clearing results of the improved process, many scattered spurious modal data are retained, and these modal data are mistakenly treated as probably physical results for later calculation of clustering threshold.
Compared with Figures 4(a), 4(c), and 4(e), Figures 4(b), 4(d), and 4(f ) contain more scattered frequency spots outside the vertical lines, and many single spots, which are definitely mathematical modal results, are retained in Figures 4(b), 4(d), and 4(f ). e main purpose of mathematical data elimination process is to eliminate as many as possible the scattered data spots considered as mathematical modal data and retain the data forming the vertical line, which are considered as physical modal data. e remaining modal data determine the precision of the stabilization threshold calculated later. e more mathematical modal data are retained, the bigger the stabilization threshold will be because the similarities between mathematical modes are much bigger than the physical ones, as is verified in Section 4.3, leading to imprecise clustering results. e elimination result indicates that the modal uncertainty is more effective than the soft validation criteria in distinguishing mathematical modal data.

Automated Calculation of Clustering
reshold. Based on the cleared stabilization diagrams, the clustering threshold was automatically calculated. e sample mean and standard deviation of d f + 1 − MAC of the remaining probably physical modal data, which are shown in Figure 4, are used to calculate the clustering thresholds. Figure 5 provides the automatically calculated thresholds based on the cleared stabilization diagrams processed with proposed method and previous method [16], respectively. e calculated thresholds based on cleared stabilization diagrams processed by previous method [16] are almost twice bigger in many scenarios than the ones calculated by the proposed method, and this is because more mathematical data are retained in the stabilization diagrams processed by previous method. Most of the calculated clustering thresholds by the proposed method are near 0.07, which is almost the same as the static threshold by experience, indicating the feasibility of the proposed method. e discreteness of similarities between mathematical modal data is much bigger than the physical ones, which results in bigger sample mean and sample standard deviation of d f and 1 − MAC. e maximum threshold in setup 3 by previous method exceeds 0.2, as is also mentioned in [19], which is much bigger for civil structures. With introducing in the modal uncertainty, more mathematical modal results can be eliminated and more precise clustering thresholds can be calculated.

Automated Clustering.
e automatically calculated thresholds were directly used as cutoff distances in hierarchical clustering process. Figures 6(a), 6(c), and 6(e) show the clustering results of scenarios 1, 5, and 9 by the proposed improved method. Different data clusters are plotted with different markers. e remaining physical stabilization axes are marked with solid vertical lines. e first five stabilization axes are found in all nine scenarios. e final produced clustering results by the previous method [16] are also provided as comparison in Figures 6(b), 6(d), and 6(f ), and more stabilization axes are identified. e stabilization axes, except the first two, are all visually fatter than the ones identified by the proposed improved method. Some data spots obviously outside the stabilization axes which are considered as mathematical results are grouped together, which is the limitation of the previous method [16]. Due to the limitation, the previous method might cause a problem that some produced clusters containing many spurious modal data would be considered as physical results because the truly physical clusters selected in step 3 are based on the data number of each cluster.
On the contrary, the identified stabilization axes by the proposed method are much thinner and clearer, and for each true modal cluster, no data points visually out of the stabilization axes are included, indicating the proposed method gets more precise clustering results. As is discussed before, the precision of the clustering results is influenced by the cutoff distance. e cutoff distance automatically calculated by the proposed method is more reasonable than the one by previous method [16], and this determines that the clustering results by the proposed method are more precise.
Different clustering results can be seen in the fifth and ninth scenarios. e identified first two physical clusters in all setups are almost the same and this is because the mathematical data near the physical modes are eliminated almost completely. e other physical clusters identified by the previous method are fatter than the ones by the proposed improved method, and this is because of the incomplete elimination of mathematical data and its associated bigger threshold. e remaining mathematical modal data near the physical data are grouped together because of the bigger  threshold, increasing the number of data of the associated clusters.
e misclustering of mathematical modal data enhances the risk that the clusters with less modal data, which are considered as truly mathematical results, might be treated as truly physical results because of its big amount of data. e identified stabilization axes near 20 Hz show big difference. e previous method produced 5 physical clusters in scenario 5 while the proposed improved method only provides 2. However, as is shown, the data spots in the extra 3 stabilization axes are visually scattered and show great discreteness, indicating that the reliability of the extra stabilization axes is questionable. Figure 7 shows the identified first five modal shapes by the proposed method. e identified nine partial modal shapes of different scenery are assembled using least square method. e assembled modal shapes are the same as [16], indicating the correctness of the identified results.

Boxplot-Based Outlier Detection.
e boxplot was then applied to each cluster. e frequency results were processed firstly and then the damping ratio. For each modal criterion, if a single value was identified as outlier, the data and the associated damping ratio are eliminated simultaneously, which ensures that the produced modal results are precise in both frequency and damping ratio. Figures 8 and 9 show the boxplots of the first five frequencies of scenarios 1 and 9 before conducting the outlier detection process, and many outliers were found out at each order because they do not meet the own characteristic of the data. e boxplots of the refined modal frequencies are provided in Figures 10 and 11, and no outliers can be found. e refined results are provided as the final clustering results. e boxplot can reveal the distribution characteristic of the sample data. For data obeying normal distribution, the median value of the sample data represented by the red middle line in the box is the same as the mean value, which is represented by blue line in the box. From the boxplots of frequencies and damping ratios, not all normal distributions were found and distributions of most of the frequencies and damping ratios are skewed, which means the former method [19] identifying the outliers based on t-distribution is not strictly theoretically reasonable. e outlier detection result of the method in [19] is also provided as a comparison. Figure 12(a) shows the relationship between damping ratio and frequency of the initial clustering result, and some obvious outliers can be found in each order. Figures 12(b) and 12(c) show the relationship between damping ratio and frequency after eliminating the outliers by the proposed method and previous method [19], and obvious outliers still can be found in Figure 12(c) in the fourth and fifth order damping ratio, indicating the previous method fails to find all the outliers. As is discussed before, [19] identifies the outliers based on the assumption that the sample data obey the normal distribution, which does not accord with the real distribution of the sample data. Another reason is that the mean value, which is used in the previous method [19] in outlier detection, is easy to be influenced by outliers. e mean value is normally bigger than the real one because of the corruption of outliers, which results in the situation that less outlier will be found out. e clustering results calculated by provided method, previous method, and manual analysis are provided in Table 2. e mean values of frequencies and damping ratios are similar to each other [16]. e standard deviation of the modal results calculated by the improved automated method is smaller, indicating the proposed method can successfully extract the modal parameters and produce more accurate results. Table 2 shows the mean value and standard deviation of Z24 bridge computed from the improved method (I), previous method (P), and manual analysis (M).

Sensitivity Analysis.
For boxplot analysis, the criteria Q1 − 1.5IQR and Q3 + 1.5IQR are proposed from statistical perspective. To investigate the influence of the criteria to the outlier detection result, two more criteria, namely, twice and one time the IQR, are utilized to detect the outliers of the clustering results. Figure 13 provides the relationship between damping ratio and frequency after outlier detection with different criteria. When the criteria are set bigger, more data are retained and more discrete modal data are obtained. When twice the IQR is set, more data are retained and some obvious outliers, which are shown in the fourth order damping ratio, are retained, indicating the criteria are not so reasonable, while when one time the IQR is set, most of the outliers are eliminated. eoretically, the smaller the criteria  are set, the more precise the remaining data will be. However, it is not suggested to set the criteria too small because it might increase the risk that the true modal data might be eliminated. e criteria Q1 − 1.5IQR and Q3 + 1.5IQR are still recommended to be utilized because the outlier detection result is acceptable because all the obvious outliers are eliminated, which on one hand guarantees the precision and on the other hand could avoid the   Figure 9: Boxplots of the first five frequencies before outlier detection for scenario 9.  Figure 10: Boxplots of the first five frequencies after outlier detection for scenario 1.  Figure 11: Boxplots of the first five frequencies after outlier detection for scenario 9.

Conclusions
is paper presents an improved clustering process to automatically identify the modal parameters by automatically interpreting the stabilization diagram. e modal uncertainty is introduced to eliminate the mathematical modal data because of its high effectiveness. e boxplot is introduced to detect the outliers of the clustering results to produce precise modal results. e Z24 benchmark data is analyzed to validate the feasibility of the proposed method. e following conclusions can be drawn: (1) e modal uncertainty shows high effectiveness in distinguishing mathematical modal data compared to the other soft validation criteria. With introducing the modal uncertainty in the first stage, the proposed method identifies and eliminates more mathematical modal data and produces clearer stabilization diagram. (2) e automatically calculated thresholds with cleared modal data processed by the improved method are smaller and more reasonable than the previous ones because the improved method eliminates more mathematical modal data, which reduces the discreteness of df and 1 − MAC of the remaining modal data. With precise clustering thresholds, more reasonable clustering results can be provided. (3) e produced modal results by clustering process are not precise in statistical perspective. With the help of boxplot, the outliers of the clustering results are found out and eliminated. e boxplot identifies the outliers of the modal results with no prior assumption of the distribution characteristic and identifies the outliers based on the own characteristic of the sample data, which is suitable for the automated modal identification.

Data Availability
e data used to support the findings of this study were downloaded from https://bwk.kuleuven.be/bwm/z24.

Conflicts of Interest
e authors declare that they have no conflicts of interest. Mathematical Problems in Engineering 15