The Method for Determining Optimal Analysis Length of Vibration Data Based on Improved Multiscale Permutation Entropy

Research on damage diagnosis or safety monitoring based on structural vibration response is one of the hot issues in the engineering field. )e characteristic information of the structure is obtained by analyzing the structure response data. In the process of data analysis, the choice of data length is very important, which is related to the validity of the structure monitoring results. At present, the selection of data length is usually subjective, which reduces the rigor of the structure monitoring process. )erefore, a method based on improved multiscale permutation entropy (IMPE) is proposed to determine the optimal data analytical length (ODAL) of vibration data. )is method creatively applies multiscale permutation entropy (MPE) to the field of data length analysis when processing nonlinear and nonstationary signals and optimizes MPE with the help of the improved coarse-grained method to obtain IMPE. IMPE is sensitive to different data lengths, and the entropy changes with the increase of the data length and tends to be stable. Here, the stable value is defined as a standard entropy. )e entropy satisfying 97% of the standard entropy is used as the effective entropy, and the corresponding data length value of the effective entropy is selected as the ODAL of the vibration data.)is method is suitable for many fields, provides a reliable data analytical length for data analysis, and has good engineering practicability.


Introduction
e structural vibration data of large-scale water conservancy projects is the main carrier that reflects the structural vibration characteristics [1][2][3]. With the continuous upgrading and transformation of the level of monitoring automation, in the face of massive monitoring data, selecting effective vibration data analytical length and extracting data feature information with the help of efficient and reliable data processing methods is an important basis for realizing real-time safety monitoring of hydraulic structures. e analysis and application of long time series data involve many fields [4][5][6]. In the field of data-driven feature information extraction from monitoring data, Dong et al. used the ensemble empirical mode decomposition (EEMD) methods to analyze the relationship between different vibration sources and the structural operation safety of offshore wind turbines and realized the safety monitoring of offshore wind turbine [7]. Lian et al. used the FM-CEEMD decomposition method to process the time series of modal vibration information of high double arch dams, which can effectively avoid the loss of modal information and improve the accuracy of modal parameter identification [8]. Deng et al. used a variety of evolutionary algorithms to identify the characteristics of the complex multiobjective optimization problem of airport gate allocation and obtain the optimal solution [9,10]. Chen et al. applied a wavelet method to model vibration response data to get the amplitude-dependent nonlinear damping and stiffness of the HAPB system, which provides the basis for vibration response analysis and prediction [11]. ese scholars extracted the required characteristic information from the monitoring data with the help of various methods and obtained many achievements.
However, the data length, as another important factor influencing the analysis results, has been seldom concerned. Usually, the length of vibration analysis data in the process of damage diagnosis and safety monitoring of structures is selected by the subjective intention of individuals with great arbitrariness. It is impossible to exclude the influence of different data lengths on the subsequent results, which is certainly misleading. e data length determines the richness of the signal it contains. If the data length is too long, there will be disadvantages such as time-consuming and cumbersome calculations, which will increase the burden of safety monitoring of the hydraulic structure. e selection of shorter data will cause the loss or incompleteness of feature information, making misjudgment in the safety monitoring process. erefore, choosing the optimal data analytical length (ODAL) that can retain complete feature information and has high calculation efficiency is the key step to ensure the accuracy and real-time performance of the hydraulic structure safety monitoring. In view of this, finding a method that can effectively evaluate the richness of data features is a prerequisite for determining ODAL.
Multiscale permutation entropy (MPE) method, proposed by Aziz and Arif on the basis of permutation entropy, is a new method for detecting dynamic mutation and time series permutation [12]. It is characterized by simple calculation, high sensitivity, and strong antinoise ability and can effectively reflect the small changes in the time series of nonlinear and nonstationary signals [13]. Because of its sensitivity in detecting dynamic mutations, this method has become popular in data analysis and has been widely used in biomedical, mechanical damage diagnosis, stock analysis, and other fields [14][15][16][17][18][19]. With the development and extension of MPE, the improved multiscale permutation entropy (IMPE) has been proposed and applied in many fields. Humeau-Heurtier et al. put forward improved coarse-graining method to improve the MPE, which reduces the dependence of the analysis results on the data length and greatly improves the accuracy of data analysis [20]. Azami and Escudero applied the IMPE in the field of biomedical signal analysis, and it is found that IMPE improves the reliability of entropy estimation compared with the MPE [21]. Zhang et al. found that the IMPE can more accurately dig out the depth information of the time series, and the entropy value obtained is more consistent and stable [22]. Minhas and Singh analyzed the potential features of bearing nonlinear vibration signals and implemented bearing fault diagnosis with the help of IMPE and found that IMPE was more accurate than other state-of-the-art permutation-based feature extraction methods [23]. In view of this, we believe that IMPE can effectively represent different characteristics of the analyzed data in the form of entropy and quantify the connection between characteristic information and the analyzed data. However, few researchers have analyzed the variation patterns of IMPE in feature identification at different data analytical lengths and the selection principles for applying IMPE to determine the ODAL for feature analysis of data.
In this study, we mainly focus on the methods and applications on getting the ODAL in vibration data analysis of hydraulic structure. Based on previous research, we believe that IMPE, as an advanced and effective method in the field of feature recognition, can be applied to the evaluation of feature information richness of data. In this paper, based on the identification of vibration data feature information by IMPE, we explore the entropy change rules under different data lengths to find the ODAL with complete feature information and high computing efficiency. e research results show that this method can select the ODAL for safety monitoring of hydraulic structures, improve the efficiency of vibration data analysis, and provide a reference basis for setting the length of data analysis for similar projects. e rest of this paper is organized as follows. Section 2 introduces the principles of the IMPE and the method on selecting the ODAL. In Section 3, the method is applied to the simulation. en, in Section 4, the method of determining the ODAL is applied to the vibration data of practical engineering and the experimental results are presented. Finally, some conclusions are drawn in Section 5.

Principle of IMPE and Method of Selecting ODAL
In this section, the principles of IMPE and the method on selecting the ODAL are presented.

Multiscale Permutation Entropy.
Based on the permutation entropy algorithm, the multiscale permutation entropy algorithm multiscales one-dimensional time series, calculates the permutation entropy of coarse-grained time series at different scales, and finally forms the multiscale permutation entropy. First, consider a one-dimensional time series X(i); i � 1, 2, . . . , n { }; the following sequence was obtained by coarse-graining processing: e coarse granulation process is shown in where y (s) j is coarse-grained sequence and scale factor s determines the degree of coarse-grained sequence. e length of coarse-graining time series decreases with the increase of scale factor s. When s � 1, the coarse-grained sequence is the original sequence; [n/s] denotes rounding up (n/s).
Reconstruct the coarse-grained sequence y (s) j to obtain the reconstructed coarse sequence y (s) l and the reconstructed components Y (s) l : where m and τ denote the embedding dimension and time delay, respectively. Y (s) l represents the l-th reconstructed component.

Shock and Vibration
Arrange each element of Y (s) l in ascending order; if there are equivalent values in the reconstructed components, they are arranged in the following order: where l 1 , l 2 , . . . , l m denote the index of the column of each element in the reconstructed component Y (s) l . For any coarse-grained sequence y (s) l , a set of symbol sequences s(r) � (l 1 , l 2 , . . . , l m ) can be obtained for (r � 1, 2, . . . , R) and R ≤ m!. Set the probability of the occurrence of each symbol sequence as P r , (r � 1, 2, . . . , R). Permutation entropy is defined as follows: Normalize the above permutation entropy: e H P value represents the random degree of time series y (s) l . e smaller the value is, the smaller the time series complexity is. On the contrary, the greater the value is, the more random the time series and the greater the complexity are.
In summary, after the coarse-graining of original time series X(i); i � 1, 2, . . . , n { }, the reconstructed coarse sequence y (s) l are obtained, and the permutation entropy of each coarse-grained sequence is calculated to obtain MPE, that is, H mp (X) � H p (1), H p (2), . . . , H p (s) .

Improved Coarse-Graining Method.
Coarse-graining is an indispensable part in the calculation of MPE. e essence of the traditional coarsening method is as follows: for scale factor s, the original time series is divided into s nonoverlapping windows, the mean value of data points in each window is calculated, and a new set of time series is formed from the obtained mean value. e specific process diagram is shown in Figure 1(a). e disadvantage is that when the scale factor s is too large, the time series is too short and contains too few data points, leading to the inaccurate estimation of MPE. In order to solve this problem, Humeau-Heurtier et al. [20] proposed the coarse-graining process based on the moving average and applied it to the calculation of sample entropy. In this way, the optimized coarsegraining method was applied to the calculation of MPE to improve the accuracy of results. e specific process diagram is shown in Figure 1 On the given scale factor s, the corresponding coarsegrained sequence is obtained by moving average method. e formula is as follows: where z (s) j is coarse-grained sequence. e sequence length after coursing is L: If the lengths of the original time series n � 600 and s � 15 are taken, the length of the shortest coarse-grained sequence obtained by the improved coarse-graining method is 586, and the length of the shortest coarse-grained sequence obtained by the traditional coarse-graining method is 40. erefore, the improved coarse-graining method greatly improves the validity of the result.

Parameter Selection of the Phase Space Reconstruction.
In order to ensure the validity of entropy values at each scale, the embedding dimension m and the delay time τ need to be calculated separately before calculating the entropy value of each coarse-grained sequence. As an important step before entropy value calculation, the selection of phase space reconstruction parameters is closely related to the accuracy of signal analysis results, which can be roughly divided into independent determination and joint determination methods. Both methods are feasible, but for the detection of abnormal conditions, the independent determination method is more accurate [24]. Here, the embedding dimension m and delay time τ are obtained by false nearest neighbor (FNN) and mutual information (MI) methods. e criterion for selecting two parameters is that the appropriate dimension m is the corresponding dimension when the percentage of pseudo-proximity points in the phase space tends to zero; the optimal delay time τ is the delay time corresponding to the minimum value for the first time. In the measured data, m ≥ 2, τ ≥ 1.
Inaccurate selection of m value will lead to the failure of dynamic mutation in time series, and inaccurate selection of τ value will lead to too large or too small degree of correlation between data. Determining the phase space reconstruction parameters m and τ by using the above two methods can effectively reduce the error of the permutation entropy calculation results and improve the accuracy of the analysis results.

Optimal Length Selection of Data Analysis.
e determination of ODAL based on IMPE uses the sensitivity of the entropy value to the sudden change of dynamical system to find the suitable length of data, so as to solve the problem of data length selectivity in data analysis. Generally, it can be divided into the following four parts: coarse-graining, parameter selection and phase space reconstruction, calculation of entropy value, and selection of sequence length. e specific process diagram is shown in Figure 2, and the detailed steps are as follows: (1) Lay out sensor devices at the key positions of the measured structures to obtain the vibration data of the structures X(i); i � 1, 2, . . . , n { }.
(2) Extract data information of different lengths N, and appropriate scale factor s is selected and vibration data are coarse-graining processed. On the given scale factor s, the corresponding coarse-grained sequence z (s) j is obtained by moving average.

Shock and Vibration 3
(3) Determine the phase space reconstruction parameters m and τ of coarse-grained data by false nearest neighbor (FNN) and mutual information (MI) methods, respectively, and the phase space reconstruction is carried out. (4) Calculate the entropy value PE 1 , PE 2 , . . . , PE s of each time series after coarse-graining, and get multiple scales permutation entropy MPE s � PE 1 , PE 2 , . . . , PE s . Mean of multiscale permutation entropy is used as the basis to measure the complexity of vibration data, in which MPE � ((PE 1 + PE 2 + · · · + PE s )/s).

Verify the Improved Coarse-Graining Method.
In order to test the superiority of the improved coarse-graining method in MPE, taking the white noise in the condition of short data length as an example, the entropy values of time series at different scales before and after the optimization of coarse-granulation method are calculated, respectively. e specific experimental data and results are shown in Figure 3. White noise is a time series of purely random processes with a theoretical entropy value of 1. As a time series of pure random process, the theoretical entropy of white noise is 1. Since the accuracy of the entropy value in the actual test has a certain dependence on the data length, the actual permutation entropy of white noise ranges from 0.90 to 0.97 under two shorter data lengths N � 200 and N � 500. It can be seen from the graph that the measured entropy value of white noise signal decreases with the increase of scale factor after the original and improved coarse-graining methods; that is, the shorter the data length is, the smaller the entropy value is, the greater the difference between the actual value and the theoretical value is. e difference is that, after the improvement of coarse-graining method, the measured entropy value is less affected by the scale factor. With the increase of scale factor, the decrease speed of entropy value slows down and the accuracy increases. Compared with the original method, it shows that the entropy value of the improved coarse -grained method is closer to the theoretical value under the condition of the same data length and scale factor.
In addition, according to the experimental results, whether before or after the improved coarse-graining method, the entropy value of white noise signal with a data length of 500 is more accurate than the entropy value of white noise signal with a length of 200. It also shows that, besides the coarse-graining method, the sequence length is also an important factor affecting the accuracy of the entropy value.

Test the Influence of Data Length on IMPE.
It can be known from the above simulation experiments that the sequence length will affect the accuracy of the entropy value.
If the sequence length is too short, the credibility of the entropy value is low. If the sequence length is too long, there are various disadvantages, such as cumbersome calculation, being time-consuming, and blurring of mutation. erefore, finding a sequence length with accurate calculation results, moderate length and favorable calculation are a key part to ensure the validity of the analysis results. At the same time, it also provides a reference standard for the selection of data length in subsequent signal analysis. Here, the white noise, which has a standard entropy value, is taken as an example for analysis. e specific test data and results are shown in Figure 4. As shown in Figure 4, the entropy curves of white noise with length N � 200, 500, 1000, 2000, 3000, 4000, and 5000 on scale s ∈ [1, 15], respectively. With the increase of sequence length, the measured entropy value gradually approaches the true value 1; with the increase of scale factor, the longer the sequence is affected by scale factor, the more accurate the entropy value is. e results of IMPE are very sensitive to data length, and the scale are associated with the date length. In addition, it can be seen that when the data length of white noise reaches 4000-5000, the entropy changes very slightly with the increase of data length and finally stabilizes at 0.998. It is shown that the accuracy of actual entropy value does not increase with the infinite increase of data length, and its accuracy has a certain limit. erefore, in signal analysis, only within a certain length range, the more the data length is, the more accurate the analysis is. e purposes of entropy value calculation are signal analysis and evaluation, which have higher requirements for the accuracy of the results. However, the higher the accuracy requirement is, the richer the data information is; that is, the more the data length is, the more difficult the calculation is. erefore, when the data length N reaches a certain level, the change of the entropy value of IMPE with the increase of N can be ignored, and the N and IMPE at this time are set to the standard data length and the standard entropy value. In the actual calculation, the standard data length is used as the analysis length and the accuracy is the highest, but usually the data length is large, the entropy value calculation and analysis take a long time, and the efficiency is low. erefore, under the condition that the analysis effect is not affected, the entropy value satisfying the standard value of 97% accuracy is taken as the effective entropy value, and the corresponding data length is taken as the ODAL. For example, in the above experiment, when the white noise data length reaches 5000 or more, the entropy value no longer increases with the increase of the data length, and the entropy value reaches 0.998, which is a little different from the theoretical value. erefore, the entropy value 0.998 corresponding to the data length N � 5000 was taken as the standard entropy value, and the entropy value of 0.968 corresponding to the data length satisfying 97% accuracy of the standard value was selected as the reasonable analysis length. e result shows that the data length of white noise N � 3000 has an entropy value of 0.971, which satisfies the accuracy requirement, so the ODAL of white noise is 3000.

Construct Simulation Signal.
In order to verify the reliability of the method for determining the ODAL, the simulation pure signal f 1 (t) and white noise f 2 (t) are constructed. eir expressions are as follows: where t is time, m is the number of samples, randn(m) is white noise, which follows the standard normal distribution, and f � 100 Hz is sampling frequency.
According to Figure 5, the entropy value of the noise signal increases with the increase of the data length. When the data length reaches 4000-5000, the entropy value is basically stable at 0.998, which has the same change law as the white noise signal. erefore, the data length N � 2000 corresponding to the entropy value that satisfies the standard entropy value of 97% accuracy can be selected as the optimal analysis length of the vibration measurement data of the noisy signal. When the sequence length reaches the optimal data length, the entropy of the noisy signals with different SNR tends to the same stable entropy value. erefore, it can be proved that the multiscale permutation entropy has a strong antinoise ability and good robustness, which can effectively judge the state of the signal.

Example 1 4.1.1. Engineering Situation.
e Jingtaichuan project in Gansu Province is an electric water-lifting irrigation project with high head, large flow, and multiechelon. During the pipeline operation for many years, the effects of material aging, the fluctuating load of water flow, traffic, and so on [25], which lead to structural damage and aging of the pipeline in some extent, directly affect the safety of pipeline operation. Taking the seventh pump pipeline of Jingtaichuan Project Phase II as the research object, the site layout chart and the pipeline sensor placement are shown in Figure 6.

Collection of Vibration Information.
e vibration information of pipe is collected by sensors mounted on the surface of the pipe. In order to ensure the collection effect and get the system characteristic information, 22 sensors are placed in the pipeline of the pumping station [26]. At the instant of the switching unit, the water flow excitation and self-excitation force at the connection of the branch and main pipeline are particularly complex. erefore, three speed sensors are placed at the connection, namely, horizontal X direction, horizontal Y direction, and vertical Z direction, to accurately capture the dynamic characteristics of the system, while different numbers of sensors are arranged at other positions. e excitation of pipeline vibration test is obtained by means of switching pumps, and two general types of vibration states are the moment of the switch machine and the stable operation of the unit, which are divided into (A ∼ K) 11 kinds of states. e 11 test states are shown in Table 1. In order to fully reflect the vibration characteristics of pipelines under various working conditions, the vibration signals are selected as the research objects, that is the sensors No. 15 (corresponding to the X direction), at the intersections where the switching moments of (1-3) turbines are more obviously affected. e specific experimental conditions are as follows.

Determination of m and τ.
Phase space reconstruction is an indispensable part of drawing recursive graphs and calculating quantitative indicators. erefore, the key step of phase space reconstruction is to select reasonable m and τ. e phase space reconstruction parameters m and τ are selected by FNN and MI methods, respectively. e appropriate dimension m is the corresponding dimension when the percentage of pseudo-proximity points in the phase space tends to zero (at least less than 5%); the optimal delay time τ is the delay time corresponding to the minimum value for the first time.
e phase space reconstruction parameters of the pipeline in the seventh pumping station of Jingtaichuan Project II have little difference among the above (A ∼ K) 11 states, which are stable in m � 4 and τ � 4 states, respectively. erefore, the calculation parameters of IMPE selected in this paper are m � 4, τ � 4, and s � 15. Here, the parameter selection diagram of condition B is shown in Figure 7.

Analysis of Examples.
Pipeline in the pump off and on moment will be subject to a larger excitation force from the system itself; that is, the pump vibration will have a significant impact on the operation of the pressure pipeline. en with the smooth operation of the machine, the impact of water excitation is reduced, and the amplitude of the pipeline is reduced. To obtain the ODAL of vibration data of pipeline under different operating conditions, the vibration information obtained from no. 15 sensor is taken to research. Here, one working condition is selected in the four states of startup, shutdown, stable operation, and total shutdown to show the change of signal entropy value with the sequence length.
e entropy value curve of the pipeline in Jingtaichuan Project under working conditions A, B, C, and J is shown in Figure 8. For each working condition, 12 different sequence lengths were selected, such as N � 200, 500, 1000, 2000, 3000,    To test the influence of data length on IMPE, the lengths N � 1000 and N � 2000 are selected to analyze the same vibration signal, which shows that the data length is different, and the effectiveness of signal monitoring is also different. Figure 9(a) is the vibration signal characteristics from opening to stable operation of the pump. Figure 9(b) is the variation curve of the entropy value under two sequence lengths.
It can be seen from Figure 9, when the sequence length is 2000 and 1000, the maximum-minimum entropy difference is 0.09 and 0.055, respectively. When the sequence length is 2000, the vibration signal has obvious mutation. rough comparative analysis, it can be concluded that the sensitivity of IMPE to sudden changes in the signal is reduced, which is not conducive to state detection. e reason is that the data length determines the signal richness. Results show that the shorter the data length is, the greater the impact on the accuracy of signal analysis will be. To avoid the misjudgment in the state detection, choosing a reasonable data length becomes an important part to ensure the correctness of analysis consequence.

Example 2.
In order to verify the practicability of the method in actual projects, the vibration of the spillway section 5 of the ree Gorges dam is studied. e arrangement of measuring points is shown in Figure 10 e entropy value change curve of the vibration signals of the six channels in the no. 5 overflow dam section of the ree Gorges Project is shown in Figure 11. It can be seen from the figure that when the sequence length N � 200, 500, 1000, 2000, 3000, and 4000, the entropy value of the vibration signal in each channel is different.
e entropy values of horizontal dynamic displacement vibration (channels 1-4) are smaller than those measured in vertical dynamic displacement entropy (channels 5 and 6). e entropy value gradually increases with the increase of data length and finally reaches a stable state. It shows that it is feasible to select the signal analysis length by IMPE method. When the data length N � 2000, the entropy value of each channel reaches a stable state and meets the 97% accuracy requirement, and the optimal data analytical length under each working condition is calculated as N � 1000. is shows  that the method of IMPE can be effectively applied in practical engineering and has great value in selecting the appropriate length of vibration data.

Conclusion
Different length of signal will contain different richness of characteristic information. Combined with the superiority of multiscale permutation in nonlinear signal mutation detection, a method to determine the ODAL of vibration data based on improved multiscale permutation entropy (IMPE) is proposed. e method is applied to the optimal data analytical length selection of white noise simulation signals and actual engineering structural vibration monitoring data. e main conclusions are as follows: (1) By analyzing the simulated signal with different SNR, it is proved that the improved multiscale permutation entropy has strong antinoise ability and good robustness, which can effectively avoid the influence of mixed noise on the accuracy of calculation results. (2) e simulation and the analysis results of specific projects show that the length of the data is closely related to the accuracy and stability of the improved multiscale permutation entropy. e entropy value corresponding to the optimal data analytical length meets the standard entropy value of 97%, which can meet the requirements of engineering accuracy. (3) e proposed method provides a reliable data length for signal analysis, eliminates the randomness and subjectivity caused by artificial selection of data length, and improves the accuracy and efficiency of hydraulic structural vibration safety monitoring.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Additional Points
Highlights. A method for determining optimal analysis length of vibration data based on improved multiscale permutation entropy is proposed. A white noise signal is used to investigate the superiority of IMPE compared to MPE. e analysis on the simulated signal with different SNR proves the IMPE has strong antinoise ability and good robustness. Simulation analysis and real engineering cases are used to determine the effectiveness of obtaining the optimal data analytical length based on IMPE. e entropy value corresponding to the optimal data analytical length that meets the standard entropy value of 97% can satisfy the requirements of engineering accuracy.