Vibration Tendency Prediction Approach for Hydropower Generator Fused with Multiscale Dominant Ingredient Chaotic Analysis, Adaptive Mutation Grey Wolf Optimizer, and KELM

Accurate vibrational tendency forecasting of hydropower generator unit (HGU) is of great significance to guarantee the safe and economic operation of hydropower station. For this purpose, a novel hybrid approach combined with multiscale dominant ingredient chaotic analysis, kernel extreme learning machine (KELM), and adaptive mutation grey wolf optimizer (AMGWO) is proposed. Among the methods, variational mode decomposition (VMD), phase space reconstruction (PSR), and singular spectrum analysis (SSA) are suitably integrated into the proposed analysis strategy. First of all, VMD is employed to decompose the monitored vibrational signal into several subseries with various frequency scales. -en, SSA is applied to divide each decomposed subseries into dominant and residuary ingredients, after which an additional forecasting component is calculated by integrating the residual of VMD with all the residuary ingredients orderly. Subsequently, the proposed AMGWO is implemented to simultaneously adapt the intrinsic parameters in PSR and KELM for all the forecasting components. Ultimately, the prediction results of the raw vibration signal are obtained by assembling the results of all the predicted prediction components. Furthermore, six relevant contrastive models are adopted to verify the feasibility and availability of the modified strategies employed in the proposed model. -e experimental results illustrate that (1) VMD plays a positive role for the prediction accuracy promotion; (2) the proposed dominant ingredient chaotic analysis based on the realization of time-frequency decomposition can further enhance the capability of the forecasting model; and (3) the appropriate parameters for each forecasting component can be optimized by the proposed AMGWO effectively, which can contribute to elevating the forecasting performance distinctly.


Introduction
Hydropower generator unit (HGU) is the key equipment of hydropower stations, which plays an important role in emergency reserve as well as regulation of peak load and frequency. Besides, the stability and safety of operation for hydropower stations and power grid depends on the healthy condition of HGU heavily [1][2][3], which can be revealed by the corresponding vibration signal monitored by specific sensors [4]. Since the faults of HGU are generally revealed by the vibration status of various components in HGU in practice engineering, it is important to achieve the accurate vibrational tendency forecasting for HGU [5]. Accordingly, the abnormal status of HGU can be disclosed before the accident as well as conducting scientific and reasonable maintenance plans. Nevertheless, the dynamic behaviors of HGU are extremely complicated due to frequently converted working conditions and inherent coupling of hydraulic, mechanical, and electromagnetic factors [6,7].
As mentioned above, the faults' information and the equipment status can be excavated from the corresponding vibration signal. Hence, the potential fault information can be effectively excavated by accurate prediction of vibration trends, while the forecasting of vibration tendency can be equivalent to the problem of time series prediction that achieves prediction by adequately utilizing historical status.
For this purpose, various state-of-the-art prediction techniques have been developed in practice engineering, which can be classified into statistical models and artificial intelligence (AI) models [8]. Statistical models, such as autoregressive (AR) [9], auto regressive moving average (ARMA) [10], auto regressive integrated moving average (ARIMA) [11], autoregressive fractionally integrated moving average (ARFIMA) [12] and generalized autoregressive conditional heteroscedasticity (GARCH) [13], achieve time series forecasting by adequately extracting the implicit information within the historical datasets. Nevertheless, the prediction capability of such models would be restricted by the nonlinearity and nonstationarity of the data. By contrast, AI models possess better adaptability to various data as well as better generalization ability. Among AI models, neural network (NN) [14,15] can approximate any function in theory, whose network structure will be complex and difficult to determine with the increase of hidden layer number and dataset size. In contrast, support vector regression (SVR) [16] that implements forecasting by appropriate kernel mapping possesses less parameters to be determined, while the computation consumption will increase as the data scale increases. In addition, extreme learning machine (ELM) [17,18] has been widely developed in various fields due to the advantages of low computational consumption and fewer parameters. Nevertheless, considering that only the empirical risk minimization principle is applied in ELM, regularization coefficient and kernel functions are introduced to ELM by Huang et al. [19], which can contribute to enhancing the generalization performance of ELM ulteriorly as well as weakening the uncertain results caused by randomly generated weights and biases of ELM. erefore, kernel extreme learning machine (KELM) is investigated to achieve vibration tendency forecasting in this paper.
Due to the fact that the operation of HGU is usually accompanied by background noise and electromagnetic interference, the corresponding vibration signal usually possess strong nonlinearity and nonstationarity, which can greatly affect the predictive performance [6,20]. To this end, various time-frequency decomposition approaches have been rapidly developed to efficaciously weaken the nonstationary and nonlinear data. For instance, Li et al. [21] performed wavelet transform (WT) on the engine cylinder head vibration signals for knock detection. Fei [22] investigated the prediction of bearing vibration signal combining empirical mode decomposition (EMD) with relevance vector machine optimized by artificial bee colony algorithm. Fu et al. [6] conducted a vibration trend measuring system combined with variational mode decomposition (VMD) and least square support vector machine (LSSVM). Among the decomposition methods mentioned above, VMD has been widely investigated in various fields due to the good adaptability and solid mathematical foundation when comparing with WT and EMD [23,24]. Hence, VMD is selected as the data preprocessing approach in this paper, which is aimed at preliminarily making a dent in terms of nonstationarity for vibration signal collected in this paper. Additionally, considering that the redundant ingredients may still be included in the decomposed subseries, the dominant and residuary ingredients of all the subseries are effectively separated by singular spectrum analysis (SSA), with which the forecasting performance could be significantly improved [25,26]. Furthermore, to adequately excavate the inherent chaotic law for the well-processed subseries as well as deducing the inputs and outputs matrixes for predictor, phase space reconstruction (PSR) is adopted in this paper. Following the previous references [27,28], prediction accuracy of the models based on PSR can be enhanced to a certain extent by feeding the appropriate parameters for PSR.
Additionally, the parameters of the approaches mentioned above can greatly affect the performance of the combined model. In view of this, various optimization algorithms based on different strategies have been focused for parameters optimization, such as genetic algorithm (GA) [29], region search evolutionary algorithm (RSEA) [30], artificial sheep algorithm (ASA) [31,32], and grey wolf optimizer (GWO) [33]. To achieve better balance between convergence speed and accuracy, an adaptive mutation grey wolf optimizer (AMGWO) algorithm is proposed in this paper. Based on the above discussion, to achieve accurate vibration tendency forecasting for HGU, multiscale dominant ingredient chaotic analysis based on VMD, SSA, and PSR is organically integrated with KELM and AMGWO in this paper. To begin with, the collected vibration signal is primarily decomposed into several subseries by VMD, after which SSA is investigated to group the dominant and residuary ingredients from all the subseries. As a result, the dominant ingredients are treated as forecasting components, while the VMD residual is accumulated with the residuary ones for further prediction. Afterwards, parameters of PSR and KELM for each component are optimized by the proposed AMGWO algorithm, with which the predicted values corresponding to each component can be generated. Finally, the forecasting results of the raw signal are deduced by accumulating the values of all the predicted components. Furthermore, the effectiveness and superiority of the proposed approach was verified by practical engineering applications as well as detailed contrastive analysis. e remaining parts of this paper are organized as follows. Section 2 illustrates the background knowledge of VMD, SSA, PSR, and KELM in detail. Section 3 details the contents of multiscale dominant ingredient chaotic analysis, the proposed AMGWO algorithm, optimization strategy, and the specific procedure of the proposed approach. Section 4 demonstrates the availability of the proposed approach by detailing experimental results and analysis. Section 5 summarizes the conclusions obtained in Section 4. In addition, the abbreviations of technical terms appeared in this paper are listed in the Abbreviations section.

Variational Mode Decomposition.
Variational mode decomposition (VMD), which was proposed by Dragomiretskiy et al. [34,35], achieves adaptive signal processing by determining center frequency and bandwidth of the corresponding decomposed complement in the phase of 2 Complexity solving a variational problem. Compared with the well-investigated decomposition techniques including wavelet transform (WT) and empirical mode decomposition (EMD), VMD possesses better adaptability for various datasets as well as more complete mathematical theoretical basis. Additionally, following the previous references [6,36], the superiority of forecasting performance promotion implemented by VMD has been widely investigated and verified. By setting a mode number K in advance, the given original signal f can be decomposed into K band-limited intrinsic mode functions [37], where the description of the corresponding constrained variational problem in VMD is as follows: where m k and ω k denote the decomposed complements and the corresponding center frequencies, δ(t) represents the Dirac distribution, and * indicates convolution operator. To solve such constrained problem, quadratic penalty term and Lagrangian multiplication operator λ are introduced, with which the problem can be transformed into an unconstrained one: where α represents the balancing parameter constrained by data fidelity. Subsequently, the solution of such constrained variational problem can be treated as searching the saddle point of the augmented Lagrangian L by updating m k , ω k , and λ alternately, which is named as alternating direction method of the multipliers (ADMM) [38]. e iterative formulas of which can be deduced as follows: where c denotes the time step of the dual ascents and m n+1 k , m i (ω), f(ω), and λ(ω)) indicate the Fourier transform corresponding to m n+1 k , m i (t), f(t), and λ(t), respectively. Additionally, the specific main procedures of VMD are exhibited below: Step 1: initialize m 1 k , ω 1 k , β 1 and n � 1 Step 2: execute loop, n � n + 1 Step 3: update m k and ω k by equations (3) and (4) Step 4: update λ based on equation (5) Step 5: if k ‖m n+1 k − m n k ‖ 2 2 /‖m 2 k ‖ 2 2 < ε, end loop; otherwise, turn to Step 2 for further iteration It is worth noting that the decomposition effectiveness and efficiency of VMD are affected by the inherent parameters, namely, mode number K and time step of the dual ascents c, respectively [34]. To achieve better decomposition effectiveness as well as more accurate prediction for vibration tendency, the aforementioned parameters are predetermined by grid search in this paper.

Singular Spectrum
Analysis. Singular spectrum analysis (SSA) is a novel time series preprocessing technique, which has been widely investigated to identify and extract periodic, quasiperiodic, and oscillating components from raw data [39]. To separate the characteristic tendency terms and the residuary ingredients from the decomposed subseries quickly and easily, such feature selection method, namely, singular value decomposition (SVD), possessing small computational cost and outstanding effect, is employed to handle the extraction task. Four main procedures, namely, embedding, SVD, grouping, and diagonal averaging, are contained in SSA [26]. Specifically, the corresponding detailed description of these four operations is exhibited as follows [24]: (1) Embedding. Given a time series with N length x � x i | i � 1, 2, . . . , N , the series can be embedded into a Hankel matrix [40] in advance: where l denotes the window length and t � N − l + 1.
In addition, all the elements along the diagonal i + j � const are equal. (2) SVD. By implementing SVD on the reconstructed matrix H, the i-th singular value σ i as well as singular vectors of matrixes HH T and H T H, namely, U i and V i , can be obtained effectively. us, the Hankel matrix H can be expressed as follows: H � H 1 + H 2 + · · · + H l , (3) Grouping. In this phase, the aforementioned matrices

Complexity
. . ., Z r }, Z 1 , Z 2 , . . ., Z r denote the indices of each group; the Z-th group matrix H Z is expressed as follows: It is worth mentioning that the set of indices Z � {1, . . ., l} are partitioned into two subsets, namely, dominant and residuary ingredients, in this paper, that is, Z 1 � {1, 2, . . ., s} and Z 2 � {s + 1, s + 2, . . ., l}, with which the matrix H Z could be represented as (4) Diagonal Averaging. With the diagonal averaging strategy, the grouped matrix can be converted into a new series with length N. Here, assume matrix X is an L × S matrix with elements where L * � min (L, S) and S * � max (L, S). Furthermore, the visual representation of diagonal averaging is depicted in Figure 1.

Phase Space Reconstruction.
Phase space reconstruction (PSR) is the basis of chaotic time series prediction, which can explore the potential law of the series by reconstructing a time series with chaotic characteristics into a low-order nonlinear dynamic system. Delay coordinates method proposed by Packard et al. [41] is one of the most popular approaches that could restore the original dynamic system effectively. PSR considering various time delays is adopted to construct appropriate input matrix for predictors, with which the potential associated information contained in historical data can be adequately exploited compared with the case of fixed time delay with 1. Meanwhile, the essence of such approach is to reconstruct a one-dimensional time series into a d-dimensional phase space vector with time delay τ. erefore, the PSR expression for the monitored vibration series x � x i |i � 1, 2, . . . , N can be indicated as follows: where J = N − τ·(d − 1), N represents the total number of wind speed samples, X i denotes the i-th space vector in the reconstructed phase space, and d and τ denote embedded dimension and time delay, respectively. Furthermore, the ith output point O i corresponding to vector X i can be deduced as follows: e key to accurate prediction applying PSR is to preset appropriate parameters. For this purpose, the embedded dimension and time delay within PSR will be synchronously optimized with the parameters of predictor by the newly modified optimization algorithm, which will be detailed in later sections.

Kernel Extreme Learning Machine.
For the normal extreme learning machine (ELM), the input weights and biases are generated randomly as well as being fixed in subsequent calculations. Besides, the minimal norm least square method is employed in ELM, with which the output weights β can be deduced by solving the set of linear equations Hβ � T and described as follows: where H † denotes the Moore-Penrose generalized inverse of matrix H. As mentioned above, the input weights and biases are randomly generated in ELM, with which the output values generally performed randomness as well as generating negative impacts on the ultimate results. To enhance the generalization and the universality of ELM [42], a modified version of ELM combining kernel functions is proposed by Huang et al. [19]. By considering both minimizing training errors and output weights norms for ELM, a better generalization performance of the networks could be achieved. Hence, the regularization coefficient C is introduced in the optimization stage, thus deducing the output weights β as follows [19]: where I represents an identity matrix of dimension N.
Additionally, kernel matrix is adopted to handle the cases that the hidden layer feature mapping h(•) would be unknown, where the kernel matrix for kernel extreme learning machine (KELM) is calculated as follows [19]: where K(•, •) denotes the kernel functions. Organizing equations (13)- (15), the output functions of KELM can be formulated as follows: According to the previous references [43,44], radial basis function (RBF) is one of the most popular kernel functions and has been recognized as an effective one, which is defined as follows: where σ 2 denotes the kernel parameter. Following the investigation [19], the forecasting capability of KELM will be significantly influenced by the regularization coefficient C and the kernel parameter σ 2 . Hence, to achieve better generalization performance of the networks, the above two parameters need to be determined appropriately. Specifically, the aforementioned two parameters will be simultaneously optimized with the parameters of PSR.

Multiscale Dominant Ingredient Chaotic Analysis.
Due to the chaotic nature and intrinsic complexity of the original vibration signal, the prediction performance of single model or models without signal preprocessing would be severely restricted. To this end, VMD is employed to preliminarily decompose the monitored vibration signal into several components with various scale frequencies.
According to references [45,46], the decomposition efficiency and effectiveness of VMD are greatly affected by parameters K and c mentioned in Section 2.1. In addition, considering that the noises may still be contained in the decomposed series, SSA is implemented to extract the dominant and residuary ingredients corresponding to each decomposed series severally, thus extracting characteristic trends of the nonstationary subseries. As mentioned in Section 2.3, the indices Z introduced in the grouping phase of SSA is divided into two discrete subsets, i.e., {1, 2, . . ., s} and {s + 1, s + 2, . . ., l}; it is worth mentioning that the forecasting accuracy can be affected by the parameter s to a certain extent [47]. Additionally, the residual of VMD calculated by m r � f − K k�1 m k is assembled with the residuary ingredients separated from all the subseries, which is dedicated to ulteriorly improving the capabilities of the forecasting model. Subsequently, as an effective chaotic sequence analysis tool, PSR is employed to generate the inputs and outputs for the prediction models in advance [47,48]. It is worth noting that the time delay τ and the embedded dimension d can affect the recovery of the PSR dynamic system to some extent, which would restrict the prediction performance significantly. As a result, the proposed multiscale dominant ingredient chaotic analysis can be implemented by setting the parameters for each module appropriately, which is also an urgent problem to be solved. To this end, a novel parameters optimization based on strategy adaptive mutation grey wolf optimizer (AMGWO) is proposed to achieve a significant improvement of the forecasting model, which will be detailed later.

Adaptive Mutation Grey Wolf
Optimizer. Grey wolf optimizer (GWO) developed by Faris et al. [49] divides the wolf pack into four categories for simulating the leadership hierarchy, i.e., α, β, δ and ω, which are determined by the corresponding positions (fitness value).
en, the mathematical model of the encircling behavior can be formulated as where t indicates the t-th iteration, X where r 1 and r 2 are random vectors in the scopes of [0, 1].
Considering that the parameter a updates with the linear rules in the normal GWO, which would attribute to make the solutions fall into local optimum, a nonlinear quadratic convex function is employed to guide the decreasing process of [50]. e specific updating equation is formulated as where t and T denote the current and the maximum iterations, respectively. In addition, Figure 2 intuitively demonstrates the variation trends of the factor over the course of the iterations with two functions. As can be observed from equation (21) and Figure 2, the convergence factors can adaptively and nonlinearly change over the course of iterations, thus achieving valid modulation of global and local searching capabilities for the algorithm. e other stage, namely, hunting, is usually led by α wolf, while β and δ wolves participate in hunting occasionally [51]. Hence, α, β, and δ wolves are assumed to possess more knowledge about the potential location of prey, where the corresponding updating equations of α, β, and δ wolves are defined as follows: where X ⇀ 1 , X ⇀ 2 , and X ⇀ 3 indicate the positional information owned by α, β, and δ wolves so far. Considering that the individuals' position in normal GWO is merely calculated by averaging the positions of α, β, and δ wolves, the difference in the importance of three wolves cannot be revealed by such simple averaging strategy [26]. To this end, a weighted averaging strategy is proposed for the iteration of individuals, where α, β, and δ wolves are separately assigned a weight value that is deduced by inversing the corresponding fitness values of the wolves in sequence. e detailed calculations are as follows: where w and fitness denote the weight and fitness value of the corresponding individual. To enrich the population diversity in the late iterations of the algorithm, mutation operation is employed to avert falling into local optimum effectively [52]. It is worth noting that the mutation strategy is merely implemented for updating α wolf, which can be described as where n = 1, 2, . . ., rand() represents the random number conforming to uniform distribution U(0, 1) and M and T p denote the coefficient and period of mutation, respectively. With the mutation operation introduced above, the positions of α wolf will be periodically mutated over the course of iterations, which makes the algorithm possess better capability of jumping out of local optimum. To verify the validity 6 Complexity and performance of the proposed AMGWO based on adaptive strategy and mutation operators, a set of benchmark functions are investigated to demonstrate the performance of various algorithms, which are detailed in Table 1. Besides, seven popular swarm intelligence optimization algorithms including particle swarm optimization (PSO), sine cosine algorithm (SCA), ant lion optimization (ALO), whale optimization algorithm (WOA), bat algorithm (BA), moth-flame optimization (MFO), and GWO are adopted to compare with the proposed AMGWO. For all the experimental algorithms, the number of searching agents is given as 50, while the maximum iterations are set as 200. Moreover, the ultima values obtained by various algorithms are averaged by operating 30 times with various random seeds, where the convergence curve corresponding to each algorithms is depicted in Figure 3. It can be observed from Figure 3 that the proposed AMGWO possesses faster convergence speed and much more appropriate solutions compared with the remaining algorithms. Specifically, following the comparison between GWO and AMGWO, it can be found that the proposed modified version of GWO based on adaptive strategy and mutation operator possesses better global search capability, which is attributed to the polytropic particles in the later stage of iterations caused by mutation operation. Furthermore, to describe the proposed AMGWO more clearly and intuitively, the pseudocode of the proposed AMGWO algorithm is exhibited in Algorithm 1.

Optimization Strategy.
To effectively enhance the accuracy and effectiveness of the proposed hybrid structure, the proposed AMGWO is implemented to optimize the parameters of PSR and KELM for each subseries. To begin with, the key parameters K and c in VMD are predetermined by grid search (GS), of which least-squares error index (LSEI) [6] is employed to be the objective function in GS.
en, the parameters of SSA are given according to the conclusions obtained in [47]. Since the parameters in KELM are not affected by the number of input layers, the parameters of PSR and KELM for each subseries are considered to optimized synchronously. In addition, due to the accumulation of residuary ingredients and VMD residual, the total number of all the forecasting components is K + 1, while the coding strategy of the agents is exhibited in Figure 4. Furthermore, metric root-mean-square error (RMSE) formulated in the first row of Table 2 is employed as the objective function.

Specific Procedures.
e major procedures of the proposed vibration tendency prediction approach combined with VMD, SSA, PSR, KELM, and AMGWO-based parameters optimization strategies are described as follows: Step 1: collect the monitoring vibration sequence Step 2: predetermine the mode number K and the updating step τ of VMD by minimizing the index LSEI with GS Step3: decompose the raw vibration signal into K subseries and calculate the residual m r of VMD Step 4: implement SSA for the i-th sub-series, thus extracting the dominant ingredients as well as accumulating all the residuary ingredients with m r Step 5: obtain the optimal parameter sets (τ i , d i , C i , σ 2 i ) for the i-th forecasting component by AMGWO, where i � 1, 2, . . ., K + 1 Step 6: predict all the components with the well-trained model Step 7: accumulate the predicted values of all the components to obtain ultimate forecasting values of the vibration signal e overall process of the proposed hybrid vibration tendency prediction approach is exhibited in Figure 5.

Data Description.
In this section, a series of the vibration signal that are collected from the Ertan Hydropower Station in China is employed to verify the validity of the proposed hybrid forecasting approach. e general structure of a mixed current hydropower unit is depicted in Figure 6 [53]. Owing to the complex structures and frequently converted operation conditions within HGU, it is hard to guarantee that various monitored signals possess uniform time intervals. In this view, the monitored vibration signal is selected with the average time intervals for experimental analysis, thus conforming to the practical engineering [6]. For this purpose, there are total 216 samples collected from the swing data of upper guide in Y direction in this study, which come from 24 February 2011 to 4 March 2011 with the mean time interval of an hour. In addition, the collected vibration signal is exhibited in Figure 7, for which the maximum and minimum of the raw data are highlighted with red point. Meanwhile, the detailed statistical information of the monitored signal containing skewness (Skew.), kurtosis (Kurt.), standard deviation (Std.), minimum (Min.) value, maximum (Max.) value, and mean value is indicated in the bottom right corner of Figure 7. It can be observed that the raw vibrational data are accompanied by strong nonlinearity and nonstationarity, with which the forecasting accuracy may be significantly limited. It is worth noting that PSR will be implemented to generate the phase space matrices corresponding to the forecasting components before the predictor executed, among which the last 70 ones are taken as the testing set while the remaining is applied as training set.

Contrastive Models and Evaluation Metrics.
To achieve comprehensive verification of the availability and the superiority for the proposed hybrid forecasting approach, two benchmark methods and several hybrid models fused with different methods are adopted to achieve contrastive experiments. Among the contrast models, SVR and Complexity  8 Complexity KELM perform the prediction on the raw vibration signal without preprocessing, while GS is carried out for optimizing the parameters in these two models. Besides, the models merely combining time-frequency signal decomposition, namely, EMD-KELM and VMD-KELM, are adopted for comparing the performance between EMD and VMD. Moreover, the combined models, namely, EMD-SSA-PSR-KELM and VMD-SSA-PSR-KELM, achieve prediction by implementing the proposed dominant ingredient chaotic analysis on the basis of decomposition technology. Furthermore, the parameters of KELM for the models mentioned above are all optimized by GS. To achieve the quantitative assessment of various forecasting models, three common metrics including RMSE, mean absolute error (MAE), and mean absolute percentage error (MAPE) are employed, which can i = 1, 2, …, K, K + 1  (21), (19) and (20), respectively. (11) Calculate the fitness of all grey wolves (12) Update the positions of α, β and δ wolves by equations (22) and (23), while the position of α wolf will be mutated every T iterations:  Root-mean-square error RMSE �

Parameters of PSR Parameters of KELM
Promoting percentages of root-mean-square error effectively represent the deviation between the predicted and collected values [54,55]. Additionally, to further quantitatively compare the performance between the contrastive models and the proposed one, the descent ratios of the metrics mentioned above are adopted, which are expressed as P RMSE , P MAE , and P MAPE . e specific definitions of these six metrics are minutely described in Table 2. Here, N denotes the total number of testing set. Y and Y illustrate the predicted and monitored data, respectively. Furthermore, subscript a indicates the contrastive model, while subscript b expresses the proposed approach in this study.

Parameters Setting for All Experimental Models.
For the contrastive models that achieve forecasting applying SVR and KELM, the regularization coefficient C and the kernel parameter σ 2 in such predictors are all optimized by GS. us, these two parameters are searched in [2 − 8 , 2 8 ] and [2 − 5 , 2 − 5 ], respectively, while the searching step is 0.5 applied on the exponent. In addition, for the contrast models based on SSA and PSR, the window length l of Hankel and the grouping parameters s in SSA are given as 100 and 21, respectively [47], while the time delay time τ and embedding dimension d in PSR are set as 1 and 10 orderly. For all the experimental models fused with VMD, the parameters K and  ...
Meanwhile, the subseries decomposed by EMD and VMD (with optimal parameters) are illustrated in Figure 8 in detail, from which it can be seen that the strong nonstationary raw signal is decomposed into several subseries with major characteristic tendencies.

Contrastive Analysis.
In this section, the experimental results of various forecasting models will be discussed in detail. e metrics RMSE, MAE, and MAPE obtained by all the comparison models and the proposed model are illustrated in Table 4. In addition, the descent ratio of metrics obtained by the proposed model when     Table 5 integrally. Analysis of the experimental results described in these two tables can lead to the following conclusions: (1) Compared with SVR, the metrics RMSE, MAE, and MAPE obtained by KELM are generally lower, from which the verdict that a better forecasting performance could be obtained by KELM can be proved preliminarily. In addition, the reducing ratio in terms of RMSE, MAE, and MAPE for KELM is 9.51%, 11.14%, and 11.83%. It can be observed that the performance gap between these two models are not significant, which can be attributed to the strong nonstationary traits within raw vibration signal that severely restrict the capabilities of the models. Hence, implementing the time-frequency signal preprocessing approaches would be the key point to achieve prediction performance promotions.  computational cost of each model that the computation complexity of each single model is almost the same, while the cost of KELM is slightly smaller. Meanwhile, the time consumption of the combined models applying decomposition techniques and GS increases correspondingly as the number of the decomposed subseries increases. It is worth noting that the proposed model possesses greatest time consumption, while the improvement implemented by the proposed model is significant compared with the remaining models. Meanwhile, the averaged metrics' decreasing ratios obtained by the proposed model are 79.74%, 79.18%, and 79.35% in contrast to all the contrastive ones.
Additionally, the comparisons of the predicted and monitored values, as well as prediction errors of each experimental model, are depicted in Figure 9 one by one, thus achieving intuitive observation of the prediction results. e predicted curve of the proposed approach can much better approximate to the actual values as well as possessing error curves that are distributed around zero with smaller fluctuations. Similarly, it can be observed from the comparisons between Figures 9(d) and 9(f ) and Figures 9(c) and 9(e) that the error curves of the models based on dominant ingredient chaotic analysis are much more approximate to zero as well as achieving lower undulation, with which the availability and the superiority of the proposed dominant ingredient chaotic analysis fused with SSA and PSR can be demonstrated ulteriorly.
Furthermore, the scatter plots illustrating the fitting degree of all the experimental models are demonstrated in Figure 10  It can be observed that distribution of the proposed approach on the regression line is the most uniform, and the corresponding R values are 0.99862, which is the largest among all models. Meanwhile, the combined models have generally achieved significant improvement compared with single models, of which the proposed structure, namely, VMD-SSA-PSR-KELM, is superior to the remaining combined models and achieves suboptimum performance among all the models.
Furthermore, the histograms of the evolution metrics calculated by all the experimental models are denoted in Figure 11 with which the conclusions summarized above can be intuitively observed. Among the histograms, the metric MAPE of various models are demonstrated in Figure 11(c) with the forms of histogram and dotted lines, with which the MAPE fluctuation of each models can be intuitively observed. e proposed approach possesses the minimum values of all the assessment metrics, while the model that contains the same frameworks as the proposed one achieves suboptimum performance. Furthermore, VMD-based models generally own smaller indicator values, such as VMD-KELM, VMD-SSA-PSR-KELM, and the proposed model.

Conclusions
To achieve accuracy forecasting for the vibration tendency, a novel hybrid approach combined with VMD, SSA, PSR, KELM, and AMGWO-based parameters optimization strategy is proposed in this paper. Concretely, VMD that can decompose the vibration signal into several subseries was employed to preliminarily weaken the nonstationary and nonlinear raw signal, while the optimal parameters of VMD are determined by minimizing LSEI by GS. Afterwards, SSA was implemented to separate the dominant and residuary ingredients from each subseries, after which all the residuary ingredients are accumulated with residual of VMD for further forecasting. To maximize the capability of the proposed prediction structure, i.e., VMD-SSA-PSR-KELM, adaptive updating strategies and mutation operator are introduced to the normal GWO for enhancing the corresponding parameters optimization performance. erefore, the parameters of PSR and KELM for each forecasting component can be optimized effectively. Ultimately, the forecasting values corresponding to the vibration tendency are deduced by accumulating the values of all the predicted components. In the experimental phase, two single models and four combined models were adopted to compare with the proposed one. e corresponding intensive analysis at various levels demonstrated that (1) the forecasting models based on VMD can achieve much better performance than the models based on EMD and the models without timefrequency decomposition in this study; (2) the forecasting accuracy can be significantly enhanced by the proposed dominant ingredient chaotic analysis; and (3) the capability of the proposed hybrid forecasting framework can be maximized by implementing the proposed AMGWO algorithm to determine the appropriate parameters for each component. Compared with the relevant comparison models, the average decreasing ratio achieved by the proposed approach in terms of RMSE, MAE, and MAPE is 79.73%, 79.18%, and 79.13%, respectively. erefore, the proposed hybrid approach can be considered as a credible tool for vibration tendency forecasting.