Safety Monitoring of a Super-High Dam Using Optimal Kernel Partial Least Squares

Considering the characteristics of complex nonlinear and multiple response variables of a super-high dam, kernel partial least squares (KPLS) method, as a strongly nonlinear multivariate analysis method, is introduced into the field of dam safety monitoring for the first time. A universal unified optimization algorithm is designed to select the key parameters of the KPLS method and obtain the optimal kernel partial least squares (OKPLS). Then, OKPLS is used to establish a strongly nonlinear multivariate safety monitoring model to identify the abnormal behavior of a super-high dam via model multivariate fusion diagnosis. An analysis of deformation monitoring data of a super-high arch dam was undertaken as a case study. Compared to the multiple linear regression (MLR), partial least squares (PLS), and KPLS models, the OKPLS model displayed the best fitting accuracy and forecast precision, and the model multivariate fusion diagnosis reduced the number of false alarms compared to the traditional univariate diagnosis. Thus, OKPLS is a promising method in the application of super-high dam safety monitoring.


Introduction
Currently, 51 super-high dams of heights greater than 200 meters exist worldwide, and another 31 super-high dams are under construction or are proposed for construction.These projects provide substantial comprehensive benefits, including power generation, flood control, and irrigation.Meanwhile, dam safety significantly influences the security of personal property and ecological environment in the area around the dam.Except the direct damage detection methods [1][2][3][4], the dam safety monitoring models based on dam prototype monitoring data can well monitor dam behavior to ensure dam safety [5].The models first forecast future dam response values, and then the predicted values and the observed values are compared to ascertain whether the observations are abnormal.Super-high dams have special structural form, such as huge structure size, numerous structural joints and holes, and face complex work environments, such as great hydrostatic pressure and complex geological conditions.There is strong structure nonlinearity, such as contact nonlinearity.Meanwhile, effect of dam material nonlinearity and coupling effect of environmental factors are both amplified.As a result, the behavior of a super-high dam shows significant nonlinear characteristics; that is, there is a complex nonlinear relationship between environmental variables and response variables of a super-high dam.The common physical hypothesis, namely, the elastic and linear behavior of materials and the principle of superposition of effects, is not valid in super-high dams.The multiple linear regression (MLR) models [6,7] based on the above hypothesis cannot accurately model this complex nonlinear relationship, and they may not perform well for super-high dams.In addition, the traditional dam safety monitoring models are almost always based on a single response variable.These models not only are time-consuming to implement but also easily result in false alarms [8].Such drawbacks are more prominent for super-high dams because more monitoring instruments, including advanced distributed optical fiber, are installed in super-high dams to fully monitor dam realtime behavior.Every instrument outputs one or more dam response variable that reflects the dam behavior.Therefore, it is urgent to research strongly nonlinear multivariate safety monitoring models appropriate for super-high dams.
In recent years, a number of dam safety monitoring models with strong nonlinear mapping ability were constantly proposed, such as neural network (NN) models [6,[9][10][11][12] and support vector regression (SVR) models [13][14][15].These nonlinear models could have better fitting accuracy and forecast precision than MLR models when selecting the appropriate parameters (NN layers and nodes, SVM kernel functions, and regularization parameters).However, it is difficult to select these key parameters.Furthermore, training NN requires much computing time, and NN easily falls into local minima.Alternatively, principle component analysis (PCA), as a type of multivariate statistical data analysis method, has been applied in dam multivariate safety monitoring [8,[16][17][18].The environmental effect component is first extracted from the multiple response variables by PCA, and then abnormality is identified by analyzing the extracted component.PCA can eliminate data noise and redundancy and reduce false alarms.However, PCA does not consider the environmental variables influencing dam response variables when extracting the environmental effect component.The extracted component may not be the true environmental effect of dam response variables.Partial least squares (PLS) method is a better form of the multivariate statistical data analysis method than PCA for process monitoring and output prediction [19].PLS combines multiple linear regression (MLR) analysis, canonical correlation analysis (CCA), and PCA.PLS can establish a regression model between multiple dependent variables and multiple independent variables.However, PLS is currently mainly used to establish univariate safety monitoring models to solve the multicollinearity of environmental variables [13,20].Furthermore, PLS is still essentially a linear regression and cannot accurately obtain the complex nonlinear relationship between environmental variables and response variables of a super-high dam.
Kernel partial least squares (KPLS) [21] method is a new nonlinear PLS used to address nonlinear problems.In KPLS, the original input data are nonlinearly transformed into a high-dimensional space via a kernel function, and then a linear PLS model is created in the high-dimensional space, so the linear relationship obtained by PLS in the highdimensional space corresponds to the nonlinear relationship in the original input space.KPLS not only retains all of the advantages of PLS but also has strong nonlinear mapping ability.Compared to other nonlinear PLS approaches, such as spline PLS [22], quadratic PLS [23,24], and neural network PLS [25,26], KPLS essentially requires only linear algebra in high-dimensional space, making it as simple as the linear PLS.Moreover, KPLS can handle a wide range of nonlinearities by using different types of kernel functions.In the last decade, KPLS has been applied for nonlinear multivariate quality prediction [27,28] and process monitoring [29,30].
Similar to NN and SVR, the kernel function and the number of latent variables in KPLS also have a strong influence on the KPLS generalization performance [30][31][32].The parameter selection of KPLS is simpler than SVR because the only parameter (beyond choice of kernel) is the number of latent variables and one needs to only consider a few discrete values as opposed to the continuous parameters in SVR.The number of latent variables is selected by the adjusted Word's  criterion [30,33].However, the selection of the kernel function is still an open problem in KPLS [33,34].Currently, the radial basis function of strong local approximation ability is mostly used as a kernel function, and its kernel parameter is selected by some formulas [30,33] or cross-validation [31,32].The selection method of the formulas is not of general applicability and may not be optimal.The selection method of cross-validation will involve the number of latent variables.Similarly, the above selection method of the number of latent variables also involves the kernel parameters.There is a lack of an executable unified method to select the kernel function and the number of latent variables.
In this paper, a universal unified optimization algorithm is designed to select the KPLS parameters and achieve the optimal kernel partial least squares (OKPLS).Next, OKPLS is used to establish a strongly nonlinear multivariate safety monitoring model to monitor a super-high dam to ensure its safety.The paper is organized as follows.Section 2.1 introduces the basic principle of KPLS.A universal unified optimization algorithm for selecting the KPLS parameters is given in Section 2.2.Section 3.1 introduces the method of establishing the strongly nonlinear multivariate safety monitoring model of a super-high dam using OKPLS.A multivariate fusion diagnosis method of the safety monitoring model is presented in Section 3.2.Section 4 offers a case analysis of radial deformation monitoring data obtained in the pendulums of a super-high arch dam.The overall conclusions are given in Section 5.

Optimal Kernel Partial Least
Squares (OKPLS) The basic principle of the procedure is maximizing the covariance among the input and output variables.The objective of PLS is to eliminate data noise and extract the comprehensive variables that best explain this system.A least squares regression is then performed on the latent variables, and then PLS obtains the relationship between the input and output variables of the system.However, when a system has strongly nonlinear characteristics, that is, when there is a complex nonlinear relationship between the input and output variables of the system, linear PLS is appropriate for simulating the system.According to Cover's theorem, the nonlinear data structure in the input space is more likely to be linear after a high-dimensional nonlinear mapping.KPLS is formulated in this high-dimensional space to extend the linear PLS to its nonlinear kernel form.Hence, KPLS can simulate the complex nonlinear system as shown in Figure 1.
Assume that  measurements of each variable are collected while the system is operating under normal conditions.The input and output data can be denoted by X × and Y × in the matrix, respectively.Consider a nonlinear mapping where B is a ( × ) matrix of the regression coefficients; F is a ( × ) matrix of residuals.
In KPLS, through the kernel function (x  , x  ) ((x  , x  ) = ⟨(x  ), (x  )⟩), we can avoid both performing explicit nonlinear mappings and computing dot products in the highdimensional space.ΦΦ  represents the kernel Gram matrix K × of the cross dot products between all mapped vectors {(x  )}  =1 .From the matrices K and Y, the modified KPLS algorithm [30] is shown as follows: (1) Set  = 1, K 1 = K, and Y 1 = Y.
(2) Initialize the score-vector u  ( × 1) of the latent variable   of Y  , as the maximum-variance column of Y  . ( The prediction of the output variables is given by For a new observation x of input variables, the output is estimated by where (x) is the mapped vector of the new observation x in the high-dimensional space.k(x) = [(x 1 , x), . . ., (x  , x)]  is the vector of kernel functions evaluated in the pairs (x  , x) for  = 1, . . ., .
Before applying KPLS, mean centering in the highdimensional space should be performed.This mean centering can be performed by substituting the kernel matrix K and the kernel vector k(x) with K and k(x), where where I is a -dimensional identity matrix; E is a  ×  matrix with all its entries equal to 1/; e is a column vector with all its entries equal to 1/.

Optimization Selection of the KPLS Parameters.
There are two main issues in KPLS: (1) the selection of the kernel function and its parameters and (2) the selection of the number of latent variables.Both decisions have strong influence on KPLS generalization performance.
Any symmetric function satisfying Mercer's theorem, such as polynomial kernel, radial basis kernel, and sigmoid kernel, can be used as a kernel function.For specific application and a given set of samples, constructing an appropriate kernel function is the key to applying the kernel function.However, an effective method does not exist to construct such a kernel function.Substantial progress was mainly made in the selection of the kernel parameters.Cross-validation [35] is a universal method to select model parameters.The parameters obtained by cross-validation are considered optimal.
In -fold cross-validation, the samples are randomly split into  blocks of the substantially same number of samples.Assuming some specific parameters, a model is established based on ( − 1) blocks of the samples.The excluded block is used for testing, and an individual predicted error sum of squares (PRESS) is calculated.This procedure is repeated by excluding each block, once and only once, and then the total PRESS is calculated for the specific parameters by summing the individual PRESS values.The total PRESS is used to estimate the generalization performance of the specific parameters.The cross-validation is applied in different parameters.The parameters of the minimum PRESS are considered optimal.In fact, cross-validation only gives an index evaluating the expected risk of one model, for example, the above-described PRESS.Some optimization algorithms, such as grid searching method, genetic algorithm, and particle swarm optimization, must be used to obtain the optimal kernel parameters.
The number of latent variables is usually selected by the adjusted Word's  criterion.The adjusted Word's  criterion is also a method based on cross-validation.The differences from the cross-validation selecting kernel parameters are as follows: the evaluation index is not the PRESS but a new index established based on the PRESS, (ℎ) = PRESS(ℎ)/PRESS(ℎ − 1), where PRESS(ℎ) is the total PRESS calculated for ℎ latent variables; when the index (ℎ) exceeds a predefined threshold (e.g., 0.9), the optimal number of latent variables is equal to ℎ − 1.Such treatment can avoid producing an overfitted model of poor prediction ability due to the inclusion of an excessive number of latent variables [36].
The respective selection methods of the kernel parameters and the number of latent variables have been given.Because selecting either one will involve another, a unified optimization solution similar to that in SVR is the best approach.However, their evaluation indices are different, which adds trouble to the unified optimization solution.Hence, a universal unified optimization algorithm selecting the KPLS parameters is designed as shown in Figure 2. The algorithm contains two loops.The outer loop optimizes the kernel parameters via a genetic algorithm.The inner loop selects the number  of latent variables by the adjusted Word's  criterion, and the obtained PRESS() is used as the target of the outer optimization.The optimal values of the kernel parameters and the number of latent variables can be obtained after the   optimization iterations, where   is the predefined number of optimization iterations as the termination condition.Finally, OKPLS can be obtained when KPLS parameters are set to the obtained optimal values.
The designed algorithm has the following characteristics: (1) The algorithm is universal for any kernel function.
(2) In the algorithm, the kernel parameters and the number of latent variables are selected together and they are both verified by the cross-validation in the inner loop.(3) The two different evaluation indices of KPLS parameters are used in the inner loop and the outer loop, making the kernel parameters and the number of latent variables both optimal.(4) The kernel parameters may be any continuous value, and the number of latent variables may only be a few discrete values.Moreover, for different kernel parameters, the kernel Gram matrix K must be recalculated, which is the largest computational cost in KPLS.Therefore, the algorithm design of the outer and inner loops can reduce the computational cost of selecting the optimal KPLS parameters.(5) Finally, the algorithm is easy to implement and execute using a computer.Output the optimal values of kernel parameters and the number of latent variables and fitness value f = 1/PRESS(A) Obtain the number of latent variables Calculate PRESS(j) by cross-validation

Super-High Dam Safety Monitoring
Using OKPLS For the dam system shown in Figure 3, the environmental variables that influence the dam system can be considered as system inputs, generally including hydrostatic pressure (H), seasonal temperature (T), and time effect ().As measured using monitoring instruments installed on the dam, the dam response variables, for example, deformation, seepage, and stress, can be considered to be the system outputs.Dam safety monitoring is the process of identifying abnormal dam behavior according to inputs and outputs of the dam system.When no abnormalities appear on the dam structure or the monitoring instruments, a relationship exists between the inputs and outputs of the dam system.Therefore, some mathematical methods can be used to obtain the determined relationship according to input and output data without abnormalities and establish dam safety monitoring models.Subsequently, the model diagnosis of the dam behavior can be performed.The models forecast future dam response values according to the new values of the environmental variables and identify abnormal dam behavior by comparing the predicted values and the observed values.
Given that super-high dam is a complex nonlinear system of multiple inputs and multiple outputs, OKPLS, as a strongly nonlinear multivariate statistical data analysis method, is used to perform super-high dam safety monitoring.

Modeling Based on OKPLS.
Establishing dam safety monitoring models involves obtaining the determined relationship between the environmental variables and the dam response variables according to the dam monitoring data without abnormalities.For a super-high dam, a complex nonlinear relationship exists between the environmental variables and the response variables.KPLS obtains the complex nonlinear relationship by a nonlinear mapping (), two linear mappings (, ), and a linear regression (), shown in Figure 1.The original environmental variables are nonlinearly transformed into a high-dimensional space via the nonlinear mapping ; next, in the high-dimensional space, the complex nonlinear relationship is obtained via , , and  in PLS.
The original environmental variables contain noise, and nonlinear mapping adds some useless, redundant components for the response variables.The useful environmental component t  is extracted by the linear mapping .Similarly, the useful response component u  regarding the behavior of a super-high dam is extracted by the linear mapping .Finally the linear regression  establishes the relationship between the useful response component u  and the useful environmental component t  .The useful components t  and u  are the score-vectors of the latent variables   and   , respectively.Using PLS, these latent variables are jointly extracted from the environmental variables and the response variables of a super-high dam.Hence, the obtained useful environmental and response components can best explain the behavior of a super-high dam.Furthermore, because the dam response variables are highly correlated, that is, they are highly multicollinear, multivariate KPLS might be advantageous in eliminating data noise and redundancy [21].
OKPLS is the KPLS based on parameter optimization for practical problems; OKPLS is more likely to obtain the correct relationship between the environmental variables and the response variables of a super-high dam.The major steps of establishing the strongly nonlinear multivariate safety monitoring model of a super-high dam based on OKPLS are as follows.
Step 1. Select multiple response variables that belong to the same monitoring project, and together reflect a particular behavior of a super-high dam, such as deformation, seepage, or stress, as the OKPLS outputs y = { 1 ,  2 , . . .,   }.Their corresponding measuring point positions are adjacent and there are strong correlations among the multiple response variables.
Step 2. Similar to MLR safety monitoring models [6,7], according to physical and mechanical analysis, select the environmental variables influencing the particular behavior in Step 1 as the OKPLS inputs x = { 1 ,  2 , . . .,   }.They may be hydrostatic pressure (H), seasonal temperature (T), and time effect ().
Step 3. Select a period of normal history monitoring data {X, Y} of the environmental variables and the response variables, which vary over the largest possible range; that is, they should include the monitoring data in extreme environmental conditions, for example, possible highest or lowest water reservoir water level and temperature.
Step 4. Select an appropriate type of kernel function (x  , x  ) to generate the kernel Gram matrix K and then select the optimal values of kernel parameters and the number of latent variables using the universal unified optimization algorithm shown in Figure 2.
Step 5. Calculate the matrix B of the regression coefficients using the modified KPLS algorithm.
Step 6. Calculate the predicted values of the OKPLS model according to (3) and (4).

Model Multivariate Fusion Diagnosis. Model diagnosis
of dam behavior involves identifying the abnormal dam behavior by comparing the observed value and the predicted value of the safety monitoring model.When abnormalities appear on the dam structure or the monitoring instruments, the residual, that is, the deviation between the observed value and the predicted value, will increase significantly and exceed certain control limits.Note that the prediction accuracy of the safety monitoring model is important for model diagnosis.If the prediction accuracy is poor, then the residuals will contain a larger model error, which results in incorrect diagnosis conclusions.According to the principle of minor probability accident, the observed value is identified as being normal, almost normal, or abnormal, depending on the following [5] where   is the th observed value; ŷ is the th predicted value;  is the standard deviation of residuals and is estimated by [37]  where  is the number of the observations used to establish the models and  is the number of environmental variables used to establish the models.This above-described model diagnosis method is the traditional univariate diagnosis.The corresponding monitoring control charts of the diagnosis method are shown in Figure 4.
The OKPLS model is a multivariate model outputting multiple response variables.The model provides certain advantages when performing multivariate fusion diagnosis by integrating information of multiple response variables.The multivariate fusion diagnosis can not only decrease the number of diagnosed variables but also reduce the incorrect diagnosis conclusions due to incomplete information of individual response variable.To perform the multivariate fusion diagnosis, the key works are constructing a fusion diagnosis index and determining its control limits.By learning from the squared prediction error (SPE) in multivariate statistical process monitoring [29,30] a fusion diagnosis index FDI is constructed by where FDI  is the value of the fusion diagnosis index FDI for the th observation and represents the comprehensive prediction error of  response variables;  is the number of response variables to be diagnosed;   is the th observed value of the th response variable; ŷ is the th predicted value of the th response variable;   is the standard deviation of residuals of th response variable.Note that   is divided by the standard deviation to eliminate the difference between residuals of multiple response variables, and these response variables will be treated equally in the fusion diagnosis.
In dam safety monitoring, global structure damage and serious local structure damage are the focuses.According to (8), only when these types of damages appear on the dam, the fusion diagnosis index FDI will increase significantly and exceed certain control limits.Single monitoring instrument malfunctions and slight local structure damage do not usually greatly increase the FDI; thus they will not be identified as being abnormal.In contrast, all of these above-described abnormalities may be identified as being abnormal in the traditional univariate diagnosis, which will greatly increase the subsequent abnormal analysis.
According to (8), the fusion diagnosis index FDI is always positive and does not meet a normal distribution.As a result, a diagnosis method such as (6) is not applicable.Additionally, according to the principle of minor probability accident, the FDI probability distribution is first estimated, and then some control limits are set based on the probability distribution.Finally, the dam behavior is diagnosed by these certain control limits.Because the type of the FDI probability distribution is unknown, the kernel density estimation method [38] is used to estimate the FDI probability distribution in this paper.Similar to (6), two control limits (UCL1 and UCL2) are set, and the probabilities of occurrence below them are 95.44% and 99.74%, respectively, which correspond to two and three times, respectively, of the standard deviation of the normal distribution.Hence, the fusion diagnosis index FDI is identified as being normal, almost normal, or abnormal, depending on the following: no tendency variation in the next 2 or 3 observations FDI > UCL2.(9) The corresponding monitoring control charts are shown in Figure 5.When the fusion diagnosis index FDI is identified as being abnormal, great attention should be given.Every dam response variable should be further analyzed by the output results of the model to find the reason for the abnormality and then an alarm is issued.
Through the above analysis, super-high dam safety monitoring using OKPLS is the first to establish a strongly nonlinear multivariate safety monitoring model based on OKPLS, followed by performing model multivariate fusion diagnosis to identify the abnormal behavior of a super-high dam.The overall flow chart is shown in Figure 6.

Case Study
One super-high arch dam is a double curvature arch dam with a maximum height of 294.5 m.As shown in Figure 7, the crest elevation of the dam is 1245 m and its crest length is 901.771 m.The dam is composed of 43 dam sections and a thrust pier.The crest width and bottom width of its crown cantilever (the twenty-second dam section) are 12 m and 72.912 m, respectively.There are 5 crest overflow surface holes, 6 flood discharge middle holes, 2 escape bottom holes, 4 diversion middle holes, and 2 diversion bottom holes in the dam body.The normal water level of the dam is 1240 m with a corresponding storage capacity of 149.14 billion m 3 .In dam foundation, as shown in Figure 8, there is one IIgrade fault (more than 1000 m length, 18∼37 m width), 19 III-grade faults (100∼1000 m length, 0.5∼4 m width), many IV-grade faults (10∼100 m length, 0.1∼0.5 m width), and 5 large alteration zones.In total, 9406 monitoring points are arranged in the dam to fully monitor the dam behavior such as deformation, seepage, and stress.Among these points, 52 pendulum monitoring points are used to monitor the horizontal deformation of the dam body and the dam foundation, as shown in Figure 7.The radial displacements obtained by the pendulums in the central block of the dam were used to validate the super-high dam safety monitoring using OKPLS in this paper.The pendulums contain one inverted and five hanging pendulum monitoring points, which are at elevations 963.00 m (denoted  0 ), 1010.00 m (denoted  1 ), 1065.00 m  (denoted  2 ), 1100.00 m (denoted  3 ), 1173.70 m (denoted  4 ), and 1245.00 m (denoted  5 ).
In total, 764 monitoring data samples were obtained by the pendulums from July 1, 2010, to December 31, 2012, as shown in Figure 9.The former 750 samples are used as training samples to establish the model and estimate the probability distribution of the fusion diagnosis index FDI.The latter 14 samples are used as test samples to verify the model forecast performance and the multivariate fusion diagnosis method.Note that these samples are obtained in normal operation and that no abnormal occurrences exist.The sign (−) indicates radial displacements upstream and the sign (+) indicates radial displacements downstream.During the same period, the observed values of the reservoir water level are also shown in Figure 9.

Modeling.
Based on physical and mechanical analysis, the environmental factors influencing the radial deformation of the arch dam contain hydrostatic pressure terms H, seasonal temperature terms T, and time effect terms  [5].The hydrostatic pressure terms H are the effect of hydrostatic thrust on the dam: H = {ℎ 1 , ℎ 2 , ℎ 3 , ℎ 4 }, where ℎ =  − 950.50,  is the reservoir water level, and 950.50 is the value of the elevation at the bottom of the arch  dam.The seasonal temperature terms T are the effect of seasonal concrete temperature variations which are mainly influenced by air temperature during running stage.The effect of seasonal temperature can be represented by periodic harmonic: T = {sin(2/365), cos(2/365)}, where  is the number of days since the beginning of the analysis.The time effect terms  are the effect of concrete creep of the dam body, rock creep of the dam foundation, and some irreversible deformation, such as concrete and rock plastic and creep deformations, concrete autogenous volume deformation, and cracking deformation.This type of time effect deformation occurs rapidly at the period of the first or initial impounding, and it tends to be stationary over time, so it can be represented by a combination of a polynomial function and a logarithm function:  = {, ln( + 1)}, where  = /100.In this paper, the radial basis function, (  ,   ) = exp(‖  −   ‖ 2 /), was selected as the kernel function of the OKPLS.The kernel parameter  and the number  of latent variables were selected by the universal unified optimization algorithm shown in Figure 2. To save computing time, the ) Figure 10: Evolutionary process of the optimal fitness value.
10-fold cross-validation was used in the inner loop.The parameter setting of the genetic algorithm in the outer loop is given in Table 1.Among these parameters, the range of the kernel parameter  was a symmetrical expansion of the parameter value selected by the formula in [30].The evolutionary process of the optimal fitness value is shown in Figure 10.After four iterations, the optimal fitness value achieves convergence.The optimal values of the kernel parameters and the number of latent variables were finally selected to be 3.5 and 22, respectively.
To verify the fitting and forecast performance of the OKPLS model, six independent MLR models, a PLS model, and a KPLS model for the six radial displacements were established based on the same environmental variables.The kernel function of the KPLS model was also the radial basis function and its kernel parameter was directly calculated by the formula in [30].The number of latent variables in the KPLS model was selected by the adjusted Word's  criterion.In this paper, the mean square error (MSE) was used to compare the fitting and forecast performance of the abovedescribed four models.The MSE is calculated by where  is the number of training samples or test samples;   is the th observed value; ŷ is the th model fitting value or forecast value.The comparative results are presented in  Figure 11 and Table 2.By careful comparative analysis, we can find the following consequences: (1) Compared to MLR, PLS shows a better forecast performance for all radial displacements, with a lower average MSE of 1.430; however, PLS exhibits a worse fitting performance for all radial displacements, with a higher average MSE of 0.913.This behavior may be explained for two reasons.One reason is that MLR produces spurious regression due to multicollinearity of the four hydrostatic pressure terms (the minimum value of their correlation coefficients is 0.9926).The other reason is that the six radial displacements are highly correlated (the ratio between the first and sixth eigenvalues of the covariance matrix of the six radial displacements equals 89804), and PLS well utilizes their relevant information to eliminate noise and accurately grasp the dam overall behavior.
(2) KPLS shows a better fitting performance for all radial displacements versus MLR and PLS, with a lower average MSE of 0.244; except for the displacements  3 and  5 , the forecast performance of KPLS is worse than PLS.This result may be because the selected parameters of KPLS are not appropriate and KPLS does not obtain the correct relationship between the environmental variables and the radial displacements as is the case of MLR.
(3) Among the four models, OKPLS exhibits the best fitting and forecast performance for all radial displacements, and its averages of fitting and forecast MSE are both the lowest, with values of 0.074 and 0.090, respectively.The good performance of OKPLS may be attributed to two aspects.First, OKPLS inherits the advantages of PLS.PLS not only solves the multicollinearity of environmental variables, but also well utilizes relevant information of the six radial displacements.Second, using the kernel function, OKPLS has ability to obtain the complex nonlinear relationship between the environmental variables and the radial displacements.Furthermore, the optimization selection of the kernel parameters and the number of latent variables ensure that OKPLS correctly obtains the complex nonlinear relationship.

Fusion Diagnosis.
To diagnose the radial displacements of the super-high arch dam using the multivariate fusion diagnosis method, the fusion diagnosis index FDI was first calculated according to (8).Through the function ksdensity (a kernel density estimation function) in MATLAB, the FDI probability distribution was estimated based on the FDI of the training samples, as shown in Figure 12.The two control limits (UCL1 and UCL2) were obtained from Figure 12, with values of 5.0 and 15.5, respectively.On this basis, the radial displacements in the test period were diagnosed according to (9).The multivariate fusion diagnosis results are shown in Figure 13. Figure 13 shows that no abnormal occurrence was observed, which is consistent with the real situation.The radial displacements of every elevation were also diagnosed according to (6).The results are shown in Figure 14.Most observed values of these displacements are identified to be normal.However, some late observed values of the radial displacements  0 and  3 at elevations 963.00 m and 1100.00 m, respectively, are misdiagnosed as abnormal.These false abnormalities may be due to the decreased forecast precision of the model.The phenomenon was also reflected when increasing the FDI.
The super-high arch dam considered is in the early stages of its operation and its behavior is undergoing constant adjustment.Over time, the forecast precision of the model will decline.From Figure 14, we can speculate that there may be behavior adjustments in dam foundation ( 0 ) and middle elevations ( 3 ) of the super-high arch dam.Overall, these adjustments are local and slight, and the super-high arch dam remains in normal operation.Hence, the multivariate fusion diagnosis method provides the correct diagnosis and reduces false alarms.The good performance of the method may be attributed to information integration of multiple response variables.The information integration enhances the immunity to local false abnormalities.

Conclusions
With the ongoing construction of super-high dams, the security issue of such dams is becoming increasingly prominent.A super-high dam has the characteristics of complex nonlinear and multiple response variables.Traditional linear and univariate safety monitoring models have become appropriate for monitoring a super-high dam.Recently proposed safety monitoring models of complex nonlinear or multivariate analysis functions also contain a few deficiencies.Moreover, they cannot simultaneously meet the requirements of complex nonlinear and multivariate analysis of a super-high dam.Therefore, in this paper, KPLS, as a strong nonlinear multivariate analysis method, was first introduced into the field of dam safety monitoring.Because the kernel function and the number of latent variables in KPLS have strong influence on KPLS generalization performance, the universal unified optimization algorithm is designed to select the KPLS parameters and obtain the optimal kernel partial least squares.Next, OKPLS is used to establish a strongly nonlinear multivariate safety monitoring model to identify the abnormal behavior of a super-high dam by the proposed multivariate fusion diagnosis.OKPLS cannot only solve the multicollinearity of environmental variables but also well utilizes relevant information of multiple response variables to eliminate noise and accurately grasp dam overall behavior.OKPLS also has the ability to obtain the complex nonlinear relationship between the environmental variables and the response variables of a super-high dam.Meanwhile, the designed universal unified optimization algorithm may ensure that OKPLS correctly obtains the complex nonlinear relationship.In addition, the proposed multivariate fusion diagnosis method well utilizes multiple output results of the OKPLS model.The proposed method can achieve information integration of multiple response variables by the fusion index FDI, thereby improving the diagnosis efficiency and accuracy.
The application example shows the following: compared to the MLR, PLS, and KPLS models, the average fitting and forecast precisions of the OKPLS model are the highest; the multivariate fusion diagnosis of the OKPLS model reduces the number of false alarms compared to the traditional univariate diagnosis.In this paper, we only use an analysis of deformation monitoring data of a super-high arch dam as a case study.The proposed methodology, that is, safety monitoring of a super-high dam using optimal kernel partial least squares, including the safety monitoring model (prediction model) and the fusion diagnosis method, applies to all physical data of super-high dams, for example, deformation, seepage, and stress.Thus, OKPLS is appropriate for using in safety monitoring of a super-high dam.

Figure 2 :Figure 3 :
Figure 2: Flow chart of the universal unified optimization algorithm for selecting the KPLS parameters.

Figure 7 :
Figure 7: Super-high arch dam structure and the layout of the pendulum monitoring points.

Figure 9 :
Figure 9: Time histories of six radial displacements in the central block of the dam and the reservoir water level.
(2)pute the score-vector t  ( × 1) of the latent variable   of Φ  , as t  = K  u  /‖K  u  ‖, ‖t  ‖ = 1.Save the data in the matrices: T ← t  , U ← u  .(9)Set=+ 1, and return to step(2).Stop when  > , with  being the selected number of latent variables.Finally, we can obtain the score matrices T = [t 1 , . . ., t  ] and U = [u 1 , . . ., u  ] which are orthogonal by columns.On this basis, the regression coefficients matrix B in (1) can be obtained from Flow chart of super-high dam safety monitoring using OKPLS.

Table 1 :
Parameter setting of genetic algorithm.

Table 2 :
Mean square error of the MLR, PLS, KPLS, and OKPLS models.