Priori Information Based Support Vector Regression and Its Applications

In order to extract the priori information (PI) provided by real monitored values of peak particle velocity (PPV) and increase the prediction accuracy of PPV, PI based support vector regression (SVR) is established. Firstly, to extract the PI provided bymonitored data from the aspect of mathematics, the probability density of PPV is estimated with ε-SVR. Secondly, in order to make full use of the PI about fluctuation of PPV between the maximal value and the minimal value in a certain period of time, probability density estimated with ε-SVR is incorporated into training data, and then the dimensionality of training data is increased. Thirdly, using the training data with a higher dimension, a method of predicting PPV called PI-ε-SVR is proposed. Finally, with the collected values of PPV induced by underwater blasting at Dajin Island in Taishan nuclear power station in China, contrastive experiments are made to show the effectiveness of the proposed method.


Introduction
Underwater blasting is a kind of construction method often used in engineering, such as hydraulic engineering, port engineering, bridge-building, and dam excavation.In engineering practice, vibration velocity of peak particle is the main basis to measure the influence of seismic wave caused by blasting on nearby buildings.However, vibration velocity of peak particle is influenced by many factors, which leads to a relatively low prediction accuracy.
Recently, Xie et al. [1] analyzed the characteristic parameters for blasting vibration with Sadovsky formula, and predicted PPV of blasting vibration.Yang et al. [2] predicted the PPV induced by underwater blasting at Dajin Island in the first phase of Taishan nuclear power station based on Sadovsky formula.However, Sadovsky formula [1,2] relies on a large amount of monitored data and only considers two parameters in regression analysis, and then it cannot reflect the influence caused by many complex factors in underwater blasting.Thus, T. N. Singh and V. Singh [3] made an attempt to predict the ground vibration using an artificial neural network (ANN) incorporating large number of parameters.Khandelwal and Singh [4] proposed a method to evaluate and predict the blast-induced ground vibration and frequency by incorporating rock properties, blast design, and explosive parameters using the ANN technique, and it was found that ANN was more accurate and able to predict the value of blast vibration without increasing error with the increasing number of inputs and nonlinearity among these.Furthermore, Liu et al. [5] introduced grey relational analysis to the prediction of PPV and proposed a genetic neural network model based on grey relational analysis.
However, there are many complex factors influencing PPV and the data are high-dimensional data; ANNs [6][7][8] have to face the problem of dimension disaster.And when the data are limited, ANNs are easy to fall into local minimum state.Thus, ANNs also cannot predict PPV accurately.Therefore, how to improve the prediction accuracy of PPV caused by underwater blasting vibration with the limited monitored data is still a problem worthy of study.
Support vector machines (SVMs), including support vector classifications (SVCs) and support vector regressions (SVRs), were proposed by Vapnik et al. [9,10] in the 1990s.SVMs focus on the statistical learning problems for small size samples by solving a convex quadratic optimization problem and can solve the local minimization problem which cannot be avoided by ANNs.SVMs use a kernel function to map the data in original space to a high-dimensional feature space and then solve the nonlinear decision problem in high-dimensional space.Thus, SVMs can successfully solve the problem of dimension disaster that an ANN cannot solve and have good generalization ability.However, standard SVMs focus on monitored data and cannot incorporate prior information into learning process, which may cause the generalization ability of standard SVMs to decrease.Therefore, Guan et al. [11] proposed a modified method that incorporated prior information into cancer classification based on gene expression data to improve accuracy.Zhang et al. [12] proposed a fully Bayesian methodology for generalized kernel mixed models, which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel.Liu and Xue [13] focused on designing a new class of kernels to incorporate fuzzy prior information into the training process of SVRs.Currently, SVMs have received extensive attention and are attracting more and more scholars to study from different views [14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29].
However, for the problem of PPV prediction in practice, the unknown probability density of PPV provides much PI about PPV.If we can propose a method to estimate the probability density with monitored data and incorporate it into PPV prediction, the prediction accuracy may be greatly improved.In practice, the measured PPV V at a fixed time  is the mean value V = (1/) ∑  =0 V  of many monitored values V  ( = 0, 1, . . ., ) during certain period of time Δ.Assume that V max = max  {V  } and V min = min  {V  }, and then the larger V max − V min is the larger fluctuation of PPV during the period of time Δ.Conversely, the smaller V max − V min is the smaller fluctuation of PPV.Nevertheless, the mean value V cannot provide the fluctuation information of PPV at all.Therefore, in order to increase the accuracy of PPV prediction, it is necessary to find a way to incorporate this PI about fluctuation into PPV prediction.Therefore, in order to increase the prediction accuracy of PPV, this paper focuses on proposing a new method of PPV prediction incorporating with PI.
This paper is structured as follows.Section 2 aims to estimate probability density of PPV with monitored data and -SVR, incorporate PI about the fluctuation of PPV into training data, and then establish prediction method for PPV based on priori information and -support vector regression (PI--SVR).Section 3 includes the contrastive experiments with real monitored data of PPV coming from Dajin Island in Taishan nuclear power station.Section 4 draws the conclusions and future directions.

Prediction Method of PPV Based on PI and SVR
In order to dredge much information from monitored data of PPV, here, based on the -SVR, we estimate the probability density f() of PPV with monitored data firstly.Then, we aim to establish a prediction method of PPV based on PI and -SVR.

Increasing the Dimensionality of Training Data with PI
about Fluctuation of PPV.Suppose that f() is the probability density estimated with the -SVR and  0 is the initial time and V 0 is the PPV at  0 (also denoted by ( 0 , V 0 )).In practice, PPV is often monitored many times  0 ( = 1, 2, . . ., ) for every certain period of time Δ (where  0 <  01 < ⋅ ⋅ ⋅ <  0 ≤  0 + Δ), and mean value V = (1/) ∑  =1 V 0 is output as the predicted PPV.In other words, PPV For example, if monitored PPVs V 0 ( = 1, 2, . . ., ) are with the same value during a certain period of time Δ, then the fact that "the mean value V is the PPV V 1 " holds with probability 1. Conversely, if the monitored PPVs V 0 ( = 1, 2, . . ., ) fluctuate wildly during a certain period of time Δ, then the fact that "the mean value V is the PPV V 1 " holds with a very low probability.
Hence, in order to incorporate this PI into the prediction of PPV, the training datum Remark 1.In fact, from (1) and ( 2) we can find that the larger V max − V min is, the smaller 1 −  1 becomes.That is to say, the possibility that "PPV at  1 is V 1 " is very small.On the other hand, the large V max − V min illustrates that PPVs from  0 to  1 fluctuate wildly and that "mean value V is PPV V 1 " holds with a low probability (namely, the possibility that "PPV at  1 is V 1 " is very low), which is in accordance with the information provided by 1 −  1 .Thus, 1 −  1 provides PI about the fluctuation of PPV during a certain period of time Δ.Therefore, training datum ) which contains PI provided by monitored data.
Then, the problem of predicting PPV is as follows.
In the next subsection, a method of predicting the PPVs from the aspect of mathematics is established.

Method of Predicting PPV Based on PI and Epsilon-SVR.
In order to solve the above problem of predicting PPV, based on the standard SVR, PI--SVR is constructed as follows.
Step 3. Construct and solve the convex quadratic programming problem: min We can obtain an optimal solution Step 4. Choose component   or  *  of vector a ( * ) in interval (0, ).If   is chosen, then If  *  is chosen, then Step 5. Construct the decision function with

Contrastive Experiments
In order to predict the PPV V() (cm/s) at  (minutes), we recorded the values of PPV induced by underwater blasting for 100 times with IDTS-3850 blast vibration recorder at Dajin Island in Taishan nuclear power station in China and collected 100 monitored datasets (or samples).Then, by the -SVR, the probability density () of PPV is estimated with the former 80 samples (the latter 20 samples are used for prediction) in Section 3.1.As the probability density ()
As the probability density () provides much PI about PPV, we incorporate it in the training data in the next subsection.

Increasing the Dimensionality of Training
are converted into where x  = (  , 1 −   ) ( = 1, . . ., 80).And data (12) are used to establish a model to predict PPV V 81 for the given x 81 .

Method of Predicting PPV Based on PI and Epsilon-SVR.
In order to predict PPV V 81 for the given x 81 with training data  (12), we make experiments with PI--SVR and standard -SVR, respectively.Here, a grid search method based on 5-fold cross-validation is chosen to determine model parameters,  ∈ {2 −10 , 2 −9 , . . ., 2 1 },  ∈ {2 −6 , 2 −5 , . . ., 2 6 }, and kernel function is a radial basis function (RBF).The experiment results are shown in Table 1.Predicted PPVs of training data with PI--SVR and standard -SVR are shown in Figures 2  and 3, respectively.Predicted PPVs of testing data with the two methods are shown in Figures 4 and 5, respectively.
Similar to the steps of predicting PPV V 81 for the given x 81 , we make the experiment 20 times to predict the PPV V  for the given x  ( = 81, 82, . . ., 100) (namely, the latter 20 monitored datasets); the monitored values and predicted values are shown in Table 2, and the average mean squared errors are shown in Table 3 (the numbers after ± are the standard deviations).

Results Analysis.
In the experiments, the optimal parameters are chosen via a grid search method based on 5fold cross-validation.From Table 1, we find that when the numbers of training datasets with PI--SVR and standard -SVR are the same, mean squared error of training data with PI--SVR is 0.0064, which is smaller than the corresponding one (0.0241) with standard -SVR.It illustrates that PI--SVR is more accurate than standard -SVR in predicting PPV V 81 .Figures 2 and 3 show the predicted PPVs of the 80 training datasets (x  , V  ) ( = 1, 2, . . ., 80) with PI--SVR and standard -SVR, respectively, and comparing the two figures we can find that the predicted PPVs with PI--SVR are closer to  the monitored PPVs than those with standard -SVR, which shows that the proposed PI--SVR method is more accurate than standard -SVR in predicting PPVs of the training data.
In Figures 4 and 5, we can find that the monitored PPV V 81 is 5.49 (cm/s), and the predicted PPVs V81 with PI--SVR and That is to say, the predicted PPV V81 with PI--SVR is closer to the monitored PPV V 81 than that with standard -SVR.Table 1 (the last column) shows that the mean squared errors of testing data with the two methods are 0.0438 and 0.0648, respectively.These illustrate that the proposed PI--SVR method is more accurate and effective than the standard -SVR in predicting PPV V 81 for the given x 81 .
In order to reduce the influence caused by randomness, we made the experiment 20 times to predict the PPV V  for the given x  ( = 81, 82, . . ., 100) (namely, the latter 20 monitored datasets); the real monitored values and predicted values are shown in Table 2 and the average mean squared errors and standard deviations are shown in Table 3.
From Table 2, we can see that most of the 20 predicted PPVs (numbers in the third column) V ( = 81, 82, . . ., 100) with PI--SVR are closer to the monitored PPVs (numbers in the second column) than those (numbers in the fourth column) obtained with standard -SVR, showing that the proposed PI--SVR method is more accurate than the standard -SVR in predicting the latter 20 PPVs.And also, we find that most of the 20 mean squared values (numbers in the fifth column) obtained with PI--SVR are smaller than the corresponding ones (numbers in the last column) with standard -SVR, showing that the PI--SVR method is more stable than the standard -SVR.
From Table 3, we find that the average mean squared error obtained with PI--SVR is 0.0117 which is smaller than that (0.0156) obtained with standard -SVR.This illustrates that PI--SVR method is more stable than standard -SVR method in predicting the latter 20 PPVs V  ( = 81, 82, . . ., 100).And also, the running time of PI--SVR is less than one minute, which shows that the model's running time can meet the needs of PPV prediction in application.
Through the experiments, we can find that the PPV prediction method incorporating priori information can achieve both high prediction accuracy and good stability compared to the prediction method without priori information.That is to say, incorporating PI into the prediction of PPV may be a good way of increasing the prediction accuracy.

Conclusions and Future Directions
In this paper, a method of estimating probability density of PPV with real monitored data is proposed, and we find the estimated probability density providing the PI about fluctuation of PPV between the maximal value and the minimal value in a certain period of time.Then, the PI provided by estimated probability density is incorporated into training data.After that, PI--SVR method for predicting PPV is proposed.In Table 2, experiment results, including 20 predicted values and 20 mean squared errors, show that the proposed PI--SVR is more accurate in the prediction of PPV than the standard -SVR.In Table 3, the average mean squared errors of PI--SVR and standard -SVR are 0.0117 and 0.0156, respectively, and average variances are 0.0177 and 0.0199, respectively, which show that the PI--SVR is more stable than the standard -SVR.Therefore, incorporating PI into the prediction of PPV may be a good way of increasing the prediction accuracy.
And also some other factors, such as water pressure, blast design, geotechnical properties, and explosive parameters, are also impacting the prediction of PPVs.If the PI about these factors can be incorporated into the prediction of PPVs, the prediction accuracy may be further improved.Therefore, establishing a method including PI from the aspects of both monitored data and engineering practice is one of our research directions in the future.

Figure 1 :
Figure 1: Estimated probability density of PPV with monitored data.
Data with PI about Fluctuation of PPV.Set V max  = max  {V(  )} and V min  = min  {V(  )}.By the estimated probability density f() and (1), probability   = {V min  ≤ V  ≤ V max  } can be calculated.Then, according to training set (4), monitored data

Figure 2 :
Figure 2: The blue curve -is obtained with the 80 monitored PPVs of training data; the red curve ---is obtained with the predicted PPVs of training data with PI--SVR.

Figure 3 :
Figure 3: The blue curve -is obtained with the 80 monitored PPVs of training datasets; the red curve ---is obtained with the predicted PPVs of training data with standard -SVR.

Figure 4 :
Figure 4: The blue star * is the monitored PPV V 81 ; the red five-point star f is the predicted PPV V81 with PI--SVR.

Figure 5 :
Figure 5: Blue star * is the real monitored PPV V 81 ; red five-point star f is the predicted PPV V81 with standard -SVR.

Table 1 :
Experiment results with PI--SVR and standard -SVR.

Table 3 :
Average mean squared errors.