An Adaptive Support Vector Regression Machine for the State Prognosis of Mechanical Systems

Due to the unsteady state evolution of mechanical systems, the time series of state indicators exhibits volatile behavior and staged characteristics. To model hidden trends and predict deterioration failure utilizing volatile state indicators, an adaptive support vector regression (ASVR)machine is proposed. InASVR, thewidth of an error-insensitive tube,which is a constant in the traditional support vector regression, is set as a variable determined by the transient distribution boundary of local regions in the training time series. Thus, the localized regions are obtained using a sliding time window, and their boundaries are defined by a robust measure known as the truncated range. Utilizing an adaptive error-insensitive tube, a stabilized tolerance level for noise is achieved, whether the time series occurs in low-volatility regions or in high-volatility regions. The proposed method is evaluated by vibrational data measured on descaling pumps.The results show that ASVR is capable of capturing the local trends of the volatile time series of state indicators and is superior to the standard support vector regression for state prediction.


Introduction
The state prognosis of mechanical systems is of critical importance in modern industry to prevent unexpected breakdowns, to improve machine availability, and to reduce maintenance costs.Generally, the working state of mechanical systems is represented by certain indicators, which are either acquired from monitoring devices or calculated from raw monitoring signals.The primary task of state prognosis is to estimate the actual development of the state by modeling the trend of state indicators.Then, the trend model can be extrapolated to predict the upcoming failure or estimate the remaining useful life.After an accurate prognosis is achieved, timely maintenance actions can be planned to avoid catastrophic failure.
Due to the unstable operating conditions and accidental disturbances, the time series of state indicators always exhibits random fluctuations, whether the monitored system works normally or not.Therefore, many intelligent methods have been proposed to extract hidden trends from the observed state indicators.Artificial neural network (ANN) is one of the widely used methods in the prognostics literatures [1].Gebraeel et al. [2] developed a set of feed-forward backpropagation networks to model the degradation process of rolling element bearings and to estimate the failure time of partially degraded bearings.Tse and Atherton [3] used a recurrent neural network to determine the trend in the monitoring values and to predict the value at the next time step.Because the trend is learned and memorized by neurons and network weights, ANN provides a nontransparent solution to state prognosis, or rather, the way in which forecast results are inferred by a trained network cannot be observed.Random coefficient models are another category of prognosis method for mechanical systems.In these models, the trend in the state indicators is predefined as a linear, polynomial, exponential, or any other functional form [4,5].Then, the coefficients, which include the deterministic functional coefficients and the stochastic noise coefficients, are jointly estimated with historical state indicators.Due to the requirements for system-specific trend knowledge, the applications of random coefficient models are greatly restricted.Nonparametric regression models, in which the trend needs not to take a predetermined form, overcome the barriers of prior knowledge and are also commonly used for state prognosis [6].Among this category of models, support vector regression (SVR) [7], which has good generalization ability even if training samples are not abundant, is the most widely accepted method.By training an SVR machine, the trend of the state indicators is represented as an explicit regression function and is easily extrapolated to obtain future values.The extrapolated values are used to prognose the state evolution.Generally, while the extrapolated values reach a failure threshold predefined by theoretical or experimental analysis, a prospective failure is deduced and the failure time is estimated [8].Therefore, SVR has been extensively studied to tackle the prognosis problem of mechanical systems or components, such as bearings, gears, and pumps [9][10][11][12].
The life of mechanical systems can be divided into a normal working stage and a deterioration stage [13].In the first stage, the state indicators are generally shown as a stationary time series.As initial defects emerge, the system steps into the deterioration stage, including unsteady evolution and abrupt changes.These nonstationary and transient phenomena are reflected in the time series of the state indicators.To trace the evolution of states, the regression model, which is capable of adapting to staged development and volatile state indicators, is required.However, the standard SVR seeks a globally optimized regression, in which the tolerance level for noise is fixed during the entire training time series.Therefore, it lacks the flexibility to capture the local trend of a data series with time-varying variance or staged characteristics.To improve the adaptability of volatile time series, several modified SVR machines, such as localized support vector regression (LSVR) [14] and piecewise support vector regression [15], have been developed and applied in the field of financial analysis.In this paper, a novel SVR machine, called adaptive support vector regression (ASVR), is proposed to model the trend of state indicators measured from mechanical systems.We will show that, by utilizing an adaptable error-insensitive tube, ASVR can provide satisfactory performance for regression and prediction while the system is in a deterioration stage.
The rest of the paper is organized as follows.We briefly introduce related studies of standard SVR and LSVR in Section 2. The methodologies of ASVR are described in Section 3. In Section 4, standard SVR, LSVR, and ASVR are applied to address the time series of vibration acquired from actual pumps.The regression and prediction results are compared and analyzed.Finally, conclusions and future work are discussed in Section 5.

Related Studies
2.1.Standard Support Vector Regression.Given a time series  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}, where   is the time tag,   is the corresponding value, and  is the number of data points, the goal of SVR is to find a function () that has at most  deviation from the actually obtained values   for all time tags while being as flat as possible [7].By mapping the time series into a high-dimensionality feature space, the regression function () has the linear form as where  denotes a weight coefficient, ( ) is the mapping function, and  is the bias.To obtain the optimal function (), a convex optimization problem is constructed as follows: In the constrained minimization problem, it is assumed that a function () exists for which all data points in time series lie in a tube determined by () ± . defines the width of the error-insensitive tube, or in other words, the precision of regression is .However, in many applications, it is preferred to accept a number of errors, which are caused by the data points outside the error-insensitive tube, to improve the generalization ability.Therefore, the concept of a soft margin is introduced, and the original optimization problem (2) is reformed as where  is a positive constant, known as the regularization parameter, which determines the trade-off between the flatness of () and the amount up to which deviations larger than  are tolerated.  and  *  are called slack variables and measure the deviation of |  −()| from the boundaries of the error-insensitive tube.Figure 1 provides a depiction of SVR with a soft margin.
The minimization problem denoted as ( 3) is called standard SVR.By the Lagrange multiplier method, this problem is transformed into its dual quadratic programming problem: where   and  *  are the Lagrange multipliers and (  ,   ) = (  ) ⋅ (  ) is a kernel function.After solving the dual quadratic programming problem, the regression function is formulated as According to the Karush-Kuhn-Tucker conditions, (  −  *  ) equals zero when the data point (  ,   ) lies in the error-insensitive tube.Therefore, the regression function is simply determined by the data points that are located on the boundary or outside of the error-insensitive tube.These data points, which support the definition of the regression function, are called support vectors (SVs).Any functions that satisfy Mercer's condition can be treated as kernel functions.In this study, the Gaussian radial basis function kernel is chosen as where  is the kernel parameter.
In standard SVR,  is a predetermined constant.A large  value provides a high noise-tolerance capability but may lose the local details of a trend, whereas a small  increases the precision of regression but results in a complex learning machine.Generally,  is chosen based on experiences.However, for a volatile time series, in particular for the state indicators of a mechanical system with staged characteristics, it is almost impossible to find a global optimal.It is more reasonable to set a narrow error-insensitive tube in the lowvolatility regions and a wide error-insensitive tube in the high-volatility regions.

Localized Support Vector Regression.
Many attempts have been made to adjust error-insensitive tubes based on the local characteristics of a time series.LSVR [14], which has explicit theoretical justifications, is a representative method of these attempts.In this modified SVR, a localized region centered at the th data point and with length 2 + 1, that is, ( − ,  − ), . . ., ( + ,  + ), is considered.The standard deviation of the data points in the regions is mapped into the high-dimensionality feature space as follows: where   = 1/(2 + 1) ∑  =− ( + ).Then, the constrained minimization problem of LSVR is defined as where   is an auxiliary variable that is determined by the upper bound of  ⋅    .The goal of LSVR can be interpreted as finding a regression function () by making the localized regions of function as low in volatility as possible while keeping the error as small as possible.By introducing the auxiliary variable   , LSVR can automatically adjust the errorinsensitive tube.If the th data point lies in a localized region with a larger standard deviation of noise, it will contribute to a larger  ⋅    or a larger tube width   .The wider errorinsensitive tube reduces the impact of the noise around the data point.Conversely, if the th data point is in a region with a smaller standard deviation of noise, it will play a greater role in the learning process of regression.In this way, the volatile noise of time series is flexibly tolerated, and the local trend of the time series is captured.
To avoid the explicit mapping operation,  is written as a linear combination of all training data points: and it is substituted into problem (8).The computation of the kernel function (  ,   ) is performed to substitute for the inner products of (  ) and (  ) in the high-dimensionality feature space.Finally, the kernelized LSVR is transformed into a second order cone programming (SOCP) problem and solved.The regression function is obtained in the following form: Compared with standard SVR, LSVR has two disadvantages: its high computational complexity and its inadequate ability for multistep extrapolation.Because the time

Sliding time window
Compute the truncated range Adaptive support vector regression complexity for solving the SCOP problem is an open issue, it largely restricts the computational efficiency of LSVR.
Additionally, the kernel operations within each localized region also increase the consumption of computational time.
On the other hand, the value of the regression function at a certain time  is largely determined by the training data points, which lie in the region neighboring .While the regression function is extrapolated in a multistep process, the further the extrapolated time   is from the current time   , the less the training data points support the region neighboring   .As a result, the extrapolated value will soon approach the bias  with an increase in the extrapolated steps.Therefore, LSVR is mainly applied to the regression and onestep prediction of volatile time series.

Adaptive Support Vector Regression
To capture the local trend of a volatile time series while retaining the advantages of standard SVR, we made the following assumptions to build an adaptive support vector regression.Based on these assumptions, a strategy for adaptively adjusting the error-insensitive tube is proposed.In this strategy, the constant  is replaced by the variable   determined by the distribution characteristics of the localized time series.Firstly, a sliding time window, which slides in the time axis, is adopted to continuously obtain the local regions of the training time series.For the th sliding step, the data points within the selected local region are denoted as   = { −+1 ,  −+2 , . . .,   }, where  is the length of the time window, and  = ,  + 1, . . ., .Because   does not always follow the normal distribution, the conventional measures of scale, such as the mean and variance, are not suitable for describing the statistical distribution of   .Thus, a robust measure, known as the truncated range, is utilized to define the distribution scope of   .It involves the calculation of the range after discarding given parts of the samples at the high and low ends and typically discarding an equal amount of both.This can be given as a percentage, but it is usually given as a fixed number of points to facilitate calculation.Suppose    = {  −+1 ,   −+2 , . . .,    } is the series of   in descending order, the upper bound of the truncated range is   − and the lower bound of the truncated range is   −+1+ , where  is the number of truncated data points.It has been verified that the truncated range is a robust estimator for mixed distributions and heavy-tailed distribution [16].Finally, to obtain a symmetric error-insensitive tube,   is calculated as follows: It is easy to know that   ≥ 0. When  <  especially,   can be set to   .Utilizing this strategy, the error-insensitive tube is adaptive to the variation of local margins, and a stabilized tolerance level for noise is achieved, whether the time series is in a low-volatility region or a high-volatility region.
With the introduction of   , the constrained minimization problem of ASVR is defined as This problem has the same form as that of standard SVR except for the adaptive width of the error-insensitive tube.Because   is precomputed, the dual problem of ( 12) has the same solving algorithm and computational complexity as does the quadratic programming problem (4).In each local region, a fixed number of points are excluded from the errorinsensitive tube.Therefore, the SVs supporting the regression function of ASVR would be more or less evenly distributed during the entire time series.This ensures that the local trend features of the volatile time series will not be omitted.The schematic diagram of ASVR is shown in Figure 2. When the regression function of ASVR is extrapolated, the current local region will exert a greater influence than the distant regions by adjusting the error-insensitive tube.This is favorable to improve the prediction accuracy for a volatile time series.

Experimental Verification
To demonstrate the effects of the proposed method on the state prognosis of mechanical systems, vibration signals are collected from centrifugal water-descaling pumps and processed.These descaling pumps, the function of which is to generate high-pressure and high-rate water flows to wipe away the oxide scale on a steel surface, are employed in a stainless steel plant and have an important influence on the surface quality of production.Due to working continuously under heavy loads, the bearings in water-descaling pumps are frequently damaged [17].To monitor the working state of the bearings, high-precision velocity sensors are mounted on the input end and output end of the descaling pumps to measure the vibration velocity signals.According to the intensity of the vibrational response, the measurement ranges of the inputend sensor and the output-end sensor are set to 0-20 mm/s and 0-50 mm/s, respectively.In this experimental research, the root mean square (RMS) is calculated from the vibrational signals and recorded at intervals of one hour to form the time series of state indicators.Because the behavior of the output-end vibration is more volatile than that of the inputend vibration, the RMS series monitored from the output end of the descaling pumps are chosen for the case studies.

Performance Analysis of the Regression. Figure 3(a)
depicts a time series of vibration RMS values acquired from the output end of a descaling pump.It is composed of 265 data points, which are used to indicate the evolution process of the working state from normal to failure.Before approximately 80 h, the descaling pump runs in the later period of the normal stage, and the distribution region of the RMS is narrow and stationary.Thereafter, the working state continues to deteriorate until the descaling pump is broken down by bearing damage.In this stage, the intensity of the vibration rapidly increases to a high level, and the RMS drastically changes in a wide range.Standard SVR, LSVR, and ASVR are used to model the trend of the volatile RMS series.For comparison, similar parameters are chosen and listed in Table 1.Regression curves solved by these three learning  and the margins of each error-insensitive tube are also drawn.Figure 3(b) shows the regression results of standard SVR.To obtain global optimization, a compromise is reached between the data points in the normal stage and those in the deterioration stage.As a result, most of the regression values in the normal stage are greater than the actual values, and the drastic fluctuations in the deterioration stage are excessively smoothed.For LSVR, the regression curve, as shown in Figure 3(c), exhibits a rough trend.Although the regression precision of LSVR may be superior to that of the other regression machines, many unnecessary local details are contained in the regression function.Therefore, it is too complicated for modeling the state evolution of the mechanical system.Figure 3(d) depicts the adaptive margins of the error-insensitive tube, that is, the truncated range, and the regression curve obtained by ASVR.In the normal stage, high precision of the regression is achieved with the help of the narrow error-insensitive tube.When the state enters the deterioration stage, the error-insensitive tube is expanded to adapt the high-volatility RMS.ASVR captures the local trend features of the volatile RMS series well and provides a more practical solution than LSVR for trending the state indicators of mechanical systems.
In our research, the algorithm of standard SVR is performed by a Matlab toolbox, SVM-KMToolbox [18].The ASVR algorithm is written based on the toolbox as well.
According to [14], the SOCP problem in LSVR is solved using the software package CVX [19].The regression results shown in Figures 3(b)-3(d) are obtained by running these algorithms on a PC with a 3 GHz Intel core processor and 2 GB of RAM.The average computational times are 0.188, 9.360, and 0.189 s for standard SVR, LSVR, and ASVR, respectively.The algorithm of ASVR has approximately the same computational speed as standard SVR and is suitable for dealing with the monitored state indicators.

Performance Analysis of the Prediction.
The RMS series indicating two other state evolution processes of the descaling pumps are applied to evaluate the prediction performance of our method.Due to the disadvantage of LSVR for multistep extrapolation, only standard SVR is used for comparison.The parameters listed in Table 1 are still chosen for the algorithms of standard SVR and ASVR.In general, the longer the prediction step is, the greater the prediction error is.In this case study, the prediction step is set as 5.This means that the time series of RMS are separated into two parts by the time point of five hours before breakdown.The previous part is used to train the regression model, and the later part is used to examine the prediction results.To quantify the prediction accuracy, two criteria, including the root mean square error (RMSE) and the mean absolute percentage error (MAPE), are introduced: The prediction results for these two descaling pumps are shown in Figures 4 and 5, respectively.It can be observed that the predicted values of ASVR conform better to the actual values than those of standard SVR, even though  the RMS series in the deterioration stage have dissimilar volatile behaviors.The error criteria are calculated and listed in Table 2. ASVR obtains better prediction performance than standard SVR.For example, the prediction MAPE on descaling pump number 2 utilizing standard SVR is 10.44%, whereas the prediction MAPE utilizing ASVR is only 8.75%.
In conclusion, ASVR provides a capability for predicting the state trend of mechanical systems with volatile time series of state indicators.The results of the proposed regression machine are superior to those of standard SVR.

Conclusion and Discussion
It is common for a deteriorating mechanical system to generate volatile time series of state indicators.Due to the fixed error-insensitive tube, the traditional support vector regression is ill-suited to modelling a nonstationary state trend.In this paper, an adaptive support vector regression machine is proposed to capture the local trend of volatile state indicators and to predict the deterioration behavior of mechanical systems.Compared with traditional SVR, ASVR has the significant characteristic that the error-insensitive tube is adaptively adjusted according to the transient distribution boundary of local regions in the training time series.Considering the nonnormality of the transient distribution, a truncated range is used to define the distribution scope of localized regions and calculate the width of the errorinsensitive tube.The experimental results demonstrate that ASVR has the same computational efficiency as standard SVR and provides a more practical solution than LSVR for trending state indicators.Moreover, the prediction accuracy of ASVR for volatile state indicators is higher than those for standard SVR.
However, the algorithm of ASVR on the entire training time series will be implemented again while new state indicators are obtained.For the data flow acquired from mechanical systems in long-term operation, the calculation is too huge to satisfy the trend analysis online.Thus, an incremental algorithm for ASVR is required to meet the further demand of state prognosis.Therefore, our research work will focus on this topic next.
(a) The width of the error-insensitive tube  determines the proportion of training data points excluded from the tube.(b) To maintain a stabilized exclusion proportion,  should keep pace with the variation of the margin of the volatile time series.(c) When the margin changes, the transient distribution of a local time series is not a normal distribution but a mixed distribution or a heavily tailed distribution.

Figure 3 :
Figure 3: Regression results of the vibration RMS of a descaling pump: (a) time series of the vibration RMS; (b) regression curve of standard SVR; (c) regression curve of LSVR; (d) regression curve of ASVR.

Figure 4 :
Figure 4: Prediction for the state of descaling pump number 1: (a) prediction results of standard SVR; (b) prediction results of ASVR.

Figure 5 :
Figure 5: Prediction for the state of descaling pump number 2: (a) prediction results of standard SVR; (b) prediction results of ASVR.

Table 2 :
Prediction errors of standard SVR and ASVR.