Data Amplification for Bearing Remaining Useful Life Prediction Based on Generative Adversarial Network

To deal with the di ﬃ culty in bearing remaining useful life prediction caused by the lack of history data, a data ampli ﬁ cation method based on the generative adversarial network (GAN) is proposed in this paper, and the parameters of generator and discriminator in the GAN are determined by grid search algorithm. The proposed method is veri ﬁ ed by the XJTU-SY bearing data sets from Xi ’ an Jiaotong University. First, 15 time-domain features related to the bearing life are extracted as the training data of the GAN to generate virtual data that can be used to build bearing life prediction models. Then, support vector regression and the radial basis function neural network are used to construct the bearing prognostic model based on real data, generated data, and mixed data. The results show that the proposed method can make up for the de ﬁ ciency of data and improve the accuracy of bearing remaining useful life prediction.


Introduction
Bearings are extremely important components in rotating machinery, and precisely predicting their remaining life is of vital significance for improving the reliability and safety of mechanical systems. It can assist engineers to take reasonable measures and reduce economic losses, which thus has been attracting the attention of more and more researchers. The data acquisition of bearing vibration signals requires huge amount of economic and time costs, so the full life cycle data for bearing life prediction is limited. It greatly restricted the development and application of bearing life prediction methods. Generative adversarial networks (GAN), using an unsupervised learning method for training, are equipped with powerful capabilities of data generating. It can be widely used in both semisupervised and unsupervised learning without complex Markov chains. Compared with all other models, GAN can produce clearer and more realis-tic samples, and it has been successfully applied in many fields. Ledig et al. used GAN for image super-resolution and implemented the first framework competent in inferring realistic natural images from original ones accordingly for an upscaling factor of 4 [1]. Moreover, Bai et al. used GAN to directly generate faces in high resolution based on blurred small ones to solve the problems of insufficient information and ambiguous features caused by small sizes in face detection technology [2]. GAN were originally created to solve image problems, yet image model training requires a large amount of data sets, which will be quite costly if operations of collecting and labeling are performed by human beings entirely, whereas GAN are capable of generating data sets by themselves so that it can provide low-cost training data. GAN are applied to solve tricky puzzles of stock market forecasting, order processing, image generation, semantic segmentation, health care, privacy protection, etc. according to references. Zhang et al. proposed a novel adversarial network architecture for stock market forecasting using multilayer perceptrons as discriminators as well as long-term and short-term memory as generators for predicting stock closing prices [3]. Kumar et al. proposed a kind of GAN for orders on e-commerce websites to explore and process all ambiguous orders [4]. Tirupattur et al. take advantage of the malleability of adversarial learning by designing a conditional GAN, taking the encoded EEG signal as input, and generating the corresponding image [5]. Gecer et al. rebuilt facial texture and shape from a single image by GAN and deep convolutional neural networks (CNN), in which GAN is utilized for training a very powerful generator for facial texture in ultraviolet space [6]. Souly et al. used GAN for semisupervised semantic segmentation and proposed a semisupervised framework to force real samples to be close to the feature space by adding large fake visual data [7]. Goel et al. realized automatic screening for the coronavirus, using an optimized GAN, able to generate more CT images [8]. Liu et al. applied GAN to privacy protection, adding designed noise in model learning to make privacy differentiation, and to improving model stability and compatibility by controlling loss of privacy [9]. Pascual et al. mainly used GAN to learn complex functions from a large number of data sets for speech enhancement [10]. This paper mainly illuminates the model of extracting 15 time-domain features from a large amount of bearing vibration data to expand the dimension of data and optimizing life prediction, which exploit GAN, considered as the superior one of most approaches in many cases, to accomplish the data extension duty.
For life prediction of bearings, experts at home and abroad have carried out a lot of research on it and achieved certain results. Lu et al., who conducted research of the relationship between bearing clearance and load distribution under interference fit, studied the effect of bearing installation dimensional accuracy and surface machining accuracy of surrounding structural components on fatigue life and bearing load-carrying characteristics by establishing the model of the low-speed spindle drive system of the fan [11]. Shen et al. proposed a new method for predicting remaining life based on relative characteristics and multivariable support vector machines (SVMs). Shen et al. proposed a new method for predicting remaining life based on relative features and multivariate support vector machines [12]. This method evaluates the decline rule of bearing performance, which is not affected by individual bearing difference. The correlation analysis is used to select sensitive features as input to construct a model that combines the dual advantages of multivariable regression and small sample prediction to predict the remaining life of bearings. Although the characteristics and methods are different, the idea is very similar to the general idea of this paper. Aiming at the shortcomings of the traditional life prediction method, that is, the inability to predict the life of space rolling bearings, Dong studied and used the support vector machine method. The phase space reconstruction method is used to select the input parameters of the support vector machine, the particle swarm algorithm is used to optimize the internal parameters of the support vector machine, and a degradation trend prediction model based on the optimized parameters is established to predict the degradation trend and remaining life of space rolling bearings [13]. However, our research is to establish a prediction model based on extended dimension and quantity of parameters, which has the same purpose. In the prediction of the remaining life of bearings, neural networks have been widely used, of which one typical method is long short-term memory (LSTM) [14][15][16]. But due to the computational complexity of this method being large and time-consuming, this paper uses two other methods: the support vector regression (SVR) method and the radial basis function neural network (RBFNN) method.
In order to solve the problems of difficulty in establishing the model of bearing life prediction and low accuracy due to insufficient historical data for the input of the life prediction model, a bearing life prediction method based on condition monitoring data is proposed in this paper. First, multiple time-domain features of vibration signals related to bearing life are extracted and used as training data for GAN; then, the training data is input into GAN, and adversarial optimization training is performed; then, the generated virtual time-domain feature data is used for life. Based on the generated data for prediction, finally, the prediction of remaining life is carried out by two methods, SVR and RBFNN.

Generative Adversarial Network
2.1. Generative Adversarial Network Theory. The generative adversarial network is a typical generating algorithm model. GAN's idea is inspired by the two-man zero-sum game in game theory. It has two modules called the Generator and the Discriminator. They can learn from each other to produce better and better output [17]. The basic framework is shown in Figure 1.
The generator receives a random noise, through which new samples are generated. The discriminator network is a binary discriminator. Training samples and generated samples are taken as inputs to distinguish whether the current input is from the training sample or the generated sample, so as to judge the generation quality of the current generator. When entering the training sample data, the expected output of the discriminator is true. When entering the generated sample data, the expected output of the discriminator is false. The generator needs to make the discriminator expected output true as much as possible, making it consistent with the performance of the training sample, thus forming an anticompetition relationship. The optimization process of alternating training between two models can be regarded as a minimax game problem. Through the adversarial learning mechanism, the performance of the discriminator and the generator is continuously improved. After much training, the discriminator and generator can reach a balance, which is known as the Nash equilibrium. After the GAN training is completed, the generator can well estimate the distribution of training samples and generate new data consistent with the distribution of training samples, so as 2 Wireless Communications and Mobile Computing to achieve the purpose of expanding the data [18]. Its objective function is shown in the following formula: In the formula, Eð * Þ represents the expected value of the distribution function, P data ðxÞ represents the distribution of the real samples, and P noise ðzÞ is defined in the lowdimensional noise distribution. By mapping the generator with the parameter θ g to the high-dimensional data space, we can get P g = Gðz, θ g Þ.

Optimization of Generative Adversarial Networks.
To optimize both networks to generate generated samples that are as consistent as the training sample distribution, if the generator and discriminator are optimized simultaneously, the implementation principle is complex and most likely not the desired effect. Therefore, the discriminator and generator are optimized alternately: First, fix the generator, so that the discrimination accuracy of the discriminator reaches the maximum, that is, the most accurate discrimination of the discriminator. Then, the discriminator is fixed, so that the minimum discrimination accuracy of the discriminator discrimination is the most inaccurate; that is, the generator generation accuracy is the largest. During the training process, both are constantly optimized to improve the accuracy of their respective model fault identification and parameter generation ability, until the discriminator and generator reach the Nash equilibrium, and finally complete the GAN training.
The grid search method is an exhaustive search method for specifying parameter values. By optimizing the parameters of the estimation function with cross-validation methods, we can get the optimal learning algorithm. The grid search method is used to find a better generator and discriminator node number setting, which can not only optimize the neural network parameters of the neural network as much as possible, but also make the training number and time as little as possible. While improving the quality, it also improves the training efficiency of GAN.

Feature Extraction.
We used the data collected from the Xi'an Jiaotong University experiment-XJTU-SY Rolling Bearing Acceleration Life Test Data Set (data set 1: 35 Hz, 12 kN, bearing 1_1; data set 2: 37.5 Hz, 11 kN, bearing 2_ 1). It collects 32,769 data points per minute, and for the full life cycle, data set 1 was collected for 123 min and data set 2 was collected for 491 min. Select the vertical direction of the vibration signal therein. In addition, because each bearing is trained separately, the accuracy of the bearing life prediction results has no direct relationship with the vibration frequency and force of the bearing and has nothing to do with whether it is the same bearing or not. For a segment of the vibration signal x = ½x min , x max , ⋯, L, the 15 time-domain features are calculated using the following formula: x min = min x n ð Þ, x n j j, In the formula, x min is the minimum, x max is the maximum, x is the mean, j xj is the absolute average, δ is the variance, σ x is the standard deviation, S is skewed, K is steep, W is the waveform index, x r is the root amplitude, x rms is the root mean square, C is the peak indicator, I is the index of pulses, and L is the margin index.

Bearing Life Prediction Method
3.1. Support Vector Regression. In traditional regression models, such as the simplest linear regression, the loss is calculated by the difference between the f ðxÞ of the model output and the y-value of the real output. Support vector regression (SVR) [19] assumes that the model can tolerate an eps size deviation between the output f ðxÞ and the real y-value. It means that as long as the predicted value of the sample falls on the interval band where the absolute value difference between the f ðxÞ sides in the y-axis direction is less than the eps, the prediction is correct. If a sample falls 3 Wireless Communications and Mobile Computing into the interval band, the sample that falls into the interval band does not calculate a loss; that is, only the support vector has an impact on its function model. By minimizing the total loss and maximizing the intervals, we can get the optimized model, as shown in Figure 2.
Compared to traditional regression, SVR has the advantages of low generalization error, low computational complexity, and ease of interpretation and can solve highdimensional problems. Therefore, this paper uses the SVR method for life prediction.
In the SVR algorithm, the kernel function adds new features through feature transformation, making the linearly inseparable problem in low dimensional space transform to the linear separable problem in high dimensional space. Thus, the choice of the appropriate kernel function has a very large impact on the regressiveness of the SVM and the final results [20].

Life Span Prediction Method Based on Support Vector
Regression. First, the sample data is divided into the training sample data and test sample data. Then, the SVR model is trained using the training sample data, so that the model has good parameters. Finally, the test sample data is fed into the SVR model already trained to obtain the results of predicted life.
During the implementation, most of the sample data was taken out as the training data. The SVR model is trained with the training data first. After the training is completed, the test data is imported, and the sliding window is adopted to predict the value of the next moment, so as to achieve the purpose of life prediction.

Radial Basis Function Neural Network.
With regard to the radial basis function neural network (RBFNN) [21], it has three layers: the first layer is the input layer, the second layer is the hidden layer, and the third layer is the output layer. The weights between the input layer and the hidden layer are all 1. The neurons of the activation function of the hidden layer are radial basis functions. Radial basis function is a real-valued function whose value depends only on the distance from the origin and is commonly used by a Gaussian radial basis function [22]. The hidden layer and the output layer is the connection relationship of ordinary neural networks, and the weight between them can be changed through training. The linear output layer weights the output of the hidden layer nodes, and the number of neurons in the linear output layer is the same as the output vector dimension [21]. Radial basis function neural network is an efficient feed forward neural network with global features and the best approximation performance. Its training speed     Wireless Communications and Mobile Computing is fast with a simple structure. So this method is also used to predict the life of bearings. The role of the RBFNN hidden layer is to nonlinear transform the input vector, transforming the sample point from the input space to a high-dimensional feature space. Use a linear model in the feature space to model the training samples, or make the training sample linearly separable in the high-dimensional feature space. Figure 3 shows the topology of the radial basis function neural network. RBFNN has been demonstrated by experts in the related fields that its radial basis function has the best approximation performance. As long as there are enough hidden layer nodes, we can approximate the multivariate nonlinear continuous function with arbitrary precision. At present, RBFNN has been widely used in the fields of information processing, fault diagnosis, physical modeling, judgment and recognition, and image processing.

Life Span Prediction Methods Based on Radial Basis
Function Neural Networks. The key to RBFNN lies in the determination of the radial basis function. The function value of a point is only related to the distance of the point from the central point, so the position of the center point, the choice of radial basis width, and the number of radial basis functions will all affect the effect of the neural network. This paper determines the center of the radial basis function by the orthogonal least square method. Its training method is the same as the ordinary neural networks. The difference is that the ordinary neural network only trains the weights between the neural networks, while the RBFNN also trains the activation function. The hidden layer uses the radial basis function as the neuron of the activation function, and the connection between the hidden layer and the output layer is identical with that of ordinary neural networks; that is, the weight between them can be changed by training. The width vector affects the range of action of the neuron on the input information: the smaller the width, the narrower the shape of the action function of the corresponding hidden layer neuron. Determination of width σ: it is artificially stipulated that σ = d max / ffiffiffiffiffiffi ffi 2m p , where d max is the maximum distance between the centers, and m is the number of hidden layer nodes, that is, the number of basis functions.

Wireless Communications and Mobile Computing
nodes. The input is 15-dimensional raw data, and the output is 5 * 15-dimensional generated data, which is used for data dimension expansion to make life prediction better. First, initialize the generator and discriminator. During each iteration, the generator is fixed first, and only the parameters of the discriminator are updated. 90% of the data from the original data set and 90% from the output of the generator are selected, which means that the discriminator is prepared with two sorts of inputs. The discriminator's goal of learning is that if the input is from the real data set, the output is 1; if it is the data generated by the generator, the output is 0, which can be regarded as a regression problem.     Wireless Communications and Mobile Computing Next, fix the parameters of the discriminator and update the generator. Input the original data into the generator, get an output, throw the output into the discriminator, and then get a decimal. The generator needs to adjust its parameters to make the output decimal as large as possible, which is because larger means better.

Life Prediction
Using SVR. Three groups of prediction experiments were carried out: using real raw data to predict life, using generated data to predict life and using mixed data to predict life. In experiment 1, 80% of the real original data (1 * 15 dimensions) were evenly extracted as training sample data. In experiment 2, 80% of the evenly spaced samples from the generated data (5 * 15 dimensions) were extracted as training sample data. In experiment 3, evenly spaced 80% of the real original data and generated data (6 * 15 dimensions) were extracted as training data [23]. The SVR model was trained with the above three sets of training data. After the training was completed, the test sample data was imported to predict the value at the next moment using the sliding window to extract the features, so that the life prediction could be carried out.
In order to quantitatively measure the effect of the proposed method, the mean absolute error (MAE) and root mean square error (RMSE) of the three predictions were calculated and compared. MAE can represent the mean of the absolute error between the simulated values and the observed values; all individual differences were equally    weighted on the mean. RMSE is used to measure the deviation between the observed value and its simulated values. MAE is the simplest and most easily explained evaluation index, which reflects the real error. RMSE has the same scale as MAE, but after we find the results, we will find that RMSE is somewhat larger than MAE. This is because RMSE is squared first and then square, it actually amplifies the gap between the larger errors. Therefore, the smaller the value of RMSE, the smaller the significance in the measurement, because its value reflects that its maximum error is also     (3) and (4), respectively: The smaller the RMSE, the better the fitting effect; similarly, the smaller the MAE, the better the fitting effect.   Because less data is collected in Table 1, the life prediction lacks accuracy. Using the generated adversarial network to amplify the data according to the distribution of the raw data, which greatly saves the experimental cost and time cost. And the prediction results of generated data and mixed data are significantly better than those of real data, which verifies the effectiveness of this method.
Life prediction using the method of SVR is performed in Table 2, and the results are shown in Figures 7-9.
There is much data in data set 2, and the life prediction results of SVR show that the prediction quality of the generated data and the mixed data is not much different from the real data, which can also show that the generated adversarial network has a strong ability to generate data.

Radial Basis Function Neural Network Lifetime
Prediction. In the experiment, the raw sample data were normalized, and 15 vibration time-domain signal characteristic parameters were divided into two groups. Taking the first 80% of the data as a training sample of the RBFNN prediction model, and being used to determine the model structure and the training network parameters, real data were used as test samples to test the model prediction accuracy and divided into three groups of experiments. Experiment 1: take 80% of the evenly spaced extraction in the real raw data (1 * 15 dimensions) as the training sample data. Experiment 2: 80% of the evenly spaced extraction in the generated data (5 * 15 dimensions) was used as the training sample data. Experiment 3 : 80% of the evenly spaced extraction from real raw and generated data (6 * 15 dimensions) was used as training data [23]. Set the dimension of the input array of   RBFNN prediction model is M = 15, and the dimension of the output array is N = 1. The RMSE (root mean square error) and MAE (average absolute error) of the three predictions are calculated for quantitative comparison. Life prediction is performed on data set 1 using the method of RBFNN, and the results are shown in Figures 10-12. Table 3 lacks data, so the data dimension can be expanded by using generating adversarial network methods, which greatly improves the accuracy of data life prediction.
When we performed life prediction on data 2 with the method of RBFNN, the results are found in the following table. Table 4 has a lot of data, and the prediction quality of the generated data and the mixed data is not much different from the real data, which can reflect that the generated adversarial network can generate the generated data close to the original distribution, and can also reflect that the generated adversarial network has a strong ability to generate data. To sum up, we can see that the life prediction of SVR is better than that of RBFNN.

Conclusion
(1) When the lack of raw data leads to inaccurate life prediction results, the use of generated adversarial network for data amplification can obviously optimize the results of model life prediction, so that the model prediction life of mixed data and generated data is more accurate than that of real data. This result shows that the proposed method can compensate for the data deficiency and improve the accuracy of bearing residual life prediction (2) Using the vertical vibration signal of these two data sets, the vibration signal is extracted into 15 timedomain features as the GAN training data, and the support vector regression and radial basis function neural network are used to predict the bearing life; the support vector regression method is better than the method of radial basis function neural network

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflicts of interest.