Multifeatures Fusion and Nonlinear Dimension Reduction for Intelligent Bearing Condition Monitoring

. Condition-based maintenance is critical to reduce the costs of maintenance and improve the production efficiency. Data-driven method based on neural network (NN) is one of the most used models for mechanical components condition recognition. In this paper, we introduce a new bearing condition recognition method based on multifeatures extraction and deep neural network (DNN). First, the method calculates time domain, frequency domain, and time-frequency domain features to represent characteristic of vibration signals. Then the nonlinear dimension reduction algorithm based on deep learning is proposed to reduce the redundancy information. Finally, the top-layer classifier of deep neural network outputs the bearing condition. The proposed method is validated using experiment test-bed bearing vibration data. Meanwhile some comparative studies are performed; the results show the advantage of the proposed method in adaptive features selection and superior accuracy in bearing condition recognition.


Introduction
Bearing degradation is one of the most common fault sources in rotating machinery system.Unexpected bearing failure can lead to large costs of maintenance and loss of revenue.Traditionally, the maintenance activity is selected from preventive and corrective maintenance.However, preventive maintenance may typically involve the high maintenance costs, and corrective maintenance may reduce the productive efficiency [1,2].So condition-based maintenance based on condition monitoring is critical to assure safe and efficient operation of rotating machinery system [3].
Generally, among the various approaches of bearing condition monitoring, it can be classified into two categories: signal processing-based approaches and pattern recognitionbased approaches.In signal processing-based approaches, some mathematical or statistical operations are performed on the measurement signal and then the bearing condition is judged through the prior knowledge of human beings.The kurtosis coefficient is a major indicator of bearing performance degradation detection [4].The other statistical parameters such as the first-order moment (e.g., mean ), the second-order moment (e.g., variance  2 ), and Crest Factor are utilized in the study of bearing condition monitoring usually.The statistical analysis method is widely used for its simplicity and computation [5].Spectrum analysis is also effective in tracking fault of bearing [6].Spectral kurtosis in frequency domain is an effective tool to enhance the machinery fault features [7].For bearing condition monitoring and fault diagnosis, envelope spectrum analysis is used for diagnostics bearing where faults signal has an amplitude modulating effect on the characteristic frequencies [8].However, the spectral analysis method has no time scale information.In order to overcome that shortcoming, some new approaches based on time-frequency analysis are proposed for bearing faults detecting and localizing.Shi et al. [9] proposed a wavelet-based envelop spectrum method to detect the defect in rolling element bearings.
Meanwhile, pattern recognition-based condition monitoring is composed of two parts: feature extraction and pattern recognition.The signal processing-based feature extraction approach is a basic step in condition monitoring.Pattern recognition is another critical issue.Classification-based machine learning algorithms are commonly used in industry and academic fields for condition recognition [10].Support vector machines (SVMs) map the input data into highdimension space with different kernel functions to efficiently perform a nonlinear classification.Abbasion et al. [11] present a new method based on wavelet analysis (WP) and support vector machine (SVM) to diagnosis of multifault of bearing.Artificial neural networks (ANN) are able to learn expert knowledge through a representative set of data.So ANN is commonly adopted in automated detection of machine conditions [12].Bin et al. [13] extract the fault characteristics combined empirical mode decomposition and wavelet packet decomposition.Then BP neural network is taken to identify the model of bearing.In addition to BP neural network, fuzzy neural network [14], conditional random field [15], recurrent neural network [16], and radial basis function neural network [17] are also applied in bearing intelligent condition monitoring.Deep neural network (DNN) is a new kind of neural network architectures that attempts to abstract the high-level features from the raw signal through multilayer nonlinear transformation [18].Hinton and Salakhutdinov [19] point out that deep learning can convert high-dimensional data to lowdimension nonlinearly by training a multilayer neural network.Since then deep neural network has widely been used in speech recognition [20], image recognition [21], natural language processing [22], and some other classification and recognition applications.But recently, the bearing condition monitoring based on deep neural network almost cannot be found in academic and industrial.
In this paper, we propose a new method based on multifeatures fusion and DNN nonlinear dimension reduction to recognize bearing condition.The time domain features, frequency domain features, and time-frequency features are fused for integrating the bearing condition features.Those features are input into the deep neural network (DNN) as the input vector.Then the DNN model will extract the highlevel abstraction of input data and recognize the bearing condition.The remainder of this paper is organized as follows.The proposed method is presented in Section 2. Section 3 discusses the case study, where the proposed method is validated through real-world bearing vibration signal.Section 4 presents the conclusion and future work.

Condition Recognition Using the Proposed Method
The proposed method for bearing condition monitoring mainly includes three steps: feature extraction, feature dimension reduction, and condition recognition.The detailed process is as follows.
2.1.Feature Extraction.The presence of defect of machinery components can barely be determined from the raw acceleration signal.To get a better understanding of the raw vibration signal we tried to extract the time domain, frequency domain, and time-frequency domain features.
where  is the realization of the process that contains  points.And   is the data from frequency transform.In practice, the categories of failures are not known.Similar to the rolling bearing, the inner, outer races and the cage faults have the chance to appear.So the frequency signatures are difficult to obtain due to the fact that the fault may concern all the components of the test bearing at the same time.
We proposed a new condition indicator which is obtained by calculating the correlation coefficient between the two vibration signals where   and   are two spectral kurtosis vectors of size .  is the standard spectral kurtosis frequency distribution.  is the spectral kurtosis of current signal.And  and  are their corresponding means.

Time-Frequency Domain Features.
Wavelet transfer (WT) decomposes signal into time-frequency space and has the excellent performance to present the signal characteristic.However, WT just subdivides signal into low-frequency bands.To get more detailed signal frequency representation, the wavelet package transform (WPT) is performed.Through the WPT, a signal can be divided into high-frequency and low-frequency bands with the binary tree form.At each decomposition levels, the signal is divided into two mutual orthogonal subspaces: where  indicates the tree level and  is the node index in level .The dividing is operating until decomposition level ; then the process will produce 2  subspaces which are mutual orthogonal subspaces.The WP function   , () is mathematically expressed as below [24]: where  and  are the scale and translation parameters, respectively. is the oscillation parameter.So the WP coefficients   , are obtained by the inner product between the signal () and the WP functions   , , as below: From Figure 1, the main processing step includes layered pretraining and fine-tuning.In this paper, the Autoencoder DNN structure is applied to process the multidimensional feature data.

Pretraining.
During the pretraining stage, an Autoencoder network consists of three layers: the input layer, the hidden layer, and the output layer.Firstly, the input layer  is mapped into the hidden layer  with the equation where W is the weight matrix,  is the bias value, and  is a transfer function; usually it is nonlinear such as the sigmoid function.This processing is called encoder.Secondly, the decoder processing maps the hidden layer  into the output layer  with the equation: where the weight matrix W  is constrained to the transpose of the encoder mapping: W  = W  .The output value  is seen as the prediction of input  with the code value .So the network is optimized by minimizing the error between the values  and , such as () = ‖ − ‖ 2 .So the loss function can be described as The code layer (hidden layer)  has fewer nodes than the input and output layer.It can be seen as the compressed representation of the input layer.This is similar to the PCA data dimensionality reduction.But the Autoencoder is the nonlinear method.

Fine-Tuning.
The multihidden layers of the Autoencoder are unsupervised pretraining.Then those hidden layers are combined with the Softmax classifier to construct the deep neural network.The fine-tuning trains the whole neural network with the supervised learning method to improve the performance.During the fine-tuning, all layers are stacked into a single model, so all of the model parameters can be optimized.Backpropagation algorithm for fine-tuning Autoencoder is used.And the detailed equations and derivation process can be found in [25].and recognizes the bearing condition.As shown in Figure 2, the proposed method is described as follows.

Structure of the
Step 1. Preprocess the raw signal with normalization.
Step 2. Extract features of the input signal with time domain, frequency domain, and time-frequency domain.
Step 3. Smooth the features by moving average filter, and then normalize the filtered features.
Step 4. Pretrain the Autoencoder neural network with the features of training and testing set by unsupervised learning.
Step 5. Fine-tune the stacked Autoencoder neural network with the label training set.
Step 6. Estimate the bearing operation condition under the trained model.inner race is fixed through the shaft.The load part is produced by a force actuator (), which exerts an exceeding force of the bearing's maximum dynamic load on the testing bearing to reduce the bearing's life cycle.Two accelerometers (3035) are mounted horizontally and vertically on the housing of the test roller bearing to pick up the horizontal and the vertical accelerations as shown in Figure 3.The sampling frequency of the data acquisition is 25600 Hz and the vibration data provided by the two accelerometers are collected every 1 second [26,27].

Experiments Analysis and Discussion
The bearing degradation data consists of three categories: the first operating condition (1800 rpm and 4000 N), the second operating condition (1650 rpm and 4200 N), and the third operating condition (1500 rpm and 5000 N).Empirically, the bearing full life cycle can be divided into four    5(a)-5(h).From those figures, the features are almost trend but also with some impulse noise.Moving average filter is performed to smooth those features with an average size 15, determined empirically.The red curve is the filtered feature which is more smooth especially eliminating the influence of impulse.Normalization of features are conducted to transfer the features within the same scale.
For the spectrum kurtosis (SK) in frequency domain, Welch's estimate of the SK is employed to indicate the locations in the frequency domain of the raw signal [28,29].Set window length   = 2 8 , overlap length   =   * 3/4, and FFT length   = 2 *   .The length of raw signal is 2560; through (1), we can get the spectrum kurtosis  SK .Select mean of the first five  SK as the standard template of each vibration dataset and the correlation coefficient can be calculated by (2).Hence, Figure 5(i) is regarded as the bearing monitoring index from the frequency domain feature.As visual inspection, the correlation coefficient curves of spectrum kurtosis are of significant tendency and noiseless.Time-frequency domain features based on wavelet packet decomposition provide arbitrary time-frequency resolution of specified signal.From Figure 6(a), at the bearing initial operation stage, the frequency component is mainly focused on 400∼600 Hz.But with the increase of running time, there are certain changes that come out for the concentrated frequency band.At the end of bearing service, it moves down to 0∼200 Hz as shown in Figure 6(b).
Based on these observations, we quantify the spread of energy in the time-frequency plane using the sum of absolute amplitude.Over time, the different time-frequency distributions have different curve trends.From Figure 6(c), we separate the six frequency bands 0∼150 Hz, 151∼300 Hz, 301∼450 Hz, 451∼600 Hz, and 601∼800 Hz.For the severe oscillation of frequency band curve, the length of moving average filter is set to 21. Figure 6(d) is the filtered and normalized curves.Thus, eight time domain features, one frequency domain feature, and five time-frequency domain features for individual horizontal and vertical vibration signal are detected.They are smoothed by the average moving filter.And for improving the convergence speed of the regression algorithm, all the features are normalized.

Results and Discussion
. Through all above feature figures, not all of them have the same important degrees to identify the bearing condition.So before those features are fed into the classifier, we reduce their dimension through deep Autoencoder neural network.And it includes two stages: pretraining stage and fine-turning stage.The pretraining is an unsupervised learning process.So in this case, we use the feature data from training and testing set to train the Autoencoder networks.Thus in the first operation condition, there are 7534 samples from the six run-to-fail sets with different operation condition.In our experiment, the neuron's number of input layer is 28, the first hidden layer is 20, the second hidden layer is 5, and the output layer is 4.During the first hidden layer training, the input layer's neurons are 28, the hidden layer's neurons are 20, and the output layer's neurons are 28.Through (6)∼(8), the BP algorithm is used to train the Autoencoder network.When the training network arrives at the stop condition, the first hidden layer  1 can be got.Then the first hidden layer  1 is assigned to input layer for the second hidden layer training.The hidden layer's neurons are 5.Each layer is trained as a denoising Autoencoder by minimizing the reconstruction of its input.Once all layers are pretrained, the network goes through the second stage of training (fine-tuning).Fine-tuning is a supervised learning process where we want to minimize prediction error on a supervised task.A Softmax classifier layer is added to the top of the network.We then train the whole neural network to get the optimal parameters.And the transfer function of the last layer is tanh.The results are shown in Table 2.The degeneration stage works out the most accuracy rate with 98.2%.For comparison, PCA dimension reduction, SVM classification, and unpretraining method is applied to the dataset to test the classification rate.

PCA Dimension Reduction Method.
Principle component analysis (PCA) is an effective linear data dimension reduction method [30] and is often used for bearing fault diagnosis and classification.PCA has ability to extract the most significant representation features.In the experiment, 28 features are extracted in every sample.PCA is applied to the training feature sets, and just five most distinct principle components are accounted.Then the transformation matrix based on the training set is applied to the test signal.At last the compressed features are used for classification by BP neural network with three layers.The results can be found in Table 2.

SVM Classification
Method.Support vector machine (SVM) is a classical supervised learning model with the rigorous statistical learning theory.The kernel of SVM model is the construction of kernel function implicitly mapping the input data into the high-dimensional spaces.And SVMs have been found to be remarkably effective in many machinery fault diagnoses [31].In many bearing fault diagnosis applications [32], the RBF kernel obtains the significant classification accuracy rate.In the experiment, we mainly consider the RBF kernel.And the raw features input directly into SVM classification; the result is shown in Table 2.

The Influence of Unsupervised
Prelearning.An important improvement of deep learning is the unsupervised learning which does not need the label data to train the neural network [33].And the unsupervised learning can find the appropriate initial weights values which are helpful for optimizing the weights in nonlinear deep learning.For validating the significant of pretraining, the deep learning model is built without unsupervised prelearning; its result is shown in Table 2.The result indicates that the unsupervised prelearning can improve the recognition accurate rate of bearing condition.

The Influence of Different Features.
To investigate the influence of different features, different feature combination is analyzed.First, the features are divided into three categories: traditional features (Feature 1), traditional features and

Figure 1 :
Figure 1: The framework for deep learning.

Figure 2 :
Figure 2: The framework for bearing condition estimation.
Smooth and normalization of frequency energy

Table 1 :
Time domain features extraction.
[23]2.Frequency Domain Features.The signal spectrum contains rich condition information.The spectral kurtosis, indicating the presence of short transients and their frequency locations, is obtained by calculating the kurtosis from each frequency band.For a number  of -point realization at the frequency index , the spectral kurtosis is given as[23] levels: normal, early fault, degradation, and failure as shown in Figure 4.There are six run-to-fail datasets.And the number of different testing bearings are 2830, 871, 911, 797, 515, and 1637.So the total samples are 7534.Half of them (3767) are chosen randomly as the training samples.The other half are used as the testing samples.

Table 2 :
The results of different algorithms.Feature 2), traditional features, spectrum kurtosis, and WPT (Feature 3).Then those features' combination is reduced dimension and distinguished condition classification by deep learning model.The result can be found in Table3.As you can see, the proposed feature extraction method can improve the bearing condition recognition accuracy.And the spectrum kurtosis and WPT features are also helpful.

Table 3 :
The results of different features combination.Condition-based maintenance based on condition monitoring is critical to reduce the cost of maintenance and improve production efficiency of industry.In this paper, a novel method based on multifeatures extraction, deep neural network features dimensional reduction, and condition recognition is proposed.Time, frequency, and timefrequency domain features are fused to represent the characteristic of bearing operation condition.Then the nonlinear dimension reduction method based on deep learning is proposed to highlight the hidden patterns and to compress the information.At last, a classification in which the different bearing operation stage is identified is added to the top level of deep neural network.This condition monitoring system is validated with real-world vibration monitoring data collected from bearing.A comparative study is performed between the proposed method and PCA dimension reduction method, SVM classification method, unpretraining method, and three different features' combination.Those results demonstrate the advantage of the proposed method in achieving more accurate condition estimation.