Motor Fault Diagnosis Algorithm Based on Wavelet and Attention Mechanism

In order to improve the maintenance efficiency of the motor and realize the real-time fault diagnosis function of the motor, a motor fault diagnosis algorithm based on wavelet and attention mechanism is proposed. Firstly, the motor vibration signal is decomposed by wavelet transform, and the high-frequency signal is denoised to improve the signal-to-noise ratio. Secondly, the frequency band and time dimension after wavelet decomposition are taken as input data, the convolution neural network is used to fuse the frequency band features of data, and the bidirectional gated loop unit is used to fuse the time series features. Then, the attention mechanism is used to adaptively integrate the features of different time points. Finally, motor fault diagnosis and prediction are realized by classifier recognition. Experimental results show that, compared with the existing deep learning fault diagnosis model, this method has higher diagnosis accuracy and can accurately diagnose the running state of the motor.


Introduction
Motor is the most commonly used device to drive all kinds of machinery and industrial equipment. In order to ensure the safe and stable operation of the system, staff often needs to carry out regular overhaul and maintenance of the system [1]. However, this manual method not only requires a lot of manpower, material, and financial resources but also ensures that the fault can be eliminated in time. With the development of computer technology, large-and medium-sized motors are equipped with computer-centered condition monitoring and fault diagnosis system, in order to prevent or repair the functional failure or local failure before the failure, minimize the loss, and prevent the occurrence of catastrophic accidents [2].
The motor system is composed of multiple functional modules, and its faults are often uncertain and nonlinear. In the motor system, any small deviation of function module may lead to the collapse of the whole system. Therefore, it is particularly important to find a reasonable and effective fault diagnosis method. The common faults of large-and medium-sized motors include mass imbalance of moving parts, bearing wear, improper matching of moving parts, oil film oscillation, shaft misalignment, and cracks. The characteristics of these faults are most obvious in the frequency of vibration signal. Therefore, correlation analysis and spectrum analysis are usually used to monitor and diagnose the state of motor vibration signal. These methods usually deal with the data first, extract the characteristic parameters, and then refer to some fault diagnosis specifications [3].
At present, motor fault diagnosis methods mainly include quantitative analysis and qualitative analysis. Quantitative analysis methods include analytical model-based method and data-driven method. At present, the fault characteristic signal processing of rotor and bearing in the motor system usually uses the data driven by quantitative analysis method. In reference [4], the Prony method is combined with band selection and thinning technology to amplify and analyze the small frequency band near the characteristic frequency of broken rotor bars, which improves the calculation accuracy. However, in the low noise background, the diagnosis effect is not ideal. In reference [5], a rotor fault diagnosis method based on improved matrix beam filtering and detection is proposed. The improved matrix beam algorithm can effectively overcome the influence of noise and has better resolution for short-term data. In reference [6], the Park vector mode transformation technique and rotation invariant signal parameter estimation technique are proposed. The Park vector mode transformation technology is used to eliminate the influence of power frequency components and reduce the amount of calculation on the premise of ensuring the accuracy. In reference [7], a fault diagnosis technology of motor bearing based on wavelet transform and Hilbert transform is proposed. Hilbert transform is used to demodulate the modulation signal, and the effective analysis of fault characteristic signal is realized. At the same time, this method introduces the thselect function into the wavelet transform and uses the adaptive ability of this function to adjust the threshold under different noise conditions. It can eliminate the high-frequency signal and improve the adaptive ability of the algorithm. However, this method ignores the noise interference of low-frequency signal, which will still lead to misjudgment and omission. Although the quantitative analysis method based on data can carry out fault analysis according to characteristic components, the accuracy of fault detection is often affected by interference signals. Therefore, the quantitative analysis method is difficult to meet the requirements of motor system fault diagnosis.
Qualitative analysis is a method of building system model based on internal knowledge of the system [8], mainly including graph theory method, expert system, and qualitative simulation. Petri net is a graph theory method, which uses the relationship between the components in the target system to establish a directed graph combination model and accurately deals with the relationship among the sequence, concurrency, and conflict of discrete events [9]. In recent years, with the research of scholars at home and abroad, the application of Petri net in motor system fault detection is gradually increasing and has achieved good results. Aiming at the uncertainty in the process of fault propagation, a probabilistic transition method of fault Petri nets is defined in reference [10], which solves the shortcoming that the fault diagnosis process of Petri nets only focuses on the state of the repository. But the way of probability change is not adaptive. In reference [11], a colored Petri net model is established by using colored Petri nets and modelling tools cpntools, which improves the efficiency of fault diagnosis and the intuitiveness of the model. In reference [12], Petri net and fuzzy reasoning are combined to solve some stochastic and uncertain fault problems effectively. It greatly improves the ability to locate the fault source in fault diagnosis. In reference [13], fuzzy Petri net (FPN) carries out strict mathematical reasoning and gives the reasoning process of matrix. However, the acquisition of its weight still depends on the experience of experts, and its adaptability is poor.
In the practical use of the motor system, the components of motor vibration signal are more complex. At the same time, due to the fluctuation of working conditions, the vibration signal has strong time varying, and the signal characteristics at different times are very different. Therefore, aiming at the characteristics of complex frequency information and strong time varying of motor vibration signal, a motor fault diagnosis algorithm based on wavelet and attention mechanism is proposed. The wavelet denoising method is used to remove the original signal noise and improve the signal-tonoise ratio. Convolution neural network and bidirectional gating loop unit are used to fuse time-frequency information. The attention mechanism module is used to dynamically weigh the features of different time points in the time dimension, so as to more accurately predict and diagnose motor faults.

Method
2.1. Principle of Motor Bearing Vibration. The fault frequency of motor bearing detection is divided into high-frequency detection and low-frequency detection. When a part of motor bearing fails, pulse signal is generated due to periodic motion. Most of the low-frequency signals are generated by the rotation of the fault parts, while the high-frequency signals are generated by the natural frequencies of other parts when the bearing fails. The natural frequency range of bearing components is 1 kHz~10 kHz, and its level is mainly affected by material, structure, processing technology, assembly, and other factors.
The natural frequency of bearing rolling element is The natural frequency of the inner and outer ring of the bearing is where H is the modulus of elasticity. g is the acceleration of gravity. r is the radius of the steel ball. γ is the material density. I is the moment of inertia of the cross section of the bearing ring. m is the vibration order. A is the crosssectional area of the bearing ring. d a is the neutral shaft diameter of bearing cross section.
When the outer ring of the bearing fails, its characteristic frequency is When there is bearing inner ring failure, its characteristic frequency is where s is the inner ring speed. d is the diameter of roller. N is the number of rollers. D is the diameter of bearing pitch circle. α is the contact angle.

2
Journal of Sensors

Wavelet Transform and Noise
Reduction. Wavelet transform is a relatively new mathematical theory, which has outstanding characteristics of time-frequency localization. The wavelet transform is an integral transform, while the inner product of the continuous wavelet transform f ðtÞ and the stretching coefficient of the wavelet function is where wherepðtÞis the parent wavelet, which is the normalized coefficient. y and x are the translation and stretching factors of the wavelet function, respectively. The resolution of the local structure can be changed by adjusting the translation factor and the stretching factor. The excitation function of the hidden layer of the wavelet neural network [14] is the wavelet basis function, which is obtained by translating and stretching the parent wavelet function.
The principle of wavelet transform denoising is to decompose the collected signal to obtain low-frequency signal and high-frequency signal. The noise is generally contained in the obtained high-frequency signal so that the threshold value can be set to filter out the noise and the remaining signal can be reconstructed to achieve noise reduction.

Fusion of Time-Frequency Features.
According to the characteristics of complex and time-varying frequency information of motor vibration signal, convolution neural network and bidirectional gating loop unit are used to fuse the time-frequency information. Then, the attention mechanism is used to dynamically weigh the time-varying characteristics, and the fault diagnosis of planetary gearbox is realized.
Wavelet packet is another important development of wavelet theory in signal processing. It has the characteristics of multidimensional and multiresolution analysis and can provide more complex signal analysis methods [15]. The signal can be decomposed into a series of frequency bands layer by layer. It further decomposes the undivided high-frequency part of the wavelet analysis and selects the corresponding frequency band according to the characteristics of the analyzed signal to match the signal spectrum. Therefore, the wavelet packet decomposition method is used to decompose the original vibration signal into two-dimensional signals in time domain and frequency domain. For nonstationary signals such as motor vibration signals, wavelet packet decomposition can effectively represent the energy information in different frequency bands and different times and can reflect the rapid change of information in each frequency band in a short time. The process of wavelet packet decomposition is shown in Figure 1. The two-dimensional wavelet coefficient matrix obtained from the original signal decomposition is used as the input of the deep learning model.

Frequency Band Feature Fusion.
The vibration information of the motor contains multiple vibration frequencies, which are expressed in the wavelet packet decomposition coefficient matrix of the vibration signal. The amplitude in each frequency band reflects the working state of different structures, and the related information between different frequency bands can reveal the deep health status. Therefore, it is necessary to extract and fuse the band features in the signal and use these features to solve the problem of complex frequency components.
CNN is a neural network structure with local sensing ability. It scans the input data through the convolution check formed by locally connected network nodes and extracts the detailed features of the data. CNN has been widely used in image recognition and other fields because of its powerful feature learning ability. The calculation process of CNN can be expressed as follows: where x l j is the jth feature graph output at the l layer and x l−1 i is the ith feature graph of layer (l − 1). d l ij is the convolution kernel of the ith input feature graph connected with the j th feature graph. v l j is the offset term. * is the convolution operation. p ð•Þ is the activation function. In this paper, the Rectified Linear Unit (ReLU) is adopted, and its function expression is as follows.
In this paper, one-dimensional CNN is used to fuse the wavelet packet characteristic matrix of planetary gearbox vibration signal in frequency domain. Through multiple one-dimensional convolution kernels, the feature information of signals in different frequency bands is fused into new features. Therefore, the model can adaptively learn the complex frequency information in the vibration signal of planetary gearbox and reorganize the original frequency   Journal of Sensors information into new features. Now, the one-dimensional convolution kernel will scan along the time axis of the wavelet packet characteristic matrix, so the output data still contains the characteristics of the time dimension.

Time Series Feature Fusion.
In the vibration signal, in addition to the correlation characteristics between different frequency bands, the amplitude variation characteristics of each frequency band with time also reflect the health status of the planetary gearbox. However, CNN is mainly used to extract the local features of signals, and it is difficult to perceive the overall change features in long sequences. For the vibration signal of planetary gearbox, it is necessary to integrate the time dimension features to improve the recognition ability of its health state.
Strobe recursive unit (GRU) is a kind of improved recurrent neural network, which is specially used to process time series data and can maintain the change characteristics before and after the series, so it can be used to integrate the characteristics of data in time dimension. The internal structure of GRU is shown in Figure 2.
Its calculation process can be expressed as where W ∈ R d×k , V ∈ R d×d , and b ∈ R d are trainable parameters. ⨂ is the cross product operation. x t and h t are input and output vectors, respectively. z t and r t are update and reset gate vectors, respectively. σ g and σ h are activation functions.
In this paper, Hard Sigmoid and Tanh are used as activation functions, respectively. The Hard Sigmoid function is expressed as follows. 1, x > 2:5: The Tanh function is expressed as follows.
GRU unit receives a group of data each time, feeds back the state of the previous time, and merges it with the current data as input. After the output is calculated, the cycle is used for the next calculation until the last time scale, so as to obtain the overall change characteristics of the input data in the time dimension. Compared with the standard RNN structure, GRU can selectively learn and forget the features in time series by controlling the state of update gate and reset gate and has the ability of long-term memory. In order to fur-ther enhance the time feature fusion ability of the model, this paper adopts the bidirectional GRU structure, and its time series expansion is shown in Figure 3. As can be seen from Figure 3, bidirectional GRU can obtain the time series features before the current time through forward scanning and obtain the time series features after the current time through backward scanning. Compared with one-way GRU, the efficiency of feature fusion is greatly improved. We can get the time series characteristics after the current time. Compared with one-way GRU, the efficiency of feature fusion is greatly improved.

Dynamic Weighted Fusion Based on Attention
Mechanism. Under the influence of alternating working conditions, the vibration signal of planetary gearbox on display has strong time-varying characteristics. Therefore, the signals collected at different times have different characteristics, and there are great differences between different samples, which will greatly reduce the generalization ability of deep learning model.
The attention mechanism in deep learning is a bionic concept, which simulates the process of human focusing on the specific part of the observed object and improves the ability of information acquisition by focusing on different parts of the object information. Based on the deep learning model, attention mechanism is added to dynamically fuse the output features of bidirectional GRU layer at different times so that  The structure of this paper is as follows: where pð•Þ is the activation function. The hidden state sequence of bidirectional GRU is entered into the full connection layer, resulting in a dynamic weight α t . s t is the feature vector after dynamic weighted fusion.
The neural network outputs a weight α t for the feature at every moment by learning the information of the input data itself, which represents the "attention" of the network. The size of the weight represents the degree of focus of attention. Since the vibration signals of the planetary gearbox have different characteristics at different times, the attention machine can help the neural network to dynamically give weight to the characteristics at different times. Thus, it can more effectively improve the utilization degree of useful information and reduce the sensitivity of the mode to the input data. (1) Signal acquisition and truncation: the vibration sensor is used to collect the vibration signal of the plan-etary gearbox, and the collected signal is truncated to the same length to get the sample (2) Sample wavelet packet decomposition: onedimensional vibration sample signal is decomposed by wavelet packet to obtain two-dimensional timefrequency diagram samples, which are divided into the training set and the test set (3) Design and training of deep learning model: use training set data to train model parameters, and constantly adjust super parameters in the training process to achieve better performance (4) Fault diagnosis test: after the training of the model, the data of the training set is used for fault diagnosis test to verify the diagnosis performance of the model According to the above methods, the deep learning model is constructed. Among them, batch normalization is to normalize the output data of the convolution layer. It can make the activation of neurons tend to normal distribution and avoid oversaturation. Therefore, it can alleviate the problem of gradient vanishing to a certain extent, accelerate the convergence speed, and improve the accuracy of the model.
The global average pool is used to compress the dimension of the fused feature data to maintain the significance of the feature. It can effectively reduce the number of network parameters, reduce the complexity of the network, and prevent overfitting. In this paper, we use global average pool operation after structure fusion. Statistics and dimension reduction of the fused features are carried out to facilitate the subsequent classification and recognition.
Shedding is an effective way to reduce over assembly. In this process, some weights are discarded randomly when the network parameters are updated in each training. Only retained weights are updated for backpropagation. This can reduce the excessive dependence on some neurons, reduce the complex adaptive relationship between neurons, and

Motor
Torque sensor Bearing Power meter Figure 4: The bearing test system. 5 Journal of Sensors improve the generalization ability of the network. In this paper, exit operation is set between all connection layers, which only takes effect during training, but not during testing.
The final output of the deep learning model generates probability distribution through Softmax classifier to judge the fault types of the input data. Its mathematical representation is as follows.
In this paper, cross entropy is selected as the loss function to calculate the error between the calculated value and the real value of the deep learning model and promote its backpropagation to update the parameters. The loss function is expressed as follows.
where M is the number of fault types, t j is the true sample label, and y j is an estimate of the output of Softmax.  Journal of Sensors The RMSProp optimization algorithm [16] was used as the optimizer for model training, and the parameters were iterated and updated in the process of backpropagation. It brings the model's predictions closer to the real value. RMSProp limits the oscillations of gradient descent during backpropagation, allowing the network to obtain more accurate results.

Experimental Results and Analysis
3.1. Feasibility Verification of the Proposed Algorithm. In this paper, experimental data from the Bearing Data Center of Case Western Reserve University [17] were used to test the effectiveness of the proposed method. Vibration signals were measured by the bearing test system, and the bearing test system is shown in Figure 4.
The bearing model is skf6203, the sampling frequency is 1.2 kHz, the rotating speed is 1797 r/min, and there are four kinds of bearings: normal bearing, inner ring fault bearing, in which the damage diameter is 0.53 mm, rolling element fault bearing, in which the damage diameter is 0.53 mm, and outer ring fault bearing, in which the damage diameter is 0.53 mm (including center @ 6:00, orthogonal @ 3:00, and opposite @ 12:00). For each type of data, 120000 sample points are obtained, which means that the length of time series data is at least 120000. If it is directly used as the input of deep neural network, the training will be too large. Here, a group of vibration signals is grouped according to every 300 sampling points, and a total of 2400 datasets are obtained. Figure 5 shows an example of six vibration data.
In order to verify the performance of the model proposed in this paper, based on the same dataset, it is compared with the AlexNet [18], GoogleNet [19], and ResNet [20] deep learning methods that have been proposed at present.
The 400 hundred groups were selected as the test data, and 500, 1000, 1500, and 2000 samples were selected for training. The algorithm proposed in this paper was used to carry out diagnostic tests, and the comparison of diagnosis rate is shown in Figure 6.
As can be seen from Figure 6, with the increase of the number of training samples, the accuracy rate of the proposed algorithm is gradually increased.

Actual Experimental
Results and Analysis. Field diagnosis tests are carried out on motor bearings, in order to further verify the effectiveness of the improved random forest algorithm and verify whether the diagnosis method proposed in this paper is accurate when the bearing fault signal is weak. The motor model is SEW-DV132M4, the speed is 1430 r/min, the sampling frequency is 1 kHz, the bearing model is SKF6209-2Z, and the vibration data collector is COMMTEST-VB7. There are 4 kinds of bearings: normal, inner ring fault (damage diameter of 10 μm), rolling body fault (damage diameter of 18 μm), and outer ring fault (damage diameter of 9 μm). The number of diagnostic samples was 1000, and the number of collected samples was 2000.
In order to reduce the influence of unexpected factors, the training set and the test set are divided randomly for 10 times. Each group of data was used for independent training and testing, and the experimental results were recorded, respectively. The results of 10 experiments are used as performance evaluation indexes for comparative verification. The specific results are shown in Figure 7. The average accuracy and standard deviation are shown in Table 1. The accuracy of the AlexNet model is poor. Because the model simply overlaps multiple convolution layers, it is difficult to effectively extract complex data features. The GoogleNet model adds convolution kernels of different sizes in each convolution layer, which has certain frequency feature extraction ability, and its accuracy is 1.21% higher than that of AlexNet.  ResNet adds residual structure on the basis of GoogleNet and realizes cross layer connection of convolution layer, and its accuracy is 1.27% higher than that of GoogleNet. However, due to the single network structure for feature extraction, the above models cannot make full use of the information contained in the signal, so the overall accuracy is low. The method proposed in this paper can fully combine the timefrequency characteristics of the signal and introduce the attention mechanism of dynamic weighting. Therefore, in each test, its accuracy is higher than other deep learning models. The above results show that, compared with other deep learning fault diagnosis methods, the proposed fault diagnosis method has obvious advantages [21][22][23][24].

Conclusion
In order to improve the accuracy of motor fault diagnosis, a motor fault diagnosis algorithm based on wavelet and attention mechanism is proposed. The wavelet denoising method is used to remove the original signal noise and improve the signal-to-noise ratio. Convolution neural network and bidirectional gating loop unit are used to fuse time-frequency information. The attention mechanism module is used to dynamically weigh the features of different time points in time dimension. In the experimental part, the feasibility of the algorithm is verified by the dataset of Case Western Reserve University Bearing Data Center. The field diagnostic test was used for test comparison. Experimental results show that, compared with other deep learning methods, the proposed fault diagnosis method has obvious advantages.

Data Availability
The labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no competing interests.