A Data Fusion Fault Diagnosis Method Based on LSTM and DWT for Satellite Reaction Flywheel

This paper presents a novel fault diagnosis method based on data fusion for a reaction ﬂywheel of the satellite attitude system. Diﬀerent from most traditional fault diagnosis techniques, the proposed solution simultaneously accomplishes fault detection and identiﬁcation within parallel fusion blocks. The core of this method is independent fusion block, which uses a generalized ordered weighted average (GOWA) operator to complement the characteristics of output data from long short-term memory (LSTM) neural network and discrete wavelet transform (DWT) so as to enhance the reliability and rapidity of decision-making. Moreover, minibatch normalization is selected to address the problem of covariate shift, realize the adaptive processing of the dynamic information in the original data, and improve the convergence speed of the network. With the high-ﬁdelity model of the reaction ﬂywheel, three common faults are, respectively, injected to collect experimental data. Extensive experiment results show the eﬃcacy of the proposed method and the excellent performance achieved by LSTM and DWT.


Introduction
In recent years, the aerospace industry has begun to be interested in preventive maintenance system. A technological shift is ongoing in system monitoring from traditional fixed interval maintenance (FIM) to condition-based maintenance (CBM) system [1,2]. Compared with the former, CBM eliminates unnecessary preventive scheduled maintenance, reduces maintenance cost, and improves safety and reliability of the system. CBM includes a variety of technologies, such as performance monitoring and fault diagnosis, which ultimately enables the system to detect the faulty components and take appropriate measures [3].
Fault diagnosis has been widely concerned in CBM whose main task is to detect the fault occurrence time and identify the type of fault. en, the diagnosis information is used to upgrade the maintenance operations from FIM to CBM [4,5]. Nowadays, the intelligent technology represented by neural network further improves CBM, and thus many intelligent fault diagnosis methods have been studied by experts and scholars. Generally, the neural network-based diagnosis methods can be divided into two categories. One is to design an observer and estimator or fit the system function by the nonlinear fitting ability. Xin et al. [6] used the single hidden layer feedforward wavelet neural network to design an adaptive observer to estimate the state value of the system, whose residual is the detection signal of fault. Inspired by the concept of fault parameters, some researchers proposed a single fault detection method based on neural network fault parameter estimator [7,8]. On this basis, Sobhani-Tehrani et al. [9] designed a new fault diagnosis structure, which solved the problem of detection and isolation of multiple faults in the system by paralleling multiple parameter estimators. Witczak et al. [10] proposed a method of H∞ observer based on the state-space recurrent neural network (RNN). e approach guaranties a predefined disturbance attenuation level and convergence of the observer, as well as unknown input decoupling and state and actuator fault estimation. e other category adopts the feature extraction ability of neural network to detect and classify faults. e diagnosis procedure generally includes three steps, i.e., data collection, artificial feature extraction, and health-state recognition [11,12]. A novel health monitoring system for a variable air volume unit is developed in [13]. After generating fault features for various fault types via fuzzy logic, the artificial neural network classification technique is applied to fault classification. Sobie et al. [14] used training data gained from high-resolution simulations of roller bearing dynamics to train machine learning algorithms. Although simulated data cannot replace experimental data, they provide a stronger starting point for novel applications to further improve fault classifier performance. Barakat et al. [15] introduced the growing neural network to construct a diagnosis model for motor bearings, which obtained the higher diagnosis accuracy for a large number of data when compared with the conventional RBFN and the probabilistic neural network. A new approach named hybrid gradient boosting is proposed in [16] for fault detection in robotic arms. e method is based on the logistic regression model, and random forests, neural networks with machine learning algorithm, and xgboost models are used to boost the base line model.
At present, there is an end-to-end trend in the field of deep learning, which combines feature extraction and classifier design into a neural network. e scheme overcomes the shortcoming of traditional intelligent diagnosis method that needs manual intervention. e motivation of this idea is that neural network can automatically learn both features of original data and classifier so as to better adapt to fault diagnosis and improve performance [17][18][19]. In order to deal with the dynamic information of time series, recurrent neural network, such as long short-term memory (LSTM) neural network, and its variants have shown excellent performance, especially in the complex time series problems including signal processing, fault classification, and price prediction [20,21]. By adjusting the inputs and outputs of the LSTM cell with the nonlinear gate units, LSTM can learn the dynamic information of the time series adaptively. However, it should be noted that the accuracy of this kind of method depends on the data's types and prior data of the system.
With the development of sensor technology, the types of system data collected are more abundant, which promote the development of data fusion technology and have been widely applied in different fields [22][23][24]. It can use the complementarity between data sets to improve the accuracy of decision process. In addition, the cooperative use of overlapping data reduces the uncertainty in the system and leads to more reliable results [25,26]. Ordered weighted average (OWA) and its variants, as a robust tool to aggregate data from various sources, have become a representative data fusion method [27]. Now, the technology is also gradually applied in the field of fault diagnosis and prognosis to improve the ability of feature extraction and fault decision [28]. Rezamand et al. [29] proposed a novel real-time failure prognosis method for wind turbine bearings in which the OWA operator is applied to combine information obtained from various single features to provide relatively accurate predictions. A new type of fuzzy Petri nets (FPNs) was introduced in [30], which overcomes the deficiencies of traditional FPN by using intuitionistic fuzzy sets (IFSs) and intuitionistic fuzzy ordered weighted average (IFOWA) operators. is makes the model include a wide range of particular cases so that it can effectively handle the various uncertainties in knowledge acquisition and representation. Sánchez-Fernández et al. [31] achieved fault identification based on a scored ranking at two time points: early fault and steady fault. In each case, the OWA linguistic operator based on the regular increasing monotone (RIM) function can find the variables that are responsible for the fault. However, the specific application value of OWA still needs further study.
Reaction flywheel (RW) as the actuator of satellite attitude control system (ACS) is an important guarantee for the successful implementation of space mission. e RW consists of several highly nonlinear internal and external circuits. Generally, only feedback control signal and attitude sensor can be used for monitoring. erefore, the fault diagnosis and health monitoring of RW is very challenging.
In this paper, the parallel fusion blocks are used to detect and identify the three common faults of RW: motor current reduction, bus voltage insufficient, and work temperature over-high. In each block, a data fusion method based on LSTM and DWT is designed. e main motivation to integrate LSTM and DWT methods is to improve the performance of fault diagnosis. e main contributions of this paper are as follows: (1) e parallel structure of fusion blocks reduces the calculation complexity, which allows the network to achieve higher accuracy and speed in identifying the initial fault. (2) e LSTM can adaptively learn the dynamic information of sequential data. e minibatch normalization method is applied to solve the problem of covariate shift and improve the convergence of LSTM.
(3) e output information of LSTM and DWT is complemented by the GOWA operator, which overcomes the shortage of data types and improves the real-time detection ability and identification accuracy of fault diagnosis system.

Minibatch Normalization-Based Vanilla LSTM
Vanilla LSTM, improved form of LSTM, is most commonly used in sequential data prediction and classification [32]. e internal structure of the vanilla LSTM is illustrated in Figure 1.
In Figure 1,c t−1 and y t−1 represent the prior cell state and output, while c t and y t represent the current cell state and output. z t , z i t , z f t , and z o t are the cell input, input gate, forget gate, and output gate, respectively. σandh denote the sigmoid function and tanh function. It can be seen from the figure that vanilla LSTM has similar architecture to RNN, but the former is based on a set of connected cells. Different from the RNN cell which overwrites the information directly, each cell of the vanilla LSTM contains cell input, three gates, and cell output. e cell output is recurrently connected back to the cell input and all the gates. is special design allows the vanilla LSTM to robustly remove or add information during long period of time.

Minibatch Normalization.
In deep neural network, the change of parameters in one layer usually has a serious impact on the distribution of subsequent layers. We refer to this phenomenon as covariate shift which can be addressed by normalizing layer input [33].
Since the full whitening of each layer's input is costly and not differentiable everywhere, we made two necessary simplifications in this paper. e first is that we normalize each scalar feature independently instead of whitening the features in layer inputs and outputs. For a layer with n-dimensional input x � (x 1 , . . . , x n ), each dimension is normalized as where the expectation and variance are calculated over the training data set. Note that simply normalizing each input to a layer may change what the layer can represent. To solve this problem, we introduce a set of parameters ρ k and η k for each activation x k , which determine the mean and standard deviation of the normalized features: When using stochastic optimization, normalizing activations by entire training set is impractical. us, the second simplification is that each minibatch produces estimates of the mean and variance of each activation. In this way, the statistical information used for normalization can fully participate in the gradient backpropagation.
Consider a minibatch ϑ of size m: Let the normalized values be x 1...m and their linear transformations be y 1...m , we refer to the transform e minibatch normalization transform is presented in Algorithm 1.
In Algorithm 1, ε is a regularization parameter added to the minibatch variance for numerical stability.

Forward
Pass. Let x t be the input at time t, and M and N are the number of inputs and LSTM cells, respectively. According to the architecture of the vanilla LSTM, the equations for layer forward pass can be written as where W (·) ∈ R N×M , R (·) ∈ R N×N , P (·) ∈ R N , and b (·) ∈ R N represent the input weights, recurrent weights, peephole weights, and bias weights, respectively. σ and h represent the sigmoid function and tanh function; ⊙ denotes the elementwise multiplication, and the initial states c 0 and y 0 are the parameters of the network. Apply minibatch normalization to vanilla LSTM. In order to avoid unnecessary redundancy and over fitting, set ρ � 0 in minibatch normalization. According to Algorithm 1 and (5), the forward pass of minibatch normalization-based vanilla LSTM (hereinafter referred to as LSTM) is as follows: where Because training data is standardized before training, there is no need to normalize W x x t .

Backpropagation through Time.
e deltas inside the LSTM cell are calculated as

Mathematical Problems in Engineering
where Δ denotes the delta passed down from the upper layer.
Only when there is a layer to be trained below, the input deltas are needed, which can be calculated as follows: en, the gradients for weights are computed as where (·) denotes any of z z i z f z o and 〈·, ·〉 denotes the outer product of two vectors.

The Proposed Method for Fault Diagnosis
In this section, a data fusion fault diagnosis method for the actuator of satellite attitude system is proposed. e main purpose is to design a fault diagnosis system to detect and identify the fault in RW. e timely fault diagnosis can not only give the satellite enough time to take appropriate measures before the further development of the fault in the system but also use the fault information to predict the remaining service life of the system. e following content defines the fault set of the actuator of satellite attitude system and introduces the detailed design process.

Fault Set of the Reaction Flywheel.
e RW considered in this paper is ITHACO "type A" reaction flywheel that is manufactured by Goodrich Corporation. A high fidelity nonlinear model of the RW can be obtained from [34] and has been integrated into the ACS dynamics.
ree identical RWs are used in a three-axis stabilized satellite (regardless of the often redundant fourth RW), and each RW has a dedicated fault diagnosis block. Since the three RWs are identical, the results of the fault diagnosis block for only the pitch axis is studied. e training data set can be obtained from the closed-loop ACS simulation of the three-axis stabilized satellite. e moment of inertia of three axes is , and I z � (440kg/m 2 ). e initial angle of the satellite is 5, and the running time is 100 seconds.
In this paper, three types of common faults in RW are considered, including motor current reduction, bus voltage insufficient, and work temperature over-high. Generally, current and voltage faults are transient, while temperature fault accumulates slowly. us, fault in the motor current reduction is modeled and injected as variations in motor torque gain k t . Fault in the bus voltage insufficient is modeled and injected as drops in the voltage of the power bus V bus . Fault in the work temperature overhigh is modeled and injected as the slope function in the standard temperature T. In other words, k t + ΔI, V bus + ΔV, and T + ΔT are used to replace k t , V bus , and T, where ΔI, ΔV, and ΔT are defined the fault parameters, collected in Table 1. Different fault parameters are injected into RW, respectively, to obtain the satellite pitch angle and feedback control signal. ese measurements are very precise, and they are less affected by noise. erefore, we add only 1% white noise to these measurements to mimic more realistic conditions. Figure 2 illustrates, a parallel fault diagnosis framework is designed to detect and identify the fault.

e Data Fusion Fault Diagnosis Method. As
It is indicated from Figure 2 that three parallel fusion blocks are, respectively, responsible for the diagnosis of motor current reduction, bus voltage insufficient, and work temperature over-high. In each block, the fusion method based on LSTM and DWT is used for fault detection and identification.

e Neural Network.
Neural network is an effective method to estimate the complex nonlinear function, which is widely used in the field of fault diagnosis. It constructs a mapping between input and output of the system by the available data set [35]. In this paper, the satellite pitch angle Y(t) and feedback control signal U(t) are selected as inputs of neural network.
With the proposed LSTM, the diagnosis of fault data is straightforward. e offline modeling and online monitoring flowcharts are shown in Figure 3. e procedures of offline modeling and online monitoring are as follows: Input: ϑ � x 1...m ; parameters to be learned ρ, η Output:    Mathematical Problems in Engineering 5 Offline modeling: (1) Collect data of different fault types as training data.
(2) Normalize each feature of the training data.
(3) Train the three parallel LSTM with Adam.
(4) Calculate the loss function J. If J > e and the number of iterations l < I max , go to (3), where e is a small positive number and I max is the maximum number of iterations. (5) Output the parameters of the neural network.
Online monitoring: (1) Sample a new raw data as testing data: (2) Normalize the testing data.

e DWT Method.
Wavelet transform can explore the local characteristics of signals and analyze signals with different time resolution and frequency resolution [36]. DWT is used to discretize the scale and translation of wavelet transform, and it can be used for adaptive timefrequency analysis of nonstationary signals. Moreover, DWT has the ability to capture frequency and location information of the signal. erefore, DWT is an excellent tool for fault diagnosis.
Consider a signal f(t) ∈ L 2 (R), which can be constructed by the linear combinations of scaling functions and orthogonal wavelets as follows: where θ(t) and φ(t) represent the scaling functions and orthogonal wavelets, respectively. And mandi represent the dilation and translation factors, respectively. Approximation coefficients a 0,i and detail coefficients d m,i can be computed by As shown in Figures 4 and 5, the value of the detail coefficient will jump when the fault occurs in the system, and amplitudes are different for various faults. erefore, the fault type can be identified by analyzing these coefficients.

e Fusion Decision Based on GOWA.
e GOWA operator can aggregate data information more effectively and sensitively by adding an additional parameter to OWA, so it is considered as a tool for multi-information decision [37][38][39]. Due to the preprocessing of attitude data by LSTM and DWT, compared with other OWA diagnostic methods, the fusion method based on GOWA does not require to design a complex operator such as using intuitionistic fuzzy sets, linguistic operator, or induced continuous OWA. And only a simple decision process can achieve accurate diagnosis. Figure 2 shows the internal diagram of the fault diagnosis system based on the data fusion method. ree parallel fusion blocks are used to identify different fault types. In each block, the GOWA operator is used to integrate the decision outputs of LSTM and DWT into a unique framework. e fusion outputs of every block are as follows: subject to : where O I GOWA (t), O V GOWA (t), and O T GOWA (t) are the outputs of the GOWA operators for motor current reduction, bus voltage insufficient, and work temperature over-high, respectively. O DWT (t) is the output of the DWT.

Simulation Analysis
In this section, the effectiveness of the proposed fault diagnosis method for RW of the satellite attitude system is verified by simulation experiments.

Fault Scenarios.
is paper considers three common faults of motor current reduction, bus voltage insufficient, and work temperature over-high in the ACS. Table 1 presents the health value range of these faults, so we can obtain fault data by injecting different faults into the simulation model.

Motor Current Fault.
e motor has a torque constant k t , which delivers a torque proportional to the current driver, i.e., k t � (τ m /I m ). erefore, k t can be used to reflect the changes in motor current. When k t drops outside a certain range, the RW cannot provide sufficient control torque, which leads to fault of ACS.

Bus Voltage Fault.
e bus voltage V bus needs to be set high enough to avoid insufficient voltage margin. When the bus voltage is insufficient, the EMF of the motor will rise so that the maximum torque that the motor can provide decreases. Eventually, it affects the stability of the satellite attitude.

Temperature Fault.
Viscous friction τ v is generated due to the bearing lubricant, and it has a strong sensitivity to the temperature T: τ v � (0.0049 − 0.00002(T + 30))ω. (15) Note that it is the main friction in RW. erefore, when the temperature exceeds the threshold, the generated friction will reduce the control torque and result in failure. Figure 2, the pitch attitude Y(t) and the feedback control signal U(t) of the satellite are regarded as the inputs of the network with time steps 5. e LSTM contains 12 memory cells, and the output of memory is read through the full connection layer with activation function tanh to generate network output.

Implementation of LSTM. As shown in
In training, the mini batch size and epochs are set to 30 and 50. e result in Figure 6 illustrates that the BN-based LSTM converges significantly faster to the baseline LSTM.
Under operating condition, any output value ∈∈ −0.5 0.8 is regarded as health, and any other value ∈∈ 0.8 1.5 is regarded as fault. e value ∉ −0.5 1.5 is considered as an unknown fault, and the output of the network is set to −2, indicating that an unknown fault is detected in the system.
In order to illustrate the online work procedure of the blocks, it is assumed that a motor current fault occurred. In  Table 2 illustrates the results, where label I, V, T, H, and U denote motor current reduction, bus voltage insufficient, work temperature over-high, health, and unknown types, respectively. It is indicated from Table 2 that the error rate between voltage fault and temperature fault is higher than others. So, we can solve this problem by combining the outputs of LSTM and DWT.

Implementation of DWT.
As mentioned above, the rapidly changing detail coefficients in the event of a fault can be used to detect the fault. For example, consider a motor current fault with the parameter ΔI � 0.008 occurs in the system at t � 50s. Wavelet and level are selected to be db4 and 4, and Figures 4 and 5 show the behavior of the approximation and detail coefficients of the fault for satellite attitude and control signal, respectively. Temperature fault will change very slowly in the process of temperature accumulation, and it is usually difficult to be observed at the beginning of fault. For motor current reduction and bus voltage insufficient, the sudden change of fault parameters will lead to higher detail coefficient. is feature is considered as the difference between them and work temperature over-high. So, the DWT identification mechanism is straightforward. e fault identification logic with DWT is as follows: For pitch angle, For control signal, If (15) and (16) are both satisfied, DWT outputs "1," indicating that the system identifies the fault as motor current reduction or bus voltage insufficient. Otherwise, the output is "0". However, DWT cannot detect temperature fault.
us, this paper combined the DWT method with LSTM as an aid fault diagnosis tool to improve the accuracy of fault diagnosis system.

Weights Determination of GOWA Operator.
e function of GOWA operator is to fuse the outputs of LSTM and DWT to make fault decision. ese weight factors can be determined by optimizing the following cost functions: where W is a vector of weighting factors, o i n is the output of n − th block at i − th iteration, T i denotes the ideal output of i − th iteration, and "0" represents the health and "1" represents the fault.
Since DWT is unable to identify temperature fault, LSTM is only used to identify temperature fault. Table 3 presents the weighting factors of the fusion method. e design parameters of LSTM and DWT are summarized in Table 4.  Figure 6: Comparison of convergence between baseline LSTM and BN-based vanilla LSTM for motor current reduction during training.   Figure 7. In the confusion matrix, the rows represent the actual label and columns represent the predict label. e diagonal cells show where the actual and predict labels match. e nondiagonal cells show instances of test algorithm errors. It is noted that the unknown fault has no test data, and the default accuracy rate is 1. It can be seen from Figure 7 that the fusion method can correctly separate 97% of the motor current reduction, 96% of the bus voltage insufficient, 94% of the work temperature over-high, and 96% of the health condition. Compared with traditional "feature extraction + classifier" mode such as DPCA + SVM and DLDA + SVM, the proposed method has higher diagnostic accuracy for different fault types. In addition, with the adaptive processing ability of dynamic information in raw data, it also has a better performance than the fusion method of DWT with conventional single hidden layer feedforward network such as MLP which takes each data independently for training and ignores the correlation information between different data. At the same time, compared with the result of LSTM in Table 2, the proposed fusion method has a significantly lower error rate of fault diagnosis due to fusing the diagnostic information of DWT through GOWA. In summary, the discriminant power in the fusion method is larger than that in DPCA + SVM, DLDA + SVM, and DWT + MLP. us, the proposed fusion method should be more suitable and effective for fault identification.
Furthermore, in order to evaluate the real-time performance of fault diagnosis system, various faults consisting of the motor current reduction, bus voltage insufficient, and    parameter is a small value, the fault has a sever effect in the system. Figure 9 presents the fault diagnosis signals in the LSTM, DWT, and GOWA for motor current faults.
Due to the severity of the fault, all fault diagnosis blocks can detect the fault quickly. However, compared with the LSTM method, DWT and GOWA have faster response speed.

Bus Voltage Insufficient. Consider a bus voltage fault
with ΔV � −2 at t � 20s. Figure 10 indicates the satellite pitch attitude output and feedback control signal. Figure 11 presents the LSTM, DWT, and GOWA output signals under bus voltage fault. Similarly, the detection speed of GOWA is faster than that of the LSTM method.

Work
Temperature Over-High. Consider a temperature fault with a slope of 0.4 in the system as follows:       Figure 12 illustrates the change of satellite pitch attitude and feedback control signal in case of temperature fault. Figure 13 illustrates the fault signals in the LSTM, DWT, and GOWA as the operating temperature continues to rise.
It is worth noting that due to the slow change of system state caused by fault, DWT alone cannot detect the occurrence of temperature fault. erefore, only the output of LSTM plays a decisive role in fault detection in this condition.

Conclusions
Due to the inherent nonlinearities of RW and satellite attitude dynamics, as well as the impact of disturbance on the satellite, it is a challenging problem to effectively and accurately diagnose the RW. In this paper, a fusion method based on LSTM and DWT is proposed to solve the problem of fault detection and identification of the actuator. ree common faults of RW, motor current reduction, bus voltage insufficient, and work temperature over-high, are researched. en, three parallel fusion blocks are developed to detect and identify these faults.
In each block, fault information from LSTM and DWT is fused by the GOWA operator, so fault types can be synthetically determined. In addition, due to the use of LSTM, the dynamic information of process data can be used adaptively to improve the reliability of the system. Compared with DPCA + SVM, LDA + SVM, and DWT + MLP algorithms, the fusion method has better performance in fault identification accuracy. Moreover, compared with the single LSTM method, this method has better real-time fault detection ability, more sensitive, and faster response to the fault.
Finally, there will be many directions to be explored about fault diagnosis and prognosis in future. Some of those are as follows: (1) In the process of fault detection, transfer learning can be used to learn the identified unknown fault to enhance network intelligence (2) After the fault diagnosis, the prediction of remaining useful life of the system can be further studied by using the time series prediction ability of LSTM