Multidomain Feature Fusion Network for Fault Diagnosis of Rolling Machinery

Mechanical vibration constitutes a valuable cue for performing fault diagnosis as it is directly related to the transient regime of rolling machinery. This study establishes a multidomain feature fusion network (MFFN) to extract and fuse multidomain features through a novel multistream architecture. Three primary features are simultaneously extracted from the time, frequency, and time-frequency domains. Then, highly representative features are extracted via three convolutional branches in one- or two-dimensional spaces. A novel squeeze-connection-excitation (SCE) module is proposed to adaptively fuse features in the three domains. The advantage oﬀered by the proposed method is that it can leverage cues from the raw vibration signal, resulting in accurate fault diagnosis. Experimental results comprehensively demonstrate and analyze the high accuracy and generalization achieved by this MFFN-based fault diagnosis method.


Introduction
Rolling machinery is a foundational element in industrial infrastructures. Machinery faults are the main factors that significantly affect equipment and production safety. Intelligent fault diagnosis of rolling machinery has been a topic of interest in studies concerning vibration-based health monitoring of mechanical systems [1]. Previously, fault diagnosis was realized through a combination of traditional signal processing methods, such as Fourier and wavelet transforms (WTs), and shallow learning techniques, such as support vector machine (SVM) [2] and Bayes classifiers [3]. In general, these methods are physically analyzable; however, they provide an inadequate representation of faults, which may result in a low diagnosis accuracy. is problem has motivated the development of deep learning-based methods, such as deep belief networks (DBNs) [4], stacked autoencoders (SAEs) [5], convolutional neural networks (CNNs) [6], and long short-term memory (LSTM) [7]. e high representability offered by deep learning methods significantly improves fault diagnosis accuracy.
Recently, multistream architectures are being used for fault diagnosis. In contrast to single-stream architectures, multistream architectures can represent faults in terms of multiple aspects; thus, they can achieve further enhancements in the representability of intrinsic characteristics of machinery faults. is property may further improve the performance of fault diagnosis methods. However, current multistream architectures primarily focus on the multiscale characteristic of raw vibration signals [8] and ignore the various physical properties observed in multiple domains. A novel multistream architecture that can extract and fuse multidomain features is desirable to facilitate accurate fault diagnosis.
is study proposes a novel multidomain feature fusion network (MFFN) for fault diagnosis. To this end, three onedimensional (1D) and two-dimensional (2D) convolutional streams are designed and combined to construct the multistream architecture. Two 1D streams manage the data in the time and frequency domains, while a 2D stream extracts the time-frequency feature. At the backend joint, three representative features are fused by the squeeze-connection-excitation (SCE) module. Finally, the fused features are used for fault classification. e contributions of this study to fault diagnosis include the following: (i) A novel multistream architecture that can process multidomain features in an organized and comprehensive manner (ii) A novel SCE module that can adaptively fuse multidomain features (iii) A novel feature type that offers high representability of fault patterns and improves the decision-making capabilities of fault diagnosis e remainder of this paper is organized as follows. Section 2 reviews related works. e overall framework of MFFN is presented in Section 3. Section 4 describes the proposed MFFN-based fault recognition method. Experimental comparisons and analysis are discussed in Section 5. Section 6 provides the summary and conclusions.

Related Works
Various fault diagnosis methods have been proposed to classify faults in mechanical systems. ese methods generally collect vibration signals as the source data because vibrations directly relate to the transient state of running elements. Various existing shallow learning models, such as the hidden Markov model [9], k-nearest neighbors [10], and SVMs [11], have been applied in fault classification. Recently, deep learning-based methods have demonstrated excellent performance in fault diagnosis. e advantage afforded by deep learning methods is the high representability of faults. For example, the DBN model [4], which is a typical probabilistic generative model, has been introduced to solve the problems of nonlinear dynamics and discrete failure patterns. However, experimental results have revealed that DBN architectures are susceptible to overfitting. e SAE method, which is a popular deep learningbased fault diagnosis method, can incrementally learn new samples without a retraining process [5].
Another key issue in fault diagnosis is that feature extraction. Previously, temporal and frequency analyses were the two main approaches toward fault feature extraction [12]. However, they cannot represent the temporal variation of a vibration signal accurately [13]. is problem has been solved via methods including short-time Fourier transforms [13], Wigner Ville distributions [14], and WTs [15]. Among these, WT is the most practical because its relaxed structure can decompose signals with varying temporal resolutions. Moreover, WT can produce 2D feature maps such that successful image classification methods can be transformed into fault diagnosis methods.
Deep learning-based fault diagnosis has attracted considerable attention recently [16]. e advantage of deep learning lies in its excellent ability to abstract signals by performing layer-wise nonlinear calculations, thereby enabling the deeper layer to generate more representative features.
is encourages the utilization of various deep learning methods in fault recognition. e DBN is one of the most widely used deep learning methods because it can adapt to a wide range of problems, including those of nonlinear dynamics and discrete failure patterns [17]. To leverage valuable cues for fault diagnosis, an adaptive spatiotemporal feature learning architecture with multiple measurements was proposed [18]. Subsequently, the generalization of deep learning architecture was considered. For example, a domain generalization-based hybrid diagnosis network was established, which could be deployed in unseen working conditions instead of in real-world working conditions [19]. Moreover, a domain adversarial transfer network has been evaluated for application in fault diagnosis, wherein a transfer learning mechanism can be implemented to enhance the generalization of deep learning-based fault diagnosis [20]. Recently, a novel convolutional neural network is established to diagnose faults from small samples. Based on the domain adaption, this method won success when the vibration data are not available in abundant [21]. Different from this previous strategy, our study in this paper aims to solve another problem in fault diagnosis-feature representation and fusion.
Also, other types of signal have been introduced in faults diagnosis. For example, the thermographic information has been utilized in fault diagnosis of ventilation in BLDC motors [22,23]. In contrast to the vibration signal, the thermographic signals provide additional informative clues which help to increase the accuracy of the fault diagnosis. Moreover, the thermographic signal is relatively simple in contrast to the vibration signal, such that it can better identify the fault types. However, the main drawback of the thermographic signal-based strategy lies in that it is commonly hysteretic to reflect the machinery statement. In practice, it is observed that the temperature significantly increases after a while of the fault occurrence. Alternatively, the acoustic signal has been investigated in the field of fault diagnosis [24]. e main advantage of the acoustic signalbased strategy lies in the noncontact measurement that we can efficiently deploy the acoustic sensors to diagnose faults. However, the acoustic signal is likely affected by the environmental noises. As the result, noise removal is the main issue of the acoustic signal processing.
is study aims to leverage valuable cues from multiple domains for fault diagnosis. To this end, a novel MFFN is proposed. is network comprises three streams that can comprehensively extract highly representative features in multiple domains, such as the temporal, frequency, and time-frequency domains.
e MFFN can obtain more valuable cues for fault diagnosis than those of current singlestream and multistream architectures. Moreover, the novelty of the proposed architecture lies in its ability to adaptively fuse 1D and 2D features using the SCE block.

MFFN.
To achieve high fault representability, this study proposes a novel multistream architecture for extracting and classifying three types of features in the temporal, timefrequency, and frequency domains. e block diagram of MFFN is shown in Figure 1. e sliding window block is applied to segment the vibration signal into sequence vectors at the first stream. Time-frequency features are extracted using WT at the second stream. e third stream extracts frequency features via fast Fourier transform (FFT). ese primary features are subsequently enhanced in terms of their representability through layer-wise convolutional calculations. Finally, these highly representative features are adaptively fused via the SCE module. e backend classifier is established using two fully connected layers and a Softmax calculation block. e data used in this study were obtained from public datasets. e fault diagnosis platform consists of a motor, torque transducer/encoder, dynamometer, and control electronics [21]. e reason for selecting these datasets lies in that they provide a baseline to fairly evaluate and compare different methods.

Primary Feature Extraction.
e primary feature extraction process is shown in Figure 2. e sliding window is used for extracting temporal features. L denotes the window length, and M is the sliding step. e frequency spectrum is extracted via the following FFT: where N denotes the length of the signal segmentation. A limitation of FFT is that it analyses the frequency spectrum pattern of the vibration signal exclusively from a global perspective; therefore, it is not suitable for an amplitude-modulated or nonstationary signal.
is drawback can be addressed by wavelet package transform (WPT), which is a time-frequency analysis method that can analyze vibration signals with flexible temporal resolutions at both high and low frequencies [15]. erefore, the WT is operated with a wavelet packet tree that decomposes a signal into several levels of wavelet packets. A three-layer wavelet packet tree is used in our method. As a result, eight sub-bands are obtained, and the energy value of each sub-band signal can be calculated through the following equation: where C j n (t)(n � 3, j � 0, 1, . . . , 7) is the reconstructed sub-band signal, N i is the length of the reconstructed signal, and x k j (k � 1, 2, . . . , r) is the amplitude of the j th reconstructed signal. e energy spectrum feature of a sub-band signal can be presented through a normalized value, as shown in the following: where E r is the square root of the summed square values of the sub-band signal energy and is expressed as follows:

Extraction and Fusion of Highly Representative Features.
In the first and third streams, two 1D-CNNs are connected to the primary feature extractor to manage features in the temporal and frequency domains, while a 2D-CNN is connected to the WT module to process the feature in the time-frequency domain. ese highly representative features are then fused at the backend joint by the SCE module. In general, there are four successive phases in the SCE module, namely, squeeze, connection, excitation, and reweight, as shown in Figure 3. e feature matrices of the three domains are the input to the SCE module. e temporal, frequency, and time-frequency feature matrices are presented as follows: X � where L 1 , L 2 , and L 3 are the feature dimensions and C 1 , C 2 , and C 3 are the number of feature channels. Shock and Vibration

Squeeze.
A pooling operation is applied to squeeze the feature matrix. As a result, three 1D feature vectors are generated to present the feature matrix across three domains, shown as follows: 3.5. Connection. e feature vectors are fused by the following concatenation operation: p � F c (l, m, n) � l 1 , l 2 , . . . , l C 1 , m 1 , m 2 , . . . , m C 2 , n 1 , n 2 , . . . , n C 3 � p 1 , p 2 , . . . , p C , 3.6. Excitation. Multilayer mapping is performed to achieve excitation, as follows: In the above equation, σ is the sigmoid function, δ is the ReLU activation function, and W, W 1 , and W 2 are the fullconnection weights.

Reweight.
e learned weight is added to feature channels to generate the weighted feature for final classification: e advantage of the reweight calculation is similar to that of the global average pooling operation in the squeeze process that can generate channel-wise statistics. Subsequently, this global information is embedded by the excitation process to generate the channel descriptor q, which comprehensively captures channel-wise dependencies. As a result, the most important feature can be emphasized by multiplying the feature channels with the channel descriptor. In this regard, SCE blocks intrinsically introduced dynamics conditioned on the input, thereby helping boost feature discriminability of specific fault patterns [5].

MFFN-Based Fault Diagnosis
e MFFN-based fault diagnosis method comprises four modules. e first module performs primary data processing, wherein the physical significance of the vibration signal is presented with respect to the temporal, frequency, and time-frequency features. In the second module, the high-representation features are extracted through layer-wise mapping. Adaptive feature fusion is realized in the third module, wherein the most credible factor is enhanced, while feature redundancy is reduced considerably. Finally, the fourth module is designed for fault classification, wherein a shallow architecture is established with two fully connected layers and a Softmax calculation block. e details of our proposed MFFN are presented in Table 1.

Setup.
To evaluate the performance of the proposed MFFN, experiments were conducted on defective bearing datasets provided by the Case Western Reserve University Bearing Data Center (CWRU dataset) [25], Jiangnan University (Jiangnan dataset) [26], and Paderborn University (Paderborn dataset) [27].
e bearing system platform in the Case Western Reserve University Bearing Data Center includes a 2 HP motor, torque transducer, dynamometer, and load motor. e vibration signal was collected via an accelerometer at a sampling frequency of 12 kHz. In addition to the normal state, nine categories of fault state data were included in this database: single-point faults with sizes of 0.007, 0.14, and 0.021 were individually identified on the inner race (IR), outer race (OR), and rolling elements (REs), respectively. For each state, 120,000 samples were collected in 10 s. e data from Jiangnan University include four categories of running states: the normal state and fault states separately seeded on the bearing at IR, OR, and RE. All data were collected at a 50 kHz sampling frequency at rolling speeds of 600, 800, and 1000 rpm. For the normal state, 1800 samples were randomly collected, while 600 samples were collected for each fault state. e data from the Paderborn University were provided via measurements concerning six healthy and 26 damaged bearings at IR and OR. All data were collected at a 64 kHz sampling frequency at rolling speeds of 900 and 1500 rpm. For each state, 256,0000 samples are collected in 4 s. e training and testing samples for experimental evaluations are shown in Tables 2 to 4.

Model Pretraining and Fine-
Tuning. An adequate number of epochs in the training period is important for model training. Excessive epochs may result in overfitting, while the learning outcome may be poor in the case of insufficient epochs. Figure 4 illustrates the training times of the three datasets and reveals that that the MFFN can converge rapidly on all three datasets. irty iterations are sufficient for model learning with the CWRU and Jiangnan datasets, while 25 iterations are required for model learning with the Paderborn dataset.

MFFN-Based Fault Diagnosis.
Confusion matrices were utilized for evaluating the performance of the proposed MFFN. Figure 5(a) presents the confusion matrix for the CWRU dataset. is result demonstrates that our fault diagnosis method is highly accurate. Only two samples were erroneously classified; the rest were identified correctly. e classification results regarding the Jiangnan dataset are satisfactory ( Figure 5(b)); only one sample was misclassified. A similar outcome was observed in the results on the Paderborn dataset ( Figure 5(c)) with one error.

Comparison against Existing Deep Learning Methods.
We evaluated the proposed MFFN by comparing it to stateof-the-art methods. e temporal-, frequency-, and timefrequency feature-based methods are comprehensively catalogued for experimental comparison. For example, the 1D-CNN was used to classify 1D temporal features [28] and denoted as "TF + 1D-CNN." Furthermore, temporal feature + WDCNN (TF + WDCNN) [29], frequency feature + 1D-CNN (FF + 1D-CNN) [30], frequency feature + SDAE (FF + SDAE), time-frequency feature + 2D-CNN (TFF + 2D-CNN), and time-frequency feature + VGG16 (TFF + VGG16) [31] were included in the experimental comparison. Each dataset was divided into 10 subsets for experimental evaluation with respect to working conditions. Figure 6 reveals that no salient performance variation was observed among the 10 evaluations for MFFN; the maximum differences among evaluations were 0.08% on the CWRU dataset, 0.12% on the Jiangnan dataset, and 0.13% on the Paderborn dataset. us, experimental analysis demonstrates the stability of our MFFN compared with that of the other methods, which exhibit lesser model stability owing to significant performance variations across tested subsets. e average accuracies of the compared fault diagnosis methods are listed in Table 5. Two observations can be made based on this table. First, the feature in the time-frequency domain outperforms the temporal-and frequency-domain features. is is because the time-frequency domain feature can identify details of the frequency spectrum of the vibration signal, which facilitates an improved fault diagnosis. Second, fusing features in multiple domains is preferable for fault diagnosis. is is because the feature fusion results can represent machinery faults from multiple aspects and allow more valuable cues to be leveraged for fault diagnosis. As a result, intraclass fault differences are enlarged, while interclass clustering is enhanced, which theoretically explains the better performance of the proposed MFFN.

Visualization.
Aiming to comprehensively understand the benefits of our proposed MFFN, the t-SNE technique was applied to reduce the dimensionality of the learned features Shock and Vibration    e resulting 2D feature maps are shown in Figure 6, wherein different colors represent various fault or normal categories. As shown in Figure 7, after MFFN feature learning, a fault-category clustering effect is observed in contrast to the raw distribution, along with linear margins between fault categories. is result is desirable and enables simpler classification.
is further demonstrates that using the MFFN architecture can significantly improve the accuracy of fault diagnosis.

Conclusions
A novel MFFN-based fault diagnosis method is proposed. e proposed MFFN can fuse features in different domains, such as the temporal, frequency, and time-frequency domains. Sufficient cues are comprehensively leveraged through the deep learning process of MFFN. e main contribution of MFFN is that it can improve the representability of faults, leading to a significant improvement in the accuracy of fault diagnosis. eoretically, features in multiple domains depict faults from multiple perspectives, which are complementary in physical significance. Moreover, the importance of features in multiple domains varies with respect to the tasks on hand. Intrinsically, our proposed MFFN adopts a feature fusion strategy using adaptive weights. Features extracted in multiple domains are weighted and fused, leading to a comprehensive utilization of their advantages. Consequently, MFFN achieves higher accuracy compared with existing architectures.
Using our proposed MFFN, exceptional accuracy can be achieved, enabling its utilization in many practical  applications. In our future work, we will train the MFFN model to handle more signal types, such as thermal imaging and acoustic data, which contain much more valuable features for diagnosing faults. Moreover, we will evaluate MFFN in real-world applications, especially in online fault diagnosis.

Data Availability
e experiment data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.