Multisensor-Based Heavy Machine Faulty Identification Using Sparse Autoencoder-Based Feature Fusion and Deep Belief Network-Based Ensemble Learning

Faulty identi ﬁ cation plays a vital role in the area of prognostic and the health management (PHM) of the industrial equipment which o ﬀ ers great support to the maintenance strategy decision. Owing to the complexity of the machine internal component-system structure, the precise prediction of the heavy machine is hard to be obtained, thus full of uncertainty. Moreover, even for a single component, the feature representation of the acquired conditional monitoring signal can be di ﬀ erent due to the di ﬀ erent deployment of the sensor location and environmental inference, causing di ﬃ culty in feature selection and uncertainty in faulty identi ﬁ cation. In order to improve the model identi ﬁ cation reliability, a novel hybrid machine faulty identi ﬁ cation approach based on sparse autoencoder- (SAE-) and deep belief network- (DBN-) based ensemble learning is proposed in this paper. First, six kinds of statistical features are extracted and normalized from multiple sensors monitoring the same target component. Second, the six extracted features are fused by the two-stage SAE proposed in this paper from the sensor dimension and feature dimension, respectively. The composite feature fused in the feature dimension is regarded as the comprehensive representation of the corresponding component. Finally, the fused features containing comprehensive representation of di ﬀ erent components are utilized to predict the machine health condition by the ensemble of multiple deep belief classi ﬁ ers. The e ﬀ ectiveness of the proposed method is validated by the two case studies of wind turbine gearbox and industrial port crane. The experimental result shows that the proposed ensemble learning approach outperforms other traditional deep learning approaches in terms of the prediction accuracy and the prediction stability when dealing with multisensor feature fusion and the precise faulty identi ﬁ cation of the industrial heavy machine.


Introduction
With the development of the modern industrial manufacturing scale, the accurate faulty identification of the heavy industrial machine has been becoming increasingly important. The contemporary heavy machine has the characteristic of having highly complex internal component-system structure. Moreover, even for a certain component, multiple sensors are mounted on different locations to acquire complementary information. How to effectively utilize these multisensor information and raise the reliability of the machine faulty identification result remains a great challenge.
The conventional faulty identification approach is usually based on the historical conditional monitoring data which can be very useful for making the appropriate maintenance strategy to avoid catastrophic failure and save excessive maintenance costs. The traditional intelligent faulty identification procedure usually consists of three steps: feature extraction; feature fusion; and faulty identification.
1.1. Feature Extraction. During the feature extraction period, suitable statistical features are extracted and processed from the obtained sensor monitoring signal on the data space based on the expertise knowledge of signal processing to construct a suitable health indicator (HI) which can well represent the machine health status and provide useful information for faulty identification. The construction approach of the HI can be categorized into two categories: single sensor-based approach and multisensor-based approach. The single sensor-based approach is totally based on the understanding of the extracted single source monitoring signal by selecting the appropriate statistical and signal processing analysis method. Gebraeel et al. [1] extracted the average amplitude and harmonic wave of the vibration signal as the health indicator to represent the bearing condition status. Malhi et al. [2] extract the root mean squared error and the peak value of the vibration signal to construct the health indicator of bearing status by using continuous wavelet transform. Kilundu et al. [3] construct the bearing HI indicator by using the singular spectrum analysis of the vibration signal. The single signal-based faulty identification approach can capture the faulty symptom of the target component with specific physical meaning; however, it can only capture the faulty symptom from a single data dimension with lower confidence level.
In order to raise the reliability of the faulty identification result, the multisignal-based approach is proposed which reflects the potential faulty symptom of the target component from multiple data dimensions [4][5][6]. In the multisignalbased approach, signals including forces, vibrations, temperatures, and acoustics are fused for the faulty identification task [7]. Compared to the single signal-based approach, the multisignal-based faulty identification approach can make the identification result more reliable [8][9]. Hao et al. [10] proposed a multisensor-based approach for the degradation identification of the mechanical component by evaluating the composite index which is combined with multiple sensor signals collected under multiple operational conditions. A motor faulty identification model based on sensor data fusion by using support vector machine and the short-term Fourier transform is proposed by Banerjee and Das [11].
1.2. Feature Fusion. The feature fusion is usually performed on the feature space where different kinds of statistical features extracted during the feature extraction phase are integrated into a composite feature by using technologies such as principal component analysis (PCA), kernel-based methods, and manifold learning. The feature fusion procedure is aimed at achieving considerable information compression and facilitating more effective feature. Xu et al. [12] fuse the time-domain and frequency-domain features extracted from the flame oscillation signal by using PCA. Wang et al. [13] propose the feature fusion and feature selection method from forces and vibration signals by using kernel PCA. Sun et al. [14] proposes a feature fusion method for handwritten recognition by using locally linear embedding.

Faulty Identification.
During the faulty identification phase, the composite feature obtained from the feature fusion step is sent to a classifier for faulty identification [15][16]. Currently, various kinds of machine learning classifiers have been widely explored in the faulty classification stage; however, it is extremely hard for shallow-structured classifier to achieve precise faulty prediction for industrial heavy machine with complex internal component-system structure [17][18][19].
Recently, with the development of the deep learning technology, it has been widely used in feature extraction: feature fusion and faulty identification phase due to its powerful automatic feature learning and input-output mapping ability with its multi-layer structure. The deep learningbased faulty identification approach can release the dependence on the expertise knowledge of faulty identification, and it has been proved to be effective in improving the identification result [20][21][22][23]. However, there are three issues needed to be further considered.
(1) During the feature extraction period, most existing methods directly extract statistical features from multiple sensor signals. However, these sensors are mounted on different locations to acquire a variety of condition monitoring signals with different characteristics due to sensors' arrangement and environment inference. Therefore, the features extracted from these sensors are disordered and correlated  Dealing with the three issues listed above, this paper proposes a multisensor-based hierarchical machine faulty identification approach by using a two-stage sparse autoencoder-based feature fusion and the deep belief network-based ensemble learning. The innovation point of the proposed paper is listed as follows: Considering the first issue listed above, six kinds of statistical features are extracted and normalized from sensors monitoring the same component, and these features are sent to six different threelayer SAEs for feature extraction. The middle layers of six SAEs are extracted as the complete representation of the multisensor input signals. Considering the second issue listed above, the six extracted middle-layer features are concatenated and sent to a specially designed three-layer SAE for feature fusion to obtain a more robust feature. The six extracted middle-layer features are merged into a sixnode compressed feature which is regarded as the composite health indicator of the target component. Considering the third issue listed above, the composite health indicators obtained during the feature fusion period which reflects the comprehensive health statuses of different components are sent to multiple DBN classifiers to independently classify the machine faulty status, and the outputs of these classifiers are aggregated by using the Bayesian weighting strategy which represent the affiliated degree between the target component and the certain machine conditions. The rest of this paper is organized as follows: Section 2 briefly reviews the related research literature including the sparse autoencoder, the deep belief network, and the ensemble learning; Section 3 presents the framework of the proposed hybrid faulty prognostic approach; Section 4 describes the case study and the competition with other deep learning models; and finally, Section 5 summarizes the main contribution and future work of the proposed paper.

Methodology
2.1. Autoencoder and Sparse Autoencoder. The autoencoder is a kind of unsupervised feature learning approach which can learn the high-level representation of the raw input signal. The output layer of the autoencoder has the same dimension as the input layer which minimizes the reconstruction error between the input and the output so that the high-level features contained in the hidden layer can be obtained. The learning process of the autoencoder consists of two procedures: encoding and decoding. The encoding process acts as a feature extractor realizing the transformation from the raw input into the high-level feature, and the decoding process acts as a feature detector which reconstruct the input in the output layer based on the obtained highlevel feature. The detailed structure of the autoencoder can be illustrated in Figure 1.
Assuming the n-dimensional raw input vector is fX1, X2, ⋯XNg, Xi ∈ R n . During the encoding process, the raw input vector was transformed into the high-level feature represented in the following equations: 3 Journal of Sensors where w 1 ij denotes the weight between the ith dimension in the input layer and the jth dimension in the hidden layer. The parameter b 1 j denotes the bias of the jth node in the hidden layer. The function f ð * Þ denotes the activation function which transforms the raw input vector into the high-level feature.
During the decoding process, the decoder transforms the hidden layer h j into the output layer y, and the transform function can be illustrated in the following equation: where w 2 jk denotes the jth dimension in the hidden layer and the kth dimension in the output layer. The parameter b 2 k denotes the bias of the kth node in the output layer. In order to obtain the compressed high-level information of the raw input signal, the "sparse" restriction is applied on the hidden layer. The "sparse" restriction is aimed at making the statuses of the majority nodes at an inactive stage whose output     Journal of Sensors is close to zero. The "sparse" restriction can be illustrated as shown in the following: Here, the parameter e p j denotes the average activation value of the jth node of the hidden layer, and n denotes the number of the training samples. α j ðx i Þ denotes the activation value of the jth hidden node of the ith sample.
During the training stage, the activated output of the hidden layer is restricted within the predefined sparse value as is illustrated in the following equation: Here, the parameter p denotes the "sparse" parameter. The KL (Kullback-Leibler) divergence is used for the similarity evaluation between the "sparse" parameter p and the actual average activationp. Once the e p j = p, the KL divergence equals zero, and the loss function of SAE can be represented as shown in the following equations: Equation (6) denotes the loss function of the autoencoder, and Equation (7) denotes the "sparse" restriction. The sparse autoencoder is aimed at making the output as equivalent as possible to the input, obtaining the highquality internal representation of the input vector. In this paper, the sparse autoencoder is used for the construction of the health indicator of individual component, containing the information of different sensor locations and feature characteristics.

Deep Belief
Network. The deep belief network is a kind of unsupervised greedy layer-by-layer training algorithm which was proposed by Hinton in 2006 for solving the problem of structural optimization existed in the deep learning algorithm. The DBN is constructed by stacking multiple layers of restricted Boltzmann machines which is illustrated in Figure 2.
Each RBM consists of two layers with visible units and hidden units, respectively, and there is no connection between the same layers which only exists between the visible layer and the hidden layer. The training process of the DBN consists of two stages: pretraining the stacked RBM one by one in a greedy way and fine-tuning the whole network for obtaining an ideal performance. Since the DBN is composed of multiple stacked RBMs, it can be trained through contrastive divergence [24] in an unsupervised manner.
In recent years, the DBNs have been successfully used in faulty identification areas [25][26][27]. In this paper, multiple DBN classifiers are used independently for machine faulty classification from the aspect of different components.

Ensemble Learning.
The ensemble learning is based on the notion that "two heads are better than one head." The performance of aggregating multiple classifiers has been 5 Journal of Sensors proved to be better than a single classifier in many fields [28][29][30][31]. By using the appropriate combination strategy, the ensemble learning can fully take advantage of each individual classifier so as to improve generalization [32]. The ensemble learning can be usually categorized into two types: sequential ensemble learning and parallel ensemble learning. In sequential ensemble learning, different algorithms are sequentially combined in the way that the first algorithm is used to generate a model, and then the second algorithm is used to correct the first model and so on which is also called boosting. In parallel ensemble learning, different algorithms are used for model training independently which is also called bagging. There are two kinds of parallel ensemble learning namely homogeneous ensemble learning and heterogeneous ensemble learning [33].

Homogeneous Ensemble Learning.
In the homogeneous ensemble learning scheme, the same types of classifier are used but with different training datasets. These datasets are collected from multiple data sources, and the data federation of multisources is achieved which is illustrated in Figure 3. This type of ensemble is also named as data variation ensemble, and it is mainly used for the management of industrial big data under large-scale scenarios.   Validate the proposed ensemble learning methodology in the testing datasets by using the mean value of the correlation weight obtained during the training process Algorithm 1: General procedure of the proposed methodology. 6 Journal of Sensors

Heterogeneous Ensemble Learning.
In the heterogeneous ensemble learning scheme, a number of different types of classifiers but over the same training datasets are applied as can be seen in Figure 4, and the model diversity is achieved in this scheme. This type of ensemble is also called function variation ensemble which is mainly used for the generalization enhancement of model outputs.
In this paper, the homogeneous ensemble learning of several DBN classifiers are used for the construction of the hierarchical data processing framework of the componentsystem network.

Data Preprocessing and Feature Fusion with SAE
3.1.1. Data Segmentation. The conditional monitoring data are obtained from multiple sensors monitoring the same target component but with the different installed locations. The obtained datasets were segmented into two categories: one for training and the other for testing.
3.1.2. Feature Extraction and Normalization. During the feature extraction and normalization period, the six timedomain features including the impulse X Impulse , kurtosis X Kurtosis , skewness X Skewness , shape factor X Shape f actor , clearance factor X Clearance Factor , and crest factor X Crest are used in this paper. Formulas for the six used features are presented as follows: where xðiÞ denotes a series of sensor signal and the parameter "N"refers to window length. The six extracted     Journal of Sensors features are normalized by using Equation (14) where x i,j ðtÞ denotes the extracted feature of the jth sensor within the ith signal segmentation. max ðx i,j Þ and min ðx i,j Þ represent the maximum feature value and the minimum feature value within the ith signal segmentation of the jth sensor. x 3.1.3. Feature Fusion and Feature Construction Based on SAE. To construct the comprehensive health indicator of the target component, the proposed two-stage SAE-based feature fusion method is executed in this paper. First, six kinds of normalized features extracted from a group of sensors monitoring the same component are sent to six three-layer SAEs, respectively, with the structure as shown in Figure 5. Assuming that there are N sensors monitoring the same target component, the node number of the hidden layer is set as 2N + 1 so that the layer can be forced to learn an over complete representation of each kind of feature obtained from multisensor signals [34]. Since there are six kinds of features extracted from the sensor signal which is mentioned in Section 3.1.2, the six comprehensive features are concatenated and sent to the second SAE as shown in Figure 6. The node number of the hidden layer of the second SAE is set to 6 according to the parameter expertise adjustment in literature [34] so that the network is forced to learn a highly compressed representation of the six extracted features. The six-node compressed feature is extracted and regarded as the composite health indicator of the target component.

Individual Faulty Classification Based on Single DBN.
The extracted six-node compressed features are fed to multiple DBN models which are used as the subclassifier of the ensemble learning network. In this paper, the DBN is constructed by stacking two RBM layers and one softmax layer for faulty classification with the structure as shown in Figure 7, where "V" denotes visible vector and "h" denotes the hidden vector. The parameter "W" denotes the connection weight between the visible layer and the hidden layer. The input dimension of the proposed DBN is set to six, and the output dimension of the Softmax layer is equivalent to the faulty type of the corresponding faulty    Journal of Sensors classification task. The contrastive divergence proposed by Hinton et al. [24] is used for the fast training of the DBN model. Firstly, the conditional probability of the hidden units contained in RBM1 is obtained by using Equation (15). The Gibbs sampling is then employed to determine the state of the hidden units. Finally, the state of the visible units can be obtained through Equation (16), which can be regarded as the reconstruction of the former hidden layer. The param-eter w i,j denotes the connection weight between the ith visible node and the jth hidden node. The parameters α and β denote the biases of the visible and hidden layer, respectively.  13 Journal of Sensors The gradient variation can be obtained through maximizing the log-likelihood by performing stochastic gradient descent. The updating regulations of the parameter θ can be achieved by using the following equations: Δβ Here, σ denotes the learning rate, h * i pðhjvÞ denotes the expectation value of the conditional distribution pðhjvÞ, and h * i rec denotes the distribution expectation of the reconstructed model. The stacked RBMs are first pretrained in an unsupervised way. After the DBN pretraining process is finished, the DBN model is fine-tuned starting from the last layer by using the labeled data. The softmax layer of representing the probability of the different faulty types are added for a classification problem, and the entire model parameters are optimized by using the backpropagation algorithm.

Ensemble Learning Based on Bayesian Weighting.
Since the individual DBN classifier only represents the machine health status from the component perspective, in order to construct the correlation network between individual components and different machine faulty conditions, the ensemble learning approach based on Bayesian weighting strategy is applied for integrating the outputs result of multiple DBN classifiers. Assuming that there are N individual DBN classifiers corresponding to the health statuses of N individual components of the equipment, the machine health status can be represented as follows.
Here, the Corrðcomponent j Þ denotes the correlation weight between the machine and the target component If the true state of the of the machine state is exactly S i , the postier correlation weight can be updated as shown in Equation (22); otherwise, the weight will not be updated. It should be noted that the updated weight assignment used here is only corresponding to the classification type of the state S i , and the different machine faulty states have different component-system weight assignment.

The Combination of the SAE and the DBN-Based
Ensemble Learning. The overall framework of the proposed approach is illustrated in Figure 8. First, the sensors monitoring the same component are categorized, and six classical time-domain features are extracted and normalized from these sensor data. Second, the SAE-based feature fusion is used for obtaining the compressed representation of the six normalized features which is also regarded as the          Journal of Sensors multisensor datasets of the wind turbine gearbox provided by the enterprise of Shanghai electric group. As shown in Figure 9, the wind turbine gearbox mainly consists of four main components namely large wheel, small wheel, drive shaft, and pedestal. Each component is monitored by three vibration sensors from the vertical, horizontal, and lateral directions. The sampling frequency is set as 36 kHz, and the sampling time is set as 1 s under the load speed ranging from 800 rpm to 1600 rpm. The time window is set as 10, where each sample contains 10 sampling points. Therefore, there are 3600 samples in total, in which half are for training and half are for testing. There are five kinds of gearbox conditions namely healthy, rotor unbalance, rotor misalignment, rotor friction, and bearing loosing. The specific details of the datasets of Case Study I are illustrated as shown in Table 1.

Feature Extraction and Normalization.
In this paper, six kinds of features including the impulse factor, kurtosis, skewness, shape factor, clearance factor, and crest factor are extracted and normalized from each sample collected by sensors monitoring the same subsystem. Take the component of "driving shaft" as an example; the raw signal and the six kinds of normalized features are extracted from three vibration sensors from vertical, horizontal, and lateral directions, respectively, as shown in Figures 10(a)-10(g). It can be found that in Figure 10 that the amplitude of the raw signals and all the six extracted and normalized features of impulse, kurtosis, skewness, shape factor, clearance factor, and crest factor varies obviously due to the different monitoring locations of sensors which may increase not only the difficulty of feature selection but also the uncertainty in faulty identification. Therefore, it is necessary to find an effective method to merge the information collected from the three vibration sensors to obtain the comprehensive representative status of the target component, and in this, paper the proposed two-stage sparse autoencoder is introduced owning to its powerful feature compression and feature reconstruction ability.

The Feature Fusion.
In order to demonstrate the robustness of the fused feature and to visualize the feature representation ability, the raw sensor data without SAE extraction, the six statistical features being further extracted by SAE and the six-node compressed composite feature fused by the proposed two-stage SAE model are represented on a 2-D feature map as shown in Figures 11(a)-11(g) by using the t-SNE (t-distributed stochastic neighbor embedding) technology. As shown in Figure 11(a), the t-SNE fails to separate the five turbine gearbox conditions with raw sen-sor data of the multisensor. Most of the samples are mixed with each other, and it can only be distinguishable between the healthy and unhealthy state which may greatly influence the accuracy of the faulty identification model. In Figures 11(b)-11(g), most of the samples are correctly classified by the six extracted and normalized features fused by one stage SAE with less samples being mistakenly classified, where only a few marginal samples of rotor unbalance, rotor friction, and rotor misalignment are mistakenly mixed with each other, whereas in Figure 11(h), all the five conditions are perfectly separated clearly compared with the raw sensor data illustrated in Figure 11(a) and the six extracted and normalized features fused by one stage SAE illustrated from Figures 11(b)-11(g). Therefore, the proposed six-node compressed feature can greatly improve the data clustering ability because the compressed feature contains the compressed information of the six features rather than a single feature, which can be more effective when dealing with the nonstationary characteristics of faulty signals. The configuration of the SAEs used for feature extraction and feature fusion is illustrated as shown in Table 2.  Table 3. The DBN used as the individual classifier has five layers in total, where the node number of the input layer node is set as 6, which has the same dimension of the six-node compressed feature. The node number of the output layer is set as 5 according to the type number of the turbine gearbox health conditions. The overall testing accuracy curves and overall loss curves of each individual DBN and the proposed ensemble learning networks are illustrated as shown in Figures 12(a) and 12(b). It can be found that the proposed ensemble learning approach outperforms other five individual DBN classifiers in terms of the overall testing accuracy and the overall testing loss.
The experiment is repeated 10 times, and the average accuracy in predicting each kind of conditional status and the total accuracy is illustrated as shown in Table 4. It can be found that the proposed DBN-based ensemble learning approach achieves the highest average prediction accuracy in terms of predicting each kind of health condition and the total accuracy. The "variance" is used as the metric to evaluate stability of the faulty prediction model. From the variance evaluation of each classifier, it can be found that the performance of the proposed ensemble learning approach remains stable in predicting each kind of   Journal of Sensors   Table 5. It can be found that the proposed DBN-based ensemble learning approach based on Bayesian weighting strategy achieves the highest total accuracy of 96.78% while ensemble learning based on "winner takes all" strategy achieves the total accuracy of 95.32%. The total accuracy obtained by the rest four single classifiers are beyond 92%. Therefore, it can be concluded that the ensemble learning classifier can be very effective when dealing with the faulty prediction task of heavy machine with complex component-system structure.
From the stability evaluation of the different classifiers, it can be found that the proposed ensemble learning approach with the Bayesian weighting strategy achieves the lowest variance value of 0.66, indicating the model accuracy stability in predicting each kind of machine conditions. Moreover, it can be found that the ensemble learning approach with the "winner takes all" being used for comparison outperforms the other four single classifiers in terms of the prediction stability with the variance value of 0.87. The reason should be that although the ensemble learning model based on "winner takes all" is totally based on the principal of majority voting which cannot reflect the component-system affiliation relationship of the heavy equipment, it can reflect the component-system structure of the heavy machine which cannot be achieved by a single classifier.
(2) Proving the Effectiveness of the Proposed Two-Stage SAE-Based Feature Fusion. In order to evaluate the proposed two-  Figure 14: Six extracted normalized features of pinion bearing of (a) raw signal, (b) impulse factor, (c) kurtosis, (d) skewness, (e) shape factor, (f) clearance factor, and (g) crest factor. 21 Journal of Sensors stage SAE-based feature fusion, we fix the classifier as the proposed DBN-based ensemble learning classifier based on the Bayesian weighting strategy. The raw sensor data, the six statistical features with SAE extraction, and the sixnode compressed features are used as the input for comparison. The simulation result is illustrated as shown in Table 6. It can be found that the proposed ensemble learning approach can achieve the highest total accuracy of 96.78% with the model input of the six-node compressed feature. The six statistical features with SAE extraction can achieve the accuracy ranging from 82% to 87% which can be inferior to the six-node compressed feature. The raw sensor data achieve the lowest total accuracy of 55.24%. The conclusion is consistent with the visualization effect of t-SNE illustrated in Figure 11 Figure 13, the crane trolley mainly consists of four subsystems namely the pinion bearing, drum shaft, retarder input shaft, and retarder output shaft. There are 10 vibration sensors with the sensor series number from PB-01 to ROS-03 installed inside the crane trolley, and the sensor deployment is illustrated as shown in Table 7. The sampling frequency in Case Study II is set to 24 KHz, and the sampling time is set as 1 s. The time window is set to 10, and there are 2400 samples in total containing 10 sampling points in each.
The specific details of the datasets of Case Study II are illustrated as shown in Table 8. There are four kinds of crane trolley conditions namely wheel biting, wire rope over winding, brake failure, and healthy.  Table 11. Since there are four kinds of crane trolley condition, the unit number of the output layer is set as 4. The overall accuracy curves and overall loss curves of each individual DBN and the proposed ensemble learning networks are illustrated as shown in Figures 15(a) and 15(b). It can be also found that in Case Study II, the proposed ensemble learning approach outperforms other four individual DBN classifiers in terms of the overall testing accuracy and the overall testing loss.

Journal of Sensors
The experiment is repeated 10 times, and the average accuracy in predicting each kind of conditional status and the total accuracy is illustrated as shown in Table 12. It can be found that the proposed ensemble learning approach outperforms the other four individual DBN classifiers in terms of the average prediction accuracy in each condition. Moreover, the proposed ensemble learning approach outperforms the other four DBN classifiers in terms of the stability in predicting the four different kinds of crane trolley conditions based on the variance evaluation in Case Study II.

Comparison with Other
Methods. Same as the Case Study I, the comparison experiments are carried out 10 times and evaluated from the input aspect and prediction aspect, respectively.
(1) Proving the Effectiveness of the Proposed DBN-Based Ensemble Learning Classifier. In comparison experiment of prediction aspect of Case Study II, we fix the model input as the six-node composite feature as shown in Table 13. It can be also found that the proposed DBN-based ensemble learning approach based on Bayesian weighting outperforms all the other four single classifiers and the ensemble learning approach based on (winner takes all) with the highest total accuracy of 97.02%.
Moreover, it can be found that both ensemble learning models outperforms the other four single models in terms of the stability evaluated with lower variance which is consistent with the conclusion of Case Study I.
(2) Proving the Effectiveness of the Proposed Two-Stage SAE-Based Feature Fusion. In comparison experiment of the input aspect of Case Study II, we fix the classifier as the proposed DBN-based ensemble learning classifier based on the Bayesian weighting strategy as shown in Table 14. It can be also found in Case Study II that the proposed ensemble learning approach can achieve the highest total accuracy of 97.02% with the model input of the six-node compressed feature. The six fused features of impulse, kurtosis, skewness, shape factor, clearance factor, and the crest factor can achieve the inferior average accuracy ranging from 81% to 86% The raw sensor data achieve the lowest total accuracy of 62.33%.
The comparison result of Case Study II is consistent with the result of Case Study I, indicating the effectiveness of the proposed faulty identification approach also being applicable in the practical use of industrial crane trolley of Case Study II.   In this paper, a novel hybrid faulty prognostic approach based on sparse autoencoder-and deep belief network-based ensemble learning is proposed. The main contribution is summarized as follows: (1) Introducing the SAE-based feature extraction method so that the features extracted from different sensors are merged into one stream, releasing the influence caused by different sensor deployment (2) Proposing the SAE-based feature fusion method so that the six kinds of extracted features are compressed to construct a composite feature which is regarded as the health indicator of a target component. The constructed composite feature has better robust ability when dealing with the nonstationary faulty vibration signal (3) Proposing the DBN-based ensemble learning so that the machine internal component-system relationship can be well represented. The outputs of these DBN classifiers are aggregated by using the Bayesian weighting strategy which represent the affiliated degree between the health statuses of the target component and faulty probability of the certain type of machine faulty status The proposed hybrid faulty prognostic approach is evaluated on two case studies of wind turbine gearbox and the industrial port crane trolley. Both case studies demonstrate that the proposed hybrid faulty identification approach outperforms other traditional faulty identification methods in terms of the prediction accuracy and prediction stability when dealing with the industrial heavy machine, indicating the necessity of applying the two-stage SAE-and the DBNbased ensemble learning.

Future Work.
Although the proposed hybrid faulty identification method has been well evaluated in the proposed two case studies, there are still some issues needed to be considered. Firstly, the computation complexity of the proposed Bayesian weighting strategy used in the ensemble learning process should be taken into account. In the future, some more suitable weighting strategies should be designed which can not only release the computation burden but also reflect the component-system relationship. Moreover, the proposed hybrid faulty identification approach should be expected to be evaluated in other heavy machines such as vehicle system and aircraft engine system which also have a highly complex internal component-system structure.

Data Availability
The raw/processed sensor data of the two case studies are offered by the enterprises of "Shanghai Electric Group" and the "China Shipping Industrial Company," respectively. It cannot be shared due to the corporate confidentiality.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.