Fault Diagnosis Using Data Fusion with Ensemble Deep Learning Technique in IIoT

Department of Computer Science Engineering, Saranathan College of Engineering, Trichy 620012, Tamilnadu, India School of Mechanical Engineering, Vellore Institute of Technology, Vellore 632004, Tamilnadu, India Department of Computer Science Engineering, Presidency University, Bengalur 560064, Karnataka, India Department of Electronics & Communication Engineering, Lingayas Vidyapeeth, Faridbad 121002, India Department of Computer Science Engineering, Saranathan College of Engineering, Trichy 620012, Tamilnadu, India Department of Computer Science, Asutosh College, West Bengal 700026, India Department of Computer Science Engineering, Ambo University, Ambo, Ethiopia


Introduction
Connected gadgets are commonplace in the IoT, a computing paradigm that relies on ubiquitous Internet connectivity. ese smart things can sense their surroundings, transmit, and analyze the data they collect from the environment and then return relevant details to their surroundings in a form that can be understood by humans. M2M technologies with applications in the automation industry make up a subset of the IIoT, which is a subset of the IoT. Improved production e ciency and long-term viability are two key bene ts of the IIoT's [1,2] introduction to the industry. For Industry 4.0, which is enabled by the integration of cloud technologies and cyber systems, a wide range of sensors are being installed around the industrial operational situation and tackled. Proactive maintenance and a reduction in unplanned downtime can be achieved by the use of [3][4][5][6][7] data analysis technologies.
If some measures are missing owing to network or hardware subjects in the IIoT, then we must have a working mechanism in place. e problem of value imputation becomes crucial when sensor data contain many missing values. High-frequency data collection results in large gaps between data points, and all measurements taken during that period are lost if the network goes down. When data are missing [8], it could be because of a sensor failure or a network failure, or because hackers have removed data with malicious intent while it is being collected, processed, stored, or sent. Filling in the missing values is a related research challenge that must be addressed to ensure that the imputed values are as close as feasible to the genuine values to analyze the data. e data collected are so diverse that to deal with missing data in IoT systems, the methodologies created must be able to provide a high level of confidence for various applications and endure the expanding deployments in the IoT (and IIoT) space. Additionally, real-time IoT application requirements require light-weight solutions [9].
Only data imputation, anomaly detection, and fault classification have been documented in the literature so far. Because each strategy may be maximized individually in this study, we can increase the monitoring system's overall performance by integrating all three techniques. ere are three primary objectives for sensor networks in the IIoT, one being extracting relevant information for decision making [10,11].
In the raw sensor data provided by IIoTsensors, there are a lot of unnecessary and uncleaned data. Consequently, to get any meaningful information from the cleaned IoT sensor data, the raw sensor data must be cleaned [12,13]. A constrained IoT sensor network can also lead to high computational expenses and overuse of resources because of the vast amount of unwanted and worthless data [14][15][16].

Related Works
For chiller malfunction detection systems, Srinivasan et al. [15] showed the rank of understandable AI (XAI). One-dimensional convolutional neural networks (CNNs) were created by Li et al. [16] for defect identification in HVAC systems. Other M&E service systems have datadriven FDD approaches proposed in addition to the interpretability research for HVAC systems. Defect detection in sewerage systems was pioneered by Kumar et al. [17]. For picture object detection, the deep learning assembly trusts the CNN. In comparison to other machine learning (ML) techniques, the image processing skill employing the CNN is more understandable by experts. Gonzalez-Jimenez et al. [18] evaluated the existing fault diagnosis methods for electrical energies and re-examined the general process utilizing ML practices for electric drive fault detection. e lack of specific events in each electric drive is a major shortcoming of the data-driven FDD technique [18].
A microgrid's energy management system was studied by Marquez et al. [19], who used a fault detection and reconfiguration process. A reconfiguration block received fault information through acquiring residuals. Microgrid fault detection is all discussed in Morato et al. [20]. For early wind turbine breakdown detection, Ruiming et al. [21] proposed combining SCADA data and a dynamic network marker. Radial basis functions with two input parameters were used by Hussain et al. [22] to detect faults in solar systems [22]. Most solar energy applications are concerned with fault discovery in solar systems rather than fault diagnosis in solar power facilities. To better anticipate solar hot water system performance under various weather circumstances [23,24], multiple deep learning models have been constructed to look for deviations between predictions and observations. A significant advantage over prior approaches is the ability to isolate problems with the collectors' optical efficiency, flow rate, and thermal losses. In contrast to optical efficiency problems, which are caused by dirty or externally imposed collectors, deteriorating, breaking down, corroding, or otherwise degrading, problems with flow rate are caused by a loop that is out of balance, relative to the rest of the plant [25]. Assuming, as is the case in most real plants, that the only flow meter for the whole system is situated at the pump, and that thermal losses are caused by dirt, wear, insulation failures, and pipe breakage. However, the temperature reduction could be caused by an incorrect reading of the loop flow rate or by a broken pipe. Since the treatment for each case is different, it is important to know where the problem is to fix it quickly. Since a flow meter replacement is costly, it should only be done when there is confidence that this is the faulty component.

Proposed System
Consistently or in response to an external incident, IIoT sensors generate data. e other phase involves the collection, aggregation, analysis, and visualization of data generated by sensor nodes. is information is subsequently translated into a form that can be communicated as a response to an external stimulus. Data from IIoT sensors have the following notable properties: Technical constraints: the sensor's small size imposes limitations on the sensor's computer power, storage capacity, and memory. Consequently, sensor data may be lost or incorrect information may be obtained if these devices are attacked or fail to operate. Real-time processing: in future, the sensor network will be able to execute increasingly complicated networking activities and perform real-time data transformations from raw sensor data. Scalability: the sensor network in the real world is made up of a variety of sensors and actuators. As the number of sensors and actuators continues to expand, the need for scalable sensor networks that can handle the increasing volume of data expands too. Data representation: sensor data are often stored in the form of a tiny tuple containing structured information. Sensor data can be represented in a variety of ways, including Boolean, binary, featured, continuous, and numeric. Heterogeneity: there is a wide range of data from IIoT sensors from rigorously formatted datasets to real-time information systems.

Denoising.
IIoT networks create a lot of sensor data, which need to be analyzed and used for real-time decisionmaking. Sensor data have a wide range of features, including high velocities, large volumes, and a wide range of dynamic values and type values. ey pollute and complicate decision-making in real time, as the sensor data are being collected and analyzed. Unwanted changes and modifications to the signal's original vectors are caused by noise, an uncorrelated signal component. To process and utilize the unusable data, resources are wasted because of the noise characteristic. It is possible to accurately characterize the signal using wavelet transform methods and to solve the problem of signal estimation using these methods. By reducing the signal's noise, the wavelet transformation keeps the original signal coefficients intact. is is accomplished through the use of a thresholding approach that is optimized for low-coefficient noise signals. To analyze and synthesize continuous-time signal energy, wavelet transformation is widely used.
To express the signal energy, we can use e(t), t − R.
In (1), it must be within the squared search space L2 for the signal energy e(t) to satisfy the requirement (R). Analysis of discrete-time signal energy can also be done using the wavelet transformation.

Missing Data Imputation.
When dealing with missing data, imputation is a necessary preprocessing step [26]. Various sectors and fields such as smart cities, healthcare, GPS, and smart transportation rely heavily on data generated by the internet of things (IoT). Algorithms for analyzing IIoT data typically presume that the data are completed before beginning their analysis. IoT data that are partial or missing can give unreliable results because of the data analytics conducted on that data. For the IIoT, an estimation of the missing value is required. As a first step, three things must be done. Identifying the cause of missing data is a critical first step. is is a result of poor network connectivity and defective sensor systems as well as environmental conditions and synchronization challenges. Data that are entirely missing at random (MCAR) are the most common sort of missing data, which is also the most difficult to find (NMAR). e next step is to look at the patterns of data that are missing. A random missing pattern is RMP and a monotonous missing pattern is AMP. As the last step, they create an IIoT for the missing datasets.

Data Outlier Detection.
Sensor nodes in the IIoT sensor network are extensively dispersed and diverse. Note that, in a real-world physical context, such a design leads to significant sensor node failure and danger owing to a variety of outside influences. As a result, the IIoT sensor network's original data are vulnerable to manipulation, leading to data outliers [27]. Data outliers must be identified before data analysis or decision-making may take place.
Voting mechanism: an aberrant sensor node can be recognized in this manner through the comparison of its readings with those of neighboring sensors. According to Shahraki et al. [28], using a Poisson distribution to generate data in sensor network applications is the norm. Outliers for short-term, nonperiodic, and inconsequential variations in data patterns are generated in the IoT sensor network's data sets. Outliers in the IoT sensor network data with a Poisson distribution can be easily identified using the standard deviation and boxplot. e data generated by the failing sensor node and its nearby neighbors are also assessed using Euclidean distances in a distributed context. e data from this node is considered an outlier if the estimated variation exceeds a predetermined threshold. is approach, with its simplicity and convenience of use, relies heavily on the proximity of the sensor nodes to each other. e sparse network also has low precision.

Data Fusion.
It is required to integrate or fuse data from several sensors to increase the accuracy of various applications. Sifting data from multiple sensors into an accurate, reliable, and trustworthy representation of the dynamic system's state is known as sensor fusion. is approximation is more accurate than using the sensors one at a time. Sensor fusion aims to lower the system's cost, complexity, and the number of parts while also improving the system's sensing precision and confidence. It is a multifaceted approach.
System states can include acceleration and distance from sensors or mathematical models. In addition to increasing the quality of data, the fusion of sensors can also boost the dependability, measure unmeasured states, and expand the coverage area.
Using this method, all sensors are connected and used to classify each other. A data-level fusion occurs at this point. When all of the sensors' data is combined, data features can be retrieved. Objects with sensors can be identified using these data properties. When numerous sensors' association identities are jointly proclaimed, this technique of direct fusion is also known as joint identity declaration. Equations (2)-(5) show the formal design of the direct fusion process.
e feature extraction result is subjected to the identity declaration function (g) in order to identify the specific sensor data (P). Finally, the function JID declaration (.) and the result Q are the outcomes of joint identification in the direct fusion strategy.

Fault Classification Using Proposed
Model. An ensemble deep learning model for fault diagnosis uses these fused datasets as input.

Deep Neural Network (DNN).
Artificial neural networks (ANNs) are a class of techniques that use stacks of layers to build a DNN model. Supervised learning can be used as unsupervised learning [29] as well. Weights in DNN models are stored in hidden layers. ey are constantly being recalibrated during the training as they process new information. Finding more accurate patterns is why the weights are adjusted. e researcher does not need to indicate any patterns in advance for a DNN to learn. A subfield of machine learning known as "representation learning" (sometimes known as "feature learning") underpins deep learning techniques [30]. In contrast to machine learning algorithms, which need the researcher to manually select features before they can be employed, these approaches automatically select features.
ere are four completely connected layers in the DNN architecture depicted in Table 1. According to [31], deep learning models are constructed by combining layers that are compatible and allow for effective data manipulations. Deep learning models can only take and produce outputs of a specific shape, which means that each layer of a model is limited to receiving and producing input tensors of one specific shape. According to [31], there is no need to be concerned about the connecting layers' suitability because they are created to match the geometry of the incoming layer. Tensor dimensions that are returned by a layer are known as its output shape. Table 1 demonstrates that the first hidden layer of DNN will return a tensor with dimensions of n � 1 (None, 64). e output form of this first hidden layer is (None, 64), and it has 64 neurons/units. e output shape from the first layer is automatically inferred as the input shape for the second layer. For a variable batch size, a dynamic dimension of a batch called a "mini-batch" (None) is utilized, allowing the user to specify any batch size for the deep neural network. Except for the most extreme circumstances, it is unnecessary to fix the first dimension of None at this time. During the fit or prediction phase, the batch size is determined.
To avoid models from overfitting, the dropout layers after the dense layers are utilized [32]. To stabilize the learning process and dramatically minimize training epochs, the design leverages batch normalization. e DNN learning process includes two important steps. e first stage is that the input layer delivers the raw data for the training data's forward propagation phase. As a second stage, the erroneous signal must be retransmitted. Neurons in the hidden layers process the data provided to the output layers to generate output data. Nonlinear functions are used to transfer the output data to the next layer. Activation functions refer to these nonlinear functions. e logistic function, the hyperbolic tangent, and the rectified linear unit ReLU are all examples of activation functions. An input signal from a DNN node is transformed into an output signal by these devices. is study employs the ReLU activation function, which significantly decreases training time and provides faster computation and convergence [33]. In deep learning, ReLU outperforms the sigmoid and tanh activation functions in terms of performance and generalization. DNN's final layer for multiclass models is a softmax layer, which keeps track of the probabilities associated with each class. e definition of the softmax layer used for K-class classification is as follows [34]: e f(xi) softmax function generates an output ranging from 0 to 1, with a probability total of 1. For binary classification, we will use a sigmoid function as the final layer to generate probabilities ranging from zero to one. Each new/ test unit's propensity score is calculated using these probabilities.

PropensityNet.
According to [35], the PropensityNet (PN) is a deep neural network capable of predicting propensity scores. Table 2 shows that PN has five dense layers, which means they are all connected. To tackle a binary classification problem, PropensityNet uses Adadelta [36] as an optimizer and binary cross-entropy as an error measure. As the final layer, a sigmoid function is employed to provide probabilities ranging from 0 to 1. For each new/test unit, these probabilities serve as a measure of its propensity score. PropensityNet was built using Keras with a Tensorflow backend in R [37].

CNN-LSTM.
To learn long-term dependencies, long short-term memory networks (LSTMs) [38] have typically been used (subnets). Recurrent neural networks are an example of this type of network (RNNs). For nontemporal/ sequential data, we employ a hybrid model that blends CNN and LSTM models. As a result, we check to see if the hybrid model can be tweaked to predict the probability of class membership (propensity scores). If we want to get probabilities (propensity scores) between zero and one, we will have to utilize a sigmoid function as the last layer. Li et al. [39] provide a thorough explanation of CNN-LSTM. Table 3 shows the hybrid CNN-LSTM model's design.  In the CWRU dataset, the electric spark destroyed the bearing motor, replicating actual bearing failures. e inner raceway, the rolling element, and the outside raceway of the bearing driving end or fan end all had different fault locations. Various forms of fault could be represented by the bearing rolling element's four available diameters. Each of the four load types and four speeds of the bearing motor represented a different issue in the bearing motor. For the bearing motor depicted in Figure 1, see [39]. Experimental data were drawn from a subset of the CWRU dataset. Conditions for gathering normal state data included the motor load value of 1 horsepower and 1772 rpm speed value. It was determined that the frequency was 12000 samples per second and the diameter was 0.007 inches. e drive end had faults in three places: the rolling raceway, the outer raceway, and the inner raceway.

Evaluation Metrics.
ere are many ways to evaluate the performance of a suggested approach for fault diagnostics.
e Confusion Matrix, which is a two-dimensional matrix that provides information about the actual and predicted classes, serves as the basis for all evaluation criteria. e right predictions are signified by the confusion matrix's diagonal members, while the incorrect guesses are represented by the confusion matrix's nondiagonal members. Confusion matrix attributes are depicted in Table 4.
Aside from that, recent research has employed a variety of evaluation metrics.
Precision: all samples projected as faults divided by the correct number of faults is represented by this ratio.
Recall: all samples accurately categorized as Faults are divided by the total number of samples that are in fact faults to get this ratio. Detection rate is another name for it.
False alarm rate: also known as the false positive rate, it is the proportion of fault samples that are found to be normal.
True negative rate: as a percentage of all samples that are categorized as normal, it is known as a normality index.
Accuracy: it is a measure of how many cases were correctly categorized out of all the ones that were found. When a dataset is balanced, the etection accuracy metric can be used as a helpful performance measure.
F-measure: to put it another way, it is the arithmetic mean of precision and recall combined. Accuracy may be evaluated by looking at both the system's precision and recall, which is what this technique is all about.

Performance Evaluation of the Proposed Model.
Here, the performance of the proposed ensemble model is tested with two different training sets and testing sets. Initially, 50% of training data and 50% of testing data are considered for validation, which is shown in Table 5 and Figure 2 presents the comparative analysis of the proposed model with different techniques in terms of accuracy for different data ratios.
In the analysis of accuracy, CNN, DNN, and CNN-LSTM achieved 92%, LSTM and RNN achieved nearly 85%, PropensityNet achieved 93%, and the proposed model achieved 94.32%. e single classifiers achieved less performance, but when it is in ensemble state, they achieved better performance. e reason is that training data are accurate and the prediction of faults on each technique is accurate and it is finalized effectively. However, the accuracy of performance and recall is low even for the proposed ensemble model (i.e., 93.24% of recall) and this is because only 50% of data is trained and tested with the remaining 50% of data. In addition, some data are missed while training the model. But, in the analysis of precision, all techniques achieved nearly 88% to 90% and achieved nearly 92% to 96% of F-score. e next set of experiments is carried out by considering 75% of training data and 25% of testing data, which is shown in Table 6.  When the training data is increased, the performance of the models is increased; for instance, the proposed ensemble model achieved 95.62% of accuracy, 98.32% of precision, 94.62% of recall, and 94.53% of F-score. When compared with all techniques, RNN achieved low performance, i.e., 85.71% of accuracy, 84.32% of precision, 85.93% of recall, and 83.45% of F-score. e reason for poor performance is that RNN takes a long time to train the network and it is inefficient to handle the missing data. Moreover, the raw data are fused by direct fusion techniques, and then it is used for identifying the faults in machines. All proposed single classifiers such as DNN, CNN-LSTM, and PropensityNet achieved nearly 94% of accuracy, 96% of precision, 92% of recall, and an F-score. LSTM and CNN models achieved 92% of accuracy, 93% of precision, 92% of recall, and 91% of F-score. e training set is increased to 80% and testing data is set at 20%, which is shown in Table 7.
Most of the techniques' recall and F-score are nearly the same; for instance, RNN

Conclusion
As artificial intelligence technology continues to advance, it is now possible to foresee mechanical failures based on the IIoT. Sensor data fusion knowledge relies on big data processing and analysis. Models and methods for sensing data fusion in defect detection and prediction were examined in this research. e direct fusion model is provided here in terms of fusion models. To train and retrieve the original data, the relevant ensemble methods based on deep learning can be immediately implemented. Data preprocessing is not usually necessary, but the learning curve was steep and the machine performance needs were high. Because of this, the preprocessing stage includes missing data, outlier detection, and data imputation. Results from the trials demonstrate that the suggested ensemble model achieved 94% accuracy on 50%-50% of data, 95.6% accuracy on 75% to 25% of data, and 98% accuracy on 80%-20% of data, where the single DL models achieved approximately 96% accuracy on 80%-20% data.

Limitation and Future Scope
e following are the obstacles and difficulties encountered in the context of fusing sensory data, based on the current development state of fusion models: (1) Fusion models are not all the same: there is no onesize-fits-all model for mechanical defect diagnosis and prediction in the field. A large number of current fusion models are based on a certain type of device. Developing a common framework for identifying mechanical equipment failures in the future would be advantageous. (2) Uncertainty in the original data: during the data gathering process, a lot of noise is present in the actual data obtained since environmental elements cannot be controlled. Data fusion and feature extraction are often incorrect if the unique data are used directly. It is therefore vital to select a suitable data preprocessing approach instead of techniques utilized in this study when raw data are given. A set of preprocessing methods for diverse sensors used in fault analysis and prediction for mechanical gear will be beneficial in the future development process. (3) Long running time: finding appropriate hyper parameters requires a lot of running time when using fusion methods based on deep learning. Overfitting can also occur. Fusion techniques typically necessitate feature extraction by hand, which adds time to the computation time. Research into the feature and decision-level fused algorithms are the focus of the majority of fusion algorithms. ere are very few data fusion algorithms. As a result, it will be necessary to continue working on data fusion algorithms in the future.
Data Availability e data that support the findings of this study are available on request from the corresponding author. e data are not publicly available due to privacy concerns.

Conflicts of Interest
e authors declare that they have no conflicts of interests.