The Importance of Feature Processing in Deep-Learning-Based Condition Monitoring of Motors

National Centre of Robotics and Automation, HHCMS Lab, Mehran University of Engineering & Technology, Jamshoro 76020, Pakistan Department of Mechatronic Engineering, Mehran University of Engineering & Technology, Jamshoro 76020, Pakistan School of Electronics and Computer Science, Southampton University, SO32 1PH, Southampton, UK Department of Electrical Engineering, DHA Suffa University, Karachi, Pakistan Department of Electronic Engineering, Mehran University of Engineering & Technology, Jamshoro 76020, Pakistan School of Information Technology and Engineering (SITE), Melbourne Institute of Technology, Melbourne, Australia


Introduction
Condition monitoring is described as a continuous process of diagnosis that allows prevention of unintended failure of a system. e basic principle of condition monitoring is to indicate the occurrence of deterioration by taking physical measurements at regular intervals. Subsequently, diagnosis procedures allow the planning of rectification strategies [1][2][3][4]. e primary reasons for the application of condition monitoring are increasing availability, prevention of damage, increased reliability, process optimization, increased time between outages, reduced production loss, and better operator information or insights. Its application leads to the development of prognostics, which allows for the estimation of the system's future health and the prediction of the remaining useful life of the system or system's components [5][6][7][8]. Motors are the backbone of industry; they start degrading due to different reasons such as long period of operation, variations of power supply, or harsh environment; which gradually lead to permanent damage [9][10][11]. Consequently, it becomes crucial to monitor the operation continuously.
In the past, there has been extensive research relating to anomaly detection, anomaly severity level detection, and detecting failing elements of the motor. Subsequently, efforts have been made to integrate these diagnosis and prognosis methods, which in turn increases the amount of data. Although condition monitoring system integration improves performance and increases the data volume (providing richer information), it poses different shortcomings such as increased complexity in the information correlating process and increased level of uncertainty [12]. Consequently, this requires novel approaches that can address these shortcomings leading to improved performance. AI-based approaches have been extensively used in the field of condition monitoring for many years [13][14][15][16]. With continuous progress in various models of AI, it progressed into machine learning (ML) and subsequently developed into deep learning (DL), which has driven significant impact in the development of modern industry, transportation, medical, and other domains. e DL-fuelled condition monitoring of motors has set up new horizons in industry 4.0 and has been paving the way for industry 5.0. DL algorithms have impacted almost every area including business [17], medical sciences [18], natural language processing (NLP) [19], robotics [20], transportation [21], the power sector [22], and many other sectors of the modern world. e concept of DL was first coined as "deep belief networks" (DBN) in 2006 and was considered as one of the major breakthroughs in the world of technology [23]. e ability to learn data representation becomes significant with the application of DL models and makes it very attractive in the arena of intelligent diagnosis and prognosis [24,25]. In comparison to conventional ML models, which can require significant effort in manual feature design and optimization, DL models can automatically extract the representations from the data.
In recent years, DL-based diagnosis and prognosis methods have outperformed conventional machine learning algorithms owing to their generalized nature and many other advantages such as end-to-end implementation, model upgradability, and representation learning using raw data. It does not require human knowledge or intervention in feature designing. Its performance improves with volume of data, but it requires high performance computing hardware such as graphics processing units (GPU) to inherently perform the intense number of necessary matrix multiplication operations. It advocates end-to-end problem-solving rather than dividing the problem into steps. DL-based diagnosis reduces cross-domain discrepancy by learning data representations with multiple levels of abstraction [26][27][28][29]. Compared to ML models, DL models can achieve superior performance and their classification accuracies have been tending to 100%. ese benefits of DL models have attracted the attention of researchers and they have been extensively applying these models in their domains. Various DL-based condition monitoring methods for motors have been reported by researchers [30,31]. Meanwhile, extensive adoption of DL methods has enabled industry to progress with better efficiency and sustainability [32]. DL algorithms have efficiently predicted the state of motors in industrial condition monitoring systems with a wide scope. In fact, some models such as GANs can generate data through learning a model of input distribution that is useful in cases when the dataset is small for diagnosis. ese algorithms have been in the lime light owing to their merits relating to industry 4.0 and industry 5.0 [27,33]. DL is reshaping itself through variations in architecture of models, which in turn is reshaping condition monitoring systems by adding more capabilities such as high reorganization accuracy, noise dealing capability, and end-to-end system implementation.
DL-based intelligent diagnosis of various industrial subsystems has witnessed remarkable improvement in performance [28,34]. However, the disadvantage of DL models lies in feature processing and selection of its parameters such as the learning rate, momentum, and number of neurons in layers. Parameter selection is always a timeconsuming and challenging task, which is often achieved through hit-and-trial methods [29,35].
ere have been different reviews and surveys conducted by the researchers relating to condition monitoring of motors and application of DL in this field [11,[36][37][38][39][40]. However, each review has been conducted in different contexts. For example, in [40] authors have surveyed applications of ML and DL in condition monitoring of various machines in the context of vibration data as a key factor for the surveyed studies. Meanwhile, Choudhary et al. [11] have conducted a review of various faults, which occur in induction motors based on various input data types. Authors have also briefly reviewed the techniques used in condition monitoring of induction motors such as Fuzzy Logic, Artificial neural networks, Neuro-Fuzzy inference systems, and support vector machines. ey found that noninvasive techniques such as thermal imaging are overcoming the conventional condition monitoring methods. On the other hand, in [41] authors have briefly reviewed various condition monitoring techniques particularly based on current, vibration, and acoustic signals. ere is therefore still a need for a comprehensive review on the applications of DL models in condition monitoring of motors. Furthermore, the article presents open challenges and future research directions to promote the application of DL models in engineering scenarios. Compared to the existing reviews, this review focuses on input data and feature-processing techniques used for effective fault diagnosis in the field of DL-based condition monitoring of motors. It surveys and summarizes the recent developments in actual applications of various featureprocessing techniques in DL-based condition monitoring of motors. e contribution of this article can be summarized as follows: (i) Provides an integrated overview of current trends and consolidates the recent work of various researchers related to feature processing in DL-based condition monitoring of motors (ii) Endeavours to provide an in-depth analysis approach and a valuable road map to engineers and researchers working in this field, which may assist them in signposting the direction of future research 2 Mathematical Problems in Engineering (iii) Presents the merits and limitations of DL for motor condition monitoring methods based on state-ofthe-art contributions reported in the literature e remainder of this article is constructed as follows. Section 2 reviews feature processing methods used with DL models in condition monitoring of motors, and the techniques used to resolve problems posed by such methods, Section 3 summarizes and discusses performance aspects of models, highlights challenges related to DL, and presents future directions of this field, and Section 4 provides concluding remarks on this review article.

Feature Processing for DL-Based Condition Monitoring of Motors
In DL-based motor condition monitoring, data are acquired using various sensors and stored in a database. Typically, vibration, acoustic emission, current and temperature signals are used to monitor the condition of a machine [42].
Each diagnostic method has different capabilities for detecting various types of faults in motors. In the next step, data are preprocessed, and a model is built. Subsequently, the model is trained on data. e end goals or targets are typically defined as fault detection, failure prediction, and remaining useful life (RUL) estimation but are not limited to these cases. Figure 1 illustrates the generalized concept of the DL model pipeline and its comparison with the machine learning (ML) model pipeline. Feature processing is applied to extract buried features in noisy input data through feature extraction and feature selection techniques. After performing feature processing, data are fed to various ML models such as support vector machine (SVM), decision trees (DT), random forest (RF), and K-nearest neighbour (KNN). Moreover, feature processing assists in identifying and removing outliers and redundant information from the dataset and converting raw data to more manageable groups for processing. In addition, it reduces the dimensionality of large datasets, which in turn speeds up the learning process [43][44][45]. Feature processing, which includes both feature extraction and selection techniques, is very important for deploying reliable ML-based solutions for industry. It not only simplifies the entire process of tackling a domainspecific problem but also makes the process understandable for human experts. Reducing the number of features, or enabling the selection of useful ones, greatly reduces the hardware dependence and the need of highly nonlinear function mapping by the ML models. Feature extraction techniques include: (i) time-domain techniques, such as statistical parameters, including mean, standard deviation, root mean square (RMS), covariance, kurtosis, and skewness. (ii) Frequency domain features, including fast Fourier transform (FFT), spectral kurtosis, root variance frequency, and mean frequency. (iii) Timefrequency domain features, including wavelet transform (WT), short-time Fourier transform (STFT), Hilbert-Huang transform (HHT), Hilbert transform (HT), and empirical model decomposition (EMD). ese techniques basically provide DL models with meaningful representation of raw data and aid in the simplification of decision boundary definition identified by classification models. In the case of regression-based models, these techniques help lower the order of function to be mapped by DL models, which result in computational efficiency and robust real-time deployment. Popular feature selection techniques are principal component analysis (PCA), linear discriminant analysis (LDA), and independent component analysis (ICA) [41,46]. ese techniques are particularly known for their ability to reduce the dimensionality of the dataset. PCA produces an estimated representation of a bulk of features in a concise manner and reduces the feature space. is helps not only in optimization but also aids in enhancing the reliability and explainability of DL models by visualizing the feature space possible.
DL models can learn representations from raw input data and can process accordingly for supervised or unsupervised tasks as shown in Figure 1. Generally, it does not involve steps such as feature extraction and feature selection. Hence, DL-based algorithms have been extensively used in the field of condition monitoring of motors. e various architectures of DL that have been used in motor condition monitoring include deep belief networks (DBN), multilayer perceptron (MLP), autoencoders (AE), deep Boltzmann machine (DBN), convolutional neural networks (CNNs), recurrent neural networks (RNN), and generative adversarial neural networks (GANs). Each of these architectures was developed keeping in mind the ways data could be presented to them. For example, an MLP is designed to learn from tabular data, a CNN is traditionally used to map a 2D field (which in most cases is an image) to an output variable, RNN in the same way was designed to handle sequences of data, making it an attractive paradigm for time-series forecasting or speech-processing tasks. However, these architectures do allow a certain level of flexibility and have been used for alternative tasks. For example, an image could be broken into a sequence of pixels and presented as tabular data to an MLP, but this comes at the cost of explainability. Visualization of a row of image pixels into feature space would be incomprehensible.
is lays more emphasis on feature-processing techniques to be used along with a given architecture because proper feature formation is necessary for even considering the trade-off between accuracy and explainability of the models. Researchers have put efforts for optimizing the performance of these methods through different featureprocessing techniques which play a key role in the DLbased condition monitoring of motors. In this section, each of these techniques is reviewed with respect to problems, which are addressed by these methods, and how successful they were at the task.

Multilayer Perceptron (MLP)
. MLP is one of the most utilised DL model topologies, and also one of the oldest. It is a fully connected neural network consisting of one or more hidden layers. It can easily perform classification tasks with its simple structure. However, it becomes difficult to train the model as tasks become complex owing to the increase in Mathematical Problems in Engineering datasets. It is trained through a supervised learning approach called back-propagation (BP). Figure 2 shows the basic structure of the MLP model. e MLP has been used in motor condition monitoring applications for a long time owing to its easy training process. Various authors have employed it for motor fault classification tasks. Vieira et al. [47] have employed multiple MLPs with single layer to classify stator winding shortcircuit faults of the induction motor using the current data. e dataset was collected by varying the operational conditions of the motor/load/converter. e authors have investigated multiple hypotheses related to the conditions and their effects on the classifier's accuracy. e MLP model was able to achieve 92.6% accuracy at zero load and 76.9% accuracy at full load. It was observed in the results that variations in the operating frequency did not affect the classifier's accuracy. However, it was harder to correctly classify the faults with low severity, which led to the observation that the accuracy of the classifier increases as the fault severity level increases.
Palacios et al. [48] have used MLP to carry out motor fault classification using current and voltage amplitudes in induction motors. e authors have used discretized time domain signals for classifying faults such as rotor broken-bar fault (1/2/3/4 broken-bars), bearing faults (inner race grooves, outer race grooves, and rolling element defect), and stator short-circuit fault. Results confirmed a better performance of the method, which achieved 94.6% accuracy compared to other conventional methods including SVM and KNN. It was also observed that the proposed method was computationally more efficient than the comparative methods. In [49], authors have used MLP with mutual    information (MI) for fault classification of the induction motor. MI was employed for feature extraction, and it describes the similarity between time-series data simultaneously acquired from the setup. e method was applied to detect the stator winding short-circuit fault using the features extracted from the current signature data. Experimental results confirmed that the MLP performed better than the radial basis function (RBF) in terms of classifying healthy and faulty conditions of the motor with accuracy over 99%, whereas RBF achieved 96-97% accuracy.
Zolfaghari et al. [50] have used an MLP-based classifier for broken bar fault detection.
ey have input motor current data after performing FFT and wavelet packet transform, for superior feature extraction, which in turn reduced the burden on the classifier. Furthermore, they have extracted WP-based statistical parameters for intelligent broken-bar severity level detection (one/two/three broken bars). e investigation results confirmed that the method yields promising results in terms of rotor broken-bar severity-level detection, even in the no-load condition with accuracy of 98.8%. In [51], authors have used an MLP model for real-time fault classification of the induction motor. Authors have employed a nonlinear manifold technique called curvilinear component-based analysis (CCA) for feature extraction. e method was employed to classify two major faults: stator interturn fault and rotor broken-bar fault. Results revealed that the MLP model effectively classified the faults in the induction motor with an accuracy of 95.2%.
Bazan et al. [52] used an MLP model with the MI feature extraction method to classify induction motor bearing faults. MI shows the reduction in uncertainty associated with one random variable when combined with information from another variable in simultaneously acquired time-series data. Features were extracted from multiple current signatures under different operating conditions. e results confirmed the robustness of the method in the bearing fault classification compared to the conventional methods, including SVM and KNN. e MLP network achieved 90.5% accuracy with 10% voltage unbalance condition compared to SVM and KNN, which achieved 84% and 83.3%, respectively. Authors have suggested future research work in terms of investigating the real-time implementation of the approach and the potential accuracy trade-off with the speed of the motor. Recent applications of MLP for condition monitoring of motors are summarized in Table 1. (AE). AEs are frequently employed as an unsupervised feature extraction technique.

Autoencoders
ey have the capability of reducing the dimensions of input data whilst retaining most of the information. Figure 3 illustrates a deep AE structure and its operation. It has two blocks: the first block is the AE encoder and the second block is the decoder. e encoder network generates low-dimension representations while the decoder network reconstructs the input data from these low-dimension representations. In addition, it uses the reconstruction error as a loss function. Deep AE is initially trained using an unsupervised method that is known as pretraining. e pretraining process allows better convergence as it reduces overfitting and optimises the layers. e process of pretraining of a deep AE model is explained as follows: (i) A single layer extracts an initial parameter for the following hidden layer and predicts itself using the input vector. rough this technique, it learns about the data without any feedback or labeled data. Subsequently, it stores the learned features as weights for the hidden layers of the network.
(ii) Similarly, the following layer learns about features for succeeding hidden layers and the process is continuous for all the remaining layers.
(iii) Eventually, learned information reaches the output layer typically through a softmax activation function to give a probability distribution.
After this pretraining process, both the networks can be trained simultaneously for classification. Many researchers have argued that AE can be trained without pretraining, but this requires some additional effort in training process or initialization [53]. e recent developments in AE by researchers have presented various derivatives of AE for 1-D and 2-D data such as denoising autoencoders (DAE) [54], variational autoencoders (VAE) [55], and sparse autoencoders (SAE) [56]. Although DAE and SAE have the same structure as a simple AE, they differ in loss function and inputs. Some research has been carried out to improve the performance of AEs using different techniques and have been employed for diagnosis and prognosis of motors. AE and its derivatives have been used in motor condition monitoring with different goals. Sun et al. [57] have employed SAE with DNN for unsupervised feature extraction. e model was fed with vibration data to classify different faults of the induction motor. ey employed the "dropout" regularization method to avoid overfitting during the training process. e SAE remained inactive during the testing process. e results confirmed that the approach provided better performance with 97.6% accuracy compared to conventional models such as SVM and linear regression (LR), which achieved 96.4% and 92.7% accuracy, respectively. Liu et al. [58] have used STFT and SAE to extract features from the sound/acoustic emissions signals of a rolling bearing. Considering barriers to feature extraction in the DNN, they have used STFT for fast and effective feature extraction.
is enabled fault classification with fewer training data samples and resulted in increased classifier accuracy of the DNN compared to the model without STFT features. e SAE model with the STFT-based input achieved average accuracy of 96.2%.
In [59], authors have performed bearing fault classification using AE and extreme learning machines (ELM). ELM is employed with AE owing to its advantage of high training speed. No explicit feature extraction technique was involved with the employed method. e AE-ELM model input data were frequency features extracted from vibration data. is approach yielded better classification performance Mathematical Problems in Engineering than SAE and results also revealed that it was faster than SAEs. Both the AE-ELM and SAE had an accuracy of 100%. However, the SAE model had a slower response time of 20s in the online learning compared to AE-ELM, which took 0.6s. us, it was concluded that AE-ELM can be applied for real-time fault diagnosis owing to its faster response and higher accuracy. Chen et al. [60] have used deep SAE with noise added vibration data for rolling bearing fault severity level classification and life stage prediction. e input data along with added noise were fed to the deep SAE, constraining the overfitting problem caused by limited training data. ey have used the two different vibration datasets for classifying different bearing severity levels and life stages. e investigation was also carried out with and without noise data. e comparative analysis between the deep SAE with and without noise revealed that the model with noisy data effectively overcame the overfitting problem and achieved 98.3% accuracy, whereas deep SAE without noisy data achieved 93.7% accuracy. e authors in [61] employed stacked SAE with hybrid features for feature extraction and DNN for classification of bearing fault severity level. is approach allowed extracting more discriminative information, which in turn raised the classification accuracy. In addition, the input to the DAE was a combination of time domain features, wavelet energy features, and power spectrum features extracted from the vibration data. is hybrid feature pool was used to overcome nonlinearity in the data. e results of the investigation revealed that the proposed method outperformed the vibration spectral-imaging-based DNN method with an accuracy of 99.1%. Lu et al. [62] have used a stacked denoising autoencoder (SDA) for rolling bearing fault classification using raw vibration data. ey have compared the performance of SDA with SAE by inputting the data with varying signal-to-noise ratio (SNR) values. From the comparative investigation results, it was observed that the SDA outperformed the SAE, SVM, RF, and AE with maximum accuracy of 99.8% at the SNR of 20 dB owing to its learning capacity for complex and nonlinear mapping relationships. Sun et al. [63] have employed SAE and DNN for rolling bearing fault classification. ey have input vibration data to the model after compressing it. Data compression was used for handling the significant amount of data more efficiently. e investigation of the method confirmed the higher classification accuracy of 97.4% with data compression compared to the DNN model without data compression, which achieved 96.7% accuracy.
Shao et al. [64] have used ensemble deep AEs for the classification of various roller bearing faults. ey used 15 standard AEs, each with different activation functions. is approach overcame the manual feature extraction and limitations of the individual model. ese AEs were used for extracting features from vibration data. e diagnosis results revealed that the proposed method performed more robustly and produced more effective classification results than the individual AE models with different activation functions,  Figure 3: Deep AE structure. 6 Mathematical Problems in Engineering DBN, and CNN, with an accuracy of 97.1%. In [65], authors have used an unsupervised deep AE to predict the bearing decay state in an inverter-fed induction motor. ey input stator current data as a compact representation of the bearing state. e acquired data of the artificially introduced bearing faults with varying loads was segmented using a sliding window of 24s. Subsequently, they have extracted time and frequency domain features and then fed them to the models. e investigation results demonstrated that the deep AE produced clearer clusters of different bearing conditions with higher classification accuracy than a shallow neural network (SNN).
In [66], the authors have used ensemble stacked autoencoders (ESAE) for bearing fault classification. ey have applied FFT to raw vibration data and fed to the model. e proposed method was compared with time-domain features' extraction methods. e comparative results confirmed the superior performance of the ESAE with FFT technique compared to the ESAE with other feature extraction techniques such as average, kurtosis, and variance with a minimum mean absolute error (MAE) of 0.0672. Skylvik et al. [67] have employed deep AE to classify different faults in induction motors such as interturn faults, bearing faults, and broken rotor bar faults. For preprocessing, they employed Welch's method to estimate spectral density. Subsequently, they have applied FFT to current data and then fed that to the model. e performance comparison with SVM and KNN models confirmed that the proposed scheme performed much better than the traditional schemes with an accuracy of 96.1%. Zhao et al. [68] have constructed an optimal hybrid DL model, which consists of SAE and GRU, to classify the rolling bearing faults more accurately. In addition, they have used the graywolf optimising algorithm to enhance the performance of the model. e proposed model was capable of extracting features from raw vibration data. e results confirmed that the model achieved robust and effective results with an accuracy of 97.1%, which is higher than other traditional models, including ANN, SVM, SAE, and gated recurrent units (GRU).
Zhu et al. [69] have presented a novel DL model called stacked pruning denoising autoencoder (SPDAE) for rolling bearing fault diagnosis. e model reduces information loss by introducing new channels to interconnect the layers. e pruning operation was carried out on nonsuperior units in the model to reduce the number of training parameters, which in turn enhanced training efficiency and precision. To ensure the uniqueness of the dimensions extracted from SPDAE, the features from the same layer were fused. Raw vibration data with noise was fed to the model for the bearing fault classification. Comparative results confirmed the superiority of the proposed model compared with the conventional models, such as ANN, SAE, and DAE. e SPDAE model achieved 100% accuracy on the bearing dataset. e authors have suggested future studies investigating noise addition, pruning operations, and feature fusion operations, which may improve the performance of the model. State-of-the-art research related to condition monitoring of motors using AEs is summarized in Table 2.

Restricted Boltzmann Machines (RBM).
e RBM are Markov-based special-type models, consisting of two layer neural networks [70]. ey employ an unsupervised training method based on a greedy layer-wise process. ey have the capability to learn missing data patterns. RBM-based networks pose difficulty in the training process and also in tracking the loss function. ere are two types of RBM-based DL models, which are Deep Boltzmann Machine (DBM) and Deep Belief Network (DBN). DBN is a semidirected model, while DBM is a completely undirected graph model, as shown in Figure 4(b). e RBM-based DL models are reported as follows:

Deep Boltzmann Machine (DBM).
e DBM can be considered as a stacked RBM, which comprises multiple hidden and visible layers rather than only RBM layers as shown in Figure 4(a), where the blue dotted line shows the separation of the layers [71]. Each layer is composed of symmetrically coupled stochastic units. DBM has the capacity to learn features from complex data and capture higher-order correlations among the hidden features. e DBM model is trained as a combined model, in comparison to the other type of RBM, a deep belief network (DBN) (Section 2.3.2), which is trained layer by layer [72]. Hence, the training process of the DBM requires more computational power than for DBN models.

Deep Belief Networks (DBN).
Like DBM, DBN is formed by stacking RBM layers in such a way that output of the n-th layer becomes input to the (n + 1)-th layer. Figure 4(b) shows the structure of the DBN. It is a mixed directed and undirected graphical model. It can process large amount of nonlinear data [73]. As an RBM, it is also trained in a greedy layer-wise unsupervised fashion [74]. Finetuning is required after the pretraining process, which is performed either on training data label or on a proxy for DBN log-likelihood. Targets are achieved by adding a softmax layer on top of the DBN architecture.
Both the DBN and DBM have been used in various condition monitoring systems for motors. In [75], the authors have employed DBM for roller bearing fault classification.
ree types of features (time-domain features, frequency-domain features, hybrid features) were extracted from vibration data and fed to the model. Among these features, hybrid features yielded the better classification performance with an accuracy of 99.5%. Tao et al. [76] have used DBN and a multisensor information fusion technique for bearing fault detection. ey have inputted time-domain features from multiple vibration sensors to the model. It was observed in the results that DBN not only adaptively fused the multisensory data but also increased the accuracy up to 97.5%, that is, 10% greater than that of the model with single sensor data. e DBM model was able to achieve highest accuracy among the comparative models, including SVM, KNN, and ANN. In [77], authors have classified bearing degradation states using DBN and the Weibull distribution. Bearing degradation states were classified based on fitted RMS of the vibration data by Weibull distribution, which Mathematical Problems in Engineering 7 was employed to avoid fluctuation in statistical parameters. e results confirmed the effectiveness of the technique through run-to-failure experiment. e model effectively classified the bearing degradation states with an accuracy of 89.9%.
Shao et al. [21] have used a DBN model for the fault classification of induction motors in manufacturing. Different classes of the motor faults included normal motor, unbalanced rotor, stator winding defect, defective bearing, bowed rotor, and broken bar. ey utilised frequency domain features extracted from vibration data investigated the effect of the depth of the model on the classification accuracy. e investigation results confirmed the effectiveness of the method for automatic fault classification in manufacturing. In [78], the authors have investigated an improved version of the DBM by varying its energy function. e proposed method addressed the problem of the DBM, which can only process the binary data; by replacing the binary visible units with the Gaussian units, the variation allowed the DBM to process real-value data. is model was employed for bearing fault classification using raw vibration data. Supervised training was performed followed by greedy unsupervised training to initialize the model parameters.
Results confirmed the effectiveness of the method in fault classification using real-value data with manual feature extraction with a classification error of 1.9%. In [79], researchers have presented a method to classify rolling bearing faults using DBM, principal component analysis (PCA), and a least square support vector machine (LS-SVM). DBM was used for feature extraction, PCA was used to reduce the dimensionality of the data, and the LS-SVM was applied for classifying the bearing faults. Acoustic emission signals were used as input to the model owing to its higher sensitivity compared to vibration. e combination of DBM and PCA identified better features, which in turn increased classification accuracy of the model. Experimental results confirmed the effectiveness of the model with an accuracy of 95.4%. Zhao el al. [80] presented a variation mode decomposition-(VMD-) and Hilbert transform-(HT-) based DBN (VHDBN) for rolling bearing fault classification. Bearing vibration signals are decomposed into intrinsic mode functions (IMFs) through VMD. Subsequently, HT extracted the instantaneous frequency and amplitude of the IMFs and constructed a feature matrix that was fed to the DBN model. A combination of VMD and HT allowed the extraction of improved features and achieved diagnosis accuracy of 98%. e investigation results confirmed that the VHDBN algorithm has great advantage over conventional algorithms such as SVM and the DBN with time domain signals.
In [81], the authors have presented a novel architecturemultiscale cascaded DBN (MCDBN) for automatic fault classification in motors.
is variation to DBN added parallel learning capability by introducing a multiscale coarse-grained method, which in turn improved the feature extraction performance. e vibration signal was split into subsignals of equal window size using sliding window with data overlap. e technique allowed DBN to learn features from the data at multiscale rather than inherent information. Subsequently, coarse-grained time-series data at different scales was obtained through the coarse-grained process. Experimental results confirmed superiority of the method compared to standard DBM with an accuracy of 99.8%. e authors have suggested that the investigation has provided a promising tool for condition monitoring of industrial motors. Yu et al. [82] have used a two-stage approach by combining DBN and the Dempster-Shafer (D-S) theory for bearing faults and their severity level classification. D-S theory was employed for fusing vibration data from multiple sensors (horizontal and vertical vibration). A genetic algorithm (GA) and a particle swarm optimization-(PSO-) based hybrid algorithm was used to optimize the parameters of DBN during the training. e results revealed that the fused hybrid GA-PSO algorithm has not only upgraded capabilities of the DBN classifier but also enhanced computational efficiency with an overall accuracy of 99.6%. Authors have also used wavelet package decomposition (WPD) for extracting energy features from the vibration data of bearings. Table 3 summarizes applications of DBN and DBM in condition monitoring of motors.

Convolutional Neural Networks (CNN).
A neural network layer that employs a convolution operation includes convolution layers in its structure. Figure 5 illustrates the structure of a CNN. It consists of two blocks; the first block comprises convolution layers and pooling layers that extract features from the data. Multiple stacks of convolutional layer Mathematical Problems in Engineering and pooling layer are employed to extract rich features from data. e second block comprises fully connected layers that predict target variables through learning representations from training data. e discrete convolution networks are employed to extract representation from 1D or 2D data through learnable filters. e convolutional operation results in output C are as given in equation (1), and the output of the convolutional layer is known as the feature map.
where I and K are the input and filter, respectively. e pooling layer reduces the size of the feature map and assists in avoiding overfitting. Following the pooling operation, the feature map is flattened (1D array) and fed to the classification block [83]. e classification block generates output based on the extracted features.  CNN and its derivatives have been widely used in condition monitoring systems. In [84], the authors have used CNN with DFT for classifying motor faults. e method was used to classify motor faults such as bearing faults and lubricant degradation levels. Compared to traditional methods that rely on manual feature extraction, this method allowed the automatic extraction of features from scaled vibration data. en, nonoverlapping windows are separated, each containing measurement samples of one second.
Subsequently, the DFT of the windows are calculated. At the end, these frequency decompositions are fed to the deep CNN model. Overall, this method yielded better performance than classical feature engineering techniques such as kurtosis, skewness, mean, and standard deviation. Liu et al. [85] have proposed a novel method called dislocated timeseries CNN (DTS-CNN) for fault classification of electric motors. e dislocation layer in the model can extract the relationship between periodic vibration signals with different intervals. is modification of CNN performs more effectively under nonstationary conditions. is method extracts features from the raw data. e model mitigates the overfitting problem through weight sharing and sparse connectivity. e model was used to predict nine different motor faults, and it was confirmed through the results that this model performed better than the standard CNN with an accuracy of 96.3%.
In [86], authors have employed a 1D-CNN for real-time classification of the bearing faults. e method did not require any feature extraction technique, which in turn made it fast and computationally efficient. Motor current signals were fed to the model under constant speed condition. Experimental results confirmed the effectiveness of the method compared to the conventional models including MLP, SVM, and RBF. Guo et al. [87] have investigated a novel adaptive deep CNN (ADCNN) for bearing fault classification and their severity levels. It avoided training process failure due to unsuitable learning rate by the addition of adaptive learning rate and momentum. It also enabled the automatic extraction of features from raw vibration signals. e investigation results confirmed the superiority of the method compared to existing methods such as support vector regression machine (SVRM).
Ding et al. [88] have proposed a novel method called energy-fluctuated multiscale feature (EFMF) learning with deep CNN for the spindle bearing fault classification. A multiscale deep CNN was constructed using different layers such as convolution and pooling layers with sigmoid function. It combined the skipping layer with the last convolution layer, which provided input to the multiscale layer. Meanwhile, wavelet packet energy images (WPI) were constructed using wavelet packet transform (WPT) and phase space reconstruction. is 2D image was able to reveal energy fluctuations of the vibration signals and reconstruct local relationships among the WP nodes. Furthermore, representations were learned by the deep CNN architecture through brightness (frequency energy) variations of the energy-fluctuated images. Taking advantage of local and global features, the model was able to effectively classify the ten different spindle-bearing faults with a maximum accuracy of 98.8%. e model showed outstanding performance compared to the traditional methods, including standard CNN, BPNN, PCA, and LDA. In [89], authors have used an ensemble deep CNN model with the improved Dempster-Shafer (D-S) evidence theory (IDSCNN) for bearing fault classification under varying load conditions. e improved D-S evidence theory was employed for data fusion from the two vibration sensors. It was implemented using a similarity matrix and a modified Gini index. It addressed the two problems of traditional D-S evidence theory: objective evaluation of the basic function of the evidence body, and solving conflicting evidence from different sensors. Deep CNN was fed with the RMS value from FFT of the vibration signals. e experimental results confirmed higher performance of the model compared to the other existing models such as MLP, SVM, deep convolutional neural networks with wide first-layer kernels (WDCNN), and DSCNN.
Zhang et al. [90] have employed a novel method called CNN with training interference (TICNN), which can detect bearing faults with noisy data and under varying load conditions. e authors have used two techniques to introduce antinoise and domain adaptation abilities, which allowed the model to extract features from raw vibration signals. is was done by adding "dropout" layers and very small batch training. In addition, they have initially used wide convolution kernels for suppressing the noise, which is followed by small convolutional kernels that extracted rich representations from the data. It was confirmed through the results that the model performed stable and accurate classification due to ensemble learning with an average accuracy of 95.5%, whereas the comparative methods of SVM, MLP, and DNN only achieved poor performance with average accuracy of 65%, 80%, and 80%, respectively. e authors have suggested that this model would be useful in industrial environments. In [91], the authors have employed CNN as machinery health indicator. e method automatically learned features from the vibration data and constructed health indicators (HIs). HIs were constructed using a specific degradation process. However, the HIs faced a problem of outlier region deviation that was referred to as the trend burr, which negatively affects the performance of the HIs. e method was effectively able to detect and correct the outlier regions. e investigation results confirmed the effectiveness of the method in feature extraction.
Jia et al. [92] have employed a model called deep normalized CNN (DNCNN) to classify the bearing faults using the vibration data. e study addressed two problems of CNN. First the imbalanced fault classification problem was addressed using normalized CNN with weighted softmax, which allowed learning of better features and avoided misclassification. Another problem is that it was not clear what the CNN had actually learned as DL methods are treated as black box models. is problem was addressed using a neuron activation maximization (NAM) technique, which suppressed the useless information by analysing the kernels of the normalized CNN. e model produced effective results by overcoming these two problems and had a maximum accuracy of 99.2%.

Mathematical Problems in Engineering
Hoang et al. [93] have employed vibration image CNN (VI-CNN) to classify the rolling bearing faults. Authors have converted the vibration signals into gray-scale images by normalizing into range of [−1, 1]. Consequently, the amplitude of each sample becomes the intensity of pixels of the vibration image. It was confirmed that the model effectively classified the faults even in varying load conditions with an accuracy of 97.7%. e authors have suggested that the method can robustly perform in industrial environments. Li et al. [94] have employed a novel method by combining CNN and S-transforms to classify the bearing faults. e S-transform was combined with a CNN as an S-layer, which allowed extraction of features from the two accelerometers data. e S-layer automatically converted the vibration data into a 2D time-frequency matrix. e investigation results confirmed superior performance compared to the existing methods, namely, SVM, KNN, linear discriminant (LD), and bagged trees (BT).
In [95], the authors have used CNN based on a capsule network (ICN) for bearing fault classification. e method was intended for strong generalization and used a dynamic routing capsule net with an inception block. e Inception block removed the nonlinearity of the capsule. Before applying these steps, vibration data were converted to timefrequency graph using STFT and fed to the model. Subsequently, the model classified the faults through varying lengths of the capsule. e investigation results confirmed the higher generalization power of the model with an accuracy of 82%, compared to the standard CNN. Hoang et al. [96] have employed CNN with an information fusion technique to classify bearing faults of a motor. ey have employed two phase current signatures, which further split into equal samples using a sliding window. en, 1D signals are converted into 2D matrix by rearranging the array signal. e transformed 2D current signal is effectively a gray-scale image and this is fed into the model. Experimental results confirmed the effectiveness of the technique in detecting the faults with acceptable performance with a maximum accuracy of 99.4%.
Li et al. [97] have used CNN with WPT for rolling bearing fault classification. WPT extracted features from raw vibration signals, which were further transformed again into gray-scale images. e investigation results showed that the model achieved superior fault detection with individual fault detection accuracy of up to 100% because of the richness of the input features. In [98], the authors have used enhanced LeNet-5 CNN for bearing fault classification. STFT was employed to convert vibration data into 2D images, and hierarchical regularization was used to speed up the training process. In addition, the scaled exponential linear unit (SELU) function was employed to avoid excessive deadnodes during the training process, which in turn enhanced the robustness of the model learning. It was observed through results that the model accelerated the training process by selecting sensitive features even under varying load conditions. Moreover, the model effectively classified the faults with higher accuracy up to 100% compared to the models with individual time or frequency features. In [99], authors have used CNN to diagnose the stator winding faults of induction motors. ey have normalized the raw current data and then converted it into a three-dimensional matrix. e method was able to detect the number of faulty statorwinding shorted turns without extensive preprocessing. e investigation results confirmed the applicability of the method in real-time, regardless of motor operating conditions. Furthermore, results showed a high accuracy of up to 100% in the individual fault detection. Research studies on motor diagnosis and prognosis using CNN are summarized in Table 4.

Recurrent Neural Networks (RNN).
RNNs yield better performance with sequential or time-series data, which makes them the most suitable candidate for condition monitoring of motors. According to ref. [100], RNNs are the deepest among all neural networks, and their architecture is as shown in Figure 6. ey differ from MLP, which only maps input data to target vectors, while RNNs map the entire history of past inputs to target vectors owing to their memory capability. For supervised tasks, RNNs can be trained by employing back-propagation through time. e transition function of a basic RNN can be defined as in where H is a nonlinear and differentiable transfer function, x t is current time information, and h t−1 is the previous time information.
Although RNNs perform better on time-series data, they face the gradient vanishing problem.
is problem was addressed through the advent of long short-term memory (LSTM) in 1997 [101], which has outperformed in various fields. LSTMs have the capability to memorize and forget representations of data. Moreover, gated recurrent units (GRU) and bidirectional LSTMs can enhance model flexibility and capacity. As shown in Figure 7, deep LSTM can increase the representation learning capabilities through propagating the output of one layer as an input to the next layer.
RNN and its derivatives have been extensively used in motor condition monitoring owing to its memorizing power and robust performance. Zhao et al. [102] have employed deep LSTM with time-series data to classify motor faults. e "dropout" layers were used for model regularization. Raw vibration data were used as input to the model. Comparative results confirmed the effectiveness of the LSTM model with minimum root mean error (RMSE) of 10.2% compared to conventional techniques such as MLP and basic RNN. In [103], authors have employed local feature-based GRU (LFGRU) for motor fault classification again using raw vibration data. Local features were extracted from synchronized windows of the multisensory data, and then fed to the model. Experimental results confirmed the robustness of the model in classifying the motor faults with the maximum accuracy of 99.6%. In [104], authors have used RNN for rolling bearing fault prediction. ey have investigated the model with two types of features: time domain and frequency domain. FFT was used to convert vibration data into frequency domain features. Experimental results   [105] have used deep LSTM to classify various motor faults using vibration data. e authors have not used any feature learning technique for classifying motor faults such as broken-bar, bowed-bar, bowed-rotor, faulty bearing, and voltage imbalance using raw vibration data. Results confirmed the superiority of the model performance with an accuracy of 98.6% compared to the conventional models including SVM, MLP, and standard RNN. In [106], the authors have predicted bearing performance degradation using bottleneck features based on LSTM. A wavelet threshold denoising (WTD) technique eliminated noise from the vibration data. en, statistical features were extracted and finally fed to the stacked autoencoder (SAE) to obtain bottleneck features. e bottleneck features take depth and nonlinearity of the signals into account. Experimental results showed the effectiveness and superiority of the method in predicting the bearing's degradation with a resulting minimum RMSE of 0.0891 which compares well to existing methods including SVM and MLP.
Xiao et al. [107] have employed LSTM with weighted batch normalization (BN) for detecting different faults in the induction motor. ey have employed the two manual feature extractions, namely, empirical statistical parameters and recurrent quantification analysis (RQA) to add antinoise capability in the model. e weighted BN allowed the evaluation of the contribution of each feature learning technique, and validated noise reduction. e results confirmed the robustness and effectiveness of the model in fault classification of the induction motor with an accuracy of 99.3% compared to the other DL models such as SVM, MLP, CNN, and standard LSTM. In [108], authors have classified bearing faults using a hierarchical stacked LSTM. e input layer received the vibration data, and then two stacked LSTMs were able to effectively learn representations from the data without any preprocessing. e experimental results confirmed that the model produced promising results owing to its deep structure with an accuracy of 99% and the model outperformed the state-of-the-art models such as SVM, MLP, 1-layer LSTM, and CNN.
Zhang et al. [109] have employed LSTM to assess the bearing performance degradation and predict the RUL using vibration and temperature data. e authors have used waveform entropy (WE) to identify the running condition of the bearing by computing the local mean of logarithmic vibration energy. In addition, a particle swarm optimization (PSO) method was used to optimize parameters of the LSTM, which in turn improved its learning performance. e authors have divided degradation states into different stages by time. WE has performed effectively under various degradation states while it lagged in some conditions owing to window length. However, it did not reflect any negative impact on the performance of the model. Experimental results showed the effectiveness of the method in indicating the degree of degradation in assessing the degradation states with an accuracy of 93.1%. In [110], the authors have used deep gated recurrent units (DGRU) with ELM to classify the faults of an adaptive rolling bearing. In addition, the authors have used an artificial fish swarm algorithm (AFSA) for GRU parameter optimization. e model consists of two stacked layers of GRU, which learned features from the raw vibration data. Lastly, ELM is applied for accurate classification of the faults based on learned features. e model was able to achieve 94.5% testing accuracy and the results demonstrated robust performance of the model compared to the conventional models such as CNN, DBN, and SAE.
Enshaei et al. [111] have used bidirectional deep LSTM (BiD-LSTM) to classify bearing faults. BiD-LSTM takes sequential data into account in both the forward and backward directions. It was confirmed through experimental results that the deeper BiD-LSTM performed better than the single-layered network and achieved 100% testing accuracy. e model effectively classified faults with high accuracy with raw vibration data. In [112], the authors have employed a hierarchical GRU network (HGRUN) for predicting future health index (HI) and RUL of the bearing. e kernel principle component analysis (KPCA) and exponentially weighted moving average (EWMA) were used to design a modified HI. Firstly, statistical features are extracted (time domain, frequency domain, and hybrid domain), then the KPCA fused these features and transformed it as a unified HI. Subsequently, EWMA further modified the HI, which can depict the bearing degradation process. Lastly, HGRUN was developed by stacking multiple GRU layers and inputting with the modified HI. Experimental results confirmed that the method effectively depicted the degradation process of the bearing and can predict the future HI and RUL of the bearing. Comparison showed the superiority of the technique to the existing techniques with the minimum of 13.8 ± 2.8%. In [113], the authors have used the enhanced deep GRU and complex wavelet packet energy moment entropy for early bearing fault classification. Complex wavelet packet energy moment entropy as a monitoring index allows reduction in aliasing and the detection of dynamic changes in the vibration data. Subsequently, deep GRU allows to learn the complex mapping relationships from the vibration data. Lastly, the learning capability of the model was enhanced using a modified training algorithm based on learning rate decay strategy. Experimental results confirmed the effectiveness of the method in early fault detection compared to the other prognosis methods. ese recent studies related to motor condition monitoring using RNN and its variants are summarized in Table 5.

Generative Adversarial Networks (GAN). A Generative
Adversarial Network (GAN) is a binomial zero-sum gametheory-based learning model. It comprises two models: a generator model (G) and a discriminator model (D). e structure of a GAN is shown in Figure 8(a). Both the models can have a different type of neural network architecture such as RNN, AE, or CNN. e D-model tries to increase the probability of collected true data (x) and decrease the probability of samples generated by the G-model. e G-model tries to cheat the D-model by generating a sample training set using a noise input (z), gradually improving its performance until the D-model can no longer discriminate between the true data and the generator data.
us, by employing an antagonism training process, the capacity of both the models is improved simultaneously [114,115]. e optimization of this two-player game is calculated as given in Considering a supervised learning approach, the GAN models can generate fake labels which are like real data. Figure 8(b) gives a visual illustration of a GAN-based classifier. e classifier receives samples from G-model and the classification error back-propagates through G-model and classifier.
In recent years, researchers have employed GANs and its derivatives in motor condition monitoring. It is often employed to address the data imbalance problem through data augmentation. Shao et al. [116] have developed an auxiliary classifier GAN (ACGAN)-based framework to learn and generate realistic one-dimensional vibration data. Both the generator and discriminator consisted of 1D-CNN, which was allowed to learn representations and generate high-quality artificial data samples. In addition, batch normalization was employed to address the gradient vanishing problem during the training process of the GAN, which in turn assisted in avoiding overfitting. ey have employed statistical methods to evaluate the quality of the generated data. Experimental results demonstrated the effectiveness of the model in data augmentation. e model robustly classified the faults by addressing the data unbalance problem with an accuracy of 99.1%. In [117], the authors have employed deep GAN for bearing fault diagnosis using an imbalanced dataset. e authors have used a twostage approach: the first stage augmented the data through the GAN model; then the second stage classified the faults using a deep CNN model. ey have introduced GAN with multiple generators in which each generator is dedicated to the specific bearing conditions. Investigation results confirmed the robustness of the approach in data augmentation and fault classification with a maximum accuracy of 99.9%. Authors have verified the approach by applying it on two different datasets. Applications of GANs in motor condition monitoring are summarized in Table 6.

Summary, Challenges, and Prospects of DL Models
e effective application of DL models in condition monitoring systems extensively rely on data acquisition, data labeling, feature processing, and model parameter optimization. However, these processes are challenging, time-consuming, and may require domain expert knowledge. In the previous section, we have comprehensively reviewed each DL model and their application in condition monitoring of motors. Although, the reviewed research demonstrated promising leads in condition monitoring of motors, there are various open challenges that are yet to be completely solved. Figure 9 shows a heat map of DL models that have been used along with a variety of feature-processing techniques. From the heat map, it is apparent that the DL models have been effectively performing on raw input data owing to their deep hierarchical architectures. However, time domain features also have been used in various researches with less complex models that show the importance of featureprocessing techniques in simplifying the training and testing process of DL Models. On the other hand, Figure 10 shows a 3D map of the number of publications using the type of input data with different DL models for motor fault diagnosis. e majority of the current literature has focused on using vibrational analysis for motor condition monitoring tasks. e following section presents some challenges in the application of DL models and future directions to improve the performance of these models. It also summarizes the strengths and drawbacks of these models: 3.1. Class Data Imbalance. During the data extraction process, most of the time healthy samples of the data outnumber the ones representing fault conditions. Hence, while  A prebiased model can also be used with weights, which is a clever way to deal with imbalanced data, where individual class weights are calculated, and higher weights would lower their effect on model prediction. Most DL models have performed effectively in different applications but these models have some drawbacks which require more investigation for producing optimal results. e generative models such as AEs and GANs, although harder to train, can provide a way to synthesize authentic data.

Feature Processing.
Feature processing reducing the number of features greatly reduces the hardware dependence and the need for a highly nonlinear function mapping by DL models. However, the bulk of the research is conducted on raw input data with DL models as highlighted in Figure 9, which truly exploits the potential of these models. However, performance evaluation of these models with current evaluation matrices is vague. Either the trade-off between raw and formulated features versus explainability of the models needs to be taken in account or new evaluation matrices need to be developed.

Model Selection.
Model selection for addressing domain-specific problems heavily depends upon the way the problem is formulated. Anomaly detection problems are mostly formulated as classification problems. Another factor that plays a role in selecting a proper model for the problem is the type of data available. e right choice for the model also depends upon the way data is formulated.

Hyperparameter Tuning.
is is one of the most important parts of employing a DL algorithm and includes model tuning in terms of (a) number of neurons per layer, (b) number of layers, and (c) choices regarding initialization, activations, optimizers, learning rate, loss calculation, etc. Although most of the time this is done through multiple trial and error experiments, the gradient vanishing and exploding, idle points, and nonconvex optimizations are currently generating research interest. e development of new frameworks like Keras, Tensorflow, eano, and Pytorch have stimulated the process of experimentation, and the community of researchers addressing these issues is increasing with increasing progress.

Generalization Power.
e generalization power of a DL model is basically its ability to identify the correct sample and is often defined in terms of overfitting or underfitting.
e complexity of a model should be increased only when needed. However, there are multiple approaches by which the overfitting problem of the DL models can be addressed. L1 and L2 regularizations can smooth the training process; a "dropout layer" between fully connected layers works in the same way to reduce the complexity of the model in a controlled fashion; and batch-normalizations techniques have the capacity to reduce overfitting by reducing the effect of some dominant neurons in the network.
3.6. Interpretability of DL Models. DL models, which are inherently black box models, do not provide much insight into their inner workings. For humans to trust these models, they need to be interpretable and explainable. Multiple publications [26,118] have reported that a neural network could be fooled easily into choosing a wrong category by making minor changes to pixels and neither discriminative nor generative models are an exception. Interpretability alone might not be enough for humans to trust these black box models; they will need explainability. Explanatory Artificial intelligence (XAI) is one of the frontiers of DL research. 100.00% 0.00% 0.00% 0.00% Figure 9: Heatmap of DL models versus type of input data features.

Hardware
Challenges. DL models, regardless of their efficiency at solving the task on hand, invite hardware challenges, especially when their deployment in a standalone or embedded system is considered. Deep models due to their complex topology require higher computing power, energy, and memory. Training and optimization of these models is an iterative process during which multiple high-dimensional matrix multiplications are performed. erefore, much of the research has focused on reducing the cost of multiply and accumulate (MAC) [119]. Key factors taken into consideration while integrating these models with hardware are accuracy, energy consumption, throughput/latency, and cost [120]. Designing an efficient DL architecture that incorporates all these factors is done through optimizing and compressing the DL models through algorithmic techniques or designing application-specific hardware. Algorithmic  [83,113,129] (i) Performs well with time series or sequential data (ii) Better forecasting ability in time series and sequential data (i) Without proper constraints on weights and gradient clipping, might suffer from the gradient either vanishing or becoming unbounded GAN [115,116,130] (i) Learns underlying representation of data well (ii) Seems to achieve a discriminator with less generalization error where its generator output (fake data) provides a regularization effect (iii) Only algorithm that can work in semisupervised or even unsupervised setting to identify under observation clusters (iv) Model with high fidelity techniques usually focus on retaining the accuracy of a DL architecture after performing pruning/compression on it.
On the other hand, hardware is designed to make it energyefficient and reduce the latency. However, codesign of algorithm and hardware has also been explored [121].
Hardware solutions for deploying DL architectures range from general purpose solutions (GPUs) to applicationspecific solutions (FPGAs, ASICs). e optimum choice of hardware depends upon the application. Since CPUs are least used due to their limited throughput, GPUs are currently the preferred option for training and inference of DL models. Nvidia has produced solutions like CUDA and cuDNN for easy and fast implementation and inference from DL models. FPGAs-based solutions particularly focus on efficient data routing and benefit from sparsity and reuse of data points already fetched in the computation network [122]. Moreover, hardware accelerators using variable bandwidth and reconfigurable data flow paths also have been developed [123][124][125][126]. Efficient deployment of DL architectures on edge is an active area of research and many paradigms are yet to be explored for effective performance. e strengths and drawbacks of these models are summarized in Table 7.

Concluding Remarks
In this paper, a comprehensive review is presented related to the usage of various feature-processing techniques with DL models for condition monitoring of motors. is paper has reviewed the literature in terms of input data and feature types used in effective fault diagnosis of motors using DL models. More specifically, this work summarizes the applications of DL models in terms of various feature-processing techniques and how these techniques have solved various problems that allow to achieve high generalization. It was observed that the usage of feature-processing techniques has raised the efficiency of DL models and reduced hardware dependency. Moreover, usage of feature-processing techniques with DL models allowed faster execution owing to the feature extraction and removal of redundant information. It was found from the literature review that the AEs, DBM, DBNs, and MLP architectures have been in the focus of researchers for fault diagnosis and prognosis. On the other hand, CNN and RNN also have received attention for their application in condition monitoring of motors. However, their complex architectures demand expert human knowledge and feature processing for successful implementation. Furthermore, GANs have solved the class imbalance problem in data to some extent. However, more effort is still required for acceptable results.
Moreover, it has been observed that most of the available work focuses on mechanical faults diagnosis using DL models, specifically bearing faults. However, very limited work is conducted related to electrical fault diagnosis and prediction using DL models. Researchers may be facing difficulties in introducing electrical faults owing to danger associated with these faults. However, it is crucial to explore this area because motors also get damaged owing to various electrical faults. A large portion of the DL-based condition monitoring of motors have been conducted in relation to fault classification. However, a few researchers have conducted research related to health index (HI) prediction and RUL estimation of the motor and its components. Meanwhile, data fusion techniques have been successfully used with various models, which allowed the improvement of model classification accuracy. More work is required in this area; through fusing various diagnostic methods such as current and vibration, model performance could be improved further. Consequently, it will uplift condition-based maintenance of motors by employing DL models.
Meanwhile, DL models have also exhibited some deficiencies, which can be viewed as prospective future opportunities for researchers and engineers in this domain.
is review has demonstrated that the DL-based diagnosis mostly involve a supervised learning approach; however, the practicality of such an approach is highly challenging and time-consuming in real engineering scenarios. From a future perspective, DL models need to be employed for automatic end-to-end diagnosis, which includes feature learning from data acquisition to motor fault classification or prediction. Moreover, acquiring more data does not necessarily mean that the DL models will produce better results. us, featureprocessing techniques are essential for better generalization. It was noticed that the majority of research in this domain was focused on fault diagnosis in certain components such as the bearing. Very little research has been conducted on root cause analysis, degradation, and RUL prediction. Considering future opportunities, there is an urgent need for advanced feature-processing techniques that can assist in analysing the huge amounts of data and yield effective diagnosis and prognosis results. Furthermore, there is great demand for research on explainable AI (XAI), which will overcome the problem of the vague operation of DL algorithms. e authors of this review believe that the practitioners working in this domain would find this article very useful in solving their problems and evaluating the methods. Meanwhile, this review gives valuable pointers to a road map for future research in this domain. e results and discussion may be presented separately, or in one combined section, and may optionally be divided into headed subsections.
Data Availability e references and data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare no conflicts of interest. S005463/1 (better FITT early detection of contact distress for enhanced performance monitoring and predictive inspection of machines).