Intelligent Roller Bearing Fault Diagnosis in Industrial Internet of Things

Advanced research studies on industrial Internet of things require eﬀective feature extraction and accurate machinery health state evaluation. For roller bearing, a well-known mechanical component most extensively used in the industry, its running status directly aﬀects the operation of the entire machinery and equipment. For intelligent fault diagnosis of roller bearing, the selection of the intrinsic mode function (IMF) modes in approaches of ensemble empirical mode decomposition (EEMD)/variational mode decomposition (VMD) becomes a tricky problem. To solve this problem, this study proposed an eﬃcient scheme on roller bearing fault diagnosis that combines the reﬁned composite multivariate multiscale sample entropy (RCMMSE) with diﬀerent classiﬁers. Firstly, the synthetic noise signals are introduced to compare the eﬀectiveness of the multiscale sample entropy (MSE) and the RCMMSE models. Secondly, the random noise signals are used to compare the performance of EEMD and VMD methods, where the envelope spectrum characteristics of fault signals are well described. Moreover, EEMD/VMD methods are utilized to decompose the roller bearing vibration signals into various modes to get the entropy values. Finally, the obtained RCMMSE is adopted as a feature vector and subsequently employed as an input of support vector machine, random forest, and probabilistic neural network models to conduct roller bearing fault identiﬁcation. The extensive experimental results prove that this proposed scheme performs well and the classiﬁcation accuracy of VMD-RCMMSE is higher than EEMD-RCMMSE.


Introduction
With the rapid development of Internet of things (IoT) [1,2] and Industry 4.0 [3], there are increasingly massive real-time data from various types of mechanical equipment [4]. e availability of these data that contain abundant information about machine health has attracted more and more enterprises' attention. It has been proved that large volume, high velocity, and diversity mechanical big data are the major properties of mechanical big data [5,6]. Effective feature extraction from these data and accurate machinery health state evaluation with ever-accelerated updating of schemes have become hot research issues in the prognostic and health management systems in the era of industrial IoT [7].
For roller bearing, a well-known mechanical component most extensively used in the industry, its operating status directly affects the operation of the entire machinery and equipment. Roller bearing failure is an important factor leading to mechanical equipment failure, so timely detection and fault diagnosis are of great significance, and analysing the vibration signal collected by the sensors to determine its failure is a commonly adopted scheme [8]. As the roller bearings are usually working in the vibration source environment, with the destruction of complex forms, the vibration signal is typically nonlinear and nonstationary. How to excavate the fault feature based on the vibration signals has been a research hotspot [9,10]. e widely used signal decomposing schemes include wavelet analysis [11,12] and EMD [13]. However, the wavelet transformation is essentially nonadaptive due to the configuration of wavelet basis and decomposing layers. In EMD, the complicated signal can be self-adaptively decomposed into some IMFs with a residual component. Nevertheless, mode mixing is a stumbling block in EMD. To deal with this challenge, EEMD is proposed [14], in which a complex signal can be disintegrated into IMFs in terms of the local time-scale characteristic of the signal. In recent, EEMD has been extensively employed in fault detection [15].
Using the EMD or EEMD schemes for fault feature extraction has been widely concerned. However, these two schemes have some disadvantages. e recursive mode decomposition in EMD/EEMD will propagate the envelope estimation error continuously. e signal contains no noise or intermittent signal, which leads to the decomposition of the mode mixing. Although the white noise scheme was used to suppress the mode mixing, the scheme needs to be carried out several hundreds of times of EMD/EEMD operation and will break out the signal composed of more than a real component. Moreover, EMD and EEMD cannot be separated correctly from the close frequencies, and it is quite challenging to select the suitable number of IMFs when applying EMD and EEMD.
To address the abovementioned problems, a neoteric scheme of signal decomposition estimation called VMD was proposed [16][17][18]. e whole framework of the alternating direction multiplier scheme was resorted to consecutively refresh the modes and their centre frequencies and gently demodulate the modes into the corresponding base frequency bands. Ultimately, each mode and corresponding centre frequencies were abstracted together. In comparison with the recursive filter pattern of EMD and EEMD, VMD converts the signal into variational and non-recursive decomposition patterns, and it consists of adaptive Wiener filter groups in nature. VMD can isolate two pure harmonic signals with similar frequencies [16][17][18]. e IMFs obtained by EEMD denote the natural oscillatory pattern inserted into the signals, where the entropy values of every IMF were often abstracted as a feature to discover the properties of the vibration signals [19,20].
Entropy was one of the schemes to evaluate the timeseries complexity. S. M. Pincus proposed the approximate entropy (AE) [21]. However, the length of the time series makes a critical influence on the performance of AE scheme. Hence, the value of AE is conformably lower than the intended one and fails relative coherence particularly when the data length is short. Based on AE, SE was proposed to deal with this challenge [22]. It has the advantages of short data, stable and low noise and interference capacity, and good consistency in the region of large parameter range. It can be noticed that the irregularity of time series can be only reflected by AE and SE individually. When the roller bearings fail, not only the frequencies but also the corresponding complexity of vibration signals has tremendous deviations. Accordingly, ME can be viewed as a property index for fault diagnosis [23]. Considering the MSE, the feature of the vibration signals can be abstracted under all kinds of conditions, and the eigenvectors were counted as the input of adaptive neuro-fuzzy inference system (ANFIS) for roller bearing fault recognition [22]. Moreover, the SE, to some extent, was undefined as no standard vectors, which were in accordance with one another. Undefined or imprecise SE leads to the degradation of authenticity of MSE algorithm. To overcome this drawback, the RCMMSE was revolved by Zhang et al. [24] to overcome these challenges. It is demonstrated that RCMMSE can not only ascend the precision of entropy assessment but also descend the probability of inducing undefined entropy.
However, after EEMD and VMD, a necessary step is how to select the number of modes. For example, the number of patterns was usually determined by the correlation coefficient and mutual information scheme between each component and the original signals [18,25]. In addition, MSE and RCMMSE can only get the single-channel signal.
erefore, it cannot precisely display the overall signal information. e multivariate multiscale sample entropy (MMSE) was a scheme that gets different time series and conducts various embedding aspects, delay time, and amplitude ranges of data channels in a strict way. Hence, it can directly analyse multichannel data [26]. erefore, a scheme that relied on VMD and RCMMSE was proposed in this study to get the vibration feature of roller bearings.
With the advent of computer techniques, a lot of fault recognition algorithms, such as SVM, RF, and PNN [27,28], have been broadly applied to fault diagnosis. erefore, the SVM, RF, and PNN were resorted to fulfill the fault recognition.
As mentioned above, a scheme based on RCMMSEVMD and VMD is designed. At first, the vibration signals were disintegrated into series IMF/BL-IMF modes by the EEMD/ VMD. Secondly, the RCMMSE was utilized to figure out the entropy values. Finally, the SVM/RF/PNN models were utilized to achieve the fault recognition. e rest of the study is organized as follows. It is shown that the review of VMD and RCMMSE schemes, respectively, is presented in Section 2. e comparison of MSE/ RCMMSE and EEMD/VMD is presented in Section 3. Section 4 gives the procedures of the proposed scheme, experimental data sources, and parameter selection. Section 5 provides the experimental validation. Conclusions are given in Section 6.

Basic Principle of Variational Mode Decomposition.
e VMD process is divided into establishment and solution of variational constraint problems. Assuming there is a limited bandwidth in each mode, the variational problem is formulated as k mode functions u k (t), which could minimize the estimated bandwidth. e summation of each mode is constrained to be corresponded to the input signal f. In particular, the structure of this problem could be divided into the following steps.

Variational Problem Formulation
① Hilbert transform: the analytic signal of every mode function is obtained, whose purpose is to get its spectrum: ② e centre frequency e − jω k t of each modal signal is estimated. e spectrum of each mode is modified to be adaptive to its own baseband.

e Solution of the Problem
① e above constraint problem could be modified as a nonbinding problem by adding it to the penalty factor z and the Lagrangian multiplication operator λ(t), and here, the second penalty factor is used to guarantee the accuracy of the signal with the Gaussian noise, and the Lagrangian operator is adopted to ensure the constraint condition satisfied strictly, and the extended Lagrangian description is as follows: ② VMD used multiplication operator alternating direction method of multipliers (ADMMs) to solve the above variable problem. By alternately updating A, B, and C to seek the "saddle" of the Lagrangian expression: where ω k is equal to ω n+1 k , and t u i (t) is equal to i≠k u i (t) n+1 . Equation (5) is transformed to another domain by frequency by the Parseval/Plancherel Fourier equidistant transformation: e first item of ω with ω − ω k instead is as follows: Equation (7) is converted into a nonnegative integral form with frequency interval: In this case, the solution of the quadratic optimization problem is as follows: Based on the above process, the centre frequency value is converted to the frequency domain: e centre frequency updated scheme is as follows: where u n+1 k (ω) is equivalent to the current remaining amount of f(ω) − i≠k u i (ω) Wiener filtering. A is the central gravity of the spectrum of the current mode function. For inverse Fourier transmission of u k (ω), then the real part is u k (t) . e procedure of VMD algorithm is as follows: , and n. (2) Update u k and ω k followed by (9) and (11).
(3) Update λ, and (4) For a given accuracy e > 0, if e SE uses the measurement of the exponential function for two sequences with a tolerance r from m points to remain r of each other at the next point. For a time series with N sample points, the procedure of SE calculation is shown as follows: (1) e m length vectors X m (i) are formed: where m represents embedding dimension and X m (i) has m consecutive values. Commencing with the ith point and generalized by removing a baseline: (2) For each X m (i), the similarity degree between the X m (i) and its neighboring vector where d m i,j denotes the maximum absolute difference in the corresponding scalar components of X m (i) and X m (j).
(3) For each X m (i) and the fixed tolerance r, let A i be the number of vectors that satisfy, and then, B m i (r) is denoted as follows: (4) e average of the B m i (r) is designated as follows: where r represents the boundary width of the exponential function. (5) Increasing dimension m to m + 1, steps (1)-(4) are repeated to calculate the corresponding of SE values to find B m+1 (r), and SE is defined as follows: When N is finite, the SE can be estimated as follows: SE determines the time sequence irregularity on the single scale. e smaller the value of FE is, the higher time sequence self-similarity can be achieved. Conversely, the greater the FE value is, the more complicated time sequences without rules can be achieved.

Multiscale Entropy and Multiscale Sample
Entropy. ME could be represented as a series of times with different scales, which is obtained through a coarse-grained process. e irregularity of time series is then presented by the ME, where the self-similarity of various scales is reflected. Considering the scale entropy, when there is a sequence with a higher entropy than another sequence, the corresponding complexity will also be higher than the other. at is, if there is a case that the increasing scale of a time series conversely results in the decreased entropy value, there are more chances that the time series holds a relatively simple sequence structure. e MSE is obtained as follows: (1) Consider a discrete-time series X m (i): Firstly, the value of embedding dimension m and similar tolerance r of the SE are set. Another auxiliary time sequence is then constructed in a vector form, which is presented as y τ k � y τ k,1 , y τ k,2 , . . . y τ k,p and named coarse-grained vector.
where τ is the scale factor. Actually, we can find that when τ � 1, the coarse-grained time series is equal to the previous one X m (i) , which means that by dividing the length of time series, we can decompose the original time series into a coarse-grained vector series y τ k . It is worth mentioning that the number of coarse-grained time series y τ k is τ and the length is N/τ.
(2) Based on the different scales τ of the time sequence, the SE values are obtained. Generally, the r in SE takes the standard deviation of the primary time series when calculating the value of SE. is procedure is called MFE analysis.

Refined Composite Multiscale Sample Entropy.
In the process of applying coarse graining, the length of the original time series is declined by introducing a factor of τ, as shown in equation (13). For example, the authors developed the RCMMSE algorithm with the aim to boost the accuracy of the MSE. In particular, the SE of all the time series is obtained at a certain scale of factor τ. e RCMMSE value is defined as follows: . us, the RCMMSE value can be obtained as follows:

Refined Composite Multivariate Multiscale Sample
Entropy. Given time series X p m (i) with p variables, the RCMMSE is calculated according to the following procedures: (1) Each coarse-grained time series y τ k � y τ k,1 , y τ k,2 , . . . , y τ k,p } requires to compute the RCMMSE, the multiple embedding vectors M, and therefore, the original X p m (i) is reconstructed using the multiple embedding vectors M.
(3) For a composite time delay X p m (i) and threshold r, the parameter P i is the number of the vector pairs, and therefore, D[X p m (i), X p m (j)] ≤ r, j ≠ i, and then, the probability B m i (r) RCMMSE is computed as follows: Increasing dimension m to m + 1, steps (1)-(4) are repeated to get the corresponding SE values to find e calculation of the RCMMSE is defined as follows:

Comparison of EEMD/VMD.
To compare the EEMD and VMD models, a simulation signal is used to prove that the VMD model is better than EEMD. e simulation signal X(t) is superimposed by the multifrequency signal and the random noise signal with the standard deviation of 1, and the following formula is the signal definition.
e time-domain waveforms and its envelope spectrum of the original signal are given in Figure 1. As shown in Figure 1(b), the main frequency of the original signal focuses on 0-200 Hz, especially 30.27 Hz. en, EEMD and VMD were used to decompose the aforementioned signal.
To achieve the VMD, firstly, the number of modes k should be determined. It should be noted that if the value of k in VMD is chosen too tiny, the original signal with time frequency could not be fully captured by the decomposition of the mode. Larger k values produce the similar frequency between the BL-IMF components, and they may be overdecomposition. erefore, we use the observing centre frequency of the signal scheme to select the applicable k in Wireless Communications and Mobile Computing this study [19]. e results of the centre frequency corresponding to different k values are given in Table 1.
As shown in Table 1, there is a centre frequency difference of less than 0.008 kHz from BL-IMF3 to BL-IMF5 modes (0.0636 kHz in BL-IMF3, 0.0582 kHz in BL-IMF4, and 0.051 kHz in BL-IMF5) when k � 5. ese three modes exist similar frequency. erefore, the number of k values is selected as 4 in this study to decompose the above simulation signal.
e EEMD/VMD models were used to decompose the original signal, and the results of IMFs (EEMD) and BL-IMFs (VMD) are given in Figures 1(a) and 2(a). Figures 1(b) and 2(b) are the corresponding spectrum analysis results. As shown in Figure 2, each BL-IMF component is mainly distributed around a single frequency (30.27 Hz), compared with VMD in Figure 1(b), and some IMF components in Figure 2(b) have a series frequency from IMF4 mode. e mentioned result shows that the decomposition effect of EEMD algorithm is not ideal for the multicomponent synthesis simulation signal, and the mode mixing is serious. Because some slight signals drown in the signal, which is to be decomposed, EEMD in the selection process of the three-spline envelope fitting leads to decomposition bias. e weak signal is embedded in the vast majority of the strong signal where the EEMD can be filtered and extracted, but when the weak signal appears only in the maximum slope range of the strong signal, the weak signal will be in the form of wave frequency modulation and does not produce additional local extreme points. erefore, EEMD is difficult to extract the useful components and easy to produce some components with mode mixing.
It can be seen from Figure 1(b) that VMD can not only effectively remove the dummy components, but also each BL-IMF exhibits a mode in the range of a certain scale, and there is no mode mixing problem between them. erefore, the scale characterization and decomposition effect of VMD and better than EEMD, and this indicates that VMD has good robustness Figure 3.

Comparison of Validity of MSE/RCMMSE.
In this section, the 1/f noise signal was employed to demonstrate the superior performance of the RCMMSE compared with MSE. Resorting to MSE and RCMMSE, the probabilities of inducing undefined entropy were presented to address the white noise and 1/f noises with 200 samples. For each sample, the length is chosen as 1000. e results of the corresponding probabilities are shown in Table 2. From Table 2, we can see that the probability of undefined entropy showed the same tendency as the time scale increased, while there was a decrease in the length of the time series. As given in Table 1, the probability of undefined entropy is about zero when they were used to analyse white noise in the MSE model. On the contrary, the probability of inducing undefined entropy is 0.01 when τ � 4 for all 200 1/f noise samples. In the MSE algorithm, the entropy is undefined when B m (r) or B m+1 (r) is zero. However, the RCMMSE can successfully get the values of entropy from 1 to 20 when the white and 1/f noises are considered. Hence, the RCMMSE model is superior to MSE.

Comparison of Accuracy in MSE/RCMMSE.
In this section, the efficiency of the MSE/RCMMSE models was verified through case studies. In each simulation study, we employ noise samples with 1000 data points containing white and 1/f noises. e results of MSE/RCMMSEs are presented in Figure 4. Meanwhile, Figures 5 and 6 present the results of means and standard deviations of entropies.
As seen in Figure 4(a), the overall recently RCMMSE model shows higher performance compared with the MSE model in the condition of white noise, resulting from the tolerance r in SE, which is utilized to evaluate the similarity between any two time series. Note that r was often chosen as 0.15 × SD of the original time series.
It can be seen from Figures 5(a) and 5(b) that the means of the entropy values obtained using the MSE/RCMMSE are nearly equal in white noise. Nevertheless, the means of the entropy values of the MSE are higher than that of the RCMMSE (see Figure 5(b)). Figure 6 shows that the standard deviation of RCMMSE is all lower than that of MSE, and this result indicates that the entropies obtained using the RCMMSE algorithm were more consistent than those obtained using the MSE algorithm.
With the purpose of studying the relationship between the data length and the effectiveness of the MSE/RCMMSE models, several different data lengths (N � 600, 1200, 1800, and 2400) are used to get the entropies. In Table 3, the means and standard deviation of the entropies are presented. It can be found that the results of the means of entropies by MSE and RCMMSE are nearly equivalent with the long primary time series (N � 2400). However, with short original time series, the results of means show a significant difference. On the other hand, the results of the RCMMSE algorithm hold the lowest error when the discrimination between the entropy of 1/f noise with 500 data points and the entropy of 1/f noise with 2000 data points is considered. at is to say, the RCMMSE has higher performance compared with other algorithms at any length of the time series. In conclusion, the RCMMSE is more reliable than MSE according to the above discussions.

e Dataset Source.
e performance of the proposed scheme is verified through experiments in this part. Case Western Reserve University Bearing Data are used, which was obtained from an induction motor. It should be noted      Wireless Communications and Mobile Computing the data acquisition system suitable for the vibration signals, the amplifier is particularly designed with a high bandwidth. On the other hand, to improve the accuracy, a sampling frequency of 12000 Hz of each channel was set for the data recorder. Table 4 demonstrates different working conditions, which are considered in this experiment. Note that 0.1778 mm is the fault tolerance. "Normal" and "slight" denote the fault severity. In addition, considering the drive end of the motor 1785 rpm is chosen as the motor revolving speed. In total, 200 data samples are adopted with each fault condition consisting of 51 samples. In particular, in each data sample, there are 2048 data points.

Procedure of Our Proposed Scheme.
e procedure of the proposed scheme can be shown as follows: (1) Resorting to EEMD and VMD models, the vibration signals under different cases were disintegrated into a sequence of IMF and BL-IMF modes.
(2) Using the RCMMSE model to get the entropy values, the entropy values were regarded as the input of SVM/RF/PNN models for training and testing.        Figure 7.

Parameter Selection.
ere are some parameters in different models that should be present before its calculation: (1) EEMD: considering EEMD, there are two parameters that need to be determined, namely the ensemble number m and n i (t) related to the white noise. In particular, n i (t) denotes the amplitude. It should be mentioned that there is a one-to-one correspondence between the result and the ensemble number. Note that the ensemble number should be chosen as a few hundred. In addition, consider the situation that the input signal holds the standard deviation, and a fraction of one percent of error will be caused by the remaining noise with the added noise. In particular, m is set as 100. In particular, the dynamic process will be reconstructed in a more detailed manner with a larger m. However, it should also be mentioned that if m is chosen too large, the need of N � 10 m − 30 m would be greatly limited, which will cause the loss of important information, and the general condition will be hard to satisfy. In the literature, m is often set as 2. Another two parameters, namely, similarity tolerance n and nr, are dependent on the gradient of the exponential function and its bound, respectively. Experimentally, r � (0.1 − 0.25)SD, and r is chosen as 0.15SD in SE. ② RCMMSE: the multiple embedding vector in RCMMSE scheme is set as M � [2 1 , 2 2 , 2 3 , . . . , 2 k ], and here, k is the number of the BL-IMF components, and therefore, the time delay is set as 1.
(4) SVM: the kernel function in SVM is selected as the radial basis function (RBF). (5) PNN: the distribution density of PNN is set as 1.5. (6) RF: there are two parameters that need to be confirmed before using RF model, such as the number of input variables mtry is selected randomly based on the M input variables. Regarding the scale factor τ as 20 and EEMD/VMD-RCMMSE as feature, the number of input variables M � τ � 20, and the parameters often meet the condition mtry ≤ �� M √ [15]. erefore, we let mtry � 4 and the number of the DT is set as 4 and 1000 in this simulation.  Figure 8. e vertical axis is the acceleration vibration amplitude. Because of the influence of noise, it is difficult to find significant differences in different states. As shown in Figure 8, it is hard to distinguish the four signals; in particular, there is no obvious regularity in two states of NR and BF signals. e EEMD and VMD were used to disintegrate the vibration signals into series modes (IMF and BL-IMF). Here, we use the IRF vibration signals to observe the frequency change situation, and the waveforms of one IRF vibration signal and its spectrum envelope are shown in Figure 9. As shown in Figure 5(b), the IRF signal fault frequency is 164.1 Hz. e results of the centre frequency under different k values with VMD are given in Table 5. ere are some modes, such as BL-IMF4 (0.2787 kHz) and BL-IMF5 (0.3071 kHz), and with similar centre frequencies, when k � 5 in Table 5, it is considered that there is over-decomposition. erefore, the number of k values is selected as 4 to decompose the vibration signals.

Experimental Results and Analysis
Employing EEMD/VMD, the roller bearing primary signals in Table 4 are decomposed into IMF1-10 and BL-IMF1-4. Figures 10 and 11 show the VMD and the corresponding spectrum of each mode for an IRF signal.
It can be shown from Figure 6(b) that the VMD can not only effectively remove the dummy components, but also each IMF in the range of a certain scale, and there is no mode mixing problem between them. erefore, the scale characterization and decomposition effect of VMD are better than EEMD, and this result reveals that the VMD owns a good robustness. e correlation coefficient scheme is utilized to prove the degree of correlation between each mode and the primary signal to further prove the effectiveness of VMD scheme. In Tables 6 and 7, the overall average correlation values of BL-IMF component and original IRF signals are higher than  IMF. erefore, the decomposition effect of VMD is superior to EEMD.
After the vibration signal feature extraction using VMD and EEMD, the RCMMSE was employed to get the entropy value. e value of VMD-RCMMSE values under different k values and scale factors is shown in Figure 12. Figure 12 shows that the RCMMSE values of the VMD effective mode decrease with the increasement in the scale factor and the number of modes k, and the overall RCMMSE values are close to the steady state when the value of k is more than 3, because the VMD algorithm decomposes the original signals into several BL-IMFs with information of different frequency bands. is can also be supported from Table 5. e correlation coefficient of most EEMD modes and the original signal is much smaller than that of VMD mode. is indicates that the VMD algorithm can overcome the mode mixing shortcoming in EEMD. e entropy values of EEMD-RCMMSE and VMD-RCMMSE (k � 4) are shown in Figure 13.
As shown in Figure 13(a), it is difficult to distinguish the four types of roller bearing signals with EEMD-RCMMSE, especially the NR and IRF, BF, and ORF. On the one hand, the complexity of signals under normal states and the conditions with ball fault is similar. In other words, the feature of the two kinds of signals is alike. On the other hand, much noise hidden in the signals may affect the identification of them. However, the RCMMSE curves of BL-IMF components can recognize the four states clearly in Figure 13(b).      classification accuracy. Part of the experiment classification results is shown in Figure 14 and Table 8.

Conclusion
e VMD analysis model and RCMMSE are proposed in this study. e VMD model is utilized to analyse the complex vibration signals. erefore, BL-IMF can be obtained from the decomposition of the roller bearing vibration signals, the RCMMSE can deal with multichannel data, and the EEMD/ VMD-RCMMMSE values are regarded as eigenvectors. Because the VMD can achieve adaptive subdivision of each component in the frequency domain of signals, the roller bearing vibration signals are distinguished. Finally, the VMD-RCMMSE-SVM/RF/PNN combination models can distinguish roller bearing fault types effectively.
With the rapid development of Internet of things (IoT) [1,2] and Industry 4.0 [3], there are increasingly massive real-time data from various types of mechanical equipment [4]. e availability of these data that contain abundant information about machine health has attracted more and more enterprises' attention. It has been proved that large volume, high velocity, and diversity mechanical big data are the major properties of mechanical big data [5,6]. Effective feature extraction from these data and accurate machinery health state evaluation with ever-accelerated updating of schemes have become hot research issues in the prognostic and health management systems in the era of industrial IoT [7].

Future Ideas
Most related roller bearing signal fault diagnoses depend on manual aids and expert knowledge. To diagnose more smartly, convolutional neural network (\textsc{CNN}), as well as long short-term memory (LSTM), will be considered for an end-to-end scenario. For ensuring high recognition accuracy, the fault classifier based on CNN may be used to extract the effective signal from the severely distorted signal; the LSTM could cope with time-varying bearing failure. Additionally, the aforementioned SVM fault classifier performs well when the data samples are insufficient. In the future, we may consider the generative adversarial networks (GANs) and transfer learning for lack of data. By simulating the distribution of fault samples, GAN could obtain more data. Besides, the transfer learning-related diagnosis model can reuse the previous information in the new task. Consequently, small-sized samples could achieve accurate fault identification.
Data Availability e prior studies and data are cited at relevant places within the text as references [20].

Conflicts of Interest
e authors declare that they have no conflicts of interest.