Research on Novel Bearing Fault Diagnosis Method Based on Improved Krill Herd Algorithm and Kernel Extreme Learning Machine

,


Introduction
e rolling bearing is an indispensable part of the most rotating mechanical equipment.e loss of life and property and damage to the environment caused by the failure of the rolling bearing are very serious.In order to avoid bearing failure as much as possible, we can improve the bearing design, study and apply better materials and new technologies in the design and manufacturing stage, strengthen the quality control measures in the production process, and improve the assembly level of the bearing.In addition, another way to prevent catastrophic events of the rotating machinery equipment is to monitor and diagnose the working state of the equipment in real time so as to achieve e ective control of the equipment.erefore, it is of great signi cance to monitor the working state of the bearing in real time.
In recent years, various novel methods have been widely used to solve practical engineering problems [1][2][3][4][5].And learning these novel methods [6][7][8][9][10] provides many ideas for fault diagnosis.In the eld of bearing fault diagnosis, novel intelligent fault diagnosis methods emerge one after another in recent years, namely, the method based on statistics having Pearson's correlation coe cient (PCC) [11], the method based on signal processing having modi ed variable modal decomposition (MVMD) [12], improved ensemble local mean decomposition (IELMD) [13], maximum kurtosis spectral entropy deconvolution (MKSED) [14], regression residual signal based on improved intrinsic timescale decomposition [15], enhanced singular spectrum decomposition (ESSD) [16], weighted cyclic harmonic-tonoise ratio [17], time-frequency analysis [18], multipoint optimal minimum entropy deconvolution adjusted (MOMEDA) [19], and so on [20].In recent years, with the development of big data, machine learning methods and deep learning methods have been widely used to solve practical engineering problems [21][22][23][24][25][26].Machine learning methods or deep learning methods were applied in the field of bearing fault diagnosis, including the support vector machine (SVM) [27,28], BP neural network (BP) [29], deep convolutional transfer learning network (CNN) [30], and kernel extreme learning machine (ELM) [31,32].ese methods are good for fault diagnosis in most cases, but some are more subjective in choosing parameters.For example, the method used in reference [31] is highly subjective in selecting input weights and biases.erefore, it is necessary to introduce the parameter optimization algorithm in fault diagnosis.
In the era of big data, many engineering problems need feature extraction.For example, Yin et al. used multiobjective feature extraction optimization to detect M/OD HVI damages [33].And, in the bearing fault diagnosis, the first step is how to collect useful information from the fault bearing.At present, there are many practical engineering problems that take vibration information as useful information, and there are endless methods for collecting vibration information [34][35][36][37][38].And most scholars also collect bearing vibration information in the bearing fault diagnosis [39].en, how to extract the characteristic information from the vibration signal is the primary problem of intelligent fault diagnosis based on machine learning.As a tool to describe the uncertainty of signals, entropy has been widely used in recent years to extract fault features of rolling bearings [40,41], such as sample entropy (SampEn) [42] and permutation entropy (PerEn) [43].However, SampEn and PerEn need to calculate the distance between the two embedded dimensions and the embedded vector of each sample separately, which greatly increases the time for extracting features.In addition, most useful information cannot be extracted from a single time scale when analyzing the time series.In engineering practice, the optimal time scale of the original signal is often unknown.In order to solve this problem, the multiscale operation is introduced into feature extraction.Multiscale allows entropy to be extended to multiple time scales to provide an additional perspective when the time scales are uncertain.Like other entropy measurement methods, multiscale entropy aims to evaluate the complexity of the time series.One of the main reasons for using multiscale entropy is that the relevant time scales in the time series are not known.erefore, analyzing the problem over multiple time scales will obtain more information.Moreover, bearing fault diagnosis based on multiscale entropy has been widely used in the field of intelligent fault, such as multiscale fuzzy entropy (MFE) [44] and multiscale permutation entropy (MPE) [45].Rostaghi and Azami have proposed dispersion entropy (DE) [46] and multiscale dispersion entropy (MDE) [47].MDE does not need to sort the amplitude of each embedded vector nor does it need to calculate the distance between any two compound delay vectors with embedded dimensions m and m + 1, which makes DE and MDE faster than PerEn, PerEn, and MperEn, significantly.Moreover, MDE has obvious advantages in distinguishing different types of dynamic signals, so MDE is more suitable than other methods for the extraction of the bearing vibration signal.
e result of feature extraction of each sample forms a feature vector.How to classify the feature vector accurately is the most critical problem of intelligent fault diagnosis.ree common classification methods based on machine learning are the BP neural network (BP), support vector machine (SVM), and extreme learning machine (ELM).However, in the traditional BP neural network, a large part of the network training parameters need to be set artificially, which is very random.In addition, BP is adjusted by the gradient descent method, which not only is easy to get into the local optimal but also suffers slow convergence [48].More importantly, this operation is likely to lead to training failure.On the contrary, because the network structure model is not single, the parameters of the neural network need to be constantly adjusted in each iteration, so the diagnosis efficiency is not high, and it will easily fall into the phenomenon of underlearning or overfitting [49].e SVM has great advantages in solving binary classification problems with small sample size [50], but it is difficult to implement large-scale sample training and solve multiple classifications [51].However, the fault diagnosis of the rolling bearing is often a multiclassification problem, which means it is difficult for an SVM to implement the fault diagnosis of the rolling bearing and the high-level accuracy rate is difficult to reach.e ELM is a novel algorithm based on the single-hidden-layer feedforward neural network, which has the advantages of the simple mathematical model, global optimal solution, fast learning speed, less parameter selection, and high generalization, which has been applied in many fields [52].However, the ELM randomly generates input weights and hidden-layer thresholds, resulting in instability of the algorithm.In order to solve this problem, the kernel function is introduced into the basis of the ELM to obtain the kernel extreme learning machine (KELM).e KELM solves the problem of random initialization in the ELM algorithm, and the number of nodes in the hidden layer can be determined accordingly, instead of being specified manually.In addition, in solving some practical engineering problems, the KELM shows high classification accuracy, good promotion ability, and high robustness [53].
In the intelligent fault diagnosis, the parameter optimization of the diagnosis method is also a key point.For example, particle swarm optimization (PSO) was used to optimize the filtering coefficient of deconvolution [54] and ant colony algorithm [55].Gandomi proposed a krill herd optimization algorithm (KH) to solve the optimization problem in 2012 [56], which is based on the behavior of constantly updating the position of krills because of the predation of krills in nature.
is algorithm can adjust the participation amount of exploration and development by observing the progress of solving the problem through various steps dynamically.In addition, Gandomi also compared the KH with eight common optimization methods [56], and the final result showed the KH converges faster than other optimization algorithms and does not easily fall into the local optimal.So, the KH can effectively solve various optimization problems and is superior to other 2 Complexity optimization algorithms.In addition, the online optimal method can also be used to solve such problems, and the research on online optimal methods is endless.For example, Yin et al. proposed the multivariate extremum seeking approach with the Newton method [57] and the multivariate fractional-order gradient-based extremum seeking approach [58], and the authors also verified their feasibility and advantages.erefore, the online optimal method will be an important direction of parameter optimization in the future.
In the above context, the proposed optimization algorithms belong to the optimization algorithm of the population class but generally belong to the optimal problem [59].at is to say, when an individual is in an optimal position, but other individuals cannot know whether this position is the local optimal or the global optimal, then all other individuals will move towards this optimal position and the whole population will fall into the local optimal eventually.At present, in order to solve this problem, the inertia weight of some individual movements can be increased so as to jump out of the local optimal through the high weight motion.But increasing the weight of a motion only works in one individual or one position adjustment, and it not only does not work in other positions but also may be the reason for falling into the local optimal.When an individual falls into the optimal position, the gradient of its position function changes greatly.If the individual moves than it normally does at the next position more, it is still near the optimal position after the move.In other words, each individual has an impulse associated with the previous step, which means the impulse is large when the previous step is large or the impulse is small when the previous step is small.However, the individual trapped in the optimal still does not rush out of the current optimal state after a certain number of moves, so it can be considered that this position is the global optimal position.In addition, it is also an effective method to improve the optimization algorithm by improving the initialization of the population.
e OBL method proposed by Tizhoosh is one of the most effective methods to improve the initialization of population [60].
is algorithm increases the diversity of the search range and global search ability by adding the opposite population in the initial population, which avoids the whole population falling into the local optimal from the origin.When the whole population is searching for the global optimal, if the current optimal individual falls into the local optimal, then it may mislead other individuals to also enter the local optimal.However, the opposite of the individual does not usually go with it, but away from it.erefore, the OBL method can effectively prevent the whole population from falling into the local optimum to some extent.
Under the same working conditions and the same bearing type, the vibration signals between different fault types are different because of different bearing fault types and fault diameters [61].In addition, because the gearbox load is often changing according to the actual work requirements, the bearing of the axial and radial loads is constantly changing in practice.Too large or too small axial and radial loads may aggravate the vibration of the bearing [62] or increase the noise, or even produce screaming [63], which may lead to large fluctuation of the vibration signal and further affect the classification performance of the algorithm.In other words, an algorithm may only work properly under one load and not under other loads.In the actual gearbox fault diagnosis, in order to detect the health condition of a gearbox bearing, the load is deliberately set to a certain value, that is to say, to stop the normal operation of the gearbox to detect the health condition of the bearing, which will greatly reduce the economic benefit of the gearbox.erefore, it is of great significance to consider various load conditions when studying intelligent fault diagnosis of bearings.
Based on the above, this paper proposed a novel bearing intelligent fault diagnosis method based on the kernel extreme learning machine (KELM).A novel krill herd algorithm (NKH) is used to optimize kernel function parameters σ and the error penalty factor C in the KELM.Opposite-based learning (OBL) and impulse operator are introduced in the optimization algorithm to improve the global search ability of the individual krill and prevent the krill group from falling into the local optimal and increase the robustness of the algorithm.Firstly, in order to verify the correctness of the algorithm proposed in this paper, the bearing data set of Case Western Reserve University is used for this experiment, and the influence of different loads on the experimental results is considered in this experiment.
en, in order to test the performance of the intelligent fault diagnosis method proposed in this paper, it is compared with other methods based on machine learning.

Feature Extraction
2.1.Multiscale Operation.In the analysis of fault vibration signals, a more appropriate choice of the time scale usually means that more useful information can be obtained from the original signal.However, in the engineering practice, the best time scale of the original signal is often unknown.In order to solve this problem, the signal of multiple time scales should be considered comprehensively.
Multiscale allows entropy to be extended to multiple time scales to provide an additional perspective when time scales are uncertain.Like other entropy measurement methods, multiscale entropy aims to evaluate the complexity of the time series.
erefore, analyzing the problem through multiple time scales will obtain more information.For a given time series x � x 1 , x 2 , x 3 , . . ., x N   whose length is N, the conversion equation of the multiscale operation is as follows: where u (τ) j represents the jth element in the sequence u with the time scale of τ, N represents the length of the sequence u, and x b represents the bth element in the sequence x.
A multiscale coarse granulation operation with a time scale of 2 and 3, respectively, is shown in Figures 1 and 2.

Dispersion Entropy (DE).
Dispersion entropy can be used to express the dispersion degree of a time series.For a given time series x � x 1 , x 2 , x 3 , . . ., x N   whose length is N, the DE consists of the following four steps: Complexity (1) Firstly, x is linearly mapped to a sorting sequence of positive integers from 1 to c.However, considering the irregularity of the signal, that is to say, whether the maximum or minimum value of a signal is much larger or much smaller than the mean or median value of the signal, the signal is mostly concentrated in a subset of the sequence.erefore, x is mapped to a sequence y � y 1 , y 2 , y 3 , . . ., y N   from 0 to 1 through the normal cumulative distribution function (NCDF).
e equation of the normal cumulative distribution function (NCDF) is as follows: where σ is the standard deviation of the sequence and μ is the mean of the sequence.Secondly, after obtaining the sequence y, it is linearly mapped to z, and the equation is as follows: where c represents the number of types of classifications, z c j represents the jth element of the sequence, and round(•) represents the rounding operation.
(2) e equation of the embedding vector z m,c i is as follows: where i � 1, 2, 3, . . ., N − (m − 1)d, m is the embedding dimension, and d is the time delay.en, each embedded vector is mapped to the dispersion pattern; the mapping relationship is as follows: So there are c m different dispersion patterns.(3) For each possible dispersion pattern, the equation of its relative probability is (4) According to the definition of entropy proposed by Shannon, dispersion entropy can be obtained as follows: (5) Finally, the standard dispersion entropy is obtained from the dispersion entropy standardization equation.It is expressed as follows: Here is a simple example: supposing there is a time series x � 9, 8, 1, 12, 5, − 3, 1.5, 8.01, 2.99, 4, − 1, 10 { } whose length is 12, the embedding dimension m is 2, the time delay d is 1, and the types of classifications c are 3. e sequence y � 0.82, 0.75, 0.21, 0.94, 0.52, 0.05, 0.241, 0.75, 0.35, 0.43, 0.11, { 0.87} is obtained by the standard normal cumulative distribution function.And the classification sequence is z � 3, 3, 1, 3, 2, 1, 1, 3, 2, 2, 1, 3 { }.In this example, the possibilities of the dispersion pattern are 3 2 � 9, that is, Its detailed equation is as follows: (2) Figure 1: Multiscale coarse granulation of the time series with scale 2.

Multiscale Dispersion Entropy (MDE).
When dealing with the fault of the bearing vibration signal, it is often difficult to know the most suitable time scales, and signal nonstationarity and irregularity tend to be very strong; in order to solve this problem well, Azami et al. proposed multiscale discrete entropy (MDE) in 2017 [47] and proved that the MDE approach based on processing of the nonstationary signal has a strong ability of feature extraction.
e flow of MDE is shown in Figure 3.

Novel Krill Herd Algorithm (NKH).
Gandomi proposed the krill herd algorithm (KH) in 2012 by studying krill herd activity rules [56].In this algorithm, the solution of the optimization problem is expressed by the position state of the krill, and the optimal solution is constantly sought by the change of the individual position in the process of krill foraging.e Lagrange model is used to describe the Kth location change of the ith krill in a group of krills including L krills during the foraging process: where N i is the induced movement among krills, F i is the foraging movement among krills, and D i is the random disturbance of krills.e induced motion velocity N (k) i of the ith krill is expressed as follows: where N max is the maximum induced velocity; ω n is the inertia weight (ω n ∈ [0, 1]); α (k)   i is the krill movement induced by the direction; α (k)  ilocal is the local impact from the neighbor krill; α (k)  itarget is the current local influence from the optimal krill; K i and K j are the fitness function values of krills i and j, respectively; K worst and K best are the worst and the best fitness function value of the current krill, respectively; K ij is the fitness function value of the ith krill related to the jth krill; X ij is the position of the ith krill related to the jth krill; and ε is a normal number which avoided the singularity of equation (18).Complexity e foraging movement F (k) i of the ith krill is represented by the following equation: where v f is the speed of foraging; i is the direction of the last foraging; β (k)  ifood is the attraction of food to the ith krill; β (k)  ibest is the influence of the optimal krill on the ith krill until the current moment; and ω f is the inertia weight (ω f ∈ [0, 1]).
e random disturbance D (k) i of the ith krill is expressed by the following equation: where D (k) max is the maximum velocity of random diffusion and δ is the direction vector of random diffusion disturbance In order to improve the global search ability and convergence speed of the krill herd algorithm, the opposite population in the initialization of the krill herd is added.Assuming that the position of the krill is rough opposite-based learning (OBL), the diversity of the krill group is increased so as to increase the exploration scope and robustness of the krill group.
ere may not only be one optimal solution in the Ddimensional space that describes the krill herd algorithm; that is to say, there are many local solutions.e impulse operator is introduced to make a krill whose previous movement is very large have a high velocity in the next movement so as to rush out of the local optimal solution.So equation ( 8) can be improved to get the following equation: where p is the impulse coefficient and (d 2 X old i )/(dt 2 ) is the acceleration of the ith krill in the last movement.e flow chart of the NKH is shown in Figure 4.

e Kernel Extreme Learning Machine Optimized by Novel Krill Herd Algorithm (NKH-KELM
).Consider a sample data set (x i , t i ) whose capacity is N, where x i � [x i1 , x i2 , x i3 , . . ., x iN ] T is the sample input data, N is the length of each vector x i , and t i � [t 1 , t 2 , t 3 , . . ., t i ] T is the sample output value.For a single-layer forward neural network (SLFN) with L nodes of the hidden layer, its flow is shown in Figure 5.
In Figure 5, h(a i x i + b i ) is the activation function, a i is the weight vector between the ith node of the hidden layer and the node of the input layer, b i is the bias of the ith hidden layer, β i is the weight vector between the ith node of the hidden layer and the output layer, and y i is the output value.So the output equation is When the actual output of the extreme learning machine can approximate the expected output infinitely, in other words,  N i�1 ‖y i − t i ‖ � 0, e output of the hidden layer is represented by a matrix

􏼢
, and T is the expected output vectors.e least-square method is used to determine the output weight vector β i of the ELM.Its equation is as follows: where H + is the Moore-Penrose generalized inverse matrix of the hidden-layer output matrix, C is the penalty factor, and I is the identity matrix.In order to prevent the roots from deviating from zero, a constant matrix I/C is added to the matrix HH T to improve the stability and generalization ability of the results.In terms of classification, for any linear indivisible data set X in a low-dimensional space, there is always a mapping k that maps the samples to a high-dimensional feature space X ⟶ k(x), which enables the data set to be linearly separated.However, the high-dimensional feature space will greatly increase the calculation amount, and more importantly, the generalization performance of the algorithm will decline with the increase of the dimension, which is very unfavorable to the intelligent fault diagnosis of bearings.In other words, an algorithm is only applicable to intelligent fault diagnosis of bearings under one working condition, but it is often not applicable to intelligent fault diagnosis of bearings under another working condition.e kernel extreme learning machine (KELM) is borrowed from the idea of the support vector machine (SVM) and adopts kernel function to replace the feature mapping of the ELM hiddenlayer node, which can avoid the problem of the dimension disaster to a certain extent.Compared with the ELM, the KELM does not need to artificially determine the number of hidden-layer nodes and also avoids the randomization operation of input weight and input bias, thus improving the generalization ability and classification accuracy of the model.In the KELM, only the appropriate kernel function parameters need to be selected to obtain the output weight.
For the KELM, the output vector is expressed as follows: e kernel function is expressed as follows: In the selection of kernel function, since there is no prior knowledge about the classification of fault bearings, and the 6 Complexity KELM is often expected to have a strong ability to explore the local optimum after a certain time of use, radial basis function (RBF) is selected as the kernel function of the KELM [64].e equation of the Gaussian kernel function is as follows: where x i is the input vector during training, x j is the input vector during testing, and σ is the width parameter of the kernel function.
So the output equation of the KELM is where L is the dimension of the input vector.
Considering that the kernel function parameters σ and the error penalty factor C in the KELM have a great influence on the results, the novel krill herd algorithm (NKH) is used to optimize the two parameters to improve the classification accuracy and robustness of the KELM.
e generalized mean-square error GMSE can directly reflect the regression performance of the KELM [65].e equation of the GMSE is where n is the number of test samples and y i ∧ and y i are the estimated and actual values of test samples, respectively.In this paper, the KELM algorithm is used to solve the classification problem, and the accuracy of the algorithm classification can be verified by 10-fold cross-validation.In other words, the data set is divided into ten parts, and turns are taken to use nine of them as training data and the remaining one as test data, which can be used to evaluate the performance of parameters, reduce overfitting to a certain extent, and obtain as much effective information as possible from limited data to improve model performance [66].
en, the GMSEs which are obtained ten times are averaged.Finally, the result is used as the fitness function K: where k is the number of experiments and is taken as 10.
From what has been discussed above, the flow chart of NKH-KELM is shown in Figure 6.As can be seen from Figure 6, the fault diagnosis method proposed in this paper consists of three parts, namely, fault feature extraction, NKH-KELM training, and NKH-KELM testing.In feature extraction, vibration signals are collected by accelerometers, sample design is carried out, and MDE feature extraction is finally carried out so as to extract useful information from the original vibration signals.In NKH-KELM training, the NKH with OBL and impulse operator is used to optimize the width parameters σ and penalty factors C of kernel function, and then the feature space extracted from the training set is imported into the KELM for training.In the testing of NKH-KELM, the feature space extracted from the feature set is imported into the classifier which is trained in the previous step so as to conduct fault diagnosis and verify the feasibility of the method proposed in this paper.

Input layer x i
Hidden layer

Fault Diagnosis Experiment Based on Case Western
Reserve University Bearing Data Set 4.1.1.Description of Data.In order to verify the feasibility of the proposed method, this paper adopts the test bearing data set provided by Case Western Reserve University to carry out the experiment.e program code of all algorithms was written in MATLAB R2016a, and then it was run on a computer with CPU i7-5500U@2.40GHz, RAM 12.00 GB, and 64 bit Win10 operating system.As shown in Figure 7, the bench consists of a 2 horsepower motor (left), a torque sensor/encoder (center), a dynamometer (right), and other control equipment (not shown in Figure 7).e motor shaft is supported by test bearings.
e single point failure of all test bearings was caused by Electric Discharge Machining (EDM), and the fault diameters were 0.007 inch, 0.014 inch, and 0.021 inch, respectively.
e driving-end bearing (this paper only considers the driving-end bearing data) adopts the 6205-2RS bearing of SKF company.All vibration data were collected by an accelerometer, which was placed at the drive end of the motor housing with the sampling frequency of 12 kHz.
e larger the fault size, the greater the bearing fault degree.0.007 inch, 0.014 inch, and 0.021 inch are considered in this paper.In addition, for the motor load, the working conditions with no load (0 hp), small load (1 hp), and large load (3 hp) are considered.For the convenience of the following instructions, a notation is used to represent the tested data.For example, IR014_3 represents the bearing of an inner ring failure type with a load of 3 hp and a failure diameter of 0.014 inch.Table 1 shows the specifications of test bearings.
As can be seen from Table 1, there are 30 kinds of test bearing data sets.Since each fault type corresponds to 3 fault diameters besides a healthy bearing, there are 10 kinds of bearings.For each test bearing data set, every 5000 sampling points are selected as one sample, so each data set has 24 samples and the whole data set can be divided into 720 samples.e time-domain graphs of a sample in the bearing data of each test under three loading conditions are shown in Figures 8-10, respectively.
From the above time-domain diagram, it can be clearly seen that, under the same load, the same fault type, and different fault diameters, the peak size and peak occurrence time of the time-domain diagram are significantly different.For example, when the load is 3 hp, there are three different time-domain diagrams corresponding to the ball fault.In addition, in the case of the same fault type and fault diameter under different loads, there are obvious differences in the peak size and peak occurrence time of time-domain graphs.Moreover, as can be seen from Table 2, the maximum value of the signal may increase or decrease with the increase of load.For example, in the rolling body fault with the fault diameter of 0.007 inch, with the increase of load, the maximum value of the signal is 3.4254, 3.0960, and 3.1301 successively.Moreover, with the increase of load, the minimum value, peak value, and root-mean-square value all change to some extent.In addition, with the same load and fault type, but with different fault diameters, the peak values of time-domain graphs also have obvious differences.
erefore, this further proves the necessity of considering load and fault diameter in intelligent fault diagnosis.

Fault Bearing Classification in Case Western Reserve University Bearing Data Set Based on NKH-KELM.
According to the intelligent fault diagnosis method proposed in this paper, firstly, MDE is used to extract multiscale features from the original vibration signals.For the choice of the time scale τ, if the time scale is too small, that is, the observation angle of the signal is too small, it is impossible to extract more useful information and affect the subsequent operations.Conversely, if the time scale τ is too large, it will increase the memory requirements and computing time of  erefore, according to the above contents, in order to extract more useful information on the limited computer memory, the time scale τ is assumed to be 20 and the embedding dimension m to be 2 [67].In addition, when the time delay d is greater than 1, aliasing may occur.erefore, the time delay d in this paper is taken as 1. Figure 11 shows the MDE extraction results of the health condition of the bearing in the test bearing data set under three different loads.In this experiment, the extracted feature space is a matrix with a size of 720 × 20, in which 720 is the number of samples and the dimension of each feature vector is 20.Before using NKH-KELM for classification, the number of training samples is selected as 540, the number of test samples is selected as 180, and the classification label is selected as 4 (N, IR, B, and OR).Firstly, the training sample set and the novel krill herd algorithm are used to optimize the kernel function parameters σ and the error penalty factor C in the KELM.In the initialization of the krill group, the number of krills is selected to be 25, and the maximum allowable number of krill position updates is 200 [68].After OBL, 25 opposition krills are produced, so the individual number of whole krill groups is 50.After introducing the impulse operator, the value of the impulse coefficient has a great influence on the robustness of the krill herd algorithm.If the impulse coefficient is too small, the krill may fall into the local optimal and is difficult to rush out of the local optimal because of the fact the krill's impulse is too small.If the impulse coefficient is too large, then the impulse will be too large and the krill will always be out of the global optimal; that is to say, once the krill enters the global optimal, the krill herd algorithm will never reach the global optimal.When the impulse factor is greater than or equal to 1, the next move of the krill is likely to occur in a situation that will never be global optimal, which is the worst.erefore, when discussing the selection of the impulse coefficient in this paper, it should first be set to be less than 1 and then its minimum value is set to be 0.1 and the interval between values of two adjacent groups is set to be 0.1.e experiment is repeated five times, and the results are shown in Figure 12.
It can be seen from Figure 12 that when the impulse coefficient p is 1, the optimal fitness hardly changes after reaching 10 − 5 .
at is to say, the phenomenon that the impulse is too large to enter the optimal solution area and the next movement will rush out of the optimal solution area may happen, which indirectly proves the correctness of the  Complexity previous hypothesis.As can be seen from Figure 12, when the impulse coefficient p is 0.6 and krill moves 200 times, the average fitness of the optimal krill individual is 2.165 × 10 − 9 , the error penalty factor C after optimization is 15, and the width parameter σ of the kernel function is 0.53.Secondly, the optimized KELM is used to train and test the feature space extracted from MDE.And the input of the KELM is composed of 720 samples of MDE, of which 540 are used for training and the remaining 180 samples are used for testing.en, the process of training is according to the KELM's kernel mapping rules.Finally, the output of the KELM is obtained, that is, the prediction labels of the input sample.
ey were compared with the actual labels of samples, and the KELM was trained based on this basis.e experiment is repeated five times.e average fault diagnosis accuracy is shown in Table 3, and the fault diagnosis of the training and test of a certain experiment is shown in Figure 13.In Figure 13, the abscissa is the number of sample sets, the ordinate is the state of the sample, the circle represents the actual label of the sample, and the asterisk represents the prediction label of the sample.As can be seen from Figure 13, the classification accuracy of the training process and test process reached 99.2593% and 98.3333%, respectively.In addition, the correct rate reached 100% in the diagnosis of the normal bearing set, and the other three kinds of bearings all had a few cases of wrong diagnosis.is also explains the feasibility of the proposed method.Moreover, the sample of error diagnosis mainly focuses on the rolling body, which may be related to the working environment of the rolling body and other factors.12 Complexity In order to objectively test the performance of the proposed method, the same data set is imported into other algorithms, such as the K-nearest neighbor algorithm (KNN) [69], naive Bayes classification algorithm (NB) [70], symbol dynamic entropy + support vector machine (SVM) [27], radial basis network (RBF) [71], extreme learning machine (ELM) [31], and multilayer extreme learning machine [32].
e experiment is repeated five times, and the average value of the five times is taken as the evaluation index.e results are shown in Table 4 and Figures 14 and 15.
It can be seen from Table 4 and Figure 14 that the performance of the kernel extreme learning machine   Complexity optimized by the krill herd algorithm which is improved by OBL and impulse operator is better than that of the KELM optimized by the standard krill herd algorithm, which also proves the superiority of the novel krill herd algorithm proposed in this paper and the correctness of the algorithm introduced by OBL and impulse operator.Figure 15 shows the average value of the confounding matrix of the six algorithms in the five experiments, where the horizontal coordinate is a certain prediction label, the vertical coordinate is the actual label of the sample, and the number in the grid represents the proportion of the prediction label in the classification and the actual label.It can be seen from Figure 15 and Table 4 that the seven methods used for comparison and the method proposed in this paper reach 100% in the classification of healthy bearings.
e phenomenon may be that vibration signals of the health bearing and the vibration signal of the fault bearing are different greatly, the extraction of the vibration signal level of Case Western Reserve University is high, or the feature extraction method (MDE) is very superior, which leads to all kinds of algorithms on the healthy bearing reach the classification accuracy of 100%.Among the three fault types, NKH-KELM is superior to the other seven algorithms, which proves that NKH-KELM has certain advantages of fault diagnosis based on machine learning.In addition, in the fault diagnosis of the ball, the six different algorithms used for comparison have not more than 95%, among which NB has only 49%.It can also be seen from Figure 15 that the six algorithms overlap greatly in the classification of ball faults and outer ring faults.In other words, the ability of these six algorithms to distinguish the ball fault and outer ring fault is not high.However, the ball fault diagnosis accuracy of NKH-KELM reaches 97.1%, and the outer fault diagnosis accuracy reaches 98.4%, which indicates that NKH-KELM has certain advantages over other algorithms in distinguishing ball faults and outer ring faults.

Fault Diagnosis Experiment Based on XJTU-SY Bearing
Data Set 4.2.1.Description of Data.As shown in Figure 16, the bearing test is composed of an AC motor, a motor speed controller, a supporting shaft, a hydraulic loading system, etc. [72].e radial force is generated by the hydraulic loading system and applied to the bearing housing under the test.e speed is set and maintained by the AC induction motor speed controller.In this experiment, the bearing model is LDK UER204, and its specifications are shown in Table 5.

Fault Bearing Classification in XJTU-SY Bearing Data
Set Based on NKH-KELM.In order to test the generalization of the method proposed in this paper, the XJTU-SY bearing data set with the sampling frequency of 37.5 Hz was selected.
e last 5000 points in the first 108 files of Bearing2_1 (inner race fault), Bearing2_2 (outer race fault), and Bearing2_3 (cage fault) are selected as failure samples, and the first 5000 points in the first 36 samples of Bearing3_4 are taken as healthy samples.In other words, the input of this experiment  17.
In Figure 17, the abscissa is the number of sample sets, the ordinate is the state of the sample, the circle represents the actual label of the sample, and the asterisk represents the prediction label of the sample.N stands for the healthy bearing, IR for the inner race fault, C for the cage fault, and OR for the outer race fault.In this experiment, the fault classification accuracy reaches 95.5556%.e fault classification of the healthy bearing reaches 100%.And the fault diagnosis of the three kinds of fault bearings mainly occurs in the diagnosis of the cage.For example, the fault of the inner race or the outer race is misdiagnosed as the fault of the cage, or the fault of the cage is misdiagnosed as the fault of the inner race or the fault of the outer race. is indicates that the characteristics of the three faults are similar to some extent.However, on the whole, the method proposed in this paper still has certain advantages in fault diagnosis on the   en, the bearing data from Case Western Reserve University are imported into NKH-KELM for training and testing.Finally, the method proposed in this paper is compared with the other seven methods.It can be seen from the experimental results that NKH-KELM is superior to the other seven methods in the diagnosis of three fault types, which also proves the superiority of NKH-KELM in intelligent fault diagnosis of bearings.By comparing NKH-KELM with standard KH + KELM, it can be seen that the optimization ability of the novel krill herd algorithm is better than that of the unimproved krill herd algorithm.However, in the diagnosis of ball faults, although NKH-KELM is better than the other seven algorithms, the diagnostic accuracy of ball faults is significantly lower than that of the other two fault types.is phenomenon may be related to the low signal-to-noise ratio in the acquisition of the vibration signal of the ball, the small proportion of useful information on the ball extracted by the feature extraction method (MDE), or the poor performance of machine learning in the fault diagnosis of the ball.erefore, this question will be an important research direction in the future.In addition, in the era of big data, deep learning methods are becoming more and more perfect, so not only is there a great demand for intelligent fault diagnosis based on machine learning but also there will be a great demand for intelligent fault diagnosis based on deep learning, which will also be a key research direction.

Figure 7 :
Figure 7: Rolling bearing failure simulation test bench of Case Western Reserve University.

Figure 8 :
Figure 8: Time-domain graph of the original signal when the load is 0 hp.

Figure 9 :
Figure 9: Time-domain graph of the original signal when the load is 1 hp.

Figure 10 :
Figure 10: Time-domain graph of the original signal when the load is 3 hp.

Figure 11 :
Figure 11: Dispersion entropy of each scale of the vibration signal of the test bearing data set under three kinds of loads.(a) Load � 0 hp.(b) Load � 1 hp.(c) Load � 3 hp.

6 Figure 12 :
Figure 12: Optimal fitness value corresponding to the krill group movement under different impulse coefficients.
Multiscale coarse granulation of the time series with scale 3.
dynamic behavior of signals.Moreover, as can be seen from the time-domain graphs in the previous section, the tiny dynamic behavior of signals may bring about changes in the fault type or fault diameter.On the contrary, if the embedding dimension m is too large, although it can obtain more reliable entropy results, it will greatly increase the memory requirements and computing time of the computer.

Table 2 :
Numerical analysis of the healthy bearing and faulty bearing with a fault diameter of 0.007 inch.

Table 3 :
Training accuracy and test accuracy of each type of bearings.

Table 4 :
Average classification accuracy of different algorithms.Figure 14: Bearing diagnosis of the KELM optimized by the standard krill herd algorithm: results of NKH-KELM with classification accuracy � 88.3333%.
Finally, the test sample set was imported into the trained model to test the performance of the model.e test results are shown in Figure

Table 5 :
Parameters of testing bearing.