Improved Butterfly Optimizer-Configured Extreme Learning Machine for Fault Diagnosis

An efficient intelligent fault diagnosis model was proposed in this paper to timely and accurately offer a dependable basis for identifying the rolling bearing condition in the actual production application..emodel is mainly based on an improved butterfly optimizer algorithm(BOA-) optimized kernel extreme learning machine (KELM) model. Firstly, the roller bearing’s vibration signals in the four states that contain normal state, outer race failure, inner race failure, and rolling ball failure are decomposed into several intrinsic mode functions (IMFs) using the complete ensemble empirical mode decomposition based on adaptive noise (CEEMDAN). .en, the amplitude energy entropies of IMFs are designated as the features of the rolling bearing. In order to eliminate redundant features, a random forest was used to receive the contributions of features to the accuracy of results, and subsets of features were set up by removing one feature in the descending order, using the classification accuracy of the SBOAKELM model as the criterion to obtain the optimal feature subset. .e salp swarm algorithm (SSA) was introduced to BOA to improve optimization ability, obtain optimal KELM parameters, and avoid the BOA deteriorating into local optimization. Finally, an optimal SBOA-KELMmodel was constructed for the identification of rolling bearings. In the experiment, SBOA was validated against ten other competitive optimization algorithms on 30 IEEE CEC2017 benchmark functions. .e experimental results validated that the SBOAwas evident over existing algorithms for most function problems. SBOA-KELM employed for diagnosing the fault diagnosis of rolling bearings obtained improved classification performance and higher stability. .erefore, the proposed SBOA-KELM model can be effectively used to diagnose faults of rolling bearings.


Introduction
As a core component of mechanical, the rolling bearings are widely used in rotating machinery types such as wind turbines, aeroengines, ships, and automobiles. However, it is a considerable probability of a mechanical failure due to a bearing failure. Bearings cause failures in rotating machinery that is over 30%. ere are various faults of rolling bearing, including outer race, inner race, and ball, in general due to a long-term complex environment. When those faults get serious, they may cause a sudden breakdown of the machine, even the entire system, leading to substantial financial losses, and even cause casualties among workers. Overmaintenance of rolling bearings can lead to increased corporate maintenance costs. However, insufficient maintenance can easily lead to unexpected production accidents. Intelligent fault diagnosis technology formed by combining fault diagnosis and computer technology provides a useful reference for fault detection and equipment maintenance. erefore, it is vital to apply intelligent fault diagnosis method to rolling bearings. Intelligent fault diagnosis method has been widely used in bearing fault diagnosis. Xu et al. [1] proposed a new expert system based on belief rules (BRB) built from multiple activated BRB subsystems in the meantime for diagnosing whether the marine diesel engines were faulty. Pang et al. [2] proposed a novel fault pattern classification method based on an ensemble kernel extreme learning machine (KELM) that fuses features of time and frequency domains into intrinsic features that are low dimensional using local and global principal component analysis. Li et al. [3] proposed a novel machine fault diagnosis method that can efficiently learn discriminative representations with input data's local and global geometry. Kaplan et al. [4] proposed a new approach based on texture analysis that converts vibration signals to grayscale images to fuse non-whole binary patterns and texture features for bearings fault diagnosis. Deng et al. [5] proposed a modified classification and regression tree (CART) algorithm to improve fault diagnosis speed by decreasing the numbers of iteration in computation to guarantee accuracy. Zhao et al. [6] proposed a new deep residual network based on multiple wavelet coefficients fusion for fault diagnosis. Ma et al. [7] proposed a new fault detection and diagnosis (FDD) method that built an overcomplete dictionary pair based on a dictionary pair learning strategy from features extracted from the wavelet transform for motor fault diagnosis. Li et al. [8] suggested a novel feature extraction method that can combine learnable modules of multiply LS-SVMs in the structure of deep stacking based on representation learning (S-RL) to extract features for fault diagnosis. Zheng et al. [9] proposed an improved MPE-based feature extraction method to extract the fault features from the vibration signal of rolling bearing. ey applied the PSO-based SVM to fault diagnosis. Deng et al. [10] proposed an optimized deep belief network model based on an improved quantum-inspired differential evolution algorithm for realizing the fault diagnosis of rolling bearings. Zhao et al. [11] proposed a new high-order differential mathematical morphology gradient spectrum entropy method to extract rolling bearing's vibration signal features. Zhao et al. [12] proposed a novel method that applied principal component analysis and broad learning system to fault diagnosis. Deng et al. [13] proposed a novel intelligent diagnosis method based on LS-SVM with enhanced PSO algorithm for fault of rolling bearing.
e KELM is one of the ELMs which is constructed based on kernel tricks. e capability of KELM is mainly affected by two critical parameters: one is the penalty coefficient, and the other is the kernel width. At present, researchers have proposed many effective methods to determine two critical parameters in KELM. Lu et al. [45] proposed to use PSO to optimize the parameters of KELM for obtaining the optimal model. Luo et al. [46] developed a multistrategy improved GOA-based KELM for bankruptcy prediction. Wang et al. [47] planned to use a chaotic FOA optimized KELM to diagnose sepsis. Tian et al. [48] utilized the quantum-based PSO optimized KELM for activity recognition. Baliarsingh et al. [49] offered a weighted-chaotic SSA for simultaneously optimizing KELM parameters and features in the genomic data. Hu et al. [50] developed cross-validated PSO for training an optimal KELM for fault diagnosis of wind turbine gearbox. Pani and Nayak [51] suggested employing KELM based on the chaotic GSA to prognose the solar irradiance. Luo et al. [52] recommended using GWO-MFO to achieve the optimal KELM model for diagnosing somatization disorder. Li et al. [53] offered a novel method that uses the improved binary GWO wrapped with KELM for disease diagnosis. Bisoi et al. [54] proposed using DE to train an optimal KELM to predict stock price and movement. Wang et al. [55] proposed a chaotic MFO used to optimize the critical parameters of KELM to obtain an optimal KELM model for medical diagnosis. Wang et al. [56] proposed obtaining an optimal KELM model by using GWO to predict the bankruptcy of the enterprise. Heidari et al. [57] proposed an improved GWO based on a multistrategy enhanced using effective exploratory and exploitative mechanisms. Chen et al. [58] proposed using chaotic and mutative BFO to seek the optimal parameters of KELM for classification tasks.
In this study, an improved butterfly optimization algorithm-(BOA-) optimized KELM model (SBOA-KELM) was proposed and applied to bearing fault diagnosis. First, the energy entropy features are extracted from the raw vibration signals by CEEMDAN. e original vibration signals were decomposed into multiple IMF components by CEEMDAN. e energy entropy of the IMFs was calculated to construct an energy feature vector. Second, to avoid data redundancy caused by smaller energy features and increase calculation, a random forest was used to evaluate feature's importance and select informative features as new feature vectors. ird, the proposed SBOA-KELM method was used for fault feature classification. Finally, the proposed SBOA-KELM was verified and compared with several representative approaches. e experimental results presented that the proposed technique effectively diagnosed the bearing faults. e average classification accuracy was much improved. Table 1 lists the nomenclatures in the paper. e whole structure of the study is structured as follows. Section 2 explains the data collection and gives a brief description of the data collection, CEEMDAN, random forest, SBOA, and proposed SBOA-KELM model. e experimental setup is termed in Section 3. Section 4 explains the 2 Complexity results of SBOA on benchmark functions and SBOA-KELM on the bearing dataset. e conclusions and future works are delivered in Section 5.

Data Collection.
e rolling bearing data on the website of the Bearing Data Center of Case Western Reserve University were employed in this study to check the feasibility and utility of the proposed method. e URL of the website of the Bearing Data Center is https://csegroups.case.edu/ bearingdatacenter/home. e rolling bearing model is 6205-SKF. e structural parameters of rolling bearings are shown in Table 2. First, damage points are artificially set on the inner race, outer race, and ball of rolling bearings. Under the condition of input shaft speed n � 1797 r/min and the acquisition frequency of 12 kHz, the bearing's vibration signals in four states, normal state, inner race fault, outer race fault, and ball fault, are gathered. ere are 40 sets of sample data gathered from the vibration signals of each state. e length of each sample data is 1200.

SBOA-KELM Method.
e flowchart of the proposed SBOA-KLEM is shown in Figure 1. e whole flow includes feature extraction based on CEEMDAN energy entropy, feature selection based on random forest, and classification based on BOA-KELM. e first step is to extract features, using the CEEMDAN method to decompose the raw vibration signals of bearing into multiple IMFs, computing each IMF's energy entropy and normalizing. e second step is to select the feature from the CEEMDAN energy entropy to reduce data redundancy. e third step is to optimize the two critical parameters of the KLEM using SBOA. en, the optimal parameters and feature combination are used to train an optimal KELM. Finally, the optimal KELM classifier is diagnosed with the rolling bearing to determine the bearing's working condition. e standard 10-fold cross model is used to divide the data to obtain a more exact and unbiased experimental result, which many researchers often adopt.

Feature Extraction.
Fault feature extraction is a critical step in the fault diagnosis of rolling bearings [59]. When rolling bearings are in abnormal faults, vibration signals are mostly nonstationary and nonlinear characteristics, and they are interfered with intense noise [60].
After that, comparing with the effect of EEMD on feature extraction, the CEEMDAN has a more excellent performance to preserve the original signal and eliminate noise and extract bearing fault features more accurately and timely. e CEEMDAN is an adaptive time-frequency signal analysis method developed based on EEMD, which can effectively extract fault frequency characteristics. Based on previous research, CEEMDAN has a better effect on signal decomposition than EEMD [61].
It needs to be quantified [62] to make the fault characteristic information after CEEMDAN decomposition more apparent. When different faults occur in rolling bearings, the amplitude energy within the vibration signal's frequency range will change to varying degrees. erefore, the feature matrixes of bearing faults consist of the energy entropy of the IMFs. e method of feature extraction based on CEEMDAN energy is as follows: Step 1: CEEMDAN will decompose the vibration fault signal of the rolling bearing to obtain multiple IMFs: Step 2: calculate the amplitude energy E 1 , E 2 , . . . , E n of each IMF component: (2) In equation (2), N is the number of sampling points of the j-th IMF component.
Step 3: assuming that r (t) can be ignored, the total energy of the signal obtained is (3) Step 4: in order to avoid that the IMF components in the partial amplitude energy concentration control the relatively weak IMFs, the amplitude energy of each order IMF is normalized: erefore, the corresponding CEEMDAN energy entropy (EN) can be denoted as Optimal values of (c, γ) Average the prediction results on ten independent tests  In equation (5), p j is the proportion of the j-th IMF amplitude component in the total energy.

Feature Selection Based on Random Forest.
Random forest is a multiclassifier integrated algorithm that can obtain higher classification accuracy in a short time with fewer training samples [63]. For multitime domain feature sets, random forest classifiers can reduce dimensionality for features and reduce overfitting. e algorithm flow is as follows: Step 1: the sample data's energy entropy features are input into the random forest to calculate feature importance and sort features in the descending order of feature importance Step 2: according to a certain deletion ratio, delete from the feature set to construct a new feature set Step 3: input the new feature set into a new random forest and calculate the importance of each feature, and then sort in the descending order and repeat Steps 2 and 3 until a specified number of features are left Step 4: each feature set corresponds to a random forest; calculate the corresponding out-of-bag error rate, and take the feature set with the lowest out-of-bag error rate as the last selected feature set

Classification Based on BOA-KELM.
Optimization can be formulated in many ways, including multiobjective, fuzzy variables, larger-scale, or robust optimization. One way to deal with a problem is to have a single objective and hybrid methods for solving it [64][65][66][67][68][69][70][71][72]. In this study, an SBOA-KELM was constructed by improving BOA by introducing the SSA. e resultant SBOA was taken to handle the problem of parameter optimization of KELM for the fault classification task. BOA [73] was a novel nature-inspired optimization algorithm based on the food foraging of butterflies. Butterflies are the main object of BOA's search and work efficiently to complete the optimized search of the best working solution in space [74].
In this study, KELM was to identify the fault types of rolling bearings. In KELM, the output results are determined by calculating the kernel function without the hidden-layer output matrix. Compared with SVM, KELM performs faster learning speed and has better generalization performance [75]. erefore, KELM was chosen to diagnose the faults of rolling bearings in this study.
For the past few years, many scientific researchers were continually exploring the application of KELM in fault diagnosis of rotating machinery. Hu et al. [50] proposed a fault diagnosis method that extracted time-domain features from vibration signal by using a wavelet packet transform (WPT) filter and diagnosed gearbox-related faults using a cross-validated particle swarm optimized-(CPSO-) based KELM. Lei et al. [76] proposed a new fault classification method that combined KELM with the intrinsic timescale decomposition (ITD) technique to identify the tool wear conditions. Long et al. [77] proposed a novel fault patterns methodology of wind turbine gearbox that combines a cloud bat algorithm (CBA) with KELM. Wang et al. [78] proposed a novel bearing intelligent fault diagnosis method that optimized KELM parameters through the krill herd algorithm (NKH). In this study, the two critical parameters of KELM are optimized by SBOA to improve the classification accuracy of fault diagnosis. e flowchart of SBOA is shown in Figure 2.

Proposed SBOA-KELM.
After normalizing the CEEMDAN energy entropy features, the random forest was established to obtain the optimal feature subset that reduces data redundancy. To promote the classification accuracy of fault identification and the model's generalization ability, the SBOA model is proposed to obtain the optimal key parameters value of KELM. e steps of feature selection based on random forest and parameter optimization of the SBOA-KELM model are as follows: Step 1: normalize the energy entropy feature data, and the range of normalized data is [0, 1].
Step 2: the importance of the normalized features is evaluated by random forest through out-of-band errors, and after setting the threshold, the optimal feature subset is selected.
Step 3: the optimal feature subset is separated into the training set and test set by 10-fold cross-validation (CV) scheme.
Step 4: SBOA and KELM optimize the two critical parameters of KELM that is trained on the training set by inputting with the optimal feature subset through the inner 5-fold CV scheme.
Step 5: evaluate the accuracy of KELM on the test data. If the value of K is less than 10, go to Step 4.
Step 6: average the prediction result on ten independent tests as the output result.

Experimental Setup
e vibration signal was decomposed by CEEMDAN using the pyEMD toolkit in Python. e development tool uses PyCharm. After extracting the energy entropy feature from the IMF component decomposed by the vibration signal, it is saved as a CSV file to prepare for the next feature extraction and state recognition. e methods mentioned in this article, including SBOA and KELM, were implemented using MATLAB.
Data were scaled between 0 and 1 before extracting features. To make sure of fair results, classification accuracy is evaluated by the stratified 10-fold CV. It means that the data were segmented into ten parts, of which five were used as training datasets of the SBOA-KLEM model and the remaining one as the test dataset. e entire flow was rotated ten times, with the average of the 10 test data being the final result. e number of the maximum iterations and swarm size were set at 50 and 20, respectively. e seeking range for the two critical parameters in KELM is set as follows:

Complexity 5
To evaluate the validity of the SBOA-KELM model, classification accuracy was analyzed and verified by the 10fold CV procedure.
is can ensure there is no preference en route for a method because of a pro in its testing plan [92][93][94][95][96]. We have used the IEEE CEC2017 benchmark functions as a test function. In the experiment, the number of particles was set to 30, the size was set to 30, and the maximum number of evaluations was set to 300,000. Each algorithm was performed independently 30 times to take the average.
To verify and test the capability of BOA, 30 different benchmark test functions were simulated. ese benchmarking features identify various features of the algorithm, such as rapid convergence, speed of convergence, ability to step outside of the partial optimization, and ability to avoid premature convergence [97,98]. Table 4 details the test results of the SBOA algorithm and the comparison algorithm on the benchmark function, presenting the average adaptation value and standard deviation of the algorithm run independently 30 times on the benchmark function, respectively. e overall effect of the proposed SBOA is better than its other counterparts. e Friedman test [99] was used to test the algorithm performance. is test is based on the algorithm's strengths and weaknesses. e table has shown that SBOA has an ARV value of 1.6, superior to all the other competitive algorithms. e statistical results of each optimization task indicate that SBOA has a faster convergence rate. Figure 3 shows the convergence trend of the algorithm over nine benchmark functions. It is possible to know from the figure that the convergence trend of this paper's algorithm on these nine benchmark functions is superior to other comparative algorithms. In the convergence trend diagram, this paper's improved convergence trend is significantly better in the middle of convergence. e experimental results demonstrate the effectiveness of SSA in BOA. In each test case, the final output of the SBOA best meets the       Table 5. It shows that CEEMDAN is better than EEMD in orthogonality and completeness, although slightly larger in time, but negligible for modern computers.
To illustrate that the CEEMDAN energy entropy can reflect the four states of the roller bearing, the CEEEMDAN energy entropy values are calculated for the different states of the roller bearing. As shown in Table 6, the results show that the CEEEMDAN energy entropy values of bearings in different states are different. e timescale of the IMF components is relatively average when the bearings are in the normal state. Hence, the CEEEMDAN energy entropy value of bearings in the normal state is the largest. e other three states of the failure's energy entropy are not the same because the amplitude energy changes to a different degree at different parts of the failure.
A random forest is constructed to perform feature selection on energy entropy feature vectors to remove redundant features. e random forest is constructed to evaluate the contribution of the seven energy entropy features for discriminating the fault status. e result is shown in Figure 5. e features were arranged in ascending order of contribution with feature numbers E 7 , E 6 , E 3 , E 5 , E 2 , E 1 , and E 4 . At a time, one feature was removed from the feature set to form a feature subset; thus, a subset of seven features is constructed. A subset of these features is input to the SBOA-KELM model for fault identification and calculates the classification accuracy.
In this experiment, the SBOA-KELM model's validity is evaluated, and detailed results are shown in Table 7. e average accuracy of classification obtained by SBOA-KELM is 100% in the table. Moreover, we observed that the SBOA had achieved the optimal parameters of KELM on the optimal feature space obtained from the random forest, which indicates that the SBOA has a good optimization capability for searching the optimal values. To guarantee this paper's proposed technique's validity, we compared SBOA-KELM with KELM and BOA-KELM models. e results are shown in Figure 6. As shown, the classification accuracy of SBOA-KELM is better than both KELM and BOA-KELM, where the average classification accuracy of KELM is 98.12%, and BOA-KELM is 99.38%. e classification accuracy of SBOA-KELM has reached 100%, which is 1.88% better than the average accuracy of KELM and 0.62% better than the average accuracy of BOA-KELM, overcoming the underlearning problem on a small sample and has shown more robust generalization performance.

Conclusions and Future Works
In this study, an intelligent fault diagnosis model based on SBOA-KELM is established to identify the rolling bearings' running state. is approach's innovation introduces SSA into the BOA to get the right balance between the exploration and exploitation of BOA for the first time. Compared with ten other optimization algorithms, it was found that the presented method achieves better solution quality and smaller standard deviation on 30 IEEE CEC2017 benchmark problems. SBOA also showed the better capability to obtain a better combination of KELM parameters than the original BOA. e experimental results showed that the proposed SBOA-KELM model had performed more accurately and stably than its counterparts in recognizing rolling bearings. For future work, several aspects need to be further explored. e proposed SBOA-KELM method is intended to be used in the future for other aspects of rolling bearings, such as fault warning, real-time fault diagnosis, and live monitoring. e SBOA-KELM can also be combined with other feature extraction methods, such as high-order spectral analysis, inverse spectral analysis, wavelet transformation, and variable modal decomposition, further to enrich the fault diagnosis methods for rolling bearings. Moreover, the proposed method can be further applied to other scenarios including differentiation of malignant and  benign thyroid nodules [100], diagnosis of Parkinson's disease [26-28, 36, 101], diagnosis or prognosis of paraquat-poisoned patients [22,23,102,103], identification of poisoning status [104,105], RNA secondary structure prediction [106], prediction optimization of cervical hyperextension injury [107], diagnosis of erythematosquamous diseases [108], other medical diagnosis problems [15,25,53,55,[109][110][111], decision-making methods [112][113][114], parameter optimization [115], deep learning [116][117][118], image segmentation [119,120], image marbleization [121], image colorization [122,123], image editing [124], bankruptcy prediction [40,46,56,125], face recognition [126], neural network configuration [127], information fusion [128], social evolution modelling [129], text clustering [130], recognition of facial microexpressions [131], unsupervised band selection [132], and other problems [18,30,34,35,52,133,134].

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of the article.

Authors' Contributions
Weibin Chen, Changcheng Huang, and Huiling Chen contributed equally to this work.