An Improved Bearing Fault Diagnosis Model of Variational Mode Decomposition Based on Linked Extension Neural Network

In bearing fault diagnosis, due to the insufficient obtained supervised data and the inevitable noise contained in the vibration signals, the problem of clustering bearing fault diagnosis with imbalanced data containing noise is caused. Thanks to the ability to quickly and fully learn boundary information in small samples, the extension neural network-type 2 algorithm (ENN-2) has the potential in imbalanced data clustering and has been gradually applied in fault diagnosis. Therefore, in order to improve the unstable clustering performance of ENN-2 caused by its heavy dependence on input order of samples, a novel algorithm called linked extension neural network (LENN) is developed by redesigning the correlation function and its iterative method, which greatly reduces the clustering iteration epochs of the algorithm. In addition, an evaluation index of clustering quality for this novel algorithm, extension density, is also proposed. After that, a bearing fault diagnosis model of variational mode decomposition (VMD) based denoising and LENN is proposed. Firstly, VMD is used to get intrinsic mode functions (IMFs), and the correlation coefficients of IMFs are calculated for signal denoising. Secondly, the features are extracted from denoised signals and selected by PCA algorithm, and the fault diagnosis is finally completed by LENN. Compared with ENN-2, K-means, FCM, and DBSCAN based models, the proposed model identifies the faults with different severities more accurately and achieves superior diagnostic ability on different imbalance degrees of datasets, which can further lay a foundation for clustering fault diagnosis based on vibration signals.


Introduction
Bearing is one of the most common connecting parts in rotating machinery, which is more likely to break down because of wear, fatigue, corrosion, or overload. erefore, diagnosis timely and accurately of bearing conditions is of great significance to ensure the mechanical operation steady and reliable. Much study in recent years has focused on bearing fault diagnosis based on vibration signals, including signal acquisition and noise reduction, feature extraction and selection, and fault recognition. However, in industry, diagnostic data is often derived from monitoring signals, bringing great difficulties to record the machinery conditions by frequent downtime checking or manual labeling, which is time-consuming and laborious and resulting in insufficient labeled data for fault diagnosis [1]. Moreover, the number of obtained fault samples is always far less than that of normal samples from monitoring signals, generating the diagnostic problem of imbalanced data.
Clustering analysis is especially suitable for fault recognition when there is no sufficient labeled data. Because of the nonlinear and unstable characteristics of bearing vibration signals, scholars preserve in their attempts to construct clustering diagnosis models with stronger identification ability. For example, after processing the data by ensemble empirical mode decomposition (EEMD) and linear discriminant analysis (LDA), Hou et al. [2] used Gath-Geva clustering algorithm (GG) to identify the faults of rolling bearing and got a satisfactory clustering result with better intraclass compactness. Chang et al. [3] achieved 96% accuracy of permanent magnet synchronous motors demagnetization fault diagnosis by auto-encoder and K-means algorithm. In addition, Li et al. [4] integrated K-means in the neural network architecture for unsupervised learning and proposed a deep representation clustering-based diagnosis model to address the data sparsity issue in data-driven machinery fault diagnosis. Also, K-means was utilized together with K-nearest neighbor algorithm (KNN) to identify a transformer's fault category by cumulative votes [5]. On the other hand, for the algorithms which are unnecessary to set the number of fault categories before clustering, the study of Li et al. [6] describes a method to generate clustering template of rolling bearing so as to reduce the effect of noise on diagnostic accuracy using density-based spatial clustering of applications with noise (DBSCAN). is algorithm was also widely used in wind turbine condition monitoring [7] and diagnosis [8], photovoltaic power station fault detection [9], bolts with mission pins on transmission lines detection [10], and thermal runaway diagnosis of battery systems [11]. Moreover, Wei et al. [12] adopted affinity propagation clustering algorithm (AP) and a novel adaptive feature selection technique to identify different fault categories and severities of bearings successfully, and another bearing fault diagnosis model based on expectation maximization algorithm (EM) and wavelet packet was proposed by Zhang et al. [13] for coal cutter. Other clustering algorithms, including spectral clustering [14], fuzzy C-means (FCM) [15], clustering by fast search and find of density peaks algorithm (CFSFDP) [16], and extension neural network-type 2 (ENN-2) algorithm [17][18][19] were also applied to diagnosis. In conclusion, the clustering algorithms represented by K-means need to know the number of fault categories before clustering, which is contrary to the fact that clustering analysis does not require prior knowledge; however, the clustering algorithms represented by DBSCAN do not need to know the number of fault categories, but suffer a complex parameter adjustment process during training. erefore, there remains a need for an efficient clustering method with less prior knowledge, simple parameter tuning process, and stable performance.
At the same time, considering the vibration signals used for fault diagnosis not only contain the running state signals of bearings but also contain a lot of aliasing signals with noises, signal denoising methods have generated considerable recent research interest. e commonly used denoising methods mainly include wavelet threshold denoising method, empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD), and local mean decomposition (LMD). For example, Komaty et al. [20] introduced a signal-filtering method of EMD and a similarity measure. In their studies, white Gaussian and colored noises were almost removed from the signals by selecting the decomposed modes according to the similarity between the estimation of the probability density function (pdf) of the input signal and that of each mode, and combined EEMD with grey theory, Jia et al. [21] removed the noise of signals by evaluating noise levels of decomposed components of signals by grey relational analysis and selecting the noise-dominant components by grey model. Yang et al. [22] proposed an adaptive signal denoising method based on LMD. However, these decomposition methods have end-effect and modal aliasing phenomena and are more sensitive to sampling frequency, resulting in pretty large decomposition error. To overcome the defects above, Dragomiretskiy and Zosso [23] in 2014 have proposed variational mode decomposition (VMD), which is a new time-frequency analysis method with adaptive signal. Based on VMD, some research combined this method with other algorithms for signal denoising, such as singular value decomposition (SVD) [24], data-driven time-frequency analysis (DDTFA) [25], and wavelet threshold noise reduction [26], and there were also many studies that selected the decomposed modes in some evaluation methods and reconstructed the signal after VMD for noise reduction, such as kurtosis criterion [27], Bhattacharyya distance [28], and a novel parameter called signal clarity proposed by Li et al. [29]. In addition, Wang et al. [30] used VMD innovatively to eliminate outliers and noise points in features extracted from signals so as to achieve the purpose of signal-filtering and denoising.
However, few researchers have addressed the problem of bearing clustering fault diagnosis on imbalanced data with noise at the same time. us, in this paper, close attention is paid to develop an effective clustering algorithm on imbalanced data and construct a bearing fault diagnosis model dealing with the insufficient data contained noise. In our study, an improved clustering algorithm of ENN-2, called linked extension neural network (LENN), is proposed firstly, and based on this algorithm and VMD-based denoising method, a novel bearing fault diagnosis model is presented and applied to analyze the fault conditions and severities of bearings. To validate the effectiveness of the proposed algorithm and the model, three comparative experiments are designed and conducted on commonly used artificial clustering datasets and real bearing fault signals. e results manifest that the proposed model yields higher identification accuracy of minority fault clusters on imbalanced data with noise comparing with the models based on ENN-2, K-means, fuzzy C-means (FCM), and DBSCAN. Our study provides a promising method for machinery fault diagnosis based on insufficient labeled signals with imbalance, permitting an easier parameter adjustment process with less prior knowledge. e rest of this paper starts with the novel LENN algorithm in Section 2. Section 3 provides a brief description of the proposed model, and the proposed algorithm and model are experimentally verified in Section 4. In Section 5, the concluding remarks are drawn.

Linked Extension Neural Network
Extension neural network-type 2 algorithm (ENN-2) is a new clustering algorithm based on extension theory [31]. With no need to set the number of clusters manually in advance, ENN-2 shows good clustering ability and fast convergence speed in simple construction. But in fact, the performance of ENN-2 relies heavily on the initial points and correct input order of the samples. To overcome these deficiencies, we develop a novel clustering algorithm called linked extension neural network (LENN).

Network Structure.
Following the form of ENN-2, the structure of LENN contains only two layers, which is shown in Figure 1. e number of input layer nodes depends on the feature dimension of data, and the number of output layer nodes is determined by the number of clusters. Between the two layers, the upper and lower bounds of the clusters connect the neurons as the connection weights, and output neurons are successively constructed in the process of iteration (represented by color shades in Figure 1), with only one node activated at a time to indicate the clustering result.

Correlation Function in ENN-2.
In ENN-2, the correlation function ED based on the extension distance is used to measure the distance between a sample and a target cluster. e extension distance in extension theory describes the distance between a point x and an interval V � 〈a, b〉 quantitatively, which is defined as where a and b are the lower and the upper bounds of V, respectively. Given the center of the kth cluster is Z k � [z k1 , z k2 , · · · , z kn ], the boundary of the kth cluster can be represented by introducing a hyperparameter to measure the distance between the center and the ideal boundary as Also, based on the definition of extension distance, the correlation function ED between a sample X � [x 1 , x 2 , · · · , x n ] and the boundary W k of the kth cluster is defined as As shown in Figure 2, ED measures the extension relationship between a feature and its boundary. From Figure 2, it can be seen that, for the jth dimension of the kth cluster, when x j ∈ 〈a kj , b kj 〉, ED kj ≤ 1.
For a sample X � [x 1 , x 2 , · · · , x n ] with n-dimensional features, the sample X could be classified into the kth cluster if ED k ≤ n, k � 1, 2, . . . , K. (4) anks to this property of ED, the algorithm can estimate a sample belongs to which cluster and update the boundary and the center of the corresponding cluster to revise its information for iteration. erefore, ENN-2 does not require the number of clusters K before learning and can obtain better clustering results by only adjusting the unique hyperparameter λ.
However, since the input order of samples determines the updating direction of clusters' boundaries and centers in the iteration process, ENN-2 is greatly affected by the initial point selection and the input order of samples and shows unstable clustering performances. erefore, it is necessary to improve this algorithm.

Improved Correlation Function in LENN.
Different from ENN-2, each sample could be considered as a center during iteration in LENN. Take X � [x 11 , x 12 , · · · , x 1n ] for example, its boundary W X1 can be represented as (4) with the hyperparameter λ: In order to measure the correlation distance between the sample X � [x 21 , x 22 , . . . , x 2n ] and W X1 , the new correlation function is defined as e new correlation function is plotted in Figure 3. For the jth feature of the two samples, when X 2j ∈ 〈a X 1 j , b X 1 j 〉, ED X 1 j,X 2 j ≤ 1, and taking all features into account, the sample X 2 could be considered to belong to the same cluster as sample X 1 if   Computational Intelligence and Neuroscience 3 2.3. Learning Algorithm of LENN. e learning process of LENN is an unsupervised learning. It takes a dataset X � [x 1 , x 2 , · · · , x m ] with samples as the input, and after calculating the correlation distances between samples level by level, the clustering results of the dataset are finally output in just one epoch. e specific learning steps are as follows: (1) Set an optimal hyperparameter λ in (0 (2) Input a sample X i � [x i1 , x i2 , · · · , x im ] randomly and mark the cluster which belongs to as k � 0. Calculate the improved correlation function ED between X i and all the other unmarked samples according to equation (5) and mark the samples which meet the requirements of ED ≤ n as k � 0.
(3) For all the qualified samples in step (2), each sample is taken as the center to traverse, and EDs between this sample and all the remaining unmarked samples are calculated. Similarly, the samples that meet the condition of ED ≤ n are also marked as k � 0 until no qualified samples left. (4) A sample is randomly generated from the remaining unmarked samples for input to create a new cluster k � k + 1. Repeat steps (2) and (3). (5) e learning process is finished until all samples are marked.
At the end of the iteration, if a cluster contains too few samples, it can be regarded as noise.
e iterative approaches of ENN-2 and LENN are both graphically presented in Figure 4. During the iteration, ENN-2 needs to update the central coordinates each time (represented by the orange dots), and the updating direction is significantly affected by the input order of the samples. Ideally, the input order of the samples should be sorted according to the distance between samples from small to large, which is pretty difficult to ensure for the unpretreated messy datasets. As can be seen from Figure 4(a) clearly, sample 6 is closer to the center corresponding to sample 4 than sample 5. If sample 6 is input for calculation first according to (3), the result meets the requirement of (4), indicating sample 6 and samples 1∼4 belong to the same cluster, and then the final clustering results of the whole dataset contain 2 clusters. But, if input sample 5 first for calculation, the result of (3) does not satisfy (4) because of the distant relationship of sample 5 and the current center, and then a new cluster is created for iteration, resulting in the final results of 3 clusters. However, in Figure 4(b) of LENN, one specific sample is regarded as the center each time, and all the qualified samples which satisfy (7) with the sample are found and marked in this iteration. Taking two-dimensional eigenspace as an example, the learning essence of LENN is to find the samples consecutively which fall in the square constructed by the initial center sample with 2λ as the side length. All the qualified samples are classified into the same cluster with their center. And then, the next iteration begins with a subsample of the cluster. Finally, all qualified samples of the same cluster are found by this iterative linkage method. erefore, in Figure 5, samples 1, 4, 6, 5, and 10 are successively taken as the centers for iteration, and the final clustering result contains only 2 clusters with samples 1∼9 belonging to the same cluster.
According to the learning process of the algorithm, LENN has the following remarkable advantages: (1) Based on extension distance, LENN defines a new approach to categorization by distance calculation. (2) ere is no need to preset the number of clusters as in ENN-2; in addition, it is not necessary to initialize clustering center. (3) e improved algorithm only needs one epoch to complete the clustering process and converges faster. (4) LENN is not sensitive to the initial center and the input order of the samples and preserves more stable clustering ability than ENN-2.
Nevertheless, LENN is very sensitive to the hyperparameter λ, so it is necessary to select the optimal λ before learning.

Parameter Selection Method.
In LENN, the selection of the hyperparameter λ seriously affects the final number of clusters and the accuracy of clustering results. As shown in Figure 5, for a smaller λ (such as λ � 7.2 in Figure 5(a)), the constructed squares will be smaller with fewer qualified samples contained, resulting in more clusters in the end, and for a bigger λ (such as λ � 8.7 in Figure 5(b)), the constructed squares will be likewise bigger with more qualified samples contained, and fewer clusters are produced finally.
For clustering algorithms, silhouette coefficient is often used for evaluation with no real labels. However, this index is more suitable to analyze the clustering effectiveness of balanced data [32]. Considering this paper is primarily concerned with imbalanced data clustering problem, a novel evaluation index extension density EDe is developed to tune the hyperparameter λ based on extension distance, which is defined as

Computational Intelligence and Neuroscience
where m k is the number of samples in the kth cluster, Z kj is the center of the jth feature, and a kj and b kj represent the lower and upper bounds of the kth cluster respectively, a kj , b kj and can be computed by Typically, EDe declines with the increasing λ, and the optimal λ lies in the turning point of the curve.

Signal Denoising.
Bearing vibration signals tend to present nonlinear and nonstationary characteristics with noise inevitably. Without signal denoising, the outliers in raw data will be transferred into the feature space through feature extraction and affecting the diagnostic results of the model. Compared with empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD), variational mode decomposition (VMD) can effectively extract each frequency component of the signal and solve the problems of mode mixing and white noise. erefore, VMD-based method is used for noise reduction in this paper.

e Principle of VMD.
VMD is a variational problem solving process based on classical Wiener filter, Hilbert transform, and mixing, which can be written as the following constrained optimization form: where K is the number of modes to be decomposed, δ(t) represents Dirac function, * means the convolution operator, and u k and ω k stand for the kth intrinsic mode function (IMF) and center frequency after decomposition. e solution process of this problem is as follows: (1) Transform the constrained variational problem to a nonconstrained variational problem by introducing the quadratic penalty factor α and Lagrange multiplier λ(t): Computational Intelligence and Neuroscience 5 (2) Solving the minimization problem of equation (12), alternating direction method of multipliers (ADMM) is adopted to seek the saddle point of the augmented Lagrange expression by alternatively updating u n+1 k , ω n+1 k , and λ n+1 , and u n+1 k can be obtained by where ω k equals to ω n+1 k , and i u i (t) equals to i≠k u i (t) n+1 .
Next, transform to frequency domain by Parseval/ Plancherel Fourier isometric transform: Replace ω in (13) with ω − ω k and then we can get (14) is then converted to the form of nonnegative frequency interval integral: and set the first item in (15) to zero to obtain the quadratic optimization problem: Similarly, the minimization problem of the center frequency can be obtained by converting the center frequency updating problem to frequency domain: where u ∧ n+1 represents the center of power spectrum.
Also, the learning process of VMD is as follows: and n.

VMD-Based Denoising Method.
e VMD-based denoising method comprises two steps: (a) decompose the 6 Computational Intelligence and Neuroscience vibration signal by VMD. (b) Calculate correlation coefficients of IMFs obtained by VMD for noise filtering and reconstruct the denoised signal. e parameters α, λ(t), and K of VMD should be set before decomposition, which may affect the results noticeably.
Correlation coefficient is to describe the correlation degree between the original signal Y and its IMFs X(i), which is defined as e correlation coefficient also offers a means of measuring the degree of noise contained in IMFs.
us, by selecting the sensitive IMFs according to (20) below [33], we are able to obtain the reconstructed denoised signal: where μ i is the threshold of the ith IMF. Retain the IMFs of ρ i < μ i as sensitive IMFs for signal reconstruction, and remove the unqualified IMFs directly. Here, x(n) is a signal series in time domain, n � 1, 2, . . ., N; N is the number of data points. s(k) is a frequency spectrum of signal x(n), k � 1, 2, . . ., K, K is the number of spectrum lines, and f k is the frequency value of the kth spectrum line.

Feature Selection Based on PCA.
Although the multidomain features obtained above better describe the signals than using time domain or frequency domain features only, there may be feature redundancy, which will affect the diagnostic performance of the model. Considering the lack of labels of samples in clustering fault diagnosis, a commonly used unsupervised feature reduction algorithms principal component analysis (PCA) is adopted in the diagnosis model. e specific steps of PCA are as follows: Let the multidomain characteristic matrix obtained above be X n×m � [X 1 , X 2 , · · · , X m ], where the number of samples is m, and the dimension of features is n, then its covariance matrix C can be obtained by where X is a n × m matrix composed of row vectors of X, X i � 1/m m j�1 x ij ,i � 1, 2, · · · n. After that, the eigenvalues and the eigenvectors of the covariance matrix C are calculated by where λ 1 , λ 2 , · · · , λ n are the eigenvalues of C, and P 1 , P 2 , · · · , P n are the corresponding eigenvectors of the eigenvalues.
Arrange the eigenvectors according to the magnitude of the corresponding eigenvalues, take the front k rows of eigenvectors to form the matrix P, and obtain the k-dimensional data Y by

Bearing Fault Diagnosis Model of VMD-Based
Denoising and LENN. Based on the above techniques, a bearing fault diagnosis model of VMD-based denoising and LENN is proposed in this paper, including three stages: (a) signal  4 17 Computational Intelligence and Neuroscience denoising based on VMD and correlation coefficient calculation of IMFs; (b) feature extraction and selection by PCA; (c) clustering fault diagnosis on LENN. e specific diagnostic steps are as follows, which are shown in Figure 6.
(1) After signal acquisition, divide the raw signals into some segments of 2048 data points.

Experiment of LENN Algorithm on Artificial Datasets.
In order to verify the clustering effectiveness and stability of the proposed LENN algorithm on imbalanced data, LENN, ENN-2, and three commonly used clustering algorithms, namely, K-means, fuzzy C-means (FCM), and DBSCAN were experimented on three artificial datasets commonly used for clustering testing, which were Flame, Jain and Aggregation. ese three datasets are all in two dimensions with different degrees of imbalance, which are summarized in Table 2. From Figure 7 of the real distributions of the datasets, it can be seen that Flame consists of a circle and a semiring, Jain contains two semirings, and Aggregation is composed of many rounded or crescent clusters with two clusters connected, which increase the difficulty of clustering.
Adopt Rand Index to evaluate the clustering results of the algorithms here, which is defined as where a denotes the number of data pairs (x i , x j ) whose clustering results and real labels are in the same category; b represents the number of data pairs (x i , x j ) whose clustering results and real labels fall in the different categories; h denotes the number of data pairs (x i , x j ) whose clustering results are of the same category while real labels are of different categories; and g represents the number of data pairs (x i , x j ) whose clustering results turn to be of different categories while real labels turn to be of the same category.
In the experimental process of LENN, the optimal value range of λ was narrowed in each round of the experiment based on (0 ������� max(x ij ) ] to obtain the optimal λ, and the optimal λ selection processes of LENN on three datasets are shown in Figure 8. As can be seen from the figure, RI reaches a peak at the inflection point of the EDe drop-down curve, and the corresponding value of λ is just the desired λ, and for ENN-2, the optimal λ was determined in the same way based on (0, 1]. e parameter K of K-means was set according to the real number of categories of each experimental dataset, and the parameter setting of FCM was the same as above. In the training process of DBSCAN, the optimal combination of the radius parameter ε and the field density threshold MinPts was searched for several times with the initial range of ε set as (0, 2] and the initial range of MinPts set as [2,10]. In order to eliminate the influence of the sample input order on the experimental results, each experiment was conducted for ten times by disrupting the sample input order randomly to obtain RI score of the five algorithms, which are depicted in boxplots in Figure 9 and summarized in Table 3. Figure 10 presents the clustering performances of LENN graphically on three datasets.
In general, LENN achieved better performances in higher accuracy and stability on the three different datasets. Comparing Figure 9 and Table 3, it can be observed that, in terms of clustering accuracies, LENN and DBSCAN were not affected by the shape of data distribution, showing higher RI scores overall; while ENN-2, K-means, and FCM were highly affected by the shape of data distribution, all of them performed worse on semiorbicular clusters than rounded clusters, among which ENN-2 scored lowest on Flame and FCM scored lowest on Jain; in terms of clustering stability, which was visible in Figure 9, LENN showed extremely stable clustering performances on different input orders of samples, and there were small fluctuations in clustering stability of DBSCAN, K-means, and FCM, while ENN-2 showed the worst stability with the largest range of RI reaching 0.2436 on Flame. In particular, LENN scored highest on both Flame and Jain with nearly no impact of imbalanced data distribution on clustering, while scored slightly lower than that of DBSCAN on Aggregation. at was because, for Flame and Jain in Figure 10, all misclassified points of LENN, which were considered as noise, were individuals far away from their surrounding points; while for Aggregation, two clusters were closely connected distinctly by several data points, resulting in the algorithm which classified the two connected clusters as one cluster because of LENN's iterative distinguish mechanism; in spite of this, LENN could still fully identify the other minority clusters in Aggregation. As for ENN-2, only a small number of cases showed higher clustering accuracies on Aggregation, indicating that ENN-2 is better at processing rounded clusters, but still relies heavily on an appropriate sample input order. It could be concluded that the proposed LENN algorithm could deal with the clustering problem on imbalanced data in terms of higher accuracy and stability, addressing the limitations of dependency on input order of samples.

Bearing Fault Diagnosis Process of Proposed Model.
e main purpose of this work was to establish an effective bearing fault diagnosis model on imbalance data with noise, so in this part, experiments were carried out based on  Figure 11, which consists of a motor driving a shaft, a force meter, a torque transducer, and an electrical control device. In this part, we selected the data of a 6205-2RS deep-groove ball bearing from SKF Company, which were grouped into five categories, including normal, minor inner race fault (with the fault size of 0.1778 mm), serious inner race fault (with the fault size of 0.5334 mm), minor ball fault (with the fault size of 0.1778 mm), and serious ball fault (with the fault size of 0.5334 mm). After dividing the raw signals into segments of 2048 data points, five datasets above comprised the whole data of this experiment with the imbalance degree of 2 : 1:1 : 1:1, which are listed in Table 4.

Signal Denoising and Feature Extraction and Selection.
Firstly, VMD was carried out on the segmented fault samples. Here, set α � 2000, λ(t) � 1.5, and the initial distribution of center frequency was uniform. Considering improper selection of the number of decomposition K will lead to excessive or insufficient decomposition, therefore, K was determined according to the change of the center frequency of each mode in this paper. Taking VMD results of minor inner race fault (MIF) for example, as shown in Table 5, there were similar center frequencies appearing when K � 5, which may be attributed to mode mixing and Signal acquisition and segmentation VMD and correlation coefficient calculation x m1 x m2 x mn x 11 x 22 x 1n x 2n   Computational Intelligence and Neuroscience excessive decomposition. us, K was set as four. Similarly, the number of decomposition K of other datasets was determined to be four, and the VMD results of a sample of five conditions are shown in Figure 12. en, the correlation coefficients and thresholds corresponding respectively to IMFs and the original signal were calculated according to equations (19)∼ (20). e obtained results of all samples are graphically presented in Figure 13. Retain the corresponding IMFs whose correlation coefficient is above the red line and get the reconstructed signals.
Next, after signal denoising, extract the 23-dimensional features of each sample from both time and frequency domain according to Table 1, and PCA was used for feature selection and dimension reduction. By comparing the   Figure 14, it is visible that, in two dimensions, the cluster of normal is far away from others, while the other four clusters are pretty closer; moreover, the clusters of the same parts with different severities are very close to each other, with connected and overlapped points appearing in the clusters of minor and serious inner race fault, which is not conducive to fault diagnosis. In three dimensions, all the clusters could be well separated, and for the clusters of the same parts with different severities, there seem to be no overlaps among the clusters. erefore, the feature matrix of three dimensions was selected and finally input into the proposed LENN and ENN-2 algorithms.

Fault Diagnosis Results and Analysis.
At the end, the obtained matrix was input into LENN and ENN-2 using RI in (24) for comparison. Similarly, for the sake of eliminating the influence of sample input order on experimental results, each diagnosis experiment was conducted for 10 times by randomly shuffling the input order of samples. Two of the diagnosis results of ENN-2 and LENN-based models are depicted in Figure 15, with the detailed results summarized in Table 6. It is particularly evident from the results that overall, there was a marked increase in the performance of LENN-based model than that of ENN-2-based model in terms of clustering accuracy and stability. As can be seen from Figure 15, both of   the two models performed well on the data of MIF, SIF, MBF, and SBF because of the compact data distribution and good differentiation, with only three SIF points misclassified into MIF in ENN-2-based model. However, for the cluster in normal condition, its distribution was relatively loose, resulting in ENN-2-based model which could not identify the whole cluster correctly and generated multiple clusters and noise points in the results, while LENN-based model only marked very few points at the edge of the cluster as noise which were far from their near points, and from Table 6, it can be observed that

Impact of Imbalance Degree of Datasets on Diagnostic
Models. As outlined in the introduction, the problem of imbalanced data fault diagnosis in real industrial production increases the difficulty of clustering fault diagnosis. us, the impact of different imbalance degrees of datasets on diagnostic models was investigated experimentally. e bearing fault data used in Section 4.2 was still adopted in this experiment, and based on the signal denoising and feature extraction and selection method proposed in this paper, four diagnostic models commonly used in clustering diagnosis were constructed and compared on datasets in different imbalance degrees selected randomly by certain proportions (shown in Table 7), which were models on LENN, K-means, FCM, and DBSCAN. e specific parameter settings of the models were in line with the settings in Section 4.1. To evaluate the performances and measure the ability of the models to deal with the clustering problem on imbalanced data, RI, macrorecall, and macro-F score were adopted in this experiment, and by conducting each experiment randomly for 10 times, we were able to get the average scores of each model. Macro-recall measures the clustering performance of each class, especially the minority classes, which can be computed by where TP i and FN i represent the number of correctly and incorrectly predicted samples of the ith class, respectively. Macro-F score is a comprehensive evaluation of the precision and recall of clustering results, which is defined as where β denotes the relative importance of macro-R and macro-P, which is usually set as one, and macro-P can be obtained by where FP i represents the number of samples which are misclassified into the ith class but do not belong to that class actually. Table 8 collects the scores of the models on four datasets in different imbalance degrees. Two extreme cases with data imbalance degrees of 2 : 1 : 1 : 1 : 1 and 10 : 1 : 1 : 1 : 1 are taken to draw confusion matrices of the clustering results, which are shown in Figure 16.
It is apparent that, with the increase of data imbalance degrees, the clustering scores of all models showed a downward trend to varying imbalance degrees on the whole, among which LENN-based model performed noticeably best in all cases, while the configurations of K-means and FCM-based models were far from optimal. Comparing Table 8 and Figure 16, it can be observed that, the proposed model achieved considerably higher scores than other three models on all the datasets. Even on dataset 4 of the most extreme imbalance degree of 10 : 1 : 1 : 1 : 1, LENN-based model also recognized four minority clusters precisely, with ten samples of minor inner race fault, nine samples of serious inner race fault, eight samples of minor ball fault, and nine samples of serious ball fault identified correctly, DBSCAN-based model identified relatively few samples of these minor clusters; while K-means-based model regarded the samples of the same position but different severities as one cluster, failing to further subdivide the severities of samples, and FCM-based model directly identified all the four minority clusters as the same cluster, which scored lowest among the models. In addition, in the training process of LENN-based model on datasets with different imbalance degrees, the value of optimal λ grew gradually with the increase of data imbalance degree (shown in Figure 17). is was because, as the samples became more and more sparse, the distances between the samples increased, which required a larger λ to construct an extension correlation relationship between the samples, and this also indicated that the value of optimal λ was related to the final clustering results. In other words, the smaller of λ, the more precise the constructed extension relationship between samples would be, and the higher the accuracy of the proposed model could be obtained.
In conclusion, the imbalance degree of datasets increases the difficulty of fault diagnosis, especially for clustering algorithms relying on the selection of initial center points, and the appearances of minority clusters make it difficult for these algorithms to identify the fault categories correctly by distance calculation. At the same time, surprising outcomes of the experiments manifest that, by expressing the information of minority clusters more precisely through the constructed extension correlation function, the proposed LENN-based model shows good efficiency in identifying the minority clusters on imbalanced data.

Conclusions
(1) A novel clustering algorithm called linked extension neural network (LENN) is developed based on extension neural network-type 2 (ENN-2), which is far less sensitive to initial point selection and sample input order by improved correlation function and new iterative method. Furthermore, to evaluate the clustering performance, extension density is proposed as an evaluation target for this algorithm. (2) With the intention of improving the bearing fault diagnosis ability on imbalanced data with noise, a clustering fault diagnosis model of VMD-based denoising and LENN is constructed. e experimental results provide compelling evidence that the proposed model preserves powerful identification ability of minority fault clusters and achieves better diagnosis performance on imbalanced data with noise.

Data Availability
e labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.