Bearing Fault Diagnosis Method Based on Multidomain Heterogeneous Information Entropy Fusion and Model Self-Optimisation

,


Introduction
e development of intelligent manufacturing technology makes rotating machinery systems increasingly complicated. As a core rotary component of rotating machinery, rolling bearings often experience failure because of alternating impact load, mechanical wear, and thermal fatigue [1]. Improper handling of these faults affects the normal operation of rotating machinery, which results in serious cost losses. erefore, it is important to carry out effective, timely, accurate, and early diagnosis of the status of rolling bearings.
Vibration signal analysis is a commonly used method in mechanical fault diagnosis to extract valuable features that provide internal machine information from collected chaotic signals [2]. For example, Ahmed et al. [3] used discrete wavelet and wavelet packet transforms to filter the acquired signals, processed wavelet coefficients in the time and frequency domains, extracted various features, and classified the surface roughness of mechanical faults in cutting tests. Cheng et al. [4] proposed a fully set empirical mode decomposition method of adaptive noise complementarity, which improved decomposition performance by reducing the reconstruction error and the influence of mode aliasing and enabled bearing fault detection. However, during rolling bearing operation, fault characteristic information is affected by the coupling vibration signals of multiple components and is submerged in noise or other signal components. e above research methods are aimed at analysing the signal of a single measuring point sensor, which may lose part of the fault information and affect the extraction and expression of bearing fault features.
In recent years, scholars have introduced the idea of information fusion into the field of rotating machinery fault diagnosis to make use of the fault information of different sensitivities at multiple measuring points and to improve diagnostic accuracy. Yuan et al. [5] used the multivariate empirical mode decomposition method to extract dynamic characteristics from multivariate information that was collected by multiple sensors, analysed the multiscale sample entropy, and combined this information with a back-propagation neural network to classify faults in rolling bearings. Gao et al. [6] proposed a rolling bearing fault diagnosis method based on a complementary ensemble empirical mode decomposition entropy fusion feature. Kernel principal component analysis was used for feature fusion and dimension reduction, and fault classification was carried out by an optimised least-squares support vector machine. e information fusion method improves the reliability and accuracy of fault diagnosis to a certain extent, but it has the following limitations in feature extraction: (1) during the processing of multisource information, the above algorithms rely largely on feature qualities extracted by hand, and they have poor nonlinear fitting ability, which makes it difficult to deal with current complex, massive, heterogeneous, and real-time bearing fault information. (2) In the fusion of information collected by multiple sensors, the above algorithm only splices the data features of the measured points directly. Because of differences in the original data, mechanical data superposition leads to a heavier algorithm load and a lower model diagnostic performance.
With the continuous expansion of the concept of "intelligent manufacturing," the convolutional neural network (CNN) has found a wide range of applications to solve the above problems. Chen et al. [7] proposed a fault diagnosis method for rotating machinery based on parameter transmission, using a designed wide-core onedimensional (1D) CNN to automatically learn the transferable features of the fault signals and to eliminate the influence of manual features. Li et al. [8] proposed a fault diagnosis method based on multiscale permutation entropy and a multichannel fusion convolutional neural network. Multiscale permutation entropy was used to analyse vibration signals of rotating machinery at different scales, and a multichannel fusion convolutional neural network was used to fuse multichannel features and identify faults. Intelligent fault diagnosis methods based on CNN retain rich and effective fault feature information to a large extent and improve diagnostic accuracy, but some challenges remain: (1) Most of the existing studies are based on information from analysis in the time, frequency, and timefrequency domains to determine the fault type. Few studies exist on other mapping methods of time sequence signals and the complementary and simultaneous correlation of spatial components such as hue, lightness, and purity in the postmapping spatial domain. (2) e establishment of a CNN model requires designers to have rich professional and domain knowledge, to weigh model size and computational efficiency, and to conduct repeated experiments during model construction to obtain the best model parameters. With new practical problems, it is often necessary to reconstruct the model. ese problems may be difficult to interpret and generalise.
In summary, research was carried out on a bearing fault diagnosis method based on multidomain heterogeneous information entropy fusion and model self-optimisation. e spatiotemporal multiscene domain fusion strategy based on heterogeneous sensors (HSMSF) was used to analyse and assess the characteristics of bearing fault signals, and the optimal feature extractor that was obtained by the chaos elitist modified sparrow search algorithm (CEI-SSA) adaptive optimisation was used to reconstruct the multidomain fusion space and decompose features in the spacetime dimension. e multisensor predictive value fusion strategy was combined with information entropy theory to identify and diagnose bearing faults. e main contributions in this paper are summarised as follows: (1) We propose a spatiotemporal multiscene domain feature fusion strategy based on heterogeneous sensors. From the perspective of sequence periodicity, correlation, and spatiotemporal transformation, a variety of spatial domain mapping methods were explored. Based on the method of channel cascade and the application of long short-term memory (LSTM) networks, characteristic information of multidomain samples in time and space dimensions was enriched, and decision reasoning regarding sensor information from multiple measurement points was carried out by adaptive entropy weight. is approach can improve the accuracy and numerical stability of bearing faults diagnosis and provide ideas for information fusion in heterogeneous multiscene domains. (2) In this work, we propose a feature extraction model and adaptive optimisation method based on CEI-SSA. To improve the global optimisation efficiency and pioneering ability of the original SSA, a chaotic initialisation and reverse elite strategy were introduced to improve the original SSA, and the improved CEI-SSA was applied to the HSMSF model to achieve adaptive optimisation reconstruction of 19 structural parameters. e interpretability and generalisation of the model design were enhanced, and the time and labour costs of the model designs were reduced, while construction of a model by subjective experience was avoided.
e remainder of this article is arranged as follows. eoretical background is described in Section 2. e method proposed in this paper is described in detail in Section 3. In Section 4, a benchmark function was used to test the advancement of an improved intelligent optimisation algorithm, and the validity of the proposed method was verified compared with other diagnostic methods on two experimental datasets of rolling bearings. Section 5 provides the conclusions and future research directions of this study.  [9]. e functions of the input and output gates are to read data and transfer the processed data to the next point in time. e main task of the forgetting gate is to save the useful information in the output unit of the last module and to avoid the backward propagation of useless information at the last moment. e internal updates of the three are expressed as in the six following equations: where W and V represent the input and hidden status weights, respectively; b represents the deviation, and σ represents the sigmoid function. e input variable at each moment contains the input x t at the current moment, the unit state c t−1 at the previous moment, and the intermediate state h t−1 . Intermediate variables include the outputs i t and o t of the input and output gates, the output f t of the forgetting gate, and the output g t of the input node. e output variables include the current moment unit state c t and the intermediate state h t .

Sparrow Search Algorithm.
e SSA algorithm is a new swarm intelligence optimisation algorithm that was proposed by Xue and Shen [10] in 2020. e algorithm has the characteristics of few parameters that require adjustment and good stability and has been applied to battery stack parameter optimisation [11], CT image location detection [12], and other practical problems. e specific algorithm model includes the producer, scrounger, and guard. e producer looks for food and provides foraging directions for the population, the scrounger uses the producer to obtain food, and the guard provides antipredation behaviour when the sparrow population becomes aware of danger. e iterative equations of the three are as follows: where X i d represents the position of the i-th sparrow in the d-dimension, t represents the current iteration number, T is the maximum number of iterations, Xw t d represents the worst position of the sparrow in dimension d at iteration t, Xp t+1 d represents the optimal position of sparrows in the d-dimension at the t + 1 iteration of the population, α is a random number between (0, 1], Γ is a random number that follows a standard normal distribution, R 2 ∈ [0, 1] and ST ∈ [0.5, 1] indicate the warning value and safety value, respectively, and f g and f w are the optimal and worst fitness values of the current sparrow population, respectively.

Model Building
e overall framework of this paper is shown in Figure 1, which can be divided into two parts. e HSMSF was constructed, sensor information of multiple measurement points was mapped to multiple high-dimensional domains, and several domains with a strong characterisation ability for information were selected from the basic CNN model as input data. By using the multichannel characteristics of CNN combined with the complexity of each domain, the corresponding domain information was sent to the corresponding CNN channel in chronological order to fuse into a feature map with time characteristics, and the LSTM layer disassembled its space-time characteristics and made fault predictions based on information entropy theory. e CEI-SSA algorithm in the second part served as the feature extraction component of the HSMSF and used its population iterative characteristics to conduct adaptive optimisation of model components and establish an optimal feature model for fault diagnosis.

Bearing Fault Signal Classifier Based on HSMSF.
Bearing fault diagnosis uses only vibration signals that are collected by a single sensor for analysis, and the collected data samples are 1D time-domain sequences, which do not contain obvious fault characteristic information and Shock and Vibration 3 insufficient correlation expression between features.
is paper focuses on a study of the HSMSF. From multiple perspectives of bearing components, heterogeneous sensors were used to collect sequence signals in multiple time scales, and the collected data were mapped into high-dimensional domains. e multidomain fusion space of the initial signal features was reconstructed in a multichannel cascade mode, and LSTM was introduced into the fusion space for feature disassembly.
e complementarity of the features in the spatial and temporal dimensions of the data samples was explored. e probability of the sample labels was solved according to effective features that were obtained by disassembly, and the adaptive entropy weighted fusion of the predicted values of different source sensors was carried out combined with the information entropy theory to achieve efficient diagnosis of the bearing faults. e overall HSMSF framework is shown in Figure 2. e 1D signal sample was mapped into a high-dimensional space and was presented as a two-dimensional (2D) image by a special coding method. e purpose was to optimise the problem of feature singularity in the original 1D domain by describing details such as the point, line, image colour, or cross boundary. In this HSMSF study, the high-dimensional mapping of data samples has five conversion modes: time domain (TD), continuous wavelet transform (CWT), time-frequency domain [13], Gramian angular field [14], Markov transition field (MTF) [15], and recurrence plot field (RPF) [16]. e CWT is used to transform the signal to the time and frequency domain, which does not require setting the width of the time-frequency window in advance and has good local analysis ability for time and frequency. If the signal x(t) ∈ L 2 (R) is set, the wavelet basis transformation function is shown in the following equation: Original vibration signal    Shock and Vibration e Gramian angular field encodes the time series in the polar coordinate system to show different information granularity of the samples. Each element in the Gramian matrix is the trigonometric function value of angles. By considering the difference of the sum or difference operation of trigonometric functions, two different implementations are derived, namely, the Gramian angular summation field (GASF) and the Gramian angular difference field (GADF). e transformation expression function is shown as follows: where X is the transformed polar coordinate sequence and I is the unit row vector [1, 1, . . ., 1]. e Markov transition field is a variant based on a firstorder Markov chain that is used to solve the problem that the Markov transfer matrix is insensitive to the time characteristics of the sequential samples and that the time and position relationship of the sequential samples is given priority during transformation. e signal sequence x(t) with length n is mapped to each corresponding range q j , and the corresponding transition matrix is constructed by combining q j with the frequency w ij immediately adjacent to each sample. en the Markov transition field image can be obtained by transforming the following equation:  Shock and Vibration e recurrence plot field is an effective method to analyse the periodicity, chaos, and nonstationary nature of signal sequences and can separate the internal structure, similarity, and predictability of the data samples. e expression of the transformation function is shown as follows: where Θ is the Heaviside function and ε is the recursive threshold. e HSMSF considers the correlation and complementarity of different domains in the feature space. Two domains were selected with a strong ability to represent the fault information of the corresponding dataset based on the diagnostic performance of the basic CNN in a single domain. According to the complexity of the different mapping domains, two types of domain data exist with different resolution sizes at different times and CNN site node inputs. In the process of convolution with multicore CNN characteristics, the use of a multichannel cascade mode for multidomain fault characteristic information provides significant overlays, and space-time characteristics are included in the multidomain fault feature fusion. We assume that signal i maps to data vectors s k (i) and c k (i) of the above two optimal domains. Equation (15) can be obtained for M preset data channels in the basic CNN structure: where m is an integer in the interval (1, M); that is, the channel proportion of different mapping domains will be adjusted dynamically according to the spatial complexity of its mapping domain. To make full use of the information represented by a domain with more complex features, it is necessary to input the information at the front end of the network with higher resolution. Because only two domains with better performance were selected in this study for comparison of the two types of data, when they were input into the network, the domain with a higher degree of spatial complexity was input one convolution layer ahead of the other, and the resolution was set to twice that of the other. Based on the above settings, the channel proportion of the domain with higher spatial complexity was two-thirds (if the channel value is not an integer, it should be rounded down). Convolution and pooling operations were carried out, and a full connection layer was added after the last convolution block for data dimension reduction to eliminate redundant information between different domains. e stack form was used in the splicing of spatial features of different mapping domains. Unlike the traditional concat and add methods, stack can retain the time step information of the feature tensor. e data after dimension reduction of the two types of domains were spliced into a 2D matrix form, which was synchronised with the number of samples into a three-dimensional (3D) block and then transmitted to the LSTM layer as an input.
e LSTM layer was equipped with a special gating mechanism, which prevents the content of feature units within multiple time steps from being tampered with, enables learning of long-term dependent features between multiple domains, and enables processing of global features that have not been explored in the CNN layer. erefore, the LSTM has a superior solution space capability for the above splicing data blocks. e solution process is shown in equations (1)- (6). During this process, the cross-entropy loss algorithm was used to reverse the propagation of training errors and gradually update model parameters layer by layer. When the model parameters were updated to an optimum, the fault features with a strong spatiotemporal characterisation ability could be disintegrated and input into the SoftMax function to solve for the probability value. A corresponding output probability exists for the u group heterogeneous sensor. To ensure the stability and classification accuracy of the diagnostic model, decision-level fusion was carried out by the adaptive entropy weighted fusion method for group u values, as follows.
e probability output matrix of the heterogeneous sensor group can be constructed from the output probability value of group u: where each line represents the final diagnostic prediction probability of a group of sensor data x and n represents the number of fault categories. e analysis shows that if the difference in probabilities from each group of sensors was small, the classification uncertainty was large. If the difference between the maximum probability and other probabilities was large, the classification result was more reliable. erefore, the information entropy H i (x), as shown in (17), was used to represent the uncertainty of the classification: where p ij (x) represents the probability that the classification predicted value of the i-th sensor group is identified as belonging to j. According to the information entropy value of each group of sensors, the adaptive fusion weight is given by (18). Combined with the probability output matrix P(x) of (16), the final probability output matrix P′(x) is obtained as shown in (19): 6 Shock and Vibration e maximum label is obtained by summing the matrices by column weight to provide the result according to 3.2. Adaptive Optimisation of Feature Extractor Model Based on CEI-SSA. In the HSMSF feature extractor, the basic CNN structure was inspired by application of the CNN network to diagnose bearing faults in the literature [17]. In addition, 32 3 × 3 convolution blocks and 32 2 × 2 max-pooling blocks were added to the front end as a preconvolution layer in the complex feature domain, and the LSTM that was used for the spatiotemporal feature decomposition adopted a singlelayer network structure. However, such feature extractors are mostly models selected based on human experience, and the determination of internal parameters depends on extensive expert experience for debugging. Because of differences in the form of bearing data sampling, it is likely that the model will need to be reconstructed for different diagnostic problems. erefore, this work proposes an adaptive optimisation method for feature extractor models based on CEI-SSA; the algorithm's flow chart is shown in Figure 3.
Initial sparrow individuals in the original SSA were generated randomly within the search space. It was difficult to obtain good population diversity, which led to poor convergence and local algorithm development problems. e reverse learning strategies by chaotic sequence initialisation and the elite strategy provided two approaches to improving the quality of the initial SSA solution and enhanced the global and local search capabilities of the algorithm.
e chaotic sequence has the advantages of a uniform ergodic space and fast convergence. It can improve the global search ability of sparrows in the SSA and avoid a decrease in diversity of parameters to be optimised by the population in later iterations. To consider the randomness and regularity of a chaotic operator, a chaotic sequence is generated by Tent mapping to initialise the population. e Tent mapping expression is as follows: After the population as initialised by chaos is generated, chaotic individuals need to be transformed into the corresponding search space. e transformation formula is as follows: where X ub,d and X lb,d are the upper and lower boundaries of the individuals. e main idea of a reverse learning strategy is to find a feasible solution to a problem, find its reverse solution, evaluate the original solution, reverse the solution, and then select the optimal solution as the next generation of individuals. is strategy can improve the individual quality of sparrows in the SSA and improve the local search performance and exploration ability of the algorithm. Let the feasible solution of individual No. 1 in the above D-di- e reverse solution generated by the reverse learning strategy does not necessarily make it easier to find the global optimal solution compared with the original solution in the current search space. To solve this problem, an elitist reverse solution was used to improve the algorithm. It has been proven that elite reverse solution individuals carry more effective information than ordinary individuals [18]. e excellent individuals that were selected from the current population and the reverse population that was formed by the elite individuals are of great search value and can avoid algorithm prematurity and and search out the required structural parameters more effectively. e elite individual is the extreme point that corresponds to the general individual; namely,  Shock and Vibration 7 where δ is the random value of interval [0, 1], and lb d and ub d are the lower and upper bounds of the dynamic boundary, respectively. e dynamic boundary overcomes the problem that the fixed boundary makes it difficult to preserve the search experience, so that the elite reverse solution is in the space of the fine range, which is conducive to algorithm convergence. e reset equation (equation (26)) is used to reset when X E i d goes out of bounds.
Constructing a feature extractor model usually requires human professional experience and repeated experiments to determine the optimal parameter structure, resulting in insufficient model generalisation and the need to redesign the structure in the face of new diagnostic problems. Moreover, some parameters of the feature extraction model are crucial to its performance. For example, if the batch size is too small, the randomness of the model will become greater, making it difficult to achieve convergence. If the batch size is too large, the training solution will fall into local optima. When the learning rate is too slow, the model is prone to overapproximation and slow convergence, and when the learning rate is too fast, the training process will have large oscillations. For different training tasks, the selection of different kinds of optimisers also affects diagnostic model performance. For example, although the well-known Adam algorithm does not need artificial intervention to adjust the learning rate, the learning rate will rise suddenly in a certain parameter space, resulting in nonconvergence. e size of the convolution kernel, the number of channels, the number of hidden layer units, and the pooling mode are also crucial to the model. Improper collocation and combination will lead to misidentification or loss of the interlayer receptive field, leading to degradation of model diagnostic performance.
In summary, this paper uses CEI-SSA modified by the above mechanism to conduct structural optimisation operations on the feature extractor model, with the aim of reducing the difficulty of model design and improving the adaptive recombination and adaptation ability of the model. e 19 dimensions of each sparrow corresponded to 18 undetermined parameters of the CNN layer and one undetermined parameter of the LSTM layer in the feature extractor.
e convolution kernel had four convolution layers (Ck_s_1, Ck_s_2, Ck_s_3, and Ck_s_4), four convolution core layers (channel_1, channel_2, channel_3, and channel_4), four activation functions in the convolution layer (Act_1, Act_2, Act_3, and Act_4), a pooling mode of the other three layers except for the third layer ((pool_1, pool_2, and pool_4), an LSTM layer hidden unit number (Hid_s), a learning rate (L_rate), an optimiser (Op), and a batch size (Batch). Except for the activation function, pooling mode, and learning rate, 11 undetermined parameters needed rounding of their first decimal because of integer programming of information. e feature extractor model was established according to the position information of each sparrow after chaotic initialisation. e initial feature extractor model was trained, and its prediction error was calculated on the test set using equation (27): where y(i) is the prediction label, t(i) is the real label, and batch is the batch size. f was used as a fitness function in CEI-SSA for population updating, and the optimal individual information was used to build the final feature extractor model. Specific optimisation steps are shown in Algorithm 1.

Experimental Validation
In  [19], which are five high-dimensional multipeak test functions.

Performance Verification of CEI-SSA Algorithm.
To verify the optimisation performance of the CEI-SSA algorithm, CEI-SSA was compared with the SSA [10,20], Dragonfly Algorithm (DA) [21], Grey Wolf Optimiser (GWO), Artificial Bee Colony (ABC), and Particle Swarm Optimisation (PSO) algorithms on the 12 test functions. e population size of each algorithm was set to 100, the maximum number of iterations was set to 1000, and the numbers of producers and guards were set to 20% of the population size. To avoid the contingency of optimisation results and prove CEI-SSA stability, each benchmark function was run independently 30 times, the experimental results were selected as experimental data, and the mean (Mean) and standard deviation (Std) of each algorithm were taken as the final evaluation indices. e comparative results are shown in Tables 1 and 2.
According to the analysis in Table 1, for the same test constraints, the statistical results of CEI-SSA on five groups of high-dimensional single-peak test functions were significantly better than those of the other five comparison algorithms. For the test functions F1, F2, and F3, CEI-SSA yielded its theoretical optimal solution stably. When solving F3, F5, F6, and F7, CEI-SSA failed to find the theoretically optimal solution, but it was several orders of magnitude better than the other algorithms on the two evaluation indices of mean and standard deviation. Compared with the other algorithms, the improved CEI-SSA had a stronger algorithm development ability and stability on single-peak functions. 8 Shock and Vibration Table 2 shows the high-dimensional multipeak test functions F9 and F11. CEI-SSA and SSA converged stably to a global optimum. In the F10 test function, CEI-SSA and SSA reached a value closest to the optimal fitness and had strong stability compared with other algorithms. When solving the F8 function, there was no obvious difference between the results of several algorithms; CEI-SSA performed slightly better than the other algorithms in optimisation and was slightly less stable than SSA. When solving F12, CEI-SSA was slightly worse than SSA on the two evaluation indices but was greatly superior to the other comparison algorithms. erefore, CEI-SSA showed good performance in multiple optimisation processes and on single-peak or multipeak test functions. erefore, CEI-SSA has an efficient global optimisation local exploration ability and can fully and efficiently explore the search space with strong stability and robustness; hence, it can be applied to the adaptive optimisation problem of the HSMSF model in this paper.

Case 1: CWRU-Bearing
Dataset. CWRU datasets were provided by the Case Western Reserve University's Loaded Data Center [22] and have been used extensively in rolling bearing fault diagnosis. e experimental device is shown in Figure 4. It is mainly composed of a 2-HP motor, a bearing accelerometer, a torque sensor, and a power tester. e driving end and fan end test bearings were 6205-2RS JEM SKF and 6203, respectively. By EDM, single-point faults were arranged on the rolling body, inner raceway, and outer raceway of the bearings with damage levels of 7, 14, and 21 mm. An acceleration sensor was placed on the driving end of the motor and the bearing seat of the fan end of the motor, Input: the number of sparrows N Output: optimal fitness value F g;Optimal value of feature extractor structure y best ; (1 )initialization: initialize sparrow population via (21) and (22); (2) repeat (3 ) while k ∈ Population size do (4) Calculate fitness values of the each individual via (25); (5) Sort each individual according to its fitness value; (6) Random R 2 value in the interval [0, 1]; (7) for each producer i � 1 to (N * PD) do (8) Update the location of the producer via (7). (9) end for (10) for each producer j � (N * PD) + 1 to N do (11) Update the location of the scroungers via (8). (12) end for (13) Use the elite reverse strategy to reverse solution and update outstanding individual via (23) and (24); (14) Update F g and y best ; (15) k � k + 1; (16) end while (17) until a fixed number of iterations (18) return F g, y best ALGORITHM 1: e framework of optimising feature extractor. respectively, to collect vibration acceleration signals at motor speeds from 1730 RPM to 1797 RPM and motor loads from 0 HP to 3 HP. Vibration signals were collected by a 16channel data recorder with a sampling frequency of 12 kHz. e main purpose of this example was to verify the accuracy of the HSMSF model based on CEI-SSA adaptive optimisation in the proposed bearing fault diagnosis method for diagnosis of bearing defect location and damage degree. erefore, in the process of type label assignment, the same fault types at different motor speeds and loads under the background of signal data were given the same type label. According to the standard, ten faults with different severities in the rolling element, inner raceway, and outer raceway of the driving-end rolling bearing were studied for sensors with different sources on the driving-end and fan-end bearing seats. Figure 5 shows the time-domain waveform of a 10 s vibration signal that was collected from sensors at the driving end and the fan end when the internal raceway fault damage was 14 mm. e vibration signals of each fault state monitored by the sensor were divided into samples of equal length. e sample set of 10 fault samples was generated according to a window size of 1024 time points and a window moving step of 1000 time points. For the same fault type, 300 samples were collected under four operating conditions (3 HP/1730 RPM, 2 HP/1750 RPM, 1 HP/1772 RPM, and 0 HP/1797 RPM). Twelve hundred samples in each state were obtained, for a total of 12,000 samples. Two types of sensors at the driver end and fan end were used to construct datasets DE and FE, and the ratio of training set to test set in each dataset was 8 : 2. A detailed description of the experimental data is given in Table 3. To adapt to the data input format of the feature extractor and carry out high-quality domain information fusion, multispace domain transformation was carried out for various fault samples in the DE and FE datasets. Figure 6 lists the 2D images of five corresponding domains after transformation of some samples.
When the validity of the fault sample spatial domain transformation had been verified, the basic CNN structure mentioned in Section 3.2 was used to compare the diagnostic accuracy of the DE dataset. e detailed configuration structure is shown in Table 4. To explore the influence of the resolution of the input layer and the number of network layers on the ability of each spatial domain to characterise fault features, the input layer of each domain was set to resolutions of 32 × 32, 112 × 112, and 224 × 224, and eight 3 × 3 convolution blocks and eight 2 × 2 max-pooling blocks were added to the front end of the basic CNN structure for testing through five experiments. e same parameter configuration was used for all experiments, with a learning rate of 0.001, a batch size of 32, and 50 iterations. Specific comparison results are shown in Figures 7(a)-7(d).
Compared with the time domain method, the average accuracy from all domain mapping methods was improved, which proves the advancement of the proposed domain transformation method. By comparing the results in Figures 7(a) and 7(b), when the CNN structure was a threelayer convolution, the average diagnostic accuracy of samples in each domain with an input resolution of 112 × 112 was improved by ∼2%-9% compared with the original 32 × 32 input size. is means that resolution loss cannot be ignored in diagnosis. However, higher resolution does not imply a more accurate diagnosis. Figures 7(b) and 7(c) show that an increase in the resolution of the input layer decreases the average accuracy of diagnosis when using CWT timefrequency domain samples. is result may originate from differences in domain mapping methods. e pixel intensity of the noise in such image samples becomes more obvious with an increase in resolution, which affects diagnostic accuracy. Shallow network layers may not be able to fully characterise the features of domain samples with complex structures, and therefore the CNN structure was changed to four layers based on the constant resolution of the input layer, 224 × 224. Figures 7(c) and 7(d) show that the diagnostic accuracy of each domain was improved to some extent after the network structure of one layer was deepened. Except for the CWT time-frequency domain, the results obtained in the other domains after combination in this way were the best among the four combinations of verification methods.
ese results indicate that the network model performs optimally only when it matches the corresponding data. Too deep or too shallow a layer leads to a decrease in diagnostic performance.
To combine the advantages between domains, and for better application to the research method in this work, the CWT (3-layer structure, 112 × 112 resolution) with the  highest average accuracy in the above verification experiment was selected as the back-end input of the improved structure described in Section 3.1, and the second GADF (4layer structure, 224 × 224 resolution) was selected as the front-end input. A multichannel cascade method combined with an LSTM temporal and spatial feature decomposition was used in the diagnostic experiment verification. e diagnostic accuracy curve of the test data is shown in Figure 8.
In Figure 8, CWT-CNN refers to the original basic network input by CWT, GADF-CNN refers to the original basic network input by GADF, and P-Net refers to the network structure as improved by the multichannel cascade mode and by LSTM. To analyse the improvement in feature representation ability of the improved structure, a t-distributed stochastic neighbour embedding (T-SNE) [23] was used to visualise the feature distribution of the three contrast modes, as shown in Figures 9(a)-9(c). Each point in the diagram represents a sample, and different types of faults are represented in different colours.
e results show that, compared with the previous two methods, P-Net has better feature separation ability and classification performance, and the extracted features can be clustered. e verification accuracy was 96.2%, which was ∼2.7% higher than that of the    Shock and Vibration 11 previous optimal CWT-CNN. e multidomain feature fusion method developed in this research makes good use of the advantages of each domain and helps the model to separate different fault types to a greater extent after fusion of the multidomain feature information. Figure 9(c) shows that although P-Net has a certain performance improvement in diagnostic accuracy, overlap remains among categories 1, 3, and 8. is occurred largely because an insufficient number of basic components in the model were set for the artificial experience, and the original model parameter settings affected the quality of the feature extraction.
e preset convolution channel in the multichannel model is likely to appear during the cascade channel redundancy phenomenon, which increases the required calculations and reduces fault classification accuracy. erefore, it is necessary to study adaptive optimisation of the structure of diagnostic models.
CEI-SSA, which had been verified experimentally, was used to conduct model adaptive optimisation for P-Net. To reduce the time and cost required, when the loss function of the test set did not decrease for 10 consecutive cycles, the program stopped and saved the model that corresponded to the minimum loss function of the test set. e initial parameter settings of CEI-SSA are shown in Table 5, and the 19 parameters were attached when individuals obtained the global optimal solution, as shown in Table 6. e 3D diagram of the iterative process of each sparrow in the population is shown in Figure 10, with horizontal and vertical coordinates that   correspond to channel_1 and channel_2 parameters. All individuals converged to the optimal fitness of the population according to the optimisation mechanism, which indicates that the CEI-SSA used in this work can perform good model adaptive optimisation and deal adaptively with the corresponding fault diagnosis problems to reduce the cost of manpower and design time. Figure 11 shows the CEI-SSA adaptive optimisation of the model curve after AP-Net classification accuracy. e diagnostic accuracy increased to 97.7%. Furthermore, because the original partially redundant structure was abandoned during model adaptive optimisation, the accuracy curve convergence was faster and more stable, which verified the effectiveness of the optimisation method of the feature extractor model component in this paper. e signals detected by sensors at different measuring points were complementary, and therefore AP-Net was used to extract fault features from the FE dataset collected at the fan end, and the category probability prediction was obtained from the two types of datasets. e adaptive entropy weighted fusion method described in Section 3.1 was used for verification and analysis. Figure 12 shows that, through the confusion matrix, the model identified the results of each health state in the test set data after fusing information from different sensors. e average diagnostic accuracy increased to 99.8%, the error sample size decreased to 6, and the accuracy of each type of sample exceeded 99%. erefore, the HSMSF adaptive optimisation based on CEI-SSA in this work can apply different sources of information to improve diagnostic accuracy. Figure 13 shows a visual analysis of the feature dimension reduction of the final model. Compared with Figure 9(c), the model showed good same-type cohesion and a strong ability to separate different types of states.

Case 2: IMS-Bearing
Dataset. It is difficult to verify the effectiveness and practical application of the proposed method to CWRU dataset with artificially damaged parts. erefore, the method described in this paper was also tested on the IMS dataset, which was collected from the natural evolution of bearing faults and provided by the Intelligent Maintenance Systems Center of the University of Cincinnati [24]. e bearing fault experimental platform is shown in Figure 14 and consists of the main body part, the transmission part, the loading system, the lubrication system, and the control circuit. e speed of the AC motor is 2000 RPM. e test shaft consisted of four Rexnord ZA-2115 doublerow bearings with pitch diameter of 2.815 in., roller diameter of 0.331 in., and pressure angle of 15.17°. ermocouple sensors placed on the bearing together with the lubrication system monitored and regulated the bearing temperature and lubrication. e shaft and bearing carried a radial load of 26.67 kN applied by the spring mechanism. Each bearing was connected to two high-precision accelerometers (PCB 253B33 High-Sensitivity Quart ICP ® ), both measured in the X-and Y-directions. e sampling frequency was 20 KHz, and 20480 data points were sampled each time.
e IMS dataset records three groups of data according to different situations. e first group of experimental data was used to construct the dataset. A total of 2156 experimental data points from four sets of bearings were recorded. When the test bench stopped, bearings 1 and 2 were normal, bearing 3 experienced inner ring failure, and bearing 4 experienced rolling body failure. ree data categories therefore existed. For each type of fault sample, the sample set was constructed by moving the window by steps of 600 points with a window length of 1024 time points. Finally, 2,155 samples were generated for each state; a total of 6,465 samples were generated. e bearings in the X-and Y-directions were divided into two datasets, IMS_X and IMS_Y. Each dataset was divided into a training set and a test set in the ratio of 8 : 2. A detailed description of the experimental data is provided in Table 7. For the IMS_X dataset, the   "Adam", "SGD", "Adamx", "Adadelta", "Adagrad", "AdamW", "ASGD", "RMSprop" Learning rate [1e − 5, 1e − 1] Batch size [16,64] Activation function "Relu", "Sigmoid", "Relu6", "Tanh", "LRelu", "Softsign" Channel number [3,15], [16,30]        According to the comparison results, compared with the time-domain transformation, the other five domain transformation methods in the IMS_X dataset improved the diagnostic accuracy by 1 to 30 points, and the optimal average accuracy was 93.5%. e conclusions from analysing the influence of resolution and network layer on the diagnostic performance are consistent with those from Experiment 1.
e average accuracies of the CWT (3-layer structure, 112 × 112 resolution) and GADF (4-layer structure, 224 × 224 resolution) models were the top two. When input into the above P-Net model, the diagnostic accuracy curve in Figure 16 resulted. e results verify the P-Net processing method compared with that for a single domain. e fault features of the samples were separated, their inherent information was explored, and the fault identification accuracy improved to 95.6%. CEI-SSA was used to conduct model adaptive optimisation for P-Net on the IMS dataset with the same parameter settings as in Experiment 1. Table 8 shows the 19 optimal component parameters that were obtained after individual adaptive optimisation of the algorithm on the IMS dataset. e classification accuracy curve of the optimised AP-Net model is shown in Figure 17. e diagnostic accuracy was improved to 96.3%, and the model convergence was rapid and stable, with a reduction in manpower and time cost consumed in the design. e above verifies the good generalisation ability of CEI-SSA in this work and the effectiveness of the model self-optimisation method.
e IMS_Y datasets of sensors in different directions were combined to verify the validity of the method. AP-Net was used to extract the features of the two datasets, and the diagnostic performance was verified by using an adaptive entropy-weighted fusion method. e final confusion matrix of the three health states of the bearings on the IMS dataset is shown in Figure 18. e average diagnostic accuracy improved to 99.8%, and the performance improved significantly. In all test samples, only three samples in the inner ring fault class were classified incorrectly belonging to the normal class. A visualisation of the feature dimension reduction of the model is shown in Figure 19, which proves that the model can delete redundant information and has reliable feature separation ability during feature learning. e CEI-SSA optimisation with the HSMSF method improved the accuracy of fault diagnosis and had a strong generalisation ability.
To explore the influence of the selection and number of training samples on algorithm's performance, experiments were carried out on the CWRU and IMS datasets according to five segmentation ratios of training set and test set (5 : 5, 6 : 4, 7 : 3, 8 : 2, and 9 : 1). e CWT-CNN, GADF-CNN, and AP-Net algorithms were selected for comparison and verification. Both CWT-CNN and GADF-CNN were set according to the same fixed structural parameters, whereas the parameter setting in AP-Net used an adaptive selection   value. e diagnostic accuracy of the two datasets is shown in Figure 20. It is apparent that when the segmentation ratio was 5 : 5, all algorithms performed poorly. e main reason for this was that the training and test sets could not provide reasonable learnable parameters for the algorithm model under the segmentation ratio of 5 : 5, resulting in poor algorithm fitting. In contrast, the other four segmentation methods could basically meet the training and diagnosis requirements of the algorithm. Among them, the diagnostic accuracy of the AP-Net model was always better than those of the other two methods and showed high robustness.
To illustrate the superiority of the proposed method, the performances of other deep and shallow learning algorithms were compared with that of the proposed method according to the same data partitioning method. e deep learning algorithms included ST-CNN [17], the Deep Belief Network, and CNNEPDNN [25]. ST-CNN is composed of three convolution layers with filter sizes of 3 × 3, the Deep Belief Network structure is 1024-512-256-128/10, and CNNEPDNN is composed of two convolution layers with filter sizes of 5 × 1 and four hidden-layer DNN structures with 20, 40, 80, and 160 nodes. Shallow learning methods included the support vector machine (SVM), the PSO least-squares support vector machine (PSO-LSSVM) [6], and the back-propagation (BP) neural network. During the comparison of the algorithms' performances, the  influence of feature set selection on diagnostic accuracy was considered. In the deep learning algorithm, three feature sets, CWT, GADF, and FD, were selected for experiments. FD was the frequency-domain image converted from timing signals at constant speed, and data samples are shown in Figure 21. Because the shallow learning method cannot extract the sequence features automatically, the mean, variance, root mean square, peak factor, kurtosis coefficient, and waveform factor of the signal were extracted manually in the experiment as six time-domain features, and the centre frequency, root-mean-square frequency, frequency standard deviation, correlation factor, spectral moment, and total power spectrum were extracted manually in the experiment as the six characteristics of the frequency domain. All models were verified five times, and the initial learning rate and number of iterations were set to 0.001 and 50, respectively. A diagnostic performance comparison of the various models is shown in Table 9. For the two large CWRU and IMS datasets, the comparative convergence curves of the deep learning

Conclusions
We proposed a bearing fault diagnosis method based on multidomain heterogeneous information entropy fusion and model self-optimisation. A method of multiscene domain fusion strategy for spatiotemporal features based on heterogeneous sensors was designed. After mapping the information collected by multiple sensors into multiple high-dimensional domains, the multidomain information was found to be complementary and simultaneous by channel cascade and spatiotemporal feature decomposition, and the information collected by multiple sensors was determined by an adaptive entropy-weighted fusion method. e CEI-SSA improved by chaotic initialisation and elite reverse solution was used to carry out adaptive optimisation of the model components of the feature extractor model, and end-to-end bearing fault diagnosis was achieved without manual design of the model. e effectiveness and superiority of the proposed method were verified experimentally and by comparative analysis. Compared with five other optimisation algorithms, CEI-SSA had better search performance on the 12 benchmark test functions, and the feature extraction model after CEI-SSA adaptive optimisation improved the diagnostic accuracy. Compared with other deep and shallow learning algorithms on the CWRU and IMS bearing datasets, the identification accuracy of the proposed method for bearing fault categories exceeded 99%, and the influence of artificial parameter design on diagnostic performance was reduced. Advantages of the approach included learning ability, reduced cost burden, and generalisation performance. e method in this work forms part of supervised learning, and the evaluation data were visible-type samples consistent with the training set; therefore, the model was trained with labelled data. In the next step, unsupervised data processing will be considered in fault diagnosis research in a zero-small sample environment to diagnose and recognise complex samples that were not present in the training set.
Data Availability e datasets supporting the conclusions of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.