Compound Fault Diagnosis for Gearbox Based Using of Euclidean Matrix Sample Entropy and One-Dimensional Convolutional Neural Network

Vibration signals of gearbox under different loads are sensitive to the existence of the fault and composite fault vibration signals are complex. Traditional fault diagnosis methods mostly rely on signal processing methods. It is difficult for signal processing methods to separate effective information from those fault signals. *erefore, traditional fault diagnosis methods are difficult to accurately identify those faults. In this paper, a one-dimensional convolutional neural network (1-D CNN) intelligent diagnosis method with improved SoftMax function is proposed. Local mean decomposition (LMD) decomposes the signals into different physical fictions (PF). PFs are input into the matrix sample entropy based on Euclidean distance (MESE), and the PFs which best reflect fault characteristics are selected. Finally, the PFs by MESE are used to train the CNN to identify the faults of parallel-shaft gearbox. Experiment shows that MESE can quickly and accurately select the PFs with the most significant fault features. 1-D CNN can get nearly 100% recognition rate with less time and the CNN of SoftMax improved can effectively eliminate LMD endpoint effect. *is method can successfully identify single faults, combination faults, and faults under different loads of the gearbox. Compared with other methods, this method has the characteristics of high efficiency, accuracy, and strong anti-interference. *erefore, it can effectively solve the problem of complex fault signal decomposition of gearbox and can diagnose the gearbox fault under different load operation. It has great significance for gearbox fault diagnosis in actual production.


Introduction
Gearbox, as one of the most important parts of mechanical equipment, has a wide range of applications and a high utilization rate. erefore, it will bring unimaginable safety problems and high maintenance costs when the fault occurs. So, the fault diagnosis of gearbox gear is very important. However, the structure of the gearbox is complex. e gears should be able to bear a heavy load, under complex and changeable operating conditions. e large gears, small gears, and bearings in the gearbox are prone to failure parts.
Gearbox diagnosis can be made from many aspects, such as vibration signal, sound signal, current signal, and oilbased signal. ese signals can reflect the status of the gearbox. e analysis of vibration signal is the most commonly used and best effect. e traditional fault diagnosis methods are the application of signal processing experience and manual feature extraction [1,2]. e signal analysis method and feature extraction method are the main factors affecting the accuracy of fault diagnosis. Time-domain, frequency-domain, time-frequency domain analysis, and wavelet basis function [3], etc. are the most commonly used feature analysis methods in fault diagnosis [4,5]. Rui et al. [6] designed a dynamic model of gear pairs with variable meshing stiffness to study the fault vibration characteristics of spur gear pair with local spalling defect. is method has great limitations due to the fact that gear spalling is likely to cause other faults and the fault is obvious in actual production. Brethee et al. [7] exploited a comprehensive dynamic model to analyze the effect of surface wear on the dynamic response of gears. It is based on the amplitude of grid vibration and the side band components increase significantly with the increase of wear degree. Sanz et al. [8] proposed a multistage algorithm for gear dynamic state monitoring. e meshing dynamics of gears is monitored by means of the gear state information. But gear state information is changeable and complex and difficult to determine.
Due to the complexity of noise and gearbox system, the vibration signal collected from the gearbox through the sensor is usually mixed with a large number of noise interference signals. If the fault is complex, the signal mixed with noise will become more complex. So, some recursive mode decompositions are very suitable for gearbox fault diagnosis [9,10]. Wu et al. [11] proposed a fault diagnosis method of planetary gear box based on two-dimensional variational decomposition (2-D VMD) and full vector spectrum technology and verified its correctness and effectiveness. Li et al. [12] proposed a VMD method that is complementary to the DDTFA method. Comparing this method with the current VMD and the overall empirical mode decomposition, it can be seen that VMD-DDTFA is very effective in fault diagnosis of gear crack and broken tooth measurement under variable working conditions. Xiao et al. [13] proposed a gear fault diagnosis method based on kurtosis criterion, variant modal decomposition (VMD), and self-organizing map (SOM) neural network. is method uses the VMD algorithm to decompose the gear vibration signal, extracts the kurtosis value of the IMF to form a feature vector and input SOM for diagnosis, and obtains a more ideal effect. But the SOM network is an unsupervised topology neural network, the final output size is difficult to determine, so it is not universal; Most of these methods use vibration signals or other signals through traditional wavelet, EMD [14], VMD, and other decomposition methods to decompose them into appropriate signal components [15,16]. e type of fault is obtained by the parameters of the signal components. ese methods can detect faults well, but they rely heavily on mechanical expertise and are inefficient due to the manual operation of actual diagnosis. erefore, the development of traditional fault diagnosis in the direction of artificial intelligence is an inevitable trend. erefore, the development of traditional fault diagnosis towards artificial intelligence is an inevitable trend.
With the development of artificial intelligence, machine learning is increasingly used in the field of fault diagnosis. e CNN has great advantages in feature classification. CNN [17] is currently mostly used for two-dimensional image recognition. Chen et al. [18] used two stacked CNNs to build a novel deep image saliency computing framework. e proposed framework highlights the objects of interest from complex background while preserving details. Sergey et al. [19] developed a method based on a partially observed guided policy search method and CNN, which was used to learn policies that map raw image observations directly to torques on the robot's motors. CNN can be used to directly process large data or multidimensional data samples, which is beneficial for more detailed local feature extraction and retaining the relative relationship of multidimensional data, leading to the acquisition of improved identification results. Deng et al. [20] proposed an improved quantum-inspired differential evolution, which uses the MSIQDE with global optimization ability to optimize the parameters of the DBN and construct an optimal DBN model. Experimental results show that MSIQDE-DBN has higher classification accuracy.
is is an improvement and perfection of the deep belief network. It shows that deep neural network is an algorithm suitable for fault classification. Chen et al. [21] proposed a convolutional neural network (CNN) for gearbox fault identification and classification, which is suitable for fault diagnosis of industrial reciprocating machinery. It also shows good performance in gearbox fault diagnosis, but the selection of parameters and hyperparameters is very difficult. John et al. [22] proposed a deep convolutional neural network (DCNN) based on hierarchical correlation propagation (LRP) method for gearbox fault diagnosis. It converts the vibration signals of time series data into time spectrum images by wavelet transform and then classifies them by DCNN. Yao et al. [23] input the time-domain and frequency-domain signals as the original signals to the end-toend convolutional neural network (CNN) for gear fault diagnosis. is method is to analyze the sound signal and to identify the gear pitting fault problem. Li et al. [24] proposed a method that combines convolutional neural network (CNN) and gated recurrent unit (GRU) network and achieves an accuracy of more than 98% with fewer training samples. Li et al. [25] developed an improved deep neural network based on a domain-adaptive diagnosis model and used the original vibration signal for transfer learning. It used particle swarm optimization algorithm and L2 regularization algorithm to optimize the improved deep neural network. However, this method has higher requirements for data collection and classification. Chen et al. [26] proposed a gearbox fault diagnosis method based on feature learning of one-dimensional residual convolutional automatic encoder.
is unsupervised learning method applies 1-D convolution automatic encoder for feature extraction and deconvolution for filter signal reconstruction. It performs well in signal denoising and feature extraction. But many times of convolution and deconvolution make the learning efficiency of network lower. e deeper neural network also makes the parameters more complex and less adaptable.
rough the survey and analysis of these literatures, most intelligent diagnostics rely heavily on traditional signal processing methods [27,28], which makes intelligent diagnostics still limited by traditional methods. Many methods based on deep learning use two-dimensional CNN to analyze the image of the vibration signal.
is makes the operating time and storage cost increase with the increase of data. As the neural network becomes deeper, multiple feature extraction and optimization such as feature image and convolution pooling will inevitably affect the accuracy of fault diagnosis, and the influence of irrelevant signals such as noise on the accuracy of diagnosis will also gradually expand. erefore, in order to solve these problems, local mean decomposition (LMD) and one-dimensional CNN (1-D CNN) are combined [29]. LMD has a better adaptive system and lower error rate compared with EMD, and the disadvantage of decomposition of finite vibration signal, endpoint effect, is greatly reduced. Compared with 2-D CNN, 1-D CNN directly extracts the vibration data, which reduces the time and improves the accuracy. Compared with the traditional method, it is no longer limited by fault types and complexity, and different faults can be obtained by training different fault signals. e method of calculating the distance between the features of SE, which is more suitable for the correlation analysis of PF, is improved. After the analysis, the selected components will be more suitable for CNN training. e mode of data input is changed from 1-D to 2-D matrix, which is to improve the extraction and calculation speed of SE. e SoftMax activation function of CNN was improved and it will reduce the impact of LMD endpoint effect on CNN. LMD is used for adaptive local decomposition of fault signals. e CNN feature normalization ability and feature extraction ability enable this method identify single fault, faults under different loads, and composite faults. is method gets higher accuracy. e recognition accuracy of this method is nearly 98.8%. e research in this paper is organized as follows: Section 2 introduces the basic principle of LMD and the improvement method of endpoint effect. Section 3 introduces the improved distance algorithm of SE and the method of fast solved by matrix. In Section 4, the principles of 1-D CNN and improved SoftMax activation function are introduced. In Section 5, experiments and results of gearbox fault diagnosis are provided to verify the effectiveness of the method, and the results are compared with other methods. Finally, the conclusion and some works in the future are given in Section 6.

LMD Principle Analysis and Endpoint Effect Compensation Method
LMD is to separate pure frequency modulation signals and envelope signals from the original signal. e PF component is the product of the frequency modulation signal and the envelope signal. e signal is repeatedly separated until the isolated PF component becomes monotone or reaches a given threshold. Each PF signal has its own unique physical meaning [30]. e specific decomposition process of any one-dimensional signal x(t) is as follows: Find all the local extreme points of x(t) and call them a i . l i is the average value of a i and a i+1 , which are every two adjacent extreme points. e algorithm is shown in the following equation: Connect all l i and l i+1 with a straight line and then apply the lines smoothly with the moving average method. e local mean function is called l 11 (t).
en, use the local extreme point a i to get the envelope estimate b i . Connect all b i and b i+1 . e envelope estimation function, b 11 (t), is obtained by smoothing with the moving average method.
e LMD decomposition process is a process of continuously separating high-frequency signals, but the actual signal length is limited [31]. is makes it necessary for LMD to infer the value near the endpoint when processing the final signal, which makes the local mean function and the estimation function generate severe errors.
A section of the original signal and the curve of the local mean function are compared to analyze the influence of the endpoint effect and its attenuation method. Figure 1 shows the envelope and local mean function of two actual vibration signals. e way to find the envelope of LMD is to find the maximum and minimum points of the signal, and the signal outside the endpoint is unknown, so this little piece of the envelope is going to be unknown. If it is an excellent situation, the endpoint value is exactly the extreme point, the local mean function and envelope function obtained will be accurate, and the endpoint effect is almost eliminated. e cosine function is used to verify the validity of the method. Figure 2 shows the component signals after the decomposition of two vibration signals. Figure 3 shows the first PF component and the original vibration signal. After analyzing the results of the decomposition, it can be seen that the LMD component of the vibration signal whose end point passes through the origin is almost unaffected by the end point effect and can coincide with the original signal, while the other vibration signal has a large difference value after decomposition. e vibration amplitudes between the PF component and the residual component vary by 4 orders of magnitude, so the residual component can be ignored. It is proved that the end point of the original signal is the extreme point. e end effect can be mostly eliminated. So, reducing or moving the original signal to a qualified endpoint is a kind of effective means. e reduced dimension time is defined as t 0 and the original vibration signal function is x(t), and substitute t 0 from the LMD decomposition formula. Since t 0 is very small, it hardly affects the values of b and l. It can be assumed that b and l do not change as a result of the signal moving. Separating the local mean function l 11 (t) from the shifted delta function x(t − t 0 ), m 0 11 (t) which has been gained can be expressed as m 0 e vibration signal is similar to the local periodic function. According to the basic principle of periodic function, when the function moves, the conclusion is where τ is the deviation of the function from the coordinate axis, which is a fixed value of a function. m 0 11 (t) divided by the envelope estimation function b 11 (t) demodulates to get s 0 11 (t): Continue to iterate according to the LMD rule to the end. e process is as follows:

Shock and Vibration
Shock and Vibration e termination condition for the iteration is expressed as Multiply b 1 (t) and the frequency modulation signal s 0 1n (t) to get the first PF component which is reduced to the end effect: erefore, Repeat the above algorithm: where x(t 0 ) and PF (t) are very different in order of magnitude, and x(t 0 ) can be processed by taking an extreme value, and (τ/x(t 0 )) is the ratio of the vibration offset to the endpoint offset. e endpoint effect of LMD can be reduced by a small shift value.

Matrix Sample Entropy Based on Euclidean Distance
SE measures the complexity of the sequence by measuring the probability of generating a new pattern in the signal. e greater the probability of the new pattern, the greater the complexity of the sequence will be. e PF decomposed by LMD is actually a high-frequency signal with physical significance. Since the sample entropy has higher self-adaptability and consistency, sample entropy is easier to make sure whether the PF component's regularity is suitable for CNN [32]. e basic principle and improvement of sample entropy are as follows: ere is a time series, x(n) composed by sequence number, X m (i), in this sequence is }. ese vectors represent m consecutive x values starting from the i-th point. e PF components of length m are arranged and calculated according to the following matrix. Each X is a vector sequence of PF components: where d[X m (i), X m (j)] is defined as the distance between the vectorX m (i) and X m (j) is the absolute value of the maximum difference in the corresponding elements. e feature space x is an n-dimensional vector space j ) T , and the distance of x i and x j is defined as e Euclidean distance when p is equal to 2: If p is equal to infinity, each distance is the maximum, which is the original distance of SE: Because the fault data are all displacement, velocity, and acceleration signals in the same coordinates, this method converts the one-dimensional distance characteristic into a high-dimensional distance. e algorithm makes the features of various directions equally important and reduces the difference value of each feature point of the same component. e method can normalize the feature of each direction and better identify the PF component which can best represent the gearbox fault: Calculate the matrix distance of x and x i for each i: is the X m (i) whose distance between X m (i) and X m (j) should be less than or equal to r. e number is called

Shock and Vibration
Increase the dimension of the matrix from m to + 1 and then calculate the value of the matrix which is less than the threshold r. It is the distance, which is between X m+1 (i) and X m+1 (j) is less than or equal to r.
e definition of B (m) (r) is expressed as where B (m) (r) is the probability of two sequences matching m points under similar tolerance r, and B m+1 (r) is the probability that two sequences match m + 1. So, the sample entropy is defined as the following formula: e value of sample entropy is related to the value of m and r. For mechanical vibration signals, Euclidean distance   Figure 4 shows the working mode of a convolutional layer of 1-D CNN. Each row represents one element, and the convolution kernel convolves with the same width as the convolutional layer with a step length of 1, and the convolution is carried out 8 times. e below figure of Figure 4 shows the convolution method of a two-dimensional CNN. e convolution kernel is a 2 * 2 freely defined convolution kernel, and the convolution kernel is a three-layer convolution kernel with the characteristics of RGB, and the step size is 1. However, the convolution is carried out reciprocating convolution along a certain direction until the convolution kernel goes through the whole image.
In order to reduce the risk of overfitting, the convolutional layer usually has fewer parameters than the fully connected layer. Figure 5 is a 1-D CNN model training simulation graph. e original data are obtained by three  convolution layers, three pooling layers, and three full connection layers to obtain specific eigenvalues.

Weight Compensation SoftMax
Algorithm. If the original data are processed directly by vibration migration, it not only needs to process each group of data but also the offset point of each group of data cannot be determined. In order to make the method more general, SoftMax is improved. Adding an adaptive offset correction to the function theoretically cannot completely eliminate the impact of the end effect, but it will no longer affect the recognition of the neural network. SoftMax is a numerical processing of output unit that deals with multiple classification problems. e basic definition of the SoftMax function is as follows: Because of the existence of exponential operation, the value after operation is often overlapped, so the element is degraded and extracted by taking a value and V is the output value: For the linear classifier output, the input x is multiplied by the weight matrix, and the linear mixing function corresponding to the correct category is set as S y i which is output for corresponding SoftMax. So, to log on S y i operation does not affect the characteristics of the function. After a series of manipulations, the degradation results have become a simple SoftMax loss function. e function is expressed as where S y i is usually the output of the fully connected layer in CNN. Put the weight of the LMD endpoint compensation, (1/W T j )x i + b j , into equation. In the formula, b j is (τ/W T j ) . So, the equation can be expressed as In the iterative process, the centralization weight is w j � (1/τ). It changes the distance calculation method by metric learning which converts (1/W T y i )x i + b j to tan(θ y i , i) + b y i . It is used to reduce class-inner-distance and   Shock and Vibration increase class-out-distance. It is achievable that the centering weight compensates for the end point offset in the distance. e range that can be preset is [1,5] when a model network is trained. Debug the parameters until CNN is optimal.

Data Collection.
e experimental data collection is carried out using the experimental rig as shown in Figure 6. Figure 7 shows the sensor distribution and gearbox internal structure. e synthetic experimental system is composed of a first-stage planetary gearbox and a second-stage parallelshaft spur gearbox. In order to accurately diagnose the gear faults of parallel-shaft gearboxes, planetary gearboxes are equipped with intact gears without sampling. Experiment is performed with large gear and small gear in the parallel-shaft gearbox which is also an input shaft. ere is servo motor, magnetic powder brake and controller, 4 magnetic accelerometers, and displacement sensors in the driving equipment.
e sensors are placed in the axial and radial directions of the gear housing on the input and output sides to pick up the radial and axial vibration signals of the bearing housing. e large and small gears of the first input shaft are replaced. e large gear used in the experiment is S45 C gear with a modulus of 2 teeth and 75 teeth, and the pinion has a tooth number of 55, which is lubricated by immersion type.     Main gear teeth breakage about 50% 3 Main gear with pitting on tooth depth 0.05 mm, width 0.5 mm, and length 0.05 mm 4 Pinion with chafing on tooth about 50% 5 Main gear teeth breakage about 50% and pinion with chafing on tooth about 50% 6 Main gear with pitting on tooth depth 0.05 mm, width 0.5 mm, and large 0.05 mm and pinion with chafing on tooth about 50%

Shock and Vibration 13
ere are six different kinds of failure data, including gear broken teeth, gear pitting, pinion wear, gear broken teeth and pinion wear, big gear pitting and small gear wear, and normal conditions. e sampling frequency is 5120 Hz. Compound fault signals are collected by the system. Because the damage of one gear is likely to cause the damage of other parts, considering this situation, complex multigear faults are added into fault classification. Each dataset includes 6656 vibration signals collected by 9 sensors, 2 speeds, and 3 loads under each working condition, which is collected 8 times in each state interval. ere are 2,160 sets of data, in which 1296 groups were used as training samples and other 864 groups were used as test samples. e gears' fault patterns are given in Figure 8. Table 1 is the specific experimental sampling situation. Table 2 shows 6 specific fault descriptions of the large and small gears of the input shaft of the gearbox and the load and speed during sampling. ere are 21,600 groups of data which were collected by 8 sensors. 60% of the data were used for CNN training and 40% of the samples were used for testing.

Shock and Vibration
Comparing these time-domain signals, the vibration interval is changing constantly. e waveforms of the last two compound faults are very similar, both of which have small amplitude and large random vibrations. eir average amplitude is also very similar, but the entropy of the large gear pitting and the pinion wear is larger than that of the large gear broken tooth and pinion wear. So, it is impossible to distinguish different compound faults only by time-domain and frequency-domain decomposition.
Time-domain signal decomposition is performed for different loads and speeds of the same fault. Four waveforms in Figure 10 are the time-domain waveforms of large gear pitting pinion wear under load of 0.05 A, 0.1 A, and 0.2 A at 880 rpm. By comparing the time-domain waveforms under different loads, it can be noticed that the increase of the load under compound fault will result in the increase of amplitude jumping frequency and larger amplitude vibration. However, the average amplitude tends to decrease. It can be indicated that the load has an amplifying effect on the fault amplitude of the gear, but it will generate more classes for CNN training.

Comparison of LMD and EMD.
ere are many similarities between LMD and EMD as signal decomposition. Both of them have end-effects [34,35]. In order to compare the advantages and disadvantages of the two, the fault signal is decomposed by EMD and LMD in turn. Figures 11-13 show the waveform and envelope diagrams generated after decomposition.
Both methods use small frequency ratio modal aliasing for amplitude correction. e vibration signals are decomposed at 1447 rpm and the same speed without load. LMD decomposes signals into PF0, PF1, PF2, PF3, and the remaining components. High-frequency signals can better represent the characteristics of the signal. After  decomposition, it is easy to see that PF0 and PF1 have a large difference in value; PF2, PF3, and the remaining components also show different change disciplines. EMD separates the trends of different feature scales in the signal layer by layer and it generates a series of signal components IMF1 to IMF6 with different feature scales. Comparing the IMF of the two compound faults, the endpoint effect of EMD is very serious [36]. Due to the end effect, the decomposition of the same signal can completely be useless. Because false end components will gradually interfere with the entire signal  sequence from the end point, EMD is suitable to deal with faulty signals. e fault diagnosis with EMD has great disadvantages.

Selecting the Most Useful PFs by MESE.
e original signal standard deviation is set to 0.7. e entropy template vector length is set to 5 × 5, and m � 2 for sample entropy extraction. Figure 14 shows results of extraction for each state.
e sample entropy curve of PF0, which is supposed to fall steadily, shows jump and reverse growth. Compared with PF2 and PF3, the signals are very similar without obvious pulsation. erefore, sample entropy with standard deviation of 0.3 is collected for further comparison and the original sample entropy of the same data is collected for comparison. Table 3 lists the SE and MESE values of PF2 and PF3 with standard deviations of 0.3 and 0.7.
LMD is a method to separate signals from high frequency to low frequency successively, so PF2 should have higher sample entropy than PF3. Entropy of PF3 is higher than that of PF2, but it is lower than that of a second-order sample. is is caused by the endpoint effect of LMD in the vicinity of PF2 so that the entire PF3 data may become false. After analysis, this set of data PF2 is more suitable for CNN training than PF3. After comparison, it can be seen that the PF value obtained by MESE has a greater difference than the value obtained by the original one. In other words, MESE can better increase the difference between each entropy value than the original SE. e larger component of    In the training process, the Adam algorithm is used to optimize the model, and the generator which can greatly save the time to read the data is used to read the data. e model is built using Keras in a Python environment. For the first time, a filter with a height of 10 is defined. In order to enable the CNN to learn a single feature, 100 filters are defined in the first layer. e output of the first layer is 71 * 100 matrices, and each column of the output matrix contains a weight of the filter. Maximum pooling layer of size 4 is selected in the pooling layer, which makes the pooled matrix only 1/4 of the original matrix [37], and it reduces the complexity of the next one and the probability of data overfitting. Continuous convoluting and pooling make the state and fault characteristics of multiple sensors of vibration data appear as much as possible.
An average pooling layer is added to further reduce the probability of overfitting during training after multiple convolution and pooling. e average pooling layer is the average value of the two weights, and the output size is 512 * 1. Each feature detector has only one weight through this layer.
Due to the diversity of sensor sampling and the change of load, the ReLu function is used for activation in the first three, ranging from 0.1-0.6 loss rate for training and the last activation function is changed to sigmoid. It improves the generalization ability of the network. During training, dropout can be used to reduce the loss rate. e dropout layer can randomly assign neurons to 0. Because of too many convolutions and multiple sensors in different ways, the weight is assigned 0.8, which means 80% of the neurons will have zero weight. e network will not respond sensitively to small changes in data. is layer does not change the size of the output matrix. A value of 1.0 was assigned to the τ of improved SoftMax function and it forms a classification layer. It is determined that τ � 2.6 is the most appropriate after a series of training. In order to increase or decrease the number of test periods and improve the accuracy, the strategy of extracting stop method is adopted to stop in time when verifying errors of training. Table 4 shows the specific situation of CNN training. 5 kinds of data are input into one-dimensional convolutional neural network for training and testing, and parameters are adjusted repeatedly to adjust dropout to 1.0 and learning rate to 0.05, and SoftMax is selected as the activation function. e arithmetic mean value of each category index is used as the evaluation standard. e equation is expressed as where P i is the precision corresponding to the i-th category and R i is the recall corresponding to the i-th category. e flow chart of final program is expressed as Figure 15.

Comparison of Fault Recognition with LMD-MESE-CNN and Other
Methods. e raw data are input into CNN for fault diagnosis; the result is shown in Figure 16(a). e accuracy is only 0.71. A preliminary conclusion can be drawn: the model that directly trains the CNN has a low recognition rate and the net cannot be used for fault diagnosis. All PF components are directly input into CNN for fault diagnosis; the result is shown in Figure 16(b). After LMD decomposition, the faulty data are obviously distributed in each PF component. If all PF components are input into CNN for training, the accuracy rate of CNN drops to only 10%. erefore, neither data input method is suitable for fault diagnosis. e analysis curve and the obfuscation matrix are shown in the left of Figures 17-19. e corresponding obfuscation matrices [38,39], whose schematic is in Figure 20, are shown in the right of those figures. e accuracy of EMD-CNN training models keeps increasing, but there are some random declines and the accuracy cannot be achieved appropriately. When the selected PF signals are input into CNN for training, the net obtains high accuracy and low loss value. However, feature curve does not increase linearly as the number of training data increases. It can also be seen from the confusion matrix that most conditions can be separated. But a small number of confusion features does not. Because some false data are generated by LMD endpoint effect, errors are produced in the training model. e location of these errors is difficult to estimate, so it is essential to find a way to reduce the loss caused by the endpoint effect [40]. e last one shows that those PFs, selected by MESE, are input into CNN with weight-SoftMax and it is found that the curve jump is disappeared. e results show that one-dimensional CNN with selected PFs can effectively reduce the impact of endpoint effect. is net can accurately identify combined failure. Table 5 shows all the accuracy and loss of these methods.

Comparing the Proposed Method with Other Methods.
Compared with the traditional 1-D CNN, the signal analysis method based on LMD and MESE proposed in this paper can not only reduce the network parameters and speed up the training speed but also obtain good results for the diagnosis of mixed faults. e network structure proposed by this method is compared with other methods proposed in accuracy and running time. As shown in Table 6, the test accuracy of the DCNN method based on LRP is 99.90%, which is very close to the proposed method. It can be seen from Table 6 that the deep learning method has a higher accuracy for the compound fault diagnosis of the gearbox. e method proposed in this paper is slightly better than other deep learning methods on the testing set. e number of training parameters and training time of the four best methods are shown in Table 6. It can be seen from Figure 21 that the training parameters of 1-D CNN fault diagnosis method based on LMD and MESE proposed in this paper are about 10% less than that of the traditional 1-D CNN. e training parameters of this method are much lower than those of LRP-DCNN and GRU-CNN method. Although LPR-CNN has a high accuracy rate, the training parameters and time are much higher than the proposed method. e training time of the traditional CNN method is shorter, but the accuracy rate is much lower than the proposed method. e method proposed in this paper can reduce the number of training parameters and reduce the training time based on excellent diagnostic performance.
In the future, transfer learning to reduce the time of data tabulation and classification will be applied to achieve better universality and higher efficiency.      Figure 19: e optimization of CNN curve of accuracy rate and loss rate trained by PF of LMD and confusion matrix.
Shock and Vibration 23

Conclusions
For vibration signals extracted from a working gearbox contain complex noise, nonlinearity, and nonstationarity, a one-dimensional convolutional neural network algorithm (1-D CNN) with weighted activation function based on the advantage of CNN depth to extract signal features is designed. en, an improved matrix sample entropy based on Euclidean distance (MESE) is proposed to improve the selection accuracy of components, speed up the classification rate, and avoid the identification error caused by too large difference of PF components. e LMD signal decomposition method is used to decompose the fault signals. e end effect of LMD is solved, and the appropriate PF component is obtained. e gearbox fault data with 24 different loads were used to verify the effectiveness of the proposed method. From the experiment results, the recognition speed of PF component by MESE is several times faster than SE, and PF     is study provides a good choice for the gearbox fault diagnosis with nonideal conditions. e fault diagnosis based on vibration signals is not only for single faults but also for more complex and uncertain gearbox faults.
Due to the deepening of the number of CNN layers, more data are needed for multiple types of fault recognition, which makes the method take more time to train new models for new faults. It is important to make the trained one-dimensional CNN model have better versatility and mobility in the future work.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request. Data were collected from the parallel-shaft gearbox in the Fault Diagnosis Laboratory of Inner Mongolia University of Science and Technology. e data are precious, so the tutor hopes that the data and code will not be uploaded to the database. If it is necessary, the readers can contact the corresponding author via email: 2397595377@qq.com.

Conflicts of Interest
e authors declare that they have no conflicts of interest.