Mass Laplacian Discriminant Analysis and Its Application in Gear Fault Diagnosis

Fault diagnosis is essentially the identification of multiple fault modes. How to extract sensitive features and improve diagnostic accuracy is the key to fault diagnosis. In this paper, a new manifold learning method (Mass Laplacian Discriminant Analysis, MLDA) is proposed. Firstly, it is assumed that each data point in the space is a point with mass, and the mass is defined as the number of data points in a certain area. ,en, the idea of universal gravitation is introduced to calculate the virtual universal gravitation between data points. Based on the Laplace eigenmaps algorithm, the gravitational Laplacian matrix between the same kind of data and the heterogeneous data is obtained, and the discriminant function is constructed by the ratio of the virtual gravitation between the heterogeneous data and the virtual gravitation between the similar data; the projection function with the largest discriminant function value is the optimal feature mapping function. Finally, based on the mapping function, the eigenvalues of the training data and the test data are calculated, and the softmax algorithm is used to classify the test data. Experiments on gear fault diagnosis show that this method has higher diagnostic accuracy than other manifold learning algorithms.


Introduction
e integration of mechanical equipment such as wind turbines, aeroengines, and rail trains is getting higher and higher, the service time is getting longer and longer, and the system scale is getting bigger and bigger. When the state is monitored and diagnosed, the signals detected by the sensors have the characteristics of large amount of data, high nonlinearity, and strong noise interference. How to excavate the essential characteristics and internal laws of signals from massive and complex raw data has become the key link of mechanical fault diagnosis [1,2]. For the state monitoring and fault diagnosis of mechanical equipment and its key components, domestic and foreign scholars have proposed many practical and novel algorithms, and good diagnostic results have been obtained [3][4][5]. As an effective tool for dimension reduction, data mining, and machine learning, manifold learning seeks to cover the complex data processing mechanism behind the identification behavior, which can achieve the dimension reduction and feature extraction of massive fault data. erefore, it can be used as a technical means of mechanical fault diagnosis, and the fault category is better identified on the basis of maintaining the essential characteristics of fault data [6,7]. e feature extraction and fault diagnosis of mechanical equipment based on manifold learning have been a research hot spot in recent years. Many scholars have proposed a variety of new learning methods in combination with the actual mechanical fault diagnosis problems, such as multiscale manifold learning [8,9], sparse manifold learning [10,11], timefrequency manifold learning [12][13][14], fast manifold learning [15], mixed-domain manifold learning [16], and other improved manifold learning methods [17][18][19][20]. ese studies based on manifold learning have achieved certain effects, but at present it has not been found that the fault features contained in complex data are extracted to improve the accuracy of fault diagnosis from the perspective of data mass.
In this paper, a new manifold learning method is proposed: Mass Laplacian Discriminant Analysis (MLDA) algorithm. Firstly, the data mass and virtual gravitation study are carried out. en, based on the Laplace map algorithm, the virtual gravity Laplacian matrix between the same data and the heterogeneous data is calculated, the discriminant function is constructed and the optimal feature mapping function is obtained, and the sensitive feature of the data is extracted. Finally, the characteristic value of the test data is obtained based on the mapping function, and the type of fault is discriminated based on the softmax algorithm. Experiments on gear fault diagnosis show that this method has higher diagnostic accuracy than other manifold learning algorithms.

Mass of Data and the Construction of Virtual Gravity
Due to the complicated operating state of the mechanical equipment, the harsh working environment, and long running time, its state signal is high-dimensional, nonlinear, and nonstationary. e data analysis and state decision are extended from Euclidean space to high-dimensional manifold, and the essential characteristics of the data are found through the manifold learning. It has become the main means of signal feature extraction and pattern recognition. e manifold is a high-dimensional space. If it is regarded as a vast space of the universe; then the data distributed on the manifold is the stars in the sky. Based on this idea, assuming that the data on the manifold is a mass point, there is physical gravitation between the pairs of data points.
For u-dimensional data point x i (i � 1, 2, . . . , n) ∈ R n×u distributed on the manifold, the mass m i of the point x i is defined as the number q i of points in the neighborhood of x i . Assuming the neighborhood distance d i , the mass m i is Define the centroid C i of x i as the sum of the product of m i,p (p � 1, 2, . . . , q) of each point x i,p (p � 1, 2, . . . , q) in the neighborhood distance d i of that point and its coordinate vector, that is, It can be seen that the mass of x i is related to the uniformity of the distribution of the mass point on the manifold and the selection of the neighborhood distance. In terms of material theory, the greater the density of an object, the greater the relative mass. Similarly, the greater the density of the neighborhood of the data, the greater the mass of the data. Conversely, when the mass m i of the data point increases with the increase of the neighborhood distance d i , if it changes linearly in the forward direction, it can be considered that the data distribution is uniform within the range of the neighborhood distance d i ; if the mass m i of the data point decreases with the increase of the neighborhood distance d i , the data point is considered to be at the boundary of the data concentration area. If the mass m i of the data point is 0 at a certain neighborhood distance, then the data point can be considered as outlier, and it can be discarded during analysis to reduce the impact of outliers on data operations and analysis and improve the robustness of the analysis algorithm.
When data is given mass and becomes a particle distributed in a high-dimensional space, the relationship between pairs of data points can be expressed by the gravitational attraction in Newtonian mechanics. Because this gravitation is based on the assumption that the data has mass, it is defined here as virtual gravitational F i,j : where G is the virtual gravitational coefficient and r i,j is the Euclidean distance between x i and x j . Substituting (1) into (3), you can get In the data analysis, in order to make the calculated data mass comparable, the same neighborhood distance is generally used for the same spatial data; that is to say, d � d i � d j , and (4) becomes It can be seen from (5) that the virtual gravitational force F i,j is inversely proportional to the distance r i,j between the centroid point pairs and the neighborhood distance d for determining the data mass and within the range d from the neighborhood of the mass point x i . e amount of data q i is proportional to the amount of data q j in the range of the distance d of the particle x j . When the neighborhood distance d is selected, q i and q j are also determined; then, the virtual gravitational force F i,j is inversely proportional to the distance r i,j between the point pairs.

Mass Laplacian Discriminant Analysis (MLDA)
Laplace eigenmap algorithm is a classical nonlinear manifold learning algorithm. It assumes that the data points are distributed on the compact Riemannian manifold. e main idea is to describe a manifold with an undirected weight graph and construct the corresponding space embedded objective function through the Beltrami operator to obtain its low-dimensional embedding representation [21]. In short, the graph is mapped from a high-dimensional space to a low-dimensional space while maintaining the local adjacency of the graph. Suppose the Laplacian Beltrami operator is as follows: 2 Shock and Vibration en, the minimization problem can be transformed into an eigenfunction problem that is combined with the Lagrange auxiliary function. According to Graph theory, if the data is uniformly sampled from the low-dimensional manifold in the high-dimensional space, the Laplacian Beltrami operator on the manifold is defined as the negative divergence function of the gradient vector on the manifold-cut space, and the discrete approximation [22] can be performed by the previous several feature vectors of the Laplacian matrix on the graph.
Laplace eigenmap algorithm is mainly to keep the adjacent data points in the high-dimensional space still adjacent when embedded in the low-dimensional space.
e core is to construct the neighboring weight matrix W i,j , that is, the neighboring weight matrix W i,j to describe the relationship between pairs of data points. Here, if the virtual gravitational force F i,j is normalized, if the two particles are close to each other, or if the particle x i and the particle x j are adjacent, then r i,j gets smaller and F i,j gets bigger until it equals 1. If the two particles are further apart or if the particle x i and the particle x j are not adjacent, then r i,j gets bigger and F i,j gets smaller until it equals 0.
at is, the virtual gravitational force F i,j is inversely proportional to the distance r i,j between point pairs, which is consistent with the value of W i,j using thermonuclear method. erefore, the virtual gravity matrix F between point pairs can be used to replace the adjacency weight matrix W.
However, since the center of mass is also related to the selection of neighborhood distance d, that is to say, the virtual gravity moment F can also adjust the neighborhood distance d to realize the adjustment of the adjacency weight matrix, the virtual gravity matrix can describe the mutual relationship between data point pairs more accurately. For the normalized virtual gravitation matrix, its degree matrix D is en, we can obtain the Laplace matrix L of mass transformation, namely, L � D − F: In mechanical fault diagnosis, each data set contains information on multiple fault categories, and multimanifold theory thinks that each category corresponds to one manifold, there are many manifolds in the data set.
e purpose of multi-manifold learning is to search for a suitable mapping function during the process of mapping high-dimensional data to a low-dimensional space, so that the low-dimensional data set can not only faithfully retain the internal structure of the data but also effectively distinguish the types of data. Using Laplacian Eigenmap theoretical framework, suppose that L b and L w ares used to represent the manifold structure of heterogeneous and homogeneous data and are defined as a qualitative Laplacian matrix of heterogeneous and homogeneous data, which can be expressed by where C b and L b are the center of mass and Laplace matrix of the heterogeneous data and C w and L w are the center of mass and Laplace matrix of the homogeneous data. en, in order to achieve the classification of data in the process of dimensionality reduction, it is necessary to obtain the low-dimensional embedded data of sample data after projection.
e ratio of the mass Laplace structure L b of heterogeneous data and the mass Laplace structure L w of similar data is the minimum. e discriminant function E(C) is expressed in So the purpose of mass Laplacian discriminant algorithm is to find the optimal projection vector and minimize the discriminant function E(C). e problem of solving the optimal projection vector is equivalent to the problem of solving the eigenvector corresponding to the minimum eigenvalue of the generalized eigenequation in Assuming α 0 , α 1 , . . . , α v−1 are the eigenvectors corresponding to the first v minimum eigenvalues λ 0 ≤ λ 1 ≤ · · · ≤ λ v−1 of (11), then the dimensionality reduction mapping obtained by MLDA algorithm can be expressed as e mapping matrix A � [α 0 , α 1 , . . . , α v−1 ] ∈ R u×v is the linear transformation matrix of the MLDA algorithm to achieve the optimal classification while maintaining the local structure of the data.

Softmax Classifier
Softmax regression is the promotion of logistic regression model in multiclassification problem, which is mainly used to solve the problem of multiclassification. After mass Laplacian discriminant for sampling data, data sets (y (1) , z (1) ), (y (2) , z (2) ), . . . , (y (n) , z (n) ) are obtained, where y is the fault eigenvector and z is the category label z ∈ 1, 2, . . . , k { }, where k is the number of categories.
For a given test input y (i) ∈ R n×v , softmax function estimates the probability p(z � j | y) for each category, that is, estimating the probability of each category of y. Suppose the probability function h θ (y (i) ) is as follows:

Shock and Vibration
In (13), h θ (y (i) ) is the vector, any element p(z (i) � k | y (i) ; θ) in the vector is the probability that sample y (i) belongs to category k, and the sum of each element in the vector is equal to 1. θ 1 , θ 2 , . . . , θ k are the classifier parameters corresponding to each category, en, the cost function of softmax regression algorithm can be defined as where 1 is an indicative function, 1{true} � 1, and 1{false} � 0. e value of θ can be obtained by minimizing the cost function J(θ) of the softmax regression, thereby predicting the category of a new sample. Generally gradient descent algorithm is used to solve the minimization of J(θ). e gradient formula of J(θ) is as follows: where the value of ∇ θ j J(θ) is a vector, which means J(θ) is a partial derivative of θ j for the j category.

MLDA Fault Diagnosis Model
Based on the above analysis, aiming at the problems in mechanical equipment fault diagnosis, the mass Laplacian discriminant algorithm was used to extract the fault features. en softmax classifier was built to identify and classify the faults. us, the mass Laplacian fault diagnosis algorithm was established. And the algorithm model is shown in Figure 1. e specific steps are as follows.
Step 1. After extracting the monitoring data of the running state of mechanical equipment and preprocessing, the training sample x i (i � 1, 2, . . . , n) ∈ R n×u and the category label vector z ∈ 1, 2, . . . , k { } and k are obtained. Here, n is the number of data points, u is the dimension of data, and k is the number of categories of running state.
e neighborhood distance d i of x i is selected. Suppose d ij is the average of the neighborhood distance of the sample point; then, where ε is the distance adjustment coefficient. When the data is evenly distributed, ε � 1, this is d i � d ij ; if the data distribution is uneven, then ε takes value around 1, and the data quality error caused by this unevenness can be adjusted.
Step 3. According to (1), the mass m i of each point x i is calculated, and the Euclidean distance r i,j between the sample point pairs is calculated. Equation (3) is used to calculate the virtual universal gravitation between x i and x i at each point. In this case, the gravitational coefficient G � 1.
Step 4. According to the category label of the training sample, classify heterogeneous data sets and homogeneous data sets, according to (2) and (9). e centroid C b and C w of each point in the heterogeneous data set and homogeneous data set and mass Laplace matrices L b and L w were calculated, respectively.

Data collection and preprocessing
Training samples Step 5. According to (12), the discriminant function E(C) is constructed and transformed into the minimization problem of the characteristic equation ∈ R u×v is constructed by taking the eigenvectors corresponding to the first v minimum eigenvalues Step 6. Low-dimensional embedded y i of training sample x i was calculated according to (12), and softmax classifier H(y) was constructed by combining category label vector z ∈ 1, 2, . . . , k { }. Calculate probability function h θ (y (i) ) and softmax regression cost function J(θ) according to formulas (15) and (16), and obtain classification accuracy R(y) by minimizing the predicted sample category of J(θ).
Step 7. Choose the mapping matrix A with the highest recognition rate R h as the optimal mapping matrix Step 8.
e test data s i (i � 1, 2, . . . , l) ∈ R l×u of the running state was obtained by the preprocessed mechanical equipment. Its low-dimensional embedded y s i ∈ R l×v was obtained through y s i � B T s i . Softmax classifier was used to classify y s i , and the failure type of test data was obtained to realize the classification and recognition of mechanical fault.

Simulation and Analysis
According to the vibration theory of the gear, the normal gear mainly represents the shaft speed frequency and frequency multiplication and meshing frequency and its frequency multiplication. Wear characteristic frequency mainly on the tooth surface shows the meshing frequency and its harmonic components, and the fractional harmonic occurs when the wear is intensified. e characteristic frequency of the broken teeth shows a sideband near the meshing frequency and its higher harmonics, and the sideband is the rotational frequency and its harmonic frequency. Based on these, the vibration simulation signals x 1 , x 2 , x 3 of normal gear, worn gear, and broken gear can be constructed as shown in equations (17)- (19). Here, shaft speed is f 0 � 10 Hz, the number of gear teeth is z � 23, the meshing frequency is f 1 � 460 Hz, the sampling frequency is F S � 10 kHz, a, b, c is constant coefficient, and s is interference and noise: x 1 �a 11 cos 2πf 0 t + a 12 cos 4πf 0 t + b 11 cos 2πf 1 t + b 12 cos 4πf 0 t + · · · + b 14 cos 8πf 0 t + c 1 s, x 2 � a 21 cos 2πf 0 t + a 22 cos 4πf 0 t + b 21 cos 2πf 1 t + b 22 cos 4πf 0 t + · · · + b 24 cos 8πf 0 t + b 25 cos πf 1 t x 3 � a 31 cos 2πf 0 t + a 32 cos 4πf 0 t e time domain and frequency domain diagrams of three kinds of fault gear simulation signals are constructed as shown in Figures 2-4.
During signal simulation, the different fault statuses are simulated by changing the magnitude of the amplitude without changing the characteristic frequency. Each group is 0.5 seconds (120 data points) and there are a total of 240 groups data (80 groups data for each fault). In order to study the impact of noise on the algorithm, adding different degrees of noise to each fault (none noise, signal-tonoise ratio (SNR) is 40, 20, 0, −20, −30, and −40), 180 groups (60 groups for each fault) were taken as training data, and the remaining 60 groups (20 groups for each fault) were used as test data to analyze the MLDA algorithm. When the distance coefficient ε is 0.7 to 1.32, the training data can correctly identify the three fault states, and the correct recognition rate can reach 100%. At this time, the mapping matrix is used to identify the fault state of the test data. Figure 5 and Table 1 show the recognition results of fault data under different noise levels and distance coefficients. It can be seen that as SNR decreases, the correct recognition rate also decreases, but from noise-free to signal-to-noise ratio (40, 20, 0, −20), the correct rate is stable more than 90%. When the signal-to-noise ratio is −30, the correct recognition rate is between 83.33% and 91.67%. When the SNR is −40, the recognition rate suddenly decreases to 50%. In the actual engineering condition, the SNR of the vibration signal collected by sensor is generally not less than 10 dB.
at is to say, as long as the appropriate distance coefficient is selected and the optimal mapping matrix is obtained through the training data, the sensitivity of the MLDA algorithm to noise is very low, and the robustness of the algorithm is very good. Figures 6 and 7 show two extreme noise conditions (no noise and SNR � −40). e three-dimensional manifold of the three fault states obtained through training data shows that three types of faults can be well identified. Figures 8-13 show the classification results of the test data obtained from the training data with six different noise levels (no noise, signal-to-noise ratio of 40, 20, −20, −30, and −40). When the distance coefficient is 0.96 and SNR is above −20, the fault state can be better recognized and the recognition rate is above 91.67%. When SNR is −30, the fault can be identified, but the recognition rate is reduced to 83.33%. When SNR is −40, the recognition rate is lower than 50%, and the fault state is completely unrecognizable.
Shock and Vibration 5

Experiment Platform and Conditions.
e gear fault diagnosis experiment was carried out in the fault simulation bench of Spectra Quest, as shown in Figure 14. Figure 15 shows internal structure of the gearbox. e experiment bench is powered by the motor, and power is transmitted to the input shaft of the gearbox through the coupling, the drive shaft, and the pulley. e gear on the input shaft drives the intermediate shaft to mesh the gear, and the other gear on the intermediate shaft transmits the power through the output shaft, at the same time connecting the brake at the end of the gearbox output shaft. e experiment uses the vibration acceleration sensor of B&K Company to collect the signal of the multipoint on the gearbox. Figure 16 shows sensor distribution: six sensors are arranged in the horizontal direction and two sensors are arranged in vertical directions. e sensor vibration signal was the most obvious in vertical direction at the input shaft end, so these three vibration signals are taken as the analysis signals. In the experiment, the current of the electromagnetic brake is set to 55 mA, the braking torque is 1790 N·m, the sampling frequency of the acquisition system is 16 kHz, and the transmission ratio of the pulley is 3.56.
In the experiment, four degrees of pitting fault (upper part of Figure 17) and five degrees of broken fault gears (lower part of Figure 17) were made on the pair of meshing gears. e sensor signal was the most obvious to vertical direction in the input shaft, so the three vibration acceleration signals here are taken as the analysis signals. e experiment simulates four kinds of working conditions under 30 Hz motor speed (normal gear, pinion pitting fault, large gear, and broken tooth fault). Collect 240 sets of data for each condition, 120 sets of which were used as training samples and the others were used as test samples. e data dimension is the data collected by the acceleration sensor within 0.25 seconds; it is 16 K × 0.25 � 4 K. e specific working condition data combination is shown in Table 2.
For a pair of meshing gears at the input shaft end, the number of teeth of the small gear is 36, the number of teeth of the large gear is 101, and the meshing frequency is 303.48 Hz when the motor speed is 30 Hz. Since the gear is driven by the motor through the pulley drive input shaft, there is a certain fluctuation in the transmission ratio, so the meshing frequency changes around 303 Hz. Figures 18-21 are the time domain and frequency domain diagram of the vibration signal in four fault states.
It can be seen that the characteristic frequencies of the normal gears are the meshing frequency and its multiplication, the characteristic frequencies of pitting gears are the meshing frequency and its division frequency, and the characteristic frequency of the broken tooth gear is the meshing frequency and its multiplication frequency, and there are the side frequency bands near these frequencies.
e characteristic frequency component of the composite fault gear is more complicated, including division frequency, multiplication frequency of the meshing frequency, and the side frequency bands near these frequencies.  vibration signals are identified and analyzed. Because the MLDA algorithm is relatively sensitive to noise, the digital filter is used to reduce noise before performing data analysis.

Analysis and
Compared with FIR filters, IIR filters have higher accuracy in amplitude-frequency characteristics. Compared with Butterworth filters, Chebyshev filters, and other IIR filters,  elliptic filters have the smallest passband and stopband fluctuations, and the filtering effect is more stable, so the elliptical IIR filter is used in this experiment. It can be seen from Figures 20 and 21 that the characteristic frequency of the gear vibration signal is around the meshing frequency of 303.48 Hz and 2 times the frequency, plus the shaft speed side frequency of 30 Hz and its loworder frequency multiplication (generally lower than 5 times the shaft speed frequency); the filter cut-off frequency to 800 Hz can contain the fault characteristic frequency. Here take 900 Hz as the passband cut-off frequency, the stopband start frequency is 1200 Hz, the passband ripple is 0.3, and the stopband minimum attenuation is 40. e parameters of the             Shock and Vibration softmax classifier are as follows: the weight attenuation coefficient is 0.05, the learning rate is 0.01, and the number of iterations is 5000. Figure 22 and Table 3 show the recognition rates of the four fault status signals before and after noise reduction when the distance coefficient is 0.77 to 1.15. It can be seen that the noise has a great influence on the effect of fault recognition. After the noise reduction, the recognition rate of the signal is increased by about 20%, and the highest recognition rate is over 90%, and the original sampling signal has a maximum recognition rate of only 70%. At the same time, the distance coefficient also has a great influence on the recognition. When the distance coefficient is 0.87 to 1.07, the recognition rate is steadily rising, and when the distance is over 1.07, the recognition rate starts to decrease        Shock and Vibration greatly. When the distance is lower than 0.87, the recognition rate fluctuates obviously. Figure 23 shows the recognition result of the original signals in the four fault states, and the distance coefficient is 1.03. e recognition rate is 76.04%. Figure 24 shows the recognition effect of the four fault status signals after noise reduction, and the distance coefficient is 0.97. e recognition rate is 91.94%. In the figure, red ◇ is the normal signal, purple O is the pinion pitting fault, blue × is the big gear broken tooth fault, and cyan * is the small gear pitting   and large gear broken tooth composite fault. It can be seen that the four fault status signals can be well identified after noise reduction, and only some faults can be identified before noise reduction. In this example, the normal signals are clearly identified, but the composite faults are mixed in two faults; thus, the recognition rate is greatly reduced. In order to compare the effect of the algorithm, the LLE (local linear embedding), LPP (locally preserved projection), and LE (Laplace eigenmaps) are used for the vibration signals after noise reduction in the normal fault, pitting gear, worn gear, and composite fault gear to identify faults in this experiment. After taking the appropriate parameter (number of neighborhoods k and distance coefficients), the highest recognition rate is shown in Table 4. Figures 25-27 show recognition result for test signal after noise reduction based on three common manifold learning algorithms (LLE, LPP, and LE). It is visible for the sensor vibration acceleration signal, even after noise reduction and selection of appropriate parameters. e common manifold learning does not completely distinguish the four states, and the accuracy of fault recognition is also low.

Conclusions
e manifold learning algorithm is good for mechanical fault diagnosis based on data eigenvalues, but if the original vibration signal is directly used for fault diagnosis, the correct recognition rate is greatly reduced. To overcome this shortcoming, MLDA algorithm is proposed. In the paper, related theories and models are proposed, algorithm steps are given, and algorithm verification is carried out through simulation and experiment [23]: (1) e paper has theoretical innovation, puts forward the concept that the data points have mass, and uses the idea of gravitation between the particles to construct the virtual gravitation Laplacian matrix of the data point set. is concept provides a feasible way for conducting data-driven fault diagnosis analysis (2) e paper gives the Laplacian discriminant projection analysis method to find the optimal mapping function that can be classified; that is, the projection function is the optimal mapping function when the virtual gravity between different data sets is the largest and the virtual gravity between the similar data is the smallest. Feature extraction of data and fault classification using softmax algorithm. From the gear failure simulation and experiment, it can be seen that the extracted faults can distinguish the fault types. (3) From the simulation analysis results, the distance coefficient and noise level have a certain influence on the classification effect of the algorithm. When the appropriate distance coefficient is taken and the signal-to-noise ratio is good, the effect of the algorithm on the classification effect is not very large, but when the noise increases to a certain extent, even if the appropriate distance coefficient is selected, the classification effect will drop sharply. erefore, it is necessary to further consider how to improve the antinoise performance of the algorithm in the future (4) From the experimental results, for the nonfeature value of the vibration fault signal, compared with the commonly used manifold learning algorithms such as LLE, LE, and LPP, the classification of the Mass Laplacian Discriminant Analysis algorithm is the best Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.