Sparse Representation of Transients Based on Wavelet Basis and Majorization-Minimization Algorithm for Machinery Fault Diagnosis

Vibration signals captured from faulty mechanical components are often associated with transients which are significant for machinery fault diagnosis. However, the existence of strong background noise makes the detection of transients a basis pursuit denoising (BPD) problem, which is hard to be solved in explicit form. With sparse representation theory, this paper proposes a novel method for machinery fault diagnosis by combining the wavelet basis and majorization-minimization (MM) algorithm.This method converts transients hidden in the noisy signal into sparse coefficients; thus the transients can be detected sparsely. Simulated study concerning cyclic transient signals with different signal-to-noise ratio (SNR) shows that the effectiveness of this method.The comparison in the simulated study shows that the proposedmethod outperforms themethod based on split augmented Lagrangian shrinkage algorithm (SALSA) in convergence and detection effect. Application in defective gearbox fault diagnosis shows the fault feature of gearbox can be sparsely and effectively detected. A further comparison between this method and the method based on SALSA shows the superiority of the proposed method in machinery fault diagnosis.


Introduction
Detection of transients has always been an important task in image and signal processing, and it has also been increasingly significant in the field of machinery fault diagnosis in recent years [1][2][3].Most of the machinery is operated by means of gears and other rotating parts.Fault in these parts may lead to the failure of the whole machine and even loss of lives.Thus, proper analysis for transient detection from the machinery vibration signals has attracted sustained attention during the past decades [4][5][6][7][8].
Vibration signals derived from the defective gearboxes are generally observed as periodic transient impulses due to the rotating status [9].Researches have shown that the transients in the vibration signals generated from a machinery component usually correspond to the localized fault or mechanical defect, such as flaking, crack, breakage, and fracture [10].Therefore, it is very important to extract the useful information, such as the feature of transient impulse and the cycle of multiple transient impulses, by analyzing the transients in the vibration signal [11,12].However, the strong background noise in practical vibration signals will corrupt the fault-induced transient impulses.Hence, detection of transients hidden in the noisy vibration signals is of great importance for machinery fault diagnosis.
Different methods have been proposed to extract the fault features from the vibration signals, such as wavelet analysis [13], time-frequency analysis [14,15] and empirical mode decomposition (EMD) [16], and so forth.Meanwhile, another feature extraction method based on sparse representation is widely used in image processing, which has a close analogy with the effect of those fault feature extraction methods for 1-D signal.
Generally, transients to be detected can be described as a series of sparse coefficients multiplied by a certain wavelet basis as long as the signal being processed has 2 Mathematical Problems in Engineering a sparse representation with respect to the known wavelet basis [17].Hence these signal processing problems can be viewed as a series of restoration and reconstruction problems, which are classical linear inverse problems.However, most inverse problems are ill-posed [18,19]; thus, these inverse problems can only be solved satisfactorily by adopting some sort of regularization.Based on sparse representation theory, a standard formulation for coping with these problems called basis pursuit denoising (BPD), consisting of a data fidelity term and a penalty term, has been proposed in [20].One approach to derive suitable algorithms is based on the majorization-minimization (MM) method from the optimization theory.To apply the MM algorithm, either the data fidelity term (a quadratic function) or the penalty term (usually nonquadratic) can be majorized to handle this class of BPD problems.Typical approach for solving the former term is the so-called split augmented Lagrangian shrinkage algorithm (SALSA), which has been proposed recently to handle the image recovery problems [21]; while, for the latter term, an algorithm associated with an alternative quadratic function employing majorization-minimization also provides the solution to the corresponding inverse problems [22], called MM algorithm based on nonquadratic majorization.
SALSA, a recent algorithm to solve the BPD problem by using the Hessian of the data fidelity term, has better convergence properties than the earlier algorithms, such as iterative shrinkage/thresholding algorithm (ISTA) and fast ISTA (FISTA) [23,24].Based on variable splitting, this algorithm obtains an equivalent constrained optimization formulation, which is then addressed with an augmented Lagrangian method, and then, the convergence is guaranteed.The transient feature detection method based on SALSA has been increasingly popular at present.Sparsity-enabled signal decomposition method based on SALSA is used to decompose the vibration signal of faulty gearbox into high-oscillatory component and low-oscillatory component, which is employed by Cai et al. to extract the fault feature of gearbox [25].Application of the sparse algorithm SALSA is employed by Selesnick to extract the frequency component of a sinusoidal signal [26].However, some problems still remain.For instance, it is hard to select the optimal basis for signal representation to avoid the loss of useful components and the computation cost is large because two quadratic functions associated with nonconvex regularization functions should be minimized during each iteration of this algorithm.
Instead of utilization of variable splitting, the MM algorithm based on nonquadratic majorization utilizes a sequence of simpler convex optimization problems to replace the original ill-posed inverse problems yet is an effective and widely applicable method [27].The algorithm has wide applications in image deconvolution or restoration under certain regularization due to its flexibility in the design of the sequence of simpler optimization problems so does in the field of speech denoising [22,[28][29][30].However, there are still some remaining issues to be studied when applying the MM algorithm.One of the most important issues is how to select the optimal wavelet basis to satisfy the characteristics of the machinery vibration signals, which is very close to the important issue of SALSA, and this is the vital point to improve the detection performance.
Considering that the wavelet basis of nonquadratic majorization-based MM algorithm in recent image processing is not suitable for feature extraction of gearbox fault vibration signal, and the selection of appropriate wavelet basis is the premise of the detection effect, a new method must be proposed to accommodate to the actual signal.Mass data show that the waveform of Morlet wavelet is in shape similar to signal transients caused by gearbox localized defects [3,31]; thus, this paper proposes a new technique by synthesizing MM algorithm and Morlet wavelet basis.To implement the sparsity of the results, optimal wavelet basis is selected by applying correlation filtering, which has been widely applied in mechanical vibration signal processing [32].The key issue to minimize the objective function of the ill-posed inverse problem is solved by nonquadratic majorization-based MM algorithm; thus a meaningful sparse coefficient would be got.Finally, the transients can be detected from the sparse coefficients, which are the key features for machinery fault diagnosis.
The remainder of the paper is organized as follows.In Section 2, the basic theoretical background concerning the proposed method is introduced.Section 3 gives a simulated study and analysis to verify the proposed method.Section 4 applies the presented method to the fault diagnosis of a gearbox in an automobile transmission gearbox, with a comparison to the extraction results of SALSA method.Finally, conclusions are drawn in Section 5.

Theory and Method
In transient detection problems, the goal is to detect the transients buried in noise from an observed signal.The transients can be represented as a series of sparse coefficients as long as the signal being analyzed has a sparse representation with respect to a known transformation.In this section, MM algorithm, solving ill-posed inverse problems, is introduced to model the transient detection problem.The following part provides a brief description on main steps of the MM-based transient detection method.

Mathematical Model of Transient Detection Problem.
Considering that the signal sampled from the transducer always has much noise, the observed signal () can be modeled as where () is the sampled signal from the transducer, () is the true signal without noise, and () is the noise.The true signal () can be represented as a sparse linear combination of certain atoms.The representation of  can be expressed as  = , where  is the vector of representation coefficients and also represents the transients; and the set of columns of  is a wavelet basis or dictionary; to be convenient, we also call  as wavelet basis.With this sparse representation, the estimation model (1) becomes where  is an × matrix,  =  is a length- vector, and  is a length- vector, with  > .The more the similarity between the basis  and the signal  is, the sparser the vector  will be.Here assume that  * (where (⋅) * denotes complex conjugate transpose) is invertible; therefore, the system of (2) has infinitely many solutions.As in many recent publications [21,22,26], we adopt the  1 -norm regularizer to handle the ill-posed nature of the problem of inferring  and thus leads to the BPD problem: where ‖‖ 1 is the  1 -norm of -length vector , defined as , and  is the regularization parameter.
The function () cannot be easily minimized unless it is quadratic.To minimize (), an iterative algorithm must be used.According to the MM algorithm from the optimization theory, instead of minimizing (), the MM algorithm solves a sequence of simpler minimization problems: where  is the iteration counter,  = 1, 2, 3, . ... The MM algorithm asks that each function   () should be a majorizer (upper bound) of () and that it coincides with () at  =   .That is

Majorization Iterative Algorithm Based on MM Approach.
Traditional MM approaches to this problem consider three possible majorization strategies and thus lead to three different classes of algorithms as shown in [16].As mentioned above, we can solve this problem easier if the function is quadratic.
Note that the data fidelity term ‖ − ‖ 2 2 , presented in the function (), is already a quadratic function.Then we just need to focus our attention on the penalty term ‖‖ 1 .
We mark penalty term ‖‖ 1 as . () is an absolute value function and thus is nondifferentiable and nonstrictly convex, which makes (3) difficult to solve.According to the MM algorithm as shown in (5), we should find a quadratic function () to majorize () of the general form: where the parameters ,  and  are constants, the majorizer () should be the upper bound for () that coincides with () at a specified point   as shown in Figure 1.For this quadratic majorizer, conditions in ( 5) are equivalent to Solving for  and  gives  = (  (  )/2  ) − (/2  ),  = (  ) − (  /2)  (  ) − (/2)  , thus leading to the majorizer () in ( 6) given by Considering a special form of function (), we set the unknown parameter  = 0; then the parameters  and  become  = (  (  )/2  ),  = (  ) − (  /2)  (  ), thus leading to the majorizer () in (8) given by Considering the concrete form of (), the function   () can be written in a matrix format as where Λ  denotes the diagonal matrix with vector |  | along its diagonal, Then, the MM update (4) for   is The last term of ( 12) can be omitted because it does not depend on ; thus a new update equation also called cost function is got as Equation ( 13) is quadratic in  so the solution to this problem can be written explicitly using linear algebra as Although ( 14) is mathematically valid, there are still some remaining issues with this update.Due to the sparsity of , components of  will go to zero as the iteration progresses.Thus the first issue is that as the elements of  go to zero, elements of Λ −1  would go to infinity, which will make the update numerically inaccurate.The second important issue is the selection of the optimal wavelet basis to ensure the sparsity of  when applying MM algorithm to transient detection problem.
The first issue can be avoided, as described in [22], by using the matrix inverse lemma, so the update equation can be written as where  can be chosen according to the regularization in [30].
The second important issue, the selection of the optimal wavelet basis, can be avoided by employing correlation filtering, which will be described in the next section.

Selection of Optimal Basis by Correlation Filtering.
Morlet wavelet is one of the most popular nonorthogonal wavelets, defined in the time domain as a harmonic wave multiplied by a Gaussian time domain window: It is a cosine signal that decays exponentially on both the left and the right sides.This feature makes it very similar to an impulse.It has been used for impulse isolation and mechanical fault diagnosis [3,[32][33][34].
Considering that the waveform of Morlet wavelet is in shape similar to the signal transients caused by gearbox localized defects at a constant speed, we incorporate Morlet wavelet into the above algorithm.The Morlet wavelet is firstly chosen as the basis.
The parametric formulation of Morlet wavelet is where parameter vector  = (, , ) determines the wavelet properties, these parameters (, , ) are denoted frequency  ∈  + , damping ratio  ∈ [0, 1) ⊂  + , and time index  ∈ , respectively.The discrete parameters , , and  belong to the subsets of , , and   , respectively: With different parameters, dictionary of Morlet wavelet can be constructed as follows: and each item in the dictionary is called an atom.
With the constructed dictionary, correlation filtering is exploited to identify the optimal set of parameters (, , ), which is the most similar to the original signal [32,34].Correlation, which shows the similarity between the basis and the original signal, can be measured by an inner product operation.Then, a correlation function   is defined to quantify the correlation degree between the basis   () and the original signal (); where  is the angle between   () and (), and the smaller the angle, the nearer the   () and () in space and the more resemble the   () and ().Thus, the single similar wavelet function (, , , ) with optimal parameters (, , ) can be found by maximizing the correlation function   at each time value from the constructed Morlet wavelet dictionary.Peaks of   for a given time value  can be represented as which relates the wavelets with the strongest correlation to the signal, and the time index parameter  can be found by maximizing the coefficient   (); thus the most similar wavelet function is (, , , ).
The optimal wavelet basis Mor(, ) with optimal parameters ,  based on (, , , ) replacing the above basis  in (15) can be constructed; thus the second important issue of the MM algorithm has been solved.

Sparse Representation of Transients Based on Wavelet
Basis and MM Algorithm.Motivated by the merits of MM algorithm, a new sparse representation of transients method based on MM algorithm incorporated wavelet basis is proposed in this study.As the signal being analyzed can be represented sparsely on certain wavelet basis, the transients detection problem can be viewed as a basis pursuit denoisng problem; thus the MM algorithm can be employed to handle it.
Considering that the faulty gearbox vibration signal always performs as periodic double-side attenuation functions due to the rotating nature, which is in shape similar to the Morlet wavelet, this paper inserts Morlet wavelet basis Mor(, ) selected by correlation filtering into the MM algorithm and thus yields the iterative algorithm as follows: The updating of ( 22) can be implemented as follows: (1) set  = 0, choose , and initialize ; (2) set iterative number Nit;  ←  + 1  +1 =   − ( +1 − V +1 ) until stopping criterion is satisfied.
←  + 1 until stopping criterion is satisfied.The matrix Mor has two parameters ,  as defined above.Then the reconstructed signal x can be expressed as x = Mor⋅ ĉ, where ĉ is the final vector with sparse coefficients.
To illustrate the superiority of the computation cost of the proposed method, the procedures of this method and another sparse algorithm SALSA also based on Morlet wavelet basis are shown in Table 1.As shown in Table 1, wavelet-assisted SALSA algorithm has two convex functions to be minimized, and one of them includes a nonstrictly convex function, which is very hard to minimize.However, there is only one function to be minimized in the proposed method, and the function is quadratic, which makes the proposed method less computation cost.
In summary, the procedure of the proposed transient sparse representation method is illustrated in Figure 2.After the accomplishment of vibration signal measurement, the procedures of transient detection can be described as the following major parts.
(1) Given a signal () with  data points, find the most similar function (, , , ) by using correlation filtering based on correlation value maximization rule according to ( 20)-( 21); (2) construct optimal wavelet basis Mor(, ) of size  ×  by using the function (, , , ) above as the basis atom; (3) construct new transient sparse representation method based on MM algorithm through replacing  with the optimal basis Mor(, ) in (15); update the new method in (23), and a new data vector ĉ of size  × 1 representing the transients in signal is generated.Finally, the transients hidden in signal can be represented as a sparse vector ĉ, and the denoised signal x is finally reconstructed by x = Mor ⋅ ĉ.

Simulation and Evaluation of the Proposed Method
To verify the effectiveness of the proposed method, a simulated vibration signal () is performed for transient feature extraction.Considering the characteristics of vibration signals of faulty gearbox, the simulated signal is constructed as where frequency is  0 = 5 Hz,  0 = 0.01 is the damping ratio, time index is  0 = 1 s, and cyclic period is  0 = 2 s.Obviously, () is a real periodic cyclic impulse response signal simulating the gear fault signal, and the signal () is white noise.The sampling frequency is 200 Hz in time ranges [0, 10] s.Waveform of the simulated signal is shown in Figure 3, whose noise is weighted by   = 0.5.Figures 3(a) and 3(b) give the waveform of the simulated signal including noise or not, respectively.The proposed transient sparse representation method is applied to analyze the simulated signal and extract the transients from the signal.According to the procedure in Figure 2, the first step is to calculate the optimal wavelet function by correlation filtering based on correlation value maximization rule.With this principle, parameters  = 5 Hz,  = 0.01 are obtained for the optimal wavelet function.The parameters are exactly equal to the simulated values.
The second step is to cope with the objective function as described in (3) (with  = 8.457).The optimal wavelet basis Mor(, ) with parameters ,  in the first step should be founded so as to realize the feature detection sparsely.With the optimal basis Mor(, ), the transients in the noisy signal are converted into sparse vector ĉMM (represents the sparse vector by using MM method) as illustrated in Figure 4(a) through the iterative algorithm shown in (23).The reconstructed signal is shown in Figure 4(c).For better comparison of the sparse vector and the transients in the reconstructed signal, the first  elements of vector ĉ are taken as shown in Figure 4.The following analysis will refer to this one, and not explain in the next section.
In Figure 4(a), cyclic period  = 2 s is identified, which is equal to the simulated value ( 0 = 2 s).The impulse times are 1 s, 3 s, 5 s, 7 s, and 9 s, respectively, which can also be identified from Figure 4(a).Compared to the original signal shown in Figure 3(a), it can be found that the reconstructed signal shows an excellent effect on transient detection as the transients are captured clearly, while the noise is well discarded.To confirm the superiority of this result, the same simulated signal is analyzed by SALSA method for comparison.Optimal Morlet wavelet basis is firstly constructed in the same way described in Section 2.3, and then SALSA algorithm is applied to deal with this detection problem.The detection results of SALSA method are illustrated in Figure 5. Figures 5(a) and 5(c) give the sparse vector ĉSALSA (represents the sparse vector by using SALSA) and the reconstructed signal, respectively.The period of the impulses can also be identified as marked in Figure 5(a) by SALSA method.It seems that SALSA method has a good performance for transient detection.However, some details about the analysis results are ignored.We amplify both the sparse coefficients ĉMM and ĉSALSA with their -labels range [0 0.02] as shown in Figures 4(b) and 5(b), respectively.The sparsity of the vector can be measured by the  0 -norm (‖ ⋅ ‖ 0 ) of it.We set the threshold value as 1 −10 , then we get ‖ĉ MM ‖ 0 = 71, while we get ‖ĉ SALSA ‖ 0 = 1031.Thus, conclusions can be drawn that vector ĉMM is much sparser than vector ĉSALSA , which implies the proposed method has a better performance on transient sparse representation.Furthermore, their histories of the cost functions are illustrated in Figure 6, simultaneously.History of the cost function shown in (13) of the proposed method is illustrated in green dash-dot line, while the cost function history of the SALSA method [21] for this simulation is illustrated in blue dash line.Figure 6 shows the proposed method has a faster convergence than SALSA method.
Another simulated signal with larger noise (  = 0.7) as shown in Figure 7(a) is analyzed by these two methods, and the detection results of the proposed method (with  = 11.840) and SALSA method are shown in Figures 7(b) and 7(c), respectively.Conclusions can be drawn that, with the increasing of noise level, detection ability of SALSA method fades away and even leads to faulty detection as shown in red circles in Figure 7(c), while the proposed method still has a good performance on extraction effect.
In order to test the noise tolerance of the proposed method, the simulation tests with different noise amplitudes   from (24) are investigated as shown in Table 2, in which noise amplitudes   , SNR, the correlation coefficients   between the reconstructed signal and the original signal, the impulse time, and the period parameters are listed.
Correlation coefficient   , used to measure the effect of the construction, also can be written as: where  and x are the original component and the corresponding estimate result, respectively.SNR, the signal-to-noise ratio, used to weigh the noise level is defined as where   is the energy of the useful information () and   is the energy of the noise ().As shown in Table 2, it can be seen that with the increasing of the noise amplitude, the proposed method still has an excellent detection effect.

Experimental Verification
To verify the effectiveness of the proposed method in practical engineering applications, defective gearbox data is analyzed.The experimental data were acquired from an automobile transmission gearbox, which has five forward speeds and one backward speed, as shown in Figure 8.At constant rotating speed, localized faults in gearbox tend to result in periodic shocks and thereby arouse periodic transients in vibration signals, so the transients can be represented as certain wavelet basis sparsely; thus the fault features can be extracted sparsely by the proposed method.To further demonstrate the effectiveness of the proposed method, the results of fault feature detection by the proposed method are compared with the detection results of SALSA method.
During the test, a broken-tooth fault occurred on the driving gear of the third speed.The vibration signal was acquired by an accelerometer mounted on the outer case of the gearbox when it is loaded on the third gearbox.For a gear transmission, the meshing frequency   is calculated by where  is the number of the gear teeth,  is the rotating speed of the input shaft, and  is the transmission ratio.The working parameters are shown in Table 3, the meshing frequency here is 500 Hz, and thus the sampling frequency is set at 3000 Hz.
A measured vibration signal caused by one driving gear teeth broken with a length of 900 and its frequency spectrum are shown in Figures 9(a) and 9(b).From Figure 9(a), the impulse period cannot be identified as the noise corruption; from Figure 9(b), the main frequency component can be identified as 500 Hz.
The proposed transient sparse representation method is then applied to the signal.The optimal wavelet function is first calculated by the correlation filtering rule, and the associated parameters are  = 272 Hz, = 0.0074, and  = 0.0633 s.
Then the optimal wavelet basis Mor(, ) with parameters  and  is founded so as to realize the feature detection sparsely.Figure 10(a), obtained by the proposed method, gives the sparse coefficients, which represents a series of periodic impulses.The average time period of these impulses is around 0.0505 s, which is very close to the theoretical value of 0.050 s. Figure 10(b) gives the reconstructed signal, whose periodical features represent the localized fault existing in the driving gear of the third speed.The analysis results confirm that the proposed transient sparse representation method can detect the transients clearly and reduce the noise effectively; thus the fault features can be identified.For the purpose of comparison, the same signal is also processed by using SALSA method, and the results are in Figure 11.The sparse coefficients obtained by SALSA method are illustrated in Figure 11(a), which theoretically represent a series of periodic impulses.However, from Figure 11(a), the fault period cannot be identified clearly.The reconstructed signal is shown in Figure 11(b).The analysis result in Figure 11(b) has almost no periodical features, thus it is difficult for gearbox fault diagnosis.By comparing with the analysis results in Figure 10, it can be found that the proposed transient sparse representation method is obviously superior to SALSA method.

Figure 1 :
Figure 1: The penalty function and its quadratic majorizer.

Figure 2 :
Figure 2: Procedure of the proposed transient sparse representation method.

Figure 3 :
Figure 3: The simulated signal: (a) the original signal and (b) the noisy signal.

Figure 4 :
Figure 4: Detection results of the simulated signal by using the proposed method: (a) the sparse coefficients; (b) the amplified sparse coefficients, and (c) the reconstructed signal.

Figure 5 :
Figure 5: Detection results of the simulated signal by using SALSA method: (a) the sparse coefficients; (b) the amplified sparse coefficients, and (c) the reconstructed signal.

Figure 6 :
Figure 6: The cost function histories of the proposed method and SALSA method.

Figure 7 :
Figure 7: Detection results of these two methods: (a) the simulated signal with much noise; (b) the proposed method, and (c) the SALSA method.

Figure 10 :
Figure 10: The analysis results of vibration signal by using the proposed method: (a) sparse coefficients, and (b) the reconstructed signal.

Table 1 :
Procedures of these two methods: the proposed algorithm and wavelet-assisted SALSA algorithm.

Table 2 :
The analysis results of the proposed method at different noise amplitudes.