Gearbox Fault Diagnosis in a Wind Turbine Using Single Sensor Based Blind Source Separation

This paper presents a single sensor based blind source separation approach, namely, thewavelet-assisted stationary subspace analysis (WSSA), for gearbox fault diagnosis in a wind turbine. Continuous wavelet transform (CWT) is used as a preprocessing tool to decompose a single sensor measurement data into a set of wavelet coefficients to meet the multidimensional requirement of the stationary subspace analysis (SSA). The SSA is a blind source separation technique that can separate the multidimensional signals into stationary and nonstationary source components without the need for independency and prior information of the source signals. After that, the separated nonstationary source component with the maximum kurtosis value is analyzed by the enveloping spectral analysis to identify potential fault-related characteristic frequencies. Case studies performed on a wind turbine gearbox test system verify the effectiveness of theWSSA approach and indicate that it outperforms independent component analysis (ICA) and empirical mode decomposition (EMD), as well as the spectral-kurtosis-based enveloping, for wind turbine gearbox fault diagnosis.


Introduction
Wind energy is one of the renewable energy sources, which have received considerable attention around the world.As critical equipment for wind energy development, the wind turbine is being widely used due to its technological maturity and good infrastructure construction [1,2].However, with the high demand for energy efficiency, the size of the wind turbine is increasing over the years, leading to high operation costs and risk of failures.To avoid expensive maintenance costs and economic losses caused by wind turbine failures, researches on condition monitoring and fault diagnosis of the wind turbine have been carried out.For example, Kotzalas and Doll [3] investigated the failure principle of various wind turbines.Ciang et al. [1], García Márquez et al. [4], and Liu et al. [5] reviewed the existing condition-monitoring strategies and relative fault diagnosis methods for wind turbines.Chen et al. [6] summarized main failure modes of the wind turbines and corresponding signal characteristics.It is known that the wind turbine failures often occur in the generator, blade, gearbox, and electrical system.As a key component for wind turbines, the gearbox is vulnerable to damage and the gearbox failure has a severe influence on working status of the whole system.As a result, effective diagnosis of gearbox failures has become a focused research trend.Generally, the gear mesh signal is strong while the gearbox fault-related signal is often weak and transient, causing difficulty to separate such faultrelated signatures from gear mesh signals.
Over the past, some filtering methods [7,8], for example, spectral-kurtosis-based and autoregressive model-based methods, were used to extract the fault-related signal components, but the filtering results are highly related to the filter parameter selection and may be sensitive to noise.Li et al. [9] developed a noise-controlled technique based on stochastic resonance method to enhance the fault signal by adjusting the input signal and the noise level.Combet and Gelman [10] and Heyns et al. [11] both utilized the time synchronous averaging technique to remove the gear meshing frequency with the help of an independent encoder signal to resample the measured vibration signals.Different from the above methods, independent component analysis (ICA) is a blind source separation approach, which can separate the raw signal into several independent sources without any prior information or reference signal.For example, both He et al. [12] and Wang et al. [13] used the ICA to extract the faultrelated features from gearbox vibration signals.However, the ICA approach needs to assume that each component to be separated is independent and does not take distribution changes of the signals into account.Compared with ICA, the recently developed stationary subspace analysis (SSA) [14] is also a blind source separation approach.The SSA can not only take the signal distribution into consideration [15], but also decompose a multidimensional time series into stationary and nonstationary source components without the independency assumption of these source signals.Since the gearbox fault-related signal often exhibits nonstationary behavior due to complex working condition of the wind turbines, it can be naturally characterized by the nonstationary components resulting from the SSA.Therefore, the SSA presents a good alternative for gearbox fault diagnosis.The applications of the SSA have been seen in change-point detection of the multidimensional time series [16], EEG signal processing [15,17], geophysical data analysis [18], and image processing [19].It should be noted that there is a multidimensional requirement when implementing the SSA algorithm.For realtime gearbox fault diagnosis of the wind turbine, although there are several sensors attached on the system, we often have only one sensor near the fault component to collect a single-dimensional signal that is highly related to the fault feature.To address such a problem, this paper presents a single sensor based blind source separation approach for gearbox fault diagnosis by integrating continuous wavelet transform with stationary subspace analysis, which is called the wavelet-assisted stationary subspace analysis (WSSA).The proposed approach first uses CWT to decompose a onedimensional signal into multiscale wavelet coefficients, which can be considered as a multidimensional signal.Then, the SSA is applied to separating it into a set of stationary and nonstationary source components.It should be emphasized that the key of the proposed approach is to utilize the inherent multiscale analysis capability and the redundancy resulting from the CWT decomposition for dimension extension of a single sensor signal [20] and then select the SSA as a blind source separation approach for both the stationary and nonstationary signal components separation.Besides, this paper also proposes to use modified principal component analysis and runs test methods to select the number of wavelet scales and the number of nonstationary source components, respectively, for further improving the effectiveness of the WSSA approach.
The organization of the rest of the paper is as follows.Section 2 provides the theoretical background of the CWT and the SSA.The parameter selection methods for the SSA and the framework of the WSSA-based gearbox fault diagnosis approach are introduced in Section 3. Section 4 gives the experimental results, together with some discussions.Conclusions are finally stated in Section 5.

Wavelet-Assisted Stationary Subspace Analysis
2.1.Continuous Wavelet Transform.The CWT of a signal () is defined as the inner product of the signal and a selected wavelet function, which is expressed as where  , () = || −1/2 (( − )/) is the scaled and translated wavelet function,  is the scale factor, and  denotes the time location.From this equation, the CWT at a certain scale  can be considered as a continuous correlation operation between the original signal and the wavelet function by changing the time location .The result of the CWT is a series of wavelet coefficients with the same length as the original signal, which can express the similarity between the signal and the wavelet function at a given scale.It should be noted that the selection of the wavelet function significantly affects the performance of the CWT.Previous study for wavelet function selection using a quantitative measure (i.e., energyto-Shannon entropy ratio) [22] has shown that the Morlet wavelet is effective for mechanical fault feature extraction and it has also been successfully applied in many cases [23][24][25].
Hence, the Morlet wavelet is chosen as the wavelet function in this study.

Stationary Subspace Analysis (SSA).
The SSA [14] is a blind source separation technique that can separate the stationary source components from the nonstationary source components in a multidimensional signal.Neither independency nor prior information is required for these source components.In the SSA algorithm, the observed signal x() with -dimension is assumed to be generated by a linear mixture of  stationary sources (-sources) s  () = [ 1 (), . . .,   ()]  and   (  =  − ) nonstationary (nsources) sources s  () = [ +1 (), . . .,   ()]  , and x() can be expressed as where A is an unknown invertible mixing matrix.The spaces spanned by the columns of A  and A  are called -space and -space, respectively.Particularly, different from the ICA algorithm, there is no independency assumption on the sources s  () and s  ().The aim of the SSA algorithm is to find a linear transformation that can separate the stationary sources s  () from nonstationary sources s  ().This can be expressed as  where ŝ () and ŝ () are the estimated stationary and nonstationary sources and P and P are called -projection and -projection.The SSA algorithm uses weak stationarity condition as an optimization criterion to recover the sources as stationary as possible [16].The weak stationarity condition can be expressed as the mean and covariance of a time series are constant over time.Since the SSA employs an optimization criterion to recover the sources as stationary as possible, it can identify the true s  (), not the true s  ().The s  () is often identified by maximizing the nonstationarity of the estimated nonstationary sources.The flowchart of the SSA is shown in Figure 1 and the specific procedures are illustrated as follows.
(I) The observed signal is divided into  ( ≥ ( −   )/2 + 2) equal time epochs and the mean value μ and covariance Σ for the ith epoch are calculated.For any selected P , the mean value μ, = P   and covariance matrix Σ, = P Σ ( P )  of the estimated stationary source in the ith epoch can be obtained.Then the probability distribution of the estimated stationary source in the ith epoch can be obtained using normal distribution as Norm( μ, , Σ, ).
(II) Kullback-Leibler (KL) divergence  KL [26] is utilized to measure the difference between the distribution of the estimated stationary source and the standard normal distribution in each epoch.Given that  , () and  , () are probability distribution functions of two distributions Norm( μ, , Σ, ) and Norm(0, ), the KL divergence between the two distributions in the ith epoch can be defined as where  is unit matrix and   is the point number in the ith epoch.Then the objective function is set as the sum of the KL divergence in each epoch as shown in the following equation: It should be noted that P is the only variable in (6).Therefore, the optimal -projection P can be obtained by minimizing the objective function shown in (7).Then, the optimal estimated stationary source ŝ () can be obtained using (4): (III) Similar to the procedure described in (I) and (II), the objective function for nonstationary source components is formulated in (8).By maximizing the objective function [16] as shown in (9), the -projection P is obtained and the estimated nonstationary source ŝ () can be calculated by (4): From the procedures presented above, it can be seen that, as compared to the ICA, the SSA takes the distribution change of the raw signal into account and allows for dependency between the source components, which may better satisfy real-world situations.Combing the CWT with the SSA, a single sensor based blind source separation approach (i.e., WSSA) can now be developed.

Simulation Evaluation.
To evaluate the performance of the WSSA for gearbox fault diagnosis, a simulation study is first conducted, where a test signal simulating the gearbox signal is formulated as follows [13]: There are four source signals in the simulated signal as shown in Figure 2.  1 () represents the outer raceway defect-related signal of a bearing modulated by the resonance signal of the system, where  1 is the resonance frequency and   is the bearing defect characteristic frequency.The 3-stage gear meshing signals are simulated by  2 (),  3 (), and  4 (), respectively.In addition, () represents the noise.The values of these frequency components are set as   = 33 Hz,  1 = 3600 Hz,  2 = 420 Hz,  3 = 160 Hz, and  4 = 50 Hz, respectively.The waveform of the simulated signal and its frequency spectrum are shown in Figure 3, where the threestage gear meshing frequencies  2 ,  3 , and  4 can be seen clearly, but the bearing defect characteristic frequency cannot be identified.The simulated signal is then decomposed using the CWT, where the Morlet wavelet is chosen as the wavelet function.Since there are four main source components in the raw signal, the wavelet coefficients at four scales are chosen as input for the SSA decomposition and they are separated into 3 stationary sources and 1 nonstationary source as shown in Figure 4.It can be seen that the shape of the nonstationary source is similar to the wave shape of the  1 ().At last, the Hilbert transform is used to extract the envelope of the nonstationary source and spectral analysis is subsequently applied to obtaining the envelope spectrum as shown in Figure 5(b), where the bearing defect characteristic frequency   is clearly extracted.Although the simulated signal is not very complex when comparing with the signals in real cases, x 1 (t) x 2 (t) n(t) x 4 (t) x 3 (t) Amplitude (V) Time (s) the simulation results verify the effectiveness of the WSSA for gearbox failure signal analysis.
It should be noted that when the WSSA is applied to real gearbox vibration signals, the number of source components contained in the vibration signals is not known a priori; thus it is necessary to determine how many wavelet scales ( value) should be chosen and which scales need to be selected for the WSSA.In addition, how to choose a proper number of stationary or nonstationary sources is another key issue.These two issues will be addressed in the following section.

Framework of Gearbox Fault Diagnosis
Based on WSSA where   (,  = 1 to ) represents the principal component coefficient and each principal component corresponds to a variance contribution   .Equation ( 11) can be rewritten using matrix style as where Traditionally, the principal components with high cumulative variance contribution can be selected as low-dimensional features of the raw high-dimensional series.For example, if <  th and ∑  =1   ≥  th ( th represents cumulative variance contribution), the  principal components  1 () to   () can be chosen as low-dimensional features.However, there will be a problem when the traditional PCA is used to choose the outstanding wavelet scales.The wavelet coefficient series in each scale represents the feature of the raw signal under a certain frequency-band window, but the chosen principal components by the PCA are a linear superposition of these wavelet coefficients in each scale, leading to multifrequency information mixture in each principal component, which may increase separation difficulty of the SSA.On the other hand, for each principal component, the value of component coefficient   can reflect the contribution rate of each   ().The larger the coefficient   is, the more important the wavelet series   () will be.Taking all chosen principal components into account, a measure called variable score can be calculated using the following method: based on cumulative variance contribution  th , the number  ( < ) of outstanding principal components is obtained.Then the coefficients of the first  principal components are processed by (14) to calculate the variable score F = [ 1  2 ⋅ ⋅ ⋅   ]: where . This equation can be further expanded as where   stands for the ith variable score.It should be noted that each   represents the contribution rate of wavelet coefficients in each scale   ().As a result, the  series of wavelet coefficients with larger score , corresponding to  wavelet scales, are chosen from the N-dimensional wavelet series as a -dimension feature for the WSSA algorithm.

Dimension 𝑑 Selection Using Runs Test.
Before the SSA algorithm is executed, we need to determine the number of stationary sources d from the -dimensional features.If  is chosen,   can be calculated as  − .In [16], this parameter selection issue is discussed and a likelihood ratio test statistic is constructed to choose  using the distribution test.In fact, the essence of this method is to identify the stationarity of the source signals obtained by the SSA.Hence, this  selection problem can be transferred to a stationarity identification problem.The -dimensional signals can be decomposed by the SSA using different  from 1 to  − 1, and the dimension of the most stationary sources is the most suitable d value.Therefore, a novel  selection method, called runs test, has been utilized for the SSA in this study.The runs test method [28] is a stationarity identification method which has been used in various fields, such as wireless networks [29], financial time series analysis [30], and simulation run length determination [31].In this study, the d selection method using runs test is designed as follows.
(a) Based on the D-dimensional wavelet coefficient series,  =  ( = 1 to ) is chosen to implement the SSA and obtain the d stationary sources ŝ () and − nonstationary sources ŝ ().
(b) For each stationary source involved in a d-dimensional stationary source, the points larger than mean value will be marked as 1 and the rest will be marked as 0. The number of 1 and 0 can be marked as  1 and  0 , respectively.The runs r can be defined as the total number of consecutive 1's and 0's in one series.For example, given two series  1 = [1 0 0 1 1] and  2 = [1 1 0 0 0], the runs  1 = 3 and  2 = 2, respectively.Based on the definition, the number of runs  can be calculated.
(c) The total number of runs in the signal can provide information regarding whether it is stationary or not.A signal that contains very few runs is probably stationary, while a signal that contains many runs is more likely to be nonstationary.Considering that different sizes of different signals may affect runs calculation, the runs value r cannot be compared directly.Since the runs value r can be treated as an approximate Gaussian distribution, the mean value and variance of r can be acquired as ( 16) [30] and a statistic  [32], standardization of , can be obtained to identify the stationarity of the series as (17).The larger the || is, the more nonstationary the series will be.As a result, the mean value   of || for  stationary sources is obtained from (18).Consider (d)  from 1 to  − 1 is chosen for the SSA algorithm, and then ( − 1)   values are obtained based on procedures (b) and (c).The d value, corresponding to the smallest   , will be selected as the final number of stationary sources.

Framework of Gearbox Fault Diagnosis Approach.
By integrating the proposed methods, the framework of gearbox fault diagnosis approach is presented in Figure 6.The gearbox vibration signal is firstly decomposed by the CWT to obtain the multiscale wavelet coefficients.Secondly, the modified PCA is used to select the coefficients in certain scales and obtain a low-dimensional coefficient series.Thirdly, the runs test method is utilized to decide how many stationary sources should be separated from the wavelet coefficient series.Fourthly, the SSA algorithm is applied to separating the wavelet coefficient series into stationary sources and nonstationary sources.At last, the nonstationary source with the maximum kurtosis value is selected and processed using enveloping spectral analysis.The obtained envelope spectrum is used to detect the defect frequency of the gearbox.

Experimental Study
In order to verify the effectiveness of the WSSA-based gearbox fault diagnosis approach, vibration signals collected from a real wind turbine gearbox are analyzed.The experimental data is contributed by a project called Gearbox Reliability Collaborative (GRC) hosted by National Renewable Energy  Laboratory (NREL) [33].The test turbine (Figure 7) is a stallcontrolled, three-bladed, upwind turbine with a rated power of 750 kW.The gearbox under test is composed of one low speed (LS) planetary stage and two parallel stages and has an overall ratio of 1 : 81.491.The structure of this gearbox is shown in Figure 8. Accelerometers are mounted on several locations outside of the gearbox to collect vibration signals with 40 kHz sampling rate.

Case Study I.
The first case study analyzes a vibration signal measured by the accelerometer AN6 (model: IMI622B01), which is close to bearing D (model: SKF NU-2220-ECM, near the high speed shaft).In this study, the rotation speed of the high speed shaft (denoted as HS shaft) is 1200 rpm and the electric power is 25%.The waveform and frequency spectrum of the vibration signal are shown in Figure 9.It can be seen that the rotation frequency of HS shaft ( HS = 20 Hz) is visible in Figure 9(b), while no defect-related frequency for bearing D is identified.
The WSSA algorithm is then used to process the same signal.After the CWT decomposition, wavelet coefficients of scale 1-32 are obtained and mPCA is used to find four outstanding scales, which are identified as scales 15, 18, 19, and 20.Then the runs test method is utilized to compute the statistic  1 = 192.89, 2 = 265.10,and  3 = 217.06,among which  1 is the smallest.This means one stationary source and three nonstationary sources can be determined through the SSA.The results for extracted stationary and nonstationary source components are shown in Figure 10, among which the nonstationary source 3 with the maximum kurtosis value is selected and processed using enveloping spectral analysis.From the envelope spectrum shown in Figure 11, the inner raceway defect frequency of the bearing D ( BPFI = 50 Hz) can be clearly identified.In fact, the bearing D indeed suffered from the inner raceway defect as shown in Figure 12.To make a comparison, the wavelet coefficients at the scale with the maximum kurtosis are directly used for enveloping spectral analysis, and the defect frequency 50 Hz cannot be identified from the results shown in Figure 13.In addition, the ICA method is used to process the same vibration signal and conduct the same diagnostic procedure as the SSA does.As shown in Figure 14, although the defect frequency can also be identified, it is not so prominent as compared to the result obtained using the SSA.Empirical mode decomposition (EMD) technique is also used to decompose the signal and the intrinsic mode function (IMF) with maximum kurtosis is chosen for enveloping spectral analysis.From the result shown in Figure 15, the defect frequency is nearly submerged in the surrounding frequency components.Furthermore, the spectral kurtosis is applied to the vibration signals to find the proper central frequency 2500 Hz and bandwidth 312 Hz (shown in Figure 16), and then a band-pass filter with the chosen central frequency and bandwidth is used to filter the vibration signal.From the envelope spectrum of the filtered signal shown in Figure 17, the magnitude of the defect frequency 50 Hz is not large enough as compared to other adjacent frequency components.This case study verifies the effectiveness of the proposed WSSA for wind turbine gearbox fault diagnosis.

Case Study II.
The second case study analyzes a vibration signal measured by the accelerometer AN3 (model: IMI 626B02), which is close to bearing H (model: FAG SL181856E).In this study, the rotation speed of the high speed shaft (HS shaft) is 1800 rpm and the electric power is 25%.The bearing H is used to support the main shaft (rotation speed: 22.09 rpm).The waveform and frequency spectrum of the vibration signal are shown in Figure 18, where no defectrelated frequency for bearing H is identified.
The vibration signal is then processed by the proposed WSSA approach.Following the same procedure as shown in case study I, wavelet coefficients of the vibration signal at scales 1-32 are obtained by the CWT.Then the coefficients at the three scales 23, 24, and 25 are selected by the mPCA, and the runs test is performed to calculate the statistic  1 = 409.68 and  2 = 350.80,among which  2 is smaller than  1 .This means two stationary sources and one nonstationary source can be extracted from the vibration signal through the SSA, and the result is shown in Figure 19.The enveloping spectral analysis of the nonstationary source is then conducted and the resulting envelope spectrum is shown in Figure 20.In addition to the intermediate speed shaft (denoted as IS shaft) rotation frequency  IS = 7.5 Hz, the outer raceway defect frequency of the bearing H ( BPFO = 8.6 Hz) can also be clearly identified.This is consistent with the physical examination for the bearing H as shown in Figure 21.For comparison study, the methods using the CWT with maximum kurtosis-based scale selection, ICA, EMD, and the band-pass filter based on spectral kurtosis are also used to process the same vibration signal and the results are shown in Figures 22-26, respectively.From these figures, it can be seen that the rotation frequency of the IS shaft is highlighted, while the outer raceway defect frequency of the bearing H is almost submerged in the spectrum and cannot be identified.The comparison results further confirm the effectiveness of the WSSA for wind turbine gearbox fault diagnosis.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 5 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 5 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

Conclusions
This paper presents a single sensor based blind source separation approach (i.e., the WSSA) for wind turbine gearbox fault diagnosis by integrating the CWT with the SSA.The modified PCA (mPCA) and runs test methods are brought in for the presented approach to quantitatively choose the suitable wavelet scales and the number of stationary sources, respectively.The proposed method can use a single sensor signal for SSA separation based on the signal distribution without any independency assumption.Furthermore, the mPCA can effectively reduce the computation load of the SSA by wavelet coefficient dimension reduction and the runs test can ensure an optimal separation mode for the SSA.In the experimental study performed on a real wind turbine gearbox, the proposed WSSA approach can well separate the stationary signal components from the nonstationary signal components mixed in the vibration signal and accurately identify the defect frequency of the bearing in the wind turbine gearbox.In addition, comparison study using CWT with maximum kurtosis, ICA, EMD, and spectralkurtosis-based approaches indicates that the proposed WSSA approach is more beneficial than these existing methods for wind turbine gearbox diagnosis.However, the influence of the time epoch number  proposed in Section 2.2 on the SSA result is not considered in this paper which needs to be further investigated in the future study.In addition, it should be noted that the proposed diagnosis approach is tested using the signal collected from constant speed gearbox.By combining this approach with order tracking method, it is 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.  able to deal with the signals under variable-speed conditions and can also be applied to monitoring variable-speed wind turbines.

Figure 1 :
Figure 1: Flowchart of the SSA algorithm.

Figure 2 :
Figure 2: Source components of simulation signal.