Machine-Learning-Based Intelligent Mechanical Fault Detection and Diagnosis of Wind Turbines

Wind power has gained wide popularity due to the increasingly serious energy and environmental crisis. However, the severe operational conditions often bring faults and failures in the wind turbines, which may significantly degrade the security and reliability of large-scale wind farms. In practice, accurate and efficient fault detection and diagnosis are crucial for safe and reliable system operation.,is work develops an effective deep learning solution using a convolutional neural network to address the said problem. In addition, the linear discriminant criterion-basedmetric learning technique is adopted in themodel training process of the proposed solution to improve the algorithmic robustness under noisy conditions.,e proposed solution can efficiently extract the features of the mechanical faults. ,e proposed algorithmic solution is implemented and assessed through a range of experiments for different scenarios of faults. ,e numerical results demonstrated that the proposed solution can well detect and diagnose the multiple coexisting faults of the operating wind turbine gearbox.


Introduction
In recent decades, wind power generation has become one of the dominating sustainable and renewable energy sources and achieved high popularity and significant expansion [1]. Due to the increasing CO 2 emissions and the ever-increasing demand for energy, renewable power generation sources have been considered as one of the most important means to alleviate the urgent power demand in a low-carbon fashion. In practice, the deployment of wind turbines is generally located in some rural areas, remote sites, or even offshore.
is brings about direct challenges for the inspection and maintenance of wind power generation infrastructures. Due to the harsh operating environment and extreme working conditions, wind turbines are prone to various faults, resulting in high maintenance efforts [2]. Gearbox, being a critical transmission component in wind turbines, shows a high fault rate in practice. e statistics provided in [3] indicated that 76% of the failures are observed at the bearings, and the faults in gears and other components are 17.1% and 6.9%, respectively. erefore, efficient health condition monitoring is of paramount importance to reduce maintenance costs as well as downtimes for reliable system operation and low economic losses [4,5].
In practice, the mechanical fault detection and diagnosis of wind turbines in large-scale wind farms is a nontrivial task due to the operating conditions and complex operational characteristics. In general, the manual condition monitoring methods were based on signal processing and analysis, which required skilled expert knowledge and consequently leading to time-consuming remedies [6]. In the literature, to effectively address the technical challenge of fault detection and analysis, the solutions of operational condition monitoring and online fault detection and diagnosis have been investigated (e.g., [7][8][9][10]).
It can be observed from the literature that the intelligent fault diagnosis process generally consists of three key steps: (1) signal acquisition by field sensors; (2) feature extraction through the signal preprocessing; and (3) fault classification using machine-learning algorithms, regarded collectively as a typical pattern recognition problem. During the process of signal acquisition, various signals, including the measurements of vibration signals, acoustic emission signals, current, and infrared thermal, can be acquired from the field [11]. In particular, the condition monitoring and diagnosis through the analysis of vibration signals has been considered useful for the rotating machines as the vibration signal contains the most intrinsic characteristics related to health conditions [12]. During the process of the feature extraction, it is required to understand the characteristics of signals collected from various field sensors and generate the proper sensitive features to address specific diagnosis issues. As suggested by the existing studies (e.g., [13]), the exaction of intrinsic features from the available vibration signals can be generally carried out through the analysis in the time domain, frequency domain, and time-frequency domain. Time-domain features include some statistical parameters, for example, the root mean square, variance, kurtosis, skewness, impulse factor, and crest factor. For the analysis in the frequency domain, the fast Fourier transform (FFT) is a widely used technique for signal processing, while the shorttime Fourier transform (STFT) and wavelet packet transform (WPT) is often adopted for the analysis of nonstationary signals [14]. Also, some other advanced techniques have been adopted for the time-frequency analysis, for example, local mean decomposition (LMD), ensemble empirical mode decomposition (EEMD), Hilbert-Huang transform (HHT), and sparse representation [15][16][17][18]. For fault classification, many machine-learning-based solutions have been developed in previous studies, such as support vector machine (SVM) [19], random forest (RF) [20], knearest neighbor (k-NN) [21], and artificial neural networks (ANN) [22]. e existing methods generally combine advanced signal processing techniques and machine-learningbased approaches to obtain the sensitive features from the collected field signals and make the diagnostic decisions efficiently, achieving remarkable results in various fault diagnosis applications. In [23], the statistical parameters were exacted and characterized from the measured vibration signals and the roller bearing operation conditions were analyzed based on an SVM-based approach to recognize the. e authors in [24] adopted the energy entropy of empirical mode decomposition (EMD) as the input feature and used the ANN for fault classification. e authors in [25] presented an algorithmic solution to calculate the multiscale permutation entropy of vibration signals that are further used as the inputs for the SVM classifier to carry out the bearing fault diagnosis. In [26], the authors proposed to integrate the ensemble empirical mode decomposition with the independent component analysis (ICA) for the classification of the component faults for carrying out the diagnosis of bearing failures in wind turbines. e conventional machine-learning-based approaches have obtained great success in fault diagnosis. However, there are still a set of limitations that can be summarized as follows: (1) the feature extraction is generally needed for every specific fault diagnosis task.
is demands expert knowledge of signal processing and is generally time-consuming; (2) some features may be redundant and it is very difficult to select the most sensitive features without prior knowledge; and (3) the shallow architectures of learning models, for example, ANN, may potentially limit the capability of the classifiers to learn sophisticated nonlinear relationship during complicated fault scenarios. To overcome existing limitations, more advanced deep learningbased techniques need to be further investigated and assessed.
In recent years, deep learning-based techniques that can automatically and efficiently learn the high-level feature representations from the raw input data have been developed rapidly (e.g., [27,28]). Also, a collection of deep learning methods, such as deep belief network (DBN) [29], sparse autoencoder (SAE) [30], denoising autoencoder (DAE) [31], sparse filtering [32], convolutional neural network (CNN) [33], and recurrent neural network (RNN) [34], have shown extraordinary feature learning capacity and significant diagnosis results in mechanical fault diagnosis. In [33], the study calculated 256 statistic features and reshaped them into a square matrix and then adopted 2D CNN for gearbox fault diagnosis. In [35], the authors applied various time-frequency methods including shorttime Fourier transform, discrete wavelet transform, and S transform to generate time-frequency images and fed them into a CNN to evaluate the performance. e authors in [36] converted a 1D raw vibration signal into 2D vibration grey-scale image directly, utilized LeNet-5-based CNN for data-driven intelligent fault diagnosis of mechanical equipment, and resulted in remarkable accuracy. Besides, a one-dimensional CNN structure has also attracted significant attention due to its straightforward fitness in coping with time-series analysis. e solutions proposed in [37,38] have adopted the 1D CNN for the fault detection and analysis of real-time signals through capturing the lowfrequency features.
Based on the observation and analysis of the existing literature, there are still some challenges that are summarized as follows: (1) e early fault signal is very faint, easy to be interfered with by the environmental noise, even submerged. us, the fault signal may have a low signal-noise ratio, which is a challenge for fault prognostics.
(2) Various faults may occur simultaneously which is referred to as multifault and must be taken into consideration.
is work attempts to address the aforementioned problems and proposed a deep learning-based intelligent fault detection and analysis solution of wind turbine gearboxes using the linear discriminant convolutional neural network. A linear discriminant criterion-based linear discriminating loss is introduced into the loss function to enhance the discriminative power of learned features during the training process. e enhanced discriminating power is a promising factor to strengthen the generalization capability and robustness against noise.
e experimental results demonstrated that the proposed LDCNN solution can not only efficiently exact the features from vibration measurements but also provide accurate fault detection and diagnosis in comparison with the existing solutions.
In this work, compared with the aforementioned existing solutions, the main technical contributions made in this work are described as follows: (1) A deep learning-based solution is proposed and implemented to extract and learn the various forms of fault features and characteristics from the vibration measurements. (2) e linear discriminant loss is introduced and integrated into the loss function to address the technical challenge of diagnosing slight mechanical faults under a noisy environment. is effectively minimizes the negative impact of noises in the process of fault detection and classification. e rest of this work is organized as follows: the convolutional neural network is firstly overviewed in Section 2. Section 3 provides a detailed description of our proposed fault diagnosis scheme. Section 4 presents case studies and the analysis of experimental results. Finally, future research directions and conclusive remarks are given in Section 5.

Overview of CNN.
e convolutional neural network (CNN) is considered a multilayer neural network (convolutional layer, pooling layer, and fully connected layer) with a deep supervised learning architecture and has demonstrated prominent performance in many pattern recognition tasks. e convolutional layer performs a convolution of the input local regions with filter kernels and then generates the output features by the activation unit. Every kernel has the same size. We use W l i and b l i to denote the weights and bias of the i-th filter kernel in the layer l, respectively, and use a l j to denote the j-th region in layer l. e convolutional process can be described as follows: where a l+1 ij represents the input of j-th neuron in frame i of layer l + 1 and f denotes the activation function that may include the sigmoid function, rectified linear units (ReLU), and others. e max-pooling layer is one of the most commonly used layers on CNN. Let M j denote the j-th pooling window, the max-pooling transformation then can be formulated as follows: For classification tasks, the deep neural network is followed by a classifier, for example, a multilayer perceptron (MLP), a softmax classifier, or a support vector machine (SVM). Here, the softmax function-based classifier is adopted in the proposed solution. e output of the softmax classifier for a given dataset of k classes can be obtained based on the probability distribution function, as given in the following:

LDA-Based Distance Metric Learning.
A reasonable distance metric between classes is usually helpful to improve the efficiency and accuracy in pattern recognition. Metric learning is widely used in machine-learning tasks due to its capability to learn suitable metrics from training data automatically. In terms of feature learning, metric learning can learn a new discriminative feature space by feature transformation [39].
} and using D i to represent the set of training samples in i-th class, the distance between two samples under projection is formulated in Here, W is the projection matrix and the symmetric matrix W T W is called a Mahalanobis distance metric. Distance metric learning aims to learn the matrix W automatically.
e linear discriminant analysis (LDA) has been widely used in dimension reduction and pattern recognition applications [40]. It enables identifying the linear transformation matrix W that is called the Fisher criterion. Let S b denote the interclass scatter matrix and S w denote the intraclass scatter matrix, as described by equations (5)-(7). en, the Fisher criterion function can be expressed as equation (8): Mathematical Problems in Engineering e linear transformation matrix W obtained by the fisher criterion could be used as a distance metric. As to deep learning tasks, better clustering of each class and separability between classes of the extracted features is promising to strengthen the diagnostic capability. Motivated by the metric learning and fisher criterion, this work introduces the linear discriminating loss to address fault diagnosis tasks that are described in detail in the following sections.

Proposed Method for Fault Diagnosis
is section developed a deep learning-based fault diagnosis solution for wind turbines, as illustrated in Figure 1. e overall procedure of the proposed linear discriminant convolutional neural network-(LDCNN-) based solution can be described as follows: firstly, the data argumentation is carried out to increase the sample numbers by sampling the raw signals with overlap.
rough such data argumentation, a sufficient number of data samples can be obtained for the training process. en, the LDCNN is designed based on onedimensional CNN combining improved loss function to learn the highly abstracted features from the input data in an automatic fashion. Finally, the fault classification is carried out through a softmax classifier for different operating conditions with a range of fault scenarios. e implementation details of the proposed solution, including the data sample argumentation, feature exaction, and fault classification, are further explained in the following subsections.

Sample Acquisition.
e raw signal measurements can be collected by the field accelerometers, and the overlap sampling is used to produce more samples for the training and testing processes, as illustrated in Figure 1. In this work, the data augmentation through slicing the original data with overlap is used to produce a sufficient number of training samples, as suggested in [33]. Here, the segment length and overlapping rate are set as 1024 and 0.3, respectively.

e Architecture of the Proposed Model.
ere are no deterministic criteria for the CNN structure and parameters for the fault diagnosis in the existing literature. To reduce the network complexity and enhance the efficiency of the proposed model, this work simplifies the network structure through rigorous experiments. In this work, the proposed model consists of three pairs of one-dimensional convolution and pooling layers and two fully connected layers. e wide kernels [33] are adopted in the first convolutional layer. e numbers of nodes of fully connected layers are set as 512 and 128, respectively.

Loss Function Design.
In fact, the adopted loss function in the proposed solution combines both softmax loss and the proposed linear discriminant loss, and hence it can improve the classification efficiency. e target distribution and the estimated distribution are denoted as p(x) and q(x), respectively. e cross-entropy between them can be defined as Specifically, combining equation (3), the softmax loss can be written as To simultaneously minimize the intraclass and maximize the interclass variations of the extracted features, a linear discriminant loss function is proposed and implemented based on the distance metric defined in Here, L ld is defined as the ratio of intraclass variations and interclass variations, and the metric of intraclass and interclass variations can be calculated as follows: where x i denotes the features generated from the top layer, c denotes the center of feature space for all samples, and c yi denotes the center of samples that belong to class y i in the feature space. In this way, the weights of the top layer can be regarded as a linear transformation matrix and optimized in training. e loss function can be written in the following form: Here, L s is the softmax loss, L c represents the linear discriminant loss for each epoch, and α is a hyperparameter used for balancing the two parts of the loss function. Figure 1 shows that the sliced one-dimensional vibration signal segment is fed directly into the one-dimensional CNN for feature learning and classification. In the forwardpropagation, features are extracted by successive convolutional and pooling layers followed by the fully connected layers, and the feature classification is carried out by using a softmax classifier. In the backpropagation, the model is optimized by minimizing the improved loss function based on the stochastic gradient descent (SGD) algorithm. After the training process, the testing dataset is used to evaluate the proposed solution, and the accuracy of fault detection and diagnosis is used as the main metric in the evaluation.

Case 1: Validation with Simulation Signals.
is work firstly carries out the performance evaluation based on the simulated signals as adopted in [41]. e simulated signals can be expressed as Here, A i is the amplitude of the i th impulse excited by the defect and T i is the time of its occurrence, B k and ϕ k are the amplitude and initial phase of the k th harmonic caused by bearing imbalance or gear meshing; and n(t) is the white noise in the measurement. e vibration signals under both normal and fault scenarios are considered. Also, two vibration harmonics are generated under normal conditions. e simplified mathematical expressions are given by s 1 � 0.7 sin 2πf 1 t , where f 1 � 200 Hz and f 2 � 400 Hz represent two gear vibration harmonic frequencies, t represents the time, and t � n/f s where n is sample point and f s is sampling frequency which is kept 10 kHz in this experiment. In addition, the signals for both outer-race and innerrace fault scenarios are simulated as follows [41,42]: s 4 � 1.5 cos 2πf 0 t + 1 · e − 500πt 2 sin 2πf ri t , where f ro � 2000 Hz and f ri � 3000 Hz, f o � 30 Hz, and f i � 150 Hz represent the resonance frequency and fault characteristic frequency for the two kinds of faults, respectively. Here, f 0 � 20 Hz is the bearing shaft rotating frequency. e simulated fault signals of the outer-race and inner-race fault are presented in Figure 2, respectively.
In simulations, the vibration signals can be generated using the aforementioned source signals based on (14), and four kinds of vibration signals are shown in Figure 3.
In this case, in total, 500 samples are used for each class and each sample consists of 400 data points, and hence the dataset contains 2000 (500 × 4) samples, where 70% of the samples are used for training and the remainder is used for the performance test. In experiments, different solutions are implemented as the comparison benchmarks: the 1D CNN [37], traditional 2D CNN [35], WDCNN with the wide kernel in the first layer [38], and two machine-learning models with 15 time-domain statistical features [43]. Figure 4 presents the numerical results of the evaluated solutions in terms of the detection accuracy based on the test datasets.
In Figure 4, the numerical results demonstrate that the proposed fault detection and diagnosis methods can efficiently detect and diagnose both single and multiple coexisting faults. e superiority of the proposed LDCNN-based solution is confirmed compared with other existing solutions.

Case 2: Validation Using the Bearing Testbed.
e performance of the proposed algorithmic solution is further evaluated using the tested at the Case Western Reserve University Bearing Data Centre [44]. In detail, the CWRU testbed consists of a 2 hp motor, a torque transducer, and a dynamometer, as illustrated in Figure 5. is work considers three types of single mechanical faults, that is, inner race fault (IRF), outer race fault (ORF), and ball fault (BF), which are introduced to the test bearings using electrical discharge machining with different fault diameters. In this study, the bearing data with a fault diameter of 0.007 inches and 0.014 inches are selected and used for performance evaluation. Here, the bearing signals are collected from the drive end of the motor by the use of accelerometers with a sampling rate of 12 kHz under four different load conditions, that is, 0, 1, 2, and 3 hp, respectively. e data of the aforementioned three fault categories with two fault diameters as well as normal operating condition data are directly adopted to evaluate our proposed solution. ere are seven categories in total and for each health condition; 150 samples are generated in each load condition, where each sample consists of 1024 data points in Mathematical Problems in Engineering this work. erefore, the constructed dataset contains 4200 samples in total, where 70 percent of them are used for training and 30 percent for testing. As in case 1, four deep learning models and two machine-learning models with manual feature extraction are applied and compared. Here, the adopted deep learning models are optimized by stochastic gradient descent (SGD) algorithm with a minibatch size of 64 samples. As a result, the 2D CNN-based method achieved a high accuracy of 99.52%, while other deep learning models ended with 100%. In contrast, machinelearning-based methods resulted in an accuracy of 94.76% and 95.36%. Hence, the effectiveness and advantages of deep learning can be confirmed due to its higher accuracy despite the lack of manual feature extraction. In addition, the effectiveness and robustness of the proposed solution are evaluated by adding the Gaussian white noise to the collected data samples. It can be observed that the noisy condition can bring about difficulty for fault detection and diagnosis, particularly in the presence of multiple simultaneously coexisting mechanical faults. erefore, robustness against noise disturbance is essentially required and needs to be evaluated. In practice, noise-added data are generated for different signal-to-noise ratio (SNR) defined in SNR � 10 log 10 P signal P noise .
Here, P signal and P noise represent the powers of the original signal and the additive Gaussian white noise, respectively.
In this work, the data measurements are used in the training phase to train the intelligent models and the noiseadded samples with various SNR are used for the testing phase. In the training stage, the hyperparameter of the loss function for LDCNN is set as α � 0.2. Figure 5 shows the experimental results of deep learning methods. Low SNR values represent greater noise power which, as a result, hinders the efficient fault diagnosis. It is evident from Figure 5 that the proposed LDCNN outperforms other intelligent methods with 100% and 93.07% testing accuracy for 8 dB and −4 dB SNR values, respectively.
is work adopted the t-distributed Stochastic Neighbor Embedding (t-SNE) to represent the exacted features. Figure 6 illustrates the extracted high-dimensional features for 7 different conditions under two noisy conditions. e extracted features can be differentiated for SNR � 4 dB, whereas for SNR � −2 dB, features extracted by traditional 1D CNN are heavily overlapped, while features extracted by the proposed LDCNN are fairly distinguishable. is confirmed the performance of the proposed solution in terms of robustness under different noisy conditions.
It should be noted that although the effectiveness of the developed solution is confirmed by the numerical experiments, the proposed solution for multifault analysis needs to be further evaluated. To assess its performance in multiple fault scenarios, this work utilizes the single fault signals to construct multiple fault vibration signals by employing mixing matrix [45] and nonlinear function [46] given by where S is the set of single fault signals and A and tan h(·) are the linear mixing matrix and the nonlinear function, respectively. e nonlinear mixture is employed to mimic the vibration signals of realistic multiple faults. is work considers multifault signals based on only two types of faults to eradicate the combinational complexity. Specifically, any two of the three types of single faults with any fault size are selected to form various multiple faults to generate the e obtained measurement datasets are given in detail in Table 1. ere are in total 13 categories and 7800 samples used in this study. Here, 5460 randomly selected samples and the remaining 2340 samples are used as the training set and testing set, respectively. Figure 7 provides the detection accuracy of the training dataset and testing dataset for deep learning-based solutions. It is clearly seen from the training and testing results that the proposed LDCNN and 1D CNN have a faster convergence rate as compared to 2D CNN. Moreover, the testing accuracy of LDCNN is higher that 1D CNN, which demonstrates the effectiveness of improved loss function. Considering the fact that the multifault data are manually constructed and have already introduced noises and errors, evaluating noise immunity by adding Gaussian white noises may not be consistent with realistic situations and valueless, which is left out for the present.

Case 3: Validation with Realistic Data.
e proposed method is further evaluated through the adoption of realistic wind farm measurements. In this case, the realistic vibration signals are measured and collected from the operating wind turbines with the sampling rate at 25.6 kHz and the rotating speed of each selected wind turbine is about 1100 rpm. In addition to the bearing fault, the gearbox is also prone to gear failure caused by harsh operation state and extreme environmental conditions. Here, five different conditions are measured from a group of fault-diagnosed wind turbine gearboxes, including the bearing rolling ball fault (B), the inner raceway fault (I), the gear fault (G), and the multiple faults compounded by the ball and gear fault (B&G), and the inner race and gear fault (I&G), respectively. e measurements of the accelerometer installed at high-speed shaft bearing on the motor side are recorded. As a result, 300 measurement samples of each operating condition are obtained to set up the measurement dataset containing in total 1500 samples. e performance of the proposed LDCNN is compared with the existing solutions, and the performance in terms of accuracy is provided in Table 2. It is evident from the numerical results that the developed LDCNN can achieve   Figure 8. It is noteworthy from the confusion matrix that the multiple faults may be diagnosed as a single fault due to the variability in the size of each single fault. However, such fault detection cannot be ruled out as a misclassification case because multiple fault condition is also one of the fault conditions.

Conclusion and Remarks
In this paper, a linear discriminant CNN-based diagnostic solution is proposed for efficient detection and diagnosis of multiple coexisting mechanical faults in the operational wind turbines. e proposed solution is extensively assessed through simulation and experiments. In addition to the accurate performance of diagnosis for multiple faults, the noises immunity of the proposed algorithmic model is enhanced to provide excellent performance under the conditions with a low signal-to-noise ratio. For future considerations, the following two research directions will be particularly examined. To further facilitate the proposed solution, the scalable data samples argumentation and accurate classification are required. us, the effectiveness and efficiency of the proposed solution need to be further validated and extensively assessed with massive data measurements of various other kinds of faults and operating conditions. A physical testbed that can simulate various kinds of bearing and gear faults should be established and used for further research. Also, more advanced data-driven optimization and machine-learning-based techniques can be developed and incorporated into the fault feature characterization and recognition.

Conflicts of Interest
e authors declare that they have no conflicts of interest.