Health Status Assessment for Wind Turbine with Recurrent Neural Networks

In order to improve the safety, efficiency, and reliability in large scale wind turbines, a great deal of statistical andmachine-learning models for wind turbine health monitoring system (WTHMS) are proposed based on SCADA variables. The data-drivenWTHMS have been performed widely with the attentions on predicting the failures of the wind turbine or primary components. However, the health status of wind turbine often degrades gradually rather than suddenly. Thus, the SCADA variables change continuously to the occurrence of certain faults. Inspired by the ability of recurrent neural network (RNN) in redefining the raw sensory data, we introduce a hybrid methodology that combines the analysis of variance for each sequential SCADA variable with RNN to assess the health status of wind turbine. First, each original sequence is split by different variance ranges into several categories to improve the generalized ability of the RNN. Then, the long short-term memory (LSTM) is procured on the normal running sequence to learn the gradually changing situations. Finally, a weighted assessment method incorporating the health of primary components is applied to judge the health level of the wind turbine. Experiments on real-world datasets from two wind turbines demonstrate the effectiveness and generalization of the proposed model.


Introduction
With the development of wind energy, sensing technologies, and wireless communications, various SCADA data of wind turbines have been collected in recent years. According to [1,2], the research of wind turbine health monitoring system (WTHMS) motivates the promotion of detecting faults and predicting working conditions. The WTHMS builds the model using historical collected data and decides on the online SCADA data of the monitored components, which magnifies the ability of updating in real time of WTHMS.
The existing methods for WTHMS can be roughly classified into three categories: physical-based approaches [3], machine-learning approaches [4], and hybrid approaches [5]. The accuracy of physical-based approaches and hybrid approaches including physical-based approaches would be limited because of the insufficient knowledge on specific domains. As a result, physical-based approaches may be incapable when handling unknown operated mechanisms. In response to the problems of physical-based approaches, machine-learning methods based on monitoring data are utilized to characterize the degradation of system without any domain knowledge or indispensable expertise [6]. Hence, machine-learning methods are more suitable for the health degree assessments of the wind turbines, which are composed of different components to build the complex mechanical system. Although many researches have been proposed for health monitoring in different applications, there are still several disadvantages as follows.
(1) The traditional machine-learning methods, such as support vector machines (SVM) [7], artificial neural network (ANN) [8], and fuzzy methods [9], tend to yield binary classification on the health status of machines (i.e., good or bad). Moreover, the current approaches may not be available due to the neglect of nonlinearity and complex structure of wind turbine when it leverages the health status monitoring [10].
(2) The cost for searching the optimal parameters space belonging to a chosen model is typically expensive, especially when it comes to the complex and nonlinear system.

Mathematical Problems in Engineering
Furthermore, the performance of the traditional machinelearning methods could be influenced by many factors, such as the environmental volatilities, various degradation modes, and operating discrepancies. Therefore, the single model trained by the collected data under certain situations may not generate good results to other conditions. As a potential solution, the analysis of variance is performed to differentiate the underlying trends from data. The selection of the training samples with the similar fluctuant level is very crucial to the corresponding individual learning model.
As the powerful branch of the machine-learning methods, deep learning is labeled by its capability of achieving the hierarchical representations from raw input data by performing nonlinear transformations with multiple hidden layers [1]. Deep learning techniques have been successfully used in varying applications including natural language models [11][12][13], phoneme recognition [14], and acoustic classification [15]. The deep learning models are also actively procured in the field of health status assessment. Stacked denoising autoencoder (SAE) fed by the extracted frequency features from time series data diagnoses the machinery fault [16]. Deep belief network (DBN) adopts the statistical feature and frequency-domain feature for degradation assessment about an accelerated bearing life [17] and the gearbox fault diagnosis [18]. Most existing models would provide the binary value to act as the assessing results, in which one represents the normal status and zero represents the fault status. They cannot produce the quantitative analysis for the degree of degradation on some components. In addition, the mentioned models all focused on one component or linear system. As a complex and nonlinear system, the wind turbine would pose great challenges on evaluating the health status by the existing models. Therefore, to relieve the abovementioned drawbacks, we apply a novel model for the machinery health status assessment in a numerical perspective.
In this paper, a novel architecture combining the analysis of variance with long short-term memory (LSTM) is proposed for WTHMS. The analysis of variance for each SCADA variable is first performed to categorize the training samples into different subspaces. Different from the traditional neural networks, as a variant of recurrent neural network, LSTM exploits the internal memory from input data to analyze the intrinsic dependencies. Generally speaking, the dependencies in the time series can be classified into two types: shortterm dependency and long-term dependency. While the health status of wind turbine has the long-term dependency, we leverage the LSTM to monitor the health status of the wind turbines via the sequence of SCADA variables. During the training process, the dependencies among each sample would be embedded into the LSTM by feeding the SCADA value in each time point together with the previously hidden states to the hidden layer. In addition, we apply a weighted assessment method which performs a synthetical consideration on the discrepancies between the predicted value and actual value of the eleven SCADA variables to define the health degree of wind turbines. Benefiting from the analysis of variance and LSTM network, the accurate predicted results can be obtained. While the weighted assessment method can conduct the quantitative analysis for the health degree of the wind turbine by considering the major components, the experiments on two wind turbines reveal that the reasonable accuracy and efficiency can be achieved by the proposed model. This paper is organized as follows. In Section 2, the brief introduction about the proposed model is presented. Section 3 describes the analysis of variance and LSTM for monitoring the health status of wind turbines. Numerical results and conclusion are presented in Sections 4 and 5, respectively.

Methods/Experimental
In this section, the proposed method would be briefly introduced and its process is shown in Figure 1. SCADA system, as a monitoring system, reports and collects various factors affecting the reliability of wind turbine and is widely used for health status assessment. The analysis of variance splits the training sample into multiple subspaces owing to the volatility levels of each SCADA variable. The features selected from synchronized sequence of SCADA data can be concatenated to describe multiple sensors at the given time. LSTM as a variant of RNN can model long-term dependency of time series by storing the historical input and mapping the entire inputs to target vector. Considering the discrepancies between the predicted value and the actual value, the weighted assessment method that assigns the smaller weight to component with larger error is applied to exploit the health monitoring. To demonstrate the generalization and effectiveness of the proposed model, the comparisons with several state-of-the-art models for two tasks assessing the health status of wind turbine are conducted. The main contributions of our work are as follows.
(1) The analysis of variance designs the thought of classifying in the offline procedure and mapping in the online procedure. Based on the variance levels of each SCADA variable, the sequence is categorized into different classifications to increase the divergence among the single learners but also avoid the overfitting problem.
(2) LSTM is proposed to capture the future and past context of each selected SCADA variable. The concatenated matrix can be viewed as the final representations generated in our proposed framework.
(3) The weighted assessment method is fit for handling multidimensional scenario by assigning different weights to the corresponding features as verified in Section 4.
(4) Contrastive experiments including two cases are processed to explore the generalized ability within the proposed model.

The Proposed Model
Due to the harsh environment and complex operating structures, it is challenging to perform the monitoring task. Thus, this section would introduce a novel hybrid method to assess the health degree of wind turbines.  Figure 1: The process of the proposed method.

The Analysis of
{ 1 , 2 , . . . , }, where m is the dimension of the selected SCADA variables, is the sequence of the corresponding feature, and n is the length of sequence. The health of wind turbines or primary components is degrading gradually rather than suddenly, and capturing the underlying trends of SCADA variables is in favor of gauging the health degree of the machine. In the proposed methodology, the variance of each SCADA variable is first carried out to categorize the training samples according to Then, the variance series of each SCADA variable would be sorted in ascending order, as illustrated in Figure 2.
As shown in Figure 2, efficacious intervals are defined between the minimum and maximum variance, which can be denoted as intervals , ]}. We can derive the following rule from the value of .
If the variance of one training sample meets the condition −1 ≤ variance ≤ , the training sample is added into this category.
Samples with similar variance level being gathered into one group can alleviate the influence of volatility in the sequence of SCADA variable efficiently. Then, each LSTM whose parameters space is built in corresponding category can have better approximated capacity. neurons can form a directed cycle. Therefore, it can process arbitrary-length sequences by memorizing the dynamic information of input patterns. The structure of the traditional RNN is illustrated in Figure 3. It consists of an input layer, an output layer, and a hidden layer with recurrent connections. ℎ is used as the output of the hidden layer, which can hold the information from the input vector. The current hidden output would be updated by receiving input from sample with the same time point and the previous hidden output ℎ −1 :

Recurrent Neural
where ℎ ∈ ℎ × ℎ , ∈ × , and ∈ ℎ are viewed as the parameters of RNN. denotes the nonlinear activation function tanh. In (3), ℎ −1 can be viewed as the memory of the previous sample. Thus, after modeling all the sequential data, the whole sequence would be mapped into the hidden output at the last time step. The activation values of the output layers are computed as where denotes the sigmoid activation function. As we can see from Figure 4, RNN can be regarded as the multiple copies of the same network.
Owing to the exploding gradients during backpropagation, the RNN may ignore the long-dependencies of the sequence. To alleviate this drawback, long short-term memory network introduces the gate function in the design of nonlinear transformation.

Long Short-Term Memory
Network. LSTM has the totally different submodules compared with RNN. As shown in Figure 5, there are three gates including the forget gate, the input gate, and the output gate in the standard LSTM network, which serve as the controller for the preservation of the previous information. The LSTM model is built as the following steps.
The first step is to decide the degree of forgetting the information from the cell states. The decision would be made by the forget gate as According to the (5), the LSTM model views ℎ −1 and as the input vector and produces a number between 0 and 1 as the output. If the value is 1, it means that the information would be completely kept. If the value is 0, the information would be totally forgotten.
The next step is to decide the degree of storing the information in the cell states. The updating values would be controlled by the input gate as Then, the state value would be calculated by combining the old state and new candidate value as In the final step, the cell states would be firstly filtered as Then, the output value would be produced as Unlike traditional RNN, the memory module updated in linear way reduces the influence of gradients. Furthermore, the cell of the LSTM would be regulated to update with three gates: (1) The input gate receives the information of the current sample .
(2) The forget gate determines the degree of which information would be extracted from the historical data.
(3) The output gate controls the context output from the cell.
This design would allow the LSTM model to remove the invalid information in the long period.

The Weighted Assessment
Method. The wind turbine as a complex and nonlinear system is composed of several subsystems having dependent interactions, such as the gearbox, generator, and rotor. Then, assessing the health degree of the wind turbine should adopt the SCADA data to monitor each subsystem, of which the health state is totally different because of the varied characteristics. Therefore, the analysis of the degraded patterns of the wind turbine would capture the gradual process of deterioration in each significant component. The weighted assessment is adopted in this paper to reinforce the ability of gauging the different health status of wind turbines.
Compared with the existing binary fault diagnosis method, the weighted assessment method allocates the fuzzytype membership function as the weight for the corresponding component: where |Δ | is the Euclidean distance between the actual measured data and the predicted data from LSTM model. By this observation, the larger distance would be applied with the smaller weight. So the assessment of the wind turbine is as follows:  where ℎ denotes the health of wind turbine at the jth time point and n is the number of components. In the multiple fault situations, levels of health degree can be decided according to the classes to which wind turbine belong. As shown in Figure 6, the health degree is divided into four classes. Level 4 indicates that the wind turbine works in normal status. Level 3 represents the fair state of wind turbine. Level 2 shows that the wind turbine is going to fail. Additionally, Level 1 would provide red alerts which mean that the wind turbine has failed and the alerts must be dealt with immediately.

Health Degree Assessment Based on the Proposed Model.
With the proposed model incorporating the analysis of variance and LSTM, the health degree assessment of the wind turbine is straightforward. Figure 7 illustrates the flowchart of the offline training and online evaluating. The process of offline training and online evaluating are as follows:

(i) Offline Training
(1) Select the SCADA data for training the proposed model as the training sample.
(2) Normalize the training sample of each selected feature.
(3) Classify the training sample of each feature by the analysis of variance.
(4) Train the corresponding LSTM of the selected category.
(5) If there are remaining SCADA data which have not been trained to generate the LSTM, go to (4).
(6) Output all the parameters for the LSTM in each category.

(ii) Online Assessing
(1) Normalize the testing data and match the corresponding category according to the variance value.
(2) Obtain the output from the LSTM of each feature.
(3) Compute the discrepancies between the actual value from SCADA system and the predicted value from the proposed model.
(4) Perform the health degree assessment by the weighted method.

Data Description.
This section justifies the performance of the proposed model by concatenating the regressive value of three significant subsystems of two wind turbines from a wind farm in Hebei province and then computing the discrepancies between predicted and actual values. The wind turbine monitoring system can be roughly classified into three subsystems: the electrical subsystem, the temperature subsystem, and the control subsystem. Each subsystem with different components would provide different effect on the health status of wind turbine. To assess the performance of the proposed model, eleven SCADA variables belonging to the three subsystems of two wind turbines are conducted in this paper. Two experiments which are related to different health degrees all have 14690 training samples and 295 testing samples, which record the mean value of SCADA variables in the last 15min. A total of 11 variables including four continuous generator measurements, three gearbox features, and four other compositions are selected as the assessing items, which are listed in Table 1. The basic parameters of four wind turbines are as follows: the cut-in wind speed is 3m/s, the cut-out wind speed is 27m/s, the rated wind speed is 15m/s, and the rated wind power is 1.5MW.

Performance Criterions.
After the analysis of variance and training of the LSTM are finished, performance criterions in terms of the root mean square error (RMSE), the standard deviation of the error (SDE), the mean absolute error (MAE), and the bias (BIAS) are estimated on the testing data: Mathematical Problems in Engineering 7 Generator temperature temperature subsystem 4 Generator bearing temperature temperature subsystem 5 Gearbox temperature temperature subsystem 6 Gearbox bearing temperature temperature subsystem 7 Shaft bearing temperature temperature subsystem 8 Battery box temperature temperature subsystem 9 Top box temperature temperature subsystem 10 Hydraulic pressure electrical subsystem 11 Generator cooling air temperature temperature subsystem where is the predicted value from the proposed model, is the actual value of SCADA, and variable is the size of testing data.

Experimental Setting.
To further demonstrate the improvements derived from the proposed model, the traditional LSTM neural network, the ELMAN neural network, and ELM neural network are selected as the benchmark models.
To verify the improvements benefited from the analysis of variance, the traditional LSTM neural network which has the same parameters space with LSTM in each category in the proposed model is fed with 10 inputs to compare the generalized ability with the proposed model. It should be noted that we fix the configurations rather than fine-tuning them on different datasets in our experiment. The learning rate is set to 0.2, the number of training epochs is 500, and the size of hidden layer is 10.
The ELMAN neural network as a recurrent neural network can memorize the dependencies under the SCADA variable series by feeding the outputs from the hidden layer to the hidden layer. Therefore, the ELMAN neural network which is composed of four layers including input layer, hidden layer, recurrent layer, and output layer is suitable for modeling and predicting the fluctuation of SCADA variables. In this paper, 14 neural nodes in the hidden layer with sig function as the activate function are designed. The predicted procedure of the ELMAN neural network can be expressed as  where ℎ denotes the output of the hidden layer, is the input time series at time t, and , , are the connecting weights between the input layer and hidden layer, between the recurrent layer and hidden layer, and between hidden layer and output layer, respectively.
As a Single-Layer Feedforward Network (SLFN), the ELM neural network is gaining more and more attentions due to its fast learning speed. Composed of one input layer, one hidden layer, and one output layer, the ELM neural network calculates the output weights via Moore-Penrose inverse to avoid the local optimality and time-consuming problems. To fairly assess the performances of the benchmark models, the ELM neural network also has 10 inputs and 14 hidden neurons which is identical to the ELMAN neural network.   values of two candidate wind turbines from the proposed model, the traditional LSTM neural network, the ELMAN neural network, and the ELM neural network.

Multiple-Variables
As shown in Figures 8-18, the selected variables listed in Table 1 can characterize the deteriorated procedure of the components of wind turbine and then reflect the health degree to which the wind turbines are belonging. The curve graphs of generator speed, rotor speed, and hydraulic pressure generate the obvious error in the front part, in which the generator speed and rotor speed drop to zero gradually and the hydraulic pressure reduces about 15bar with the same time. Meanwhile, the other variables present the similar fluctuations, which indicate that the health degree of wind turbine may degrade during this interval. To further verify the performance of the forecasting model, four criterions mentioned in Section 4.1 have been computed collectively, which are listed in Table 2. As we can see from Table 2, the detailed criterions of generator speed, rotor speed, and hydraulic pressure have the larger errors than the other variables. Combining with Figures 8-18, the lines predicted by the proposed model of the remaining features are nearly overlapped with the actual one. Therefore, the health state of the wind turbine may be influenced by the generator or hydraulic press. Furthermore, comparative results of the traditional LSTM neural network, the ELMAN neural network, and the ELM neural network are presented in Tables 3, 4, and 5 to verify the efficiency of the proposed model.
As shown in Tables 3-5, compared with the traditional LSTM, the proposed model improved the RMSE, MAE, BIAS, and SDE by 0.582, 0.541, 0.442, and 0.364 on the eleven SCADA variables in total. This is mainly due to the analysis of variance, which categorizes the time series of SCADA variables to promote the predictability. As for the ELMAN network and ELM network, limited by the shallow architecture, the predicted performances are far behind than the proposed model.    To demonstrate the generalization of the forecasting model, we adopted another wind turbine as the experimental subject. By taking the training and testing samples having the same size with wind turbine 1, Figures 19-29 and      ELMAN network and ELM network produced the worst predicted results on the eleven SCADA variables. As for the performance criterion, compared with the proposed model, the ELMAN network increases the RMSE from 0.899 to 2.752,    the BIAS from -0.26 to -1.204, and the SDE from 0.802 to 2.203.
As drawn in Figures 19-29, dissimilar to wind turbine 1, the middle and latter parts of the generator speed and rotor     speed have presented the trend of volatility. However, the predicted values of the hydraulic pressure are nearly identical to the actual values, which is totally different from the first case. It in another view proves the complex structure of the wind turbine and the necessity of the accurate health state assessment method. power curves. The health degrees of wind turbine 1 in the testing time are almost larger than 0.6, except for the front part in which the health degrees of wind turbine 1 gradually drop to zero. Meanwhile, in the same interval, the wind power reduces from 1600 to 0. Combining with Figures 8-18, the wind turbine may be in fault state in this part, the wind turbine may be in fault state. As for wind turbine 2, Figures 32-33 give a rough estimate of the fluctuant health trend under the operating environment. As regulated in Section 3.4, in the middle and latter part of health degree curve of wind turbine 2, the values are around 0.3 which means the wind turbine would provide "red alerts." Generally speaking, wind turbine 1 tends to work steadier than wind turbine 2 in the testing time.

Conclusion
In this paper, the analysis of variance is embedded into the LSTM network to perform the prediction of SCADA data which is observed as the time-series data. Then, the weighted assessment method is introduced to evaluate several significant components and achieve the health degree to which the wind turbine is belonging. The performance of the proposed model is considered acceptable owing to the experimental results. However, to achieve the best assessing performance, the normal running data should be collected in the first place which would be a time-consuming task. In addition, in this paper, the optimal grouping by the analysis of variance is searched by experience. The further research can   focus on the optimal classification by intelligent algorithms for producing the best predicted performance.

Data Availability
The datasets generated and/or analysed during the current study are not publicly available due to the requirement of the security but are available from the corresponding author on reasonable request.

Conflicts of Interest
The authors declare no conflicts of interest.