Fault Diagnosis Strategy for Wind Turbine Generator Based on the Gaussian Process Metamodel

To facilitate continuous development of the wind power industry, maintaining technological innovation and reducing cost per kilowatt hour of the electricity generated by the wind turbine generator system (WTGS) are effective measures to facilitate the industrial development. Therefore, the improvement of the system availability for wind farms becomes an important issue which can significantly reduce the operational cost. To improve the system availability, it is necessary to diagnose the system fault for the wind turbine generator so as to find the key factors that influence the system performance and further reduce the maintenance cost. In this paper, a wind farm with 200 MW installed capacity in eastern coastal plain in China is chosen as the research object. A prediction model of wind farm’s faults is constructed based on the Gaussian process metamodel. By comparing with actual observation results, the constructed model is proved able to predict failure events of the wind turbine generator accurately. The developed model is further used to analyze the key factors that influence the system failure. These are conducive to increase the running and maintenance efficiency in wind farms, shorten downtime caused by failure, and increase earnings of wind farms.


Introduction
Nowadays, global climate warming has caused frequent occurrences of climatic anomaly, extreme climate, and major natural disasters, thus bringing serious challenges to sustainable development. To solve the significant increase of energy demands and greenhouse gas emissions, energy transformation and development is a big problem which is highly concerned by the international society. As the representative of renewable energy sources, wind power is the renewable energy power generation technology which grows the most quickly in the world. e large-scaled commercialization of wind power has been formed gradually. Wind power generation is becoming a mature technology with cost competitiveness [1][2][3][4][5][6]. e Renewables 2019 Global Status Report (GSR) which is released by "Renewable Energy Policy Network for the 21st Century (REN21)" states that, in 2018, the newly installed capacity of wind power in the world reached 51 GW and the accumulative installed capacity was 591 GW. In the past one decade, China occupied the leading role in global wind power development. In the global newly installed capacity of wind power in 2018, China's market share accounted for 41%, manifested by 21.1 GW of the newly installed capacity of wind power and 209.5 GW of the accumulative installed capacity. China has become the country with the largest wind power scale in the world [7].
However, the rapid development of wind power generation in China depends on the Feed-in-Tariff (FIT) mechanism to some extent. Currently, fixed electric charge policy is the main supporting mechanism to renewable energy sources in China, but the FIT mechanism cannot offer sustainable capital supports to the renewable energy source project with increasing scale. It is imagined that wind power development in China has to get rid of dependence on FIT basically from 2020 to 2022 [8]. On May, 2018, the National Energy Administration hereby regulated the new centralized onshore wind power projects and offshore wind power projects both shall configure and determine the ongrid price [9].
To help wind power industrial development cope with the challenge of FIT reduction successfully, optimizing management of the wind farms is another effective strategy except for reducing cost per kilowatt hour of the electricity generated by WTGS by technological innovation. Facing with annual increase of installed capacity in wind farms and running time of generators, owners of the wind farms propose higher requirements on enhancing maintenance decision management of wind turbine generators, controlling operation and maintenance cost, and protecting safe sound operation of the generators [10,11]. Since the natural environment of the wind farm is generally tough, the operation health state of the wind turbine generator fluctuates violently. As an important part of the power production system in the wind farm, operation fault of the wind turbine generator may influence the whole production link significantly. erefore, how to enhance operation availability of the wind turbine generator to increase grid generation of the wind farm becomes extremely important.
In this aspect, major influencing factors that influence operation availability have to be recognized firstly. ere are various influencing factors such as wind, blade, rotor, bearing, generator, gearbox [12], pitch, and so on. ey may bring about different types of faults, which include blade faults, generator faults, gearbox faults, pitch machine faults, and yaw system faults [13]. When gear and bearing goes wrong, it could lead to gearbox faults and reduce the generator accuracy. If the yaw counter times out, the wind turbine generator would be unable to get best wind direction in time, which is not conducive to power generation efficiency. We can also analyze the pitch angle to predict the faults of the pitch mechanism to prevent blade damage and unstable power generation. ese faults have to be predicted and diagnosed accurately by scientific methods, thus enabling to formulate targeted strategies. However, it is found from actual maintenance practices of wind turbine generators that there is a high nonlinear relationship between the turbine fault and relevant factors. Traditional linear model is difficult to analyze the relationship between the turbine fault and relevant factors. Hence, it is very important to construct a nonlinear model for turbine fault analysis.
ere are various nonlinear models which are extensively used, including artificial neural network (ANN) [14][15][16], support vector machine (SVM) [17,18], Gaussian process (GP) metamodel [19][20][21], and adaptive networkbased fuzzy inference system (ANFIS) [22]. As a nonlinear dynamic system, the neural network can map nonlinear functions and has been widely utilized in pattern recognition and classification, such as distinguishing a variety of factors to different types of faults. Back propagation neural network is the most commonly used networks, and it is an agile tool to settle complex identification problems in fault diagnosis of the wind turbine. It has good abilities of parallel distributed processing, self-learning, associative memory, and precisely nonlinear pattern recognition [16]. However, there are many types of faults in operation of the wind farm, which make the network structure difficult to determine, or there are too many nodes; it is therefore inconvenient to train this model. SVM is a universal learning method, evolved from statistical learning theory. Nonlinear SVM has been used to appraise the deviation of prediction to reflect the potential risk factors in the shortterm wind power. e research shows that wavelet transform-support vector machine (WT-SVM) models have better accuracy than the traditional radical basis function (RBF) SVM models due to their multiresolution features [12]. But, the choice of the optimal wavelet base is confusing. e Gaussian process metamodel has been widely used in many different fields due to its good mathematical characteristics and flexibility [23,24]. Although the Gaussian process metamodel has also been applied in system fault analysis, research on its applications in wind turbine generator fault analysis is still at the initial stage. In this paper, the Gaussian process metamodel is constructed for prediction of the wind turbine generator fault.
is model is used to analyze relationships between different factors and turbine fault and predict the probability of turbine fault occurrence under different states. e main contributions of this paper can be summarized as follows: e rest of this paper is organized as follows: the description of the wind turbine system is given in Section 2. e development of the Gaussian process metamodel and its application to fault diagnosis of the wind turbine generator are given in Section 3. A case study is given in Section 4, and the conclusion is provided in Section 5.

Wind Turbine System
Wind turbine system is composed of several different subsystems which make the system quite complex. e five main subsystems include driving system, electrical control system, pitch system, yawing system, and cooling system. e failure of any subsystem may make the whole system shutdown. e description of these subsystems is given as below based on the double-fed induction generator. e driving system, which is used to raise the rotation speed through the main gearbox, usually includes main shaft, gearbox, and coupling. As the driving ratio is quite high, the output speed of the gearbox will be raised to more than 100 times. If an error happens in any component of this system, it may lead to vibration or wheel gear damage. e electrical control system is used to monitor all the signals of the wind turbine and adjust the grid frequency.
is system, especially the converter system, is the key part of the wind turbine.
is is because only the frequency adjusted by the converter system meets grid's requirement, the electric power can then be transported from the wind turbine to the main grid stably. Any error of this system may make the wind turbine fail to be cut-in.
e pitch system is used to catch the wind energy in an optimization way. By the intelligent control according to the different wind speeds and directions, the pitch system can be optimized to catch the maximal wind energy. To support the optimization of control, there are many protection sensors in the pitch system. If some important sensors are broken, it may lead to the shutdown of the wind turbine or being out of speed. erefore, the maintenance of sensors in the pitch system is quite important during the operation.
Yawing system is used to detect the wind direction. e yawing system usually contains four or six yawing motors and some sensors. When the wind direction changes, the motors will work and adjust the wind turbine to the optimal direction where the pitch system can capture the highest wind energy. To trigger this motion, the signal has to be successfully sent from the wind vane. If the wind vane or one of these motors has error, the wind turbine could not find the correct wind direction, and the wind turbine will stop.
Cooling system has two types: water cooling system and air cooling system. By contrast, the water cooling is usually more efficient than air cooling. e function of this system is to decrease the temperature of the electrical control system and the generator. Once the running temperature is higher than the system setting, the wind turbine will stop to avoid the persistent increase of temperature and protects the internal devices.

Fault Diagnosis with a Gaussian Process Metamodel
Gaussian process metamodel is a statistical approximation of the original system based on the input and output data. When using the metamodel, the original system can be viewed as a black-box model. e metamodel is used to analyze relationships in the system without understanding on the internal structure of the system. To apply the Gaussian process metamodel for fault diagnosis, the input includes the fault factors (the factors that cause the fault) such as the wind power mismatch failure, pitch charger failure, and cable switch failure. Specifically, each fault factor has its standard setting value, and the input values to the model can be represented as the percentage of deviation from the standard setting value. e output of interest in the study is the occurrence of different fault levels under different fault systems. As the fault level depends on the failure time, the essential output of interest is the failure time of each fault. After the identification of the input and output in the system, the data for these input and output are collected from real system observations. Next, the details of Gaussian process metamodel development and the application for fault diagnosis are provided.
Let y(x) be the system output for the input x. In our case study, x represents the deviation from standard setting value for the considered fault factors, and y(x) represents the failure time for the wind turbine system. Specifically, four fault systems are considered for the wind turbine system, including the pitch system, cooling system, converter, and others. For each fault system, x can represent the corresponding fault factors, and y(x) can represent the failure time for this fault system. For instance, in the cooling system, x denotes the deviation values for low inlet valve temperature and generator temperature comparison failure, and y(x) denotes the failure time for this cooling system. A Gaussian process metamodel can be developed for each fault system. erefore, there are four Gaussian process metamodels that can be developed for four fault systems. As model development procedure is the same for different fault systems, the output y(x) and the corresponding input x are taken as examples to show the model development.
With the Gaussian process model, y(x) can be hypothesized as the Gaussian process, which is specified by its mean function m(x) and its variance function Σ(x, x). e mean function m(x) can be hypothesized as a constant in most real applications. e variance function Σ(x, x) describes spatial correlation among different input points, and the corresponding covariance function can be expressed in different forms [25]. e comparative study of different covariance functions is performed by Pandit and Infield [26]. Here, the commonly used Gaussian covariance function is adopted due to its flexibility. Specifically, the Gaussian covariance can be expressed as follows: where τ 2 is an unknown variance and R(x, x) is the correlation function. e correlation function can be further expressed as where θ is an unknown parameter and ‖d‖ is the Euclidean distance between two input points. Hypothesis based on Gaussian process can construct the Gaussian process metamodel by using the system input and output data. e predictive function of the model output can be further deduced. If y * is the output value under the unknown input setting x * , the predictive function of y * can be expressed as

Mathematical Problems in Engineering
where y is the actually observed data, m(x) is the mean function, Σ is the variance function between the observed points, Σ * is the variance function between the unobserved point and observed point, and Σ * * is the variance of the unobserved point. e form of the variance function is given by equation (1). It can be known from this predictive function that the prediction of the unobserved point follows the normal distribution with the given mean and variance. e predictive mean is m(x) + Σ T * Σ − 1 (y − m(x)), which can be used to represent the expected failure time at input setting x * . e predictive variance is Σ * * − Σ T * Σ − 1 Σ * , which can be used to measure the predictive error of the failure time. One advantage of the Gaussian process metamodel is that it can easily carry out uncertainty assessment so as to provide more robust results [27]. In this paper, the observed failure time is treated as deterministic, and there are no enough data to measure the noise of the observations. erefore, the predictive variance only accounts for the spatial uncertainty due to the model development. e extension to consider the stochastic model can be found in the research studied by Pandit and Infield [28].
ere are several unknown parameters in the developed Gaussian process metamodel and the derived predictive function, such as τ 2 and θ.
ese parameters are often treated as constant using the estimated values from estimation measures such as the maximum likelihood estimation method [29]. However, there are uncertainties for these estimations, and these uncertainties may significantly influence the model accuracy and predictive performance [30]. To account for these uncertainties, a Bayesian estimation method is proposed which is easy for uncertainty analysis. Let ξ denote the unknown parameters that need to be estimated and g(ξ) denote the prior for these parameters. Different priors can be assigned such as noninformative prior or conjugate prior [31]. In this paper, conjugate prior is assumed for unknown parameters due to its mathematical convenience. at is, Gaussian process mean is assumed to be conditional normal, variance τ 2 is assumed to be inverse Gamma distribution, and parameter θ is assumed to be Gamma distribution. Given the likelihood that the observed data are normally distributed conditional on the unknown parameters, the posterior distribution of the unknown parameters can be derived as the following equation based on the Bayesian theory: where L(y | ξ) denotes the likelihood function. e posterior distribution can then be used to make inference about the unknown parameters. For instance, the unknown parameters can be estimated as posterior mean or mode. e uncertainties of these parameters can also be taken into account by integrating out these parameters using the numerical integration methods (e.g., Markov chain Monte Carlo method) with respect to the posterior distribution. Given the estimated parameters, the developed model can be used for further investigation. For instance, the developed metamodel can be used to analyze and predict the occurrence of different fault levels under different fault systems. e details of applying the proposed method in wind power system fault diagnosis are illustrated in the case study.

Case Study
In this case, the fault occurrence of the wind turbine generator is studied based on the Gaussian process metamodel. Four Gaussian process metamodels are developed for four fault systems, respectively, including the pitch system, cooling system, converter, and others. With the developed metamodels, the relationship between the identified fault factors and the occurrence of different fault levels for different fault systems is analyzed. Furthermore, fault occurrences are predicted in operation process of the wind farm using the developed metamodels.  (Table 1), mainly including the pitch system, cooling system, converter, and others (e.g., cable switch and torque comparison). Under different fault types, the system output considered in the prediction model is the failure time. en, the fault levels can be identified based on the predicted failure time. Four fault levels are divided according to failure time: level I (continuous failure time of unit generator ≥7 days), level II (failure time � 4-6 days), level III (failure time � 2-3 days), and level IV (failure time ≤ 1 day).

Fault Analysis.
In this case, 347 fault events for the wind turbine generator took place in the wind farm from 2015 to 2017. e failure events of the wind turbine generator are sent to the server as the analog signal type, which is then transformed to digital and consequently showed as the error code in the interface screen. According to the error code, engineers analyze these errors during the internal system firstly and later go for the onsite checking. erefore, the input values of fault factors and the failure time and fault levels for the corresponding fault system are recorded, and these data can be used to develop the Gaussian process metamodel for different fault systems. Statistics on occurrence frequency of four fault types are shown in Figure 1. Clearly, occurrences of the fault in the pitch system are significantly higher than rest three fault types.
According to fault classification standards in this case, statistics on occurrence frequency of different levels of faults for the whole system (including four fault systems) in this wind farm are carried out. Results are shown in Figure 2. e occurrence frequency of level III and level IV is higher, which is more than 3 times higher than that of level I and level II.
Based on the data obtained from the wind farm from April 2015 to March 2017, four Gaussian process metamodels are developed for four fault systems. e developed metamodels are further used for prediction and compared with the observed data. Different fault levels for different fault systems of the wind farm are predicted by the developed Gaussian process metamodel. e occurrence frequency predicted by the model and the actual observed occurrence frequency are shown in Figures 3-6. Obviously, all four fault types have unique distribution characteristics on different fault levels. In the pitch system, cooling system, and converter, the occurrence frequency of level III and level IV is more than 3 times higher that of level I and level II. However, no significant differences of occurrence frequency have been observed among different fault levels. Although the occurrences of four fault levels predicted by the model are not agreeing with actual observation values, the difference is very small. Based on further analysis on root-meansquare error (RMSE) of model prediction of four fault levels ( Table 2), no significant difference exists between model prediction and actual observation. On the contrary, rootmean-square errors (RMSEs) of four fault systems also

Fault level I, 10%
Fault level II, 12% Fault level III, 39% Fault level IV, 39%   reflect the prediction accuracy of the proposed model. e proposed model has the highest prediction accuracy in the pitch system followed by the cooling system, converter, and others successively. is order is consistent with data size difference among four fault levels in this case. Hence, it could be speculated that increasing fault data size of research objects could increase prediction accuracy of the model effectively. Noted that the Gaussian process metamodel may not perform well if the data size is too small or it is too large. If the data size is small, the developed model may not be accurate. If the data size is too large, the model can be computationally expensive. In this case study, the data size is appropriate for the model development.
e performance of the developed Gaussian process metamodels is further compared with the neural network, where a two-layer feedforward neural network with adaptive parameters and bias units is used. e RMSEs of four fault systems using the neural network are given in Table 2.
e results indicate that the RMSEs using the neural network are similar to the values using the Gaussian process metamodels.
e t-test indicates that there is no significant difference between the two methods. erefore, the two models have similar predictive performance using the obtained training data. e mean absolute error (MAE) and the R-square are also computed to assess the performance of the models. e results as shown in Table 2 indicate that the Gaussian process metamodels are slightly better than the neural networks. Furthermore, the data obtained from July 2017 to September 2017 are used to validate the developed models.
e RMSEs for different fault systems using different models are given in Table 2. e results indicate that the Gaussian process metamodels have smaller RMSE values, and the test results show that the difference is significant. erefore, the Gaussian process metamodels have better predictive performance than the neural network using the validation data in this case study. e MAE and R-square are also computed, and the results in Table 2 show that the Gaussian process metamodels have better performance. As a result, the constructed Gaussian process substitution model could predict fault type and fault level of the wind turbine generator accurately. In addition, compared with the neural network, the Gaussian process metamodels are easier for implementation, and they can account for the uncertainties in the prediction. e developed Gaussian process metamodel is further used to analyze the effects of deviation of different fault factors on the system failure time. e results are shown in Figure 7. It can be seen that the failure time is longer with larger input deviation for most factors, except factor 6 pitch safety chain failure. e reason is that the observed failure time is quite long for a single event with accepted deviation, and the average failure time for the event with larger input deviation is short. Factors 3, 6, and 12 have relatively larger failure time compared with other factors. erefore, they are more sensitive to the fault level and have larger probability to cause high-level fault. e probability of different fault levels are further calculated when the deviation of each factor is below − 5%, within − 5%∼5%, and above 5%. e results are given in Table 3.
e results show that when the input deviation is within − 5%∼5%, there is usually a smaller probability to get highlevel fault (longer failure time). When the input deviation becomes larger (below − 5% or above 5%), the probability to      get high-level fault becomes larger. erefore, it is important to make the input deviation within control in order to improve the turbine generator system's performance.

Conclusions
is paper studies wind turbine generator fault events by the Gaussian process metamodel. e Gaussian process metamodels are developed to predict the failure time and the different fault levels of different fault systems given the fault factors. In a specific case study, relationships among different relevant factors in the fault event are analyzed, and the fault prediction model in the wind farm is constructed. According to comparison between model prediction results and actual observation results, the proposed model can predict fault level for different fault types in the wind turbine generator accurately. However, model prediction accuracy is related with data size in the case study. Usually, the model is more accurate with larger data size. However, it also depends on the accuracy of the data. If the data obtained are not accurate, the model may perform even worse with more data. e developed model is further used to analyze the effects of different factors on the fault occurrence. e results indicate that some factors have higher impacts on the fault level which have larger probability to get highlevel fault, including factor pitch position comparing fault, pitch safety chain failure, and torque comparison failure. ese factors should be maintained more appropriately in order to reduce the maintenance cost. e fault factor safety chain failure would cause high-level fault with smaller input deviation. is is due to the data obtained from the wind farm. A small deviation of some input factors may cause longer failure time. It should be noted that the observed failure time may also be influenced by other factors such as the reaction time of the engineers. erefore, it is important to identify the failure time that is e developed Gaussian process metamodel provides an efficient way to analyze the impacts of fault factors without conducting experiments on real systems. In addition, using of the Gaussian process metamodel in operation management of the wind farm can predict fault events that are going to happen accurately so that maintenance staff can adopt effective protection measures or formulate maintenance schemes in advance to protect safe sound operation of wind turbines. ese are conducive to reduce failure time caused by the fault and increase power generation of the wind turbine generator, as well as increase earnings of wind farms.
In this paper, the Gaussian process metamodel is developed to analyze the system fault of the wind turbine generator for one wind farm. A potential future work is to apply this method to other wind farms to further validate the applicability of the proposed method. Another possible future work is to compare the proposed method with other metamodel-based methods such as the artificial neural network. e comprehensive comparison is needed to study the advantages of different metamodelbased methods on fault diagnosis for the wind turbine generator.

Data Availability
e "Fault data of 100 wind turbine generators with 2 MW unit capacity in this wind farm from 2015 to 2017" used to support the findings of this study are included within the supplementary information file.

Conflicts of Interest
e authors declare that they have no conflicts of interest.