A New Wind Power Forecasting Approach Based on Conjugated Gradient Neural Network

Prediction of the output power of wind plants is of great significance for running a power system comprising large amount of wind generators. According to the prediction results, it is possible to determine the quotas of power generation in power generators and distribute resources in a scientific and reasonable way. In the past, the Grey Neural Network was widely applied in predicting wind power while it could hardly meet the engineering requirements due to the structure of ANN. The problem of slow convergence speed and large amount of iterations, especially in case of large scale data, would pose challenges to power prediction and the sensitivity of automatic control.This paper optimizes ANNmodel by applying conjugate gradient descent and creating Conjugated Gradient Neural Network (CGNN) in weights updating process. Experiments performed on different scale datasets have proved that the performance of CGNN improves substantially as the average iterations decreased by almost 90% without the sacrifice of prediction accuracy.


Introduction
In the context global crisis in energy ecology, developing clean energy has become an unavoidable path for China and even the whole world.With the development of technology and the decrease of cost, wind power has evolved as one of the rapidly developing clean energy sources.China is a leading nation in wind power development and Chinese wind market accounts for approximately 26.3% of the global total [1].In order to distribute the workload of wind generators, it is significant to predict wind power for connecting wind plants with power grids.Currently, there are two ways to predict the output power of wind generators.One is to predict wind power output according to forecasted environment elements, such as wind speed and air humidity.The other one is to directly predict power of the next few hours or days based on historical power data.China has implemented a considerable amount of researches on wind power generation with some methods proposed to forecast the power of a wind power station, including Persistence Method [2], Kalman Filtering Method [3], ARMA [4][5][6], ANN [7][8][9][10][11], Fuzzy Logic [12], Grey Theory [13,14], Expert System, Wavelet Decomposition [15,16], SVM [17,18], Nearest Neighbor Analysis, and Spatial Correlation Models [13,17,19].
Kalman Filtering Method is a common method for prediction.It constructs a state space model with wind speed as a state variable.With the establishment of the model, it is possible to predict power by Kalman Filtering Algorithm.Meanwhile, the result is based on the assumption that the statistic characteristics of noise are given, but in reality it is hard to estimate the statistic characteristics.Stochastic Time Serial Method utilizes a large amount of historical data to build a prediction model with its mature and simple theory.It applies to short-term prediction but works not well in longterm prediction [16,17,20].Grey Theory can resolve some uncertain problems by studying a small amount of data and information.In contrast, GM(1, 1) needs only little data; then it can be used in predicting wind speed in a short time period.However, it does not work well when faced with catastrophe points [16,20,21].As for Spatial Correlation Method, since a lot of things need to be considered, it needs large scale datasets.The accuracy of this method is acceptable, but the method is still on development [16,17,20].The fluctuation of wind power caused by the uncertainty and intermittence of wind can bring increasing impact on power grids.In this case, it challenges the quality and stability of power as well as the control of power generation [16,17,[22][23][24].Then it is more than necessary to develop a precise and fast wind power prediction approach.
ANN has been used in forecasting due to its ability of approaching nonlinear mapping [8][9][10][11].Recently, some models combining ANN and other methods have been developed to increase the accuracy of prediction.Reference [25] hybridizes the Fifth Generation Mesoscale Model (MM5) with ANN to forecast wind speed in the next 48 hours.It [26] uses the Moving Window and ANN to forecast the linear and nonlinear part of the data from financial market, respectively.In addition, it [27] proposes to use a Neural Network-Markov Chain (MK) model to forecast the second-scale and hourscale wind speed.In detail, MK is applied to forecast the long-term wind speed while ANN is applied to forecast the short-term wind speed.It [28] also combines the coral reefs optimization (CRO) and Extreme Learning Machine (ELM) approach to perform short-term wind speed forecasting.The result shows that the ELM approach has greatly reduced the computation cost.The CRO-ELM model converges only after 35 times iteration.Reference [29] optimizes the work in [28] by using CRO and the Harmony Search Operator (HS).The error of CRO-HS-ELM model is 10% less than that of CRO-ELM model.In addition to ANN, there are some alternative methods for predicting wind speed.For instance, [30] recommends using empirical mode decomposition (EMD) and ANN to predict the wind speed.The EMD model is used to divide the input data into smaller pieces which are used as the input of ANN to forecast the partial results.Finally, these partial results are integrated to get the final result of wind speed forecasting.Reference [31] puts forward a more complex model of decomposition forecasting.It uses the fast ensemble empirical model decomposition to divide the data into raw pieces.Then, it applies the Multilayer Perception Artificial Neural Network (optimized by Genetic Algorithms and Mind Evolutionary Algorithm) to forecast the partial results which are integrated to form the final forecasting results.
There are also some researches aiming at comparing the performance of ANNs used in forecast.Reference [32] compares the performance of three different types of ANNs including Feed Forward Back Propagation (FFBP), Racial Basis Function (RBF), and Adaptive Linear Element (ADA-LINE) neural networks.In hourly wind speed forecasting, the RBF network realizes the lowest forecasting error and the fastest convergence speed while BP network performs the worst in the wind speed forecast.Deepest Descent Method is usually applied to update weights in traditional BP neural network.But it leads to the problem that the time of network training is too long to satisfy the need of automatic control.
This paper incorporates the Conjugated Gradient Descent Method into BP neural network to get Conjugated Gradient Neural Network (CGNN).In this way, the convergence speed can essentially increase by almost 90%.In addition, since it can avoid falling into local maxima, it improves the accuracy and reliability of neural network.In part I of this paper, the structure and the algorithm procedure of wind power prediction system are introduced.
As for part II, ANN and Conjugated Gradient Method are analyzed, following which the modeling procedure of wind power prediction is presented.Finally, the test data and the comparison on the results of several ANNs including CGNN, SGNN, RBF, and ELM are discussed.

Wind Power Prediction System Construction
In reality, there are a lot of factors that can affect output power of wind power generators while each of them has different effects.On this condition, predicting wind power by utilizing linear regression algorithm can hardly acquire a good result.One of current research trends is applying nonlinear algorithms to predict wind power.Neural network is capable of approximating nonlinear functions via updating matrices between layers.This paper uses the excellent nonlinear mapping ability of ANN to predict wind power.To optimize ANN, it is feasible to improve its inner structure and the accuracy of prediction improves greatly.The exact ANN applied is Back Propagation Neural Network with the cost function calculated as the sum of the squared errors.
The optimization algorithm used in this paper is Conjugated Gradient Descent Method which can substantially improve the convergence speed and accuracy of ANN. Figure 1 shows the algorithm procedure.

Artificial Neural Network.
Neural network is able to approach nonlinear mapping in any requirement of precision and dig up unknown information from data.In terms of the structure, neural network is distributed in data storage and computing.Therefore, systems constructed by neural network possess considerable robustness and the ability of solving difficult problems.Steepest Gradient Method is a (1) Take the initial point  (0) ∈   (2) if ∇( (0) ) = 0, set  (0) as supposed point else calculate the Hesse Matrix  of () end if (3) Calculate the initial conjugated vector by  (0) = −∇( (0) ) (4) Find a proper  to minimize ( (0) +  (0) ) (5) Calculate the best step by And  (1) =  (0) +  0  (0) (6) if ∇( (1) ) = 0 set  (1) as the supposed point else if ∇( (1) ) ̸ = 0 and ∇( (1) ) (0) = 0 search a vector  (1) in the sub space of ∇( (1) ) and  (0) , so that  (1)  (0) = 0 end if (7) Set  (1) = −∇( (1) ) +  (0) (8) Then  = ∇( (1)  1) and 1) .( 11) Loop (3).common optimization method to update weights in present artificial neural network system.Steepest Gradient Method converges fast in local maxima conditions.However, it mostly performs badly when faced with global maxima problems.Therefore, it is hard to get global maxima with this method.This paper introduces BP neural network based on Conjugate Gradient Method and why it applies this method instead of Steepest Gradient Method in weights updating.The reason is that it resolves the problem of slow convergence speed caused by Steepest Gradient Method and converges well in global maxima condition in the ANN system.

Conjugated Gradient Descent Method. Conjugated Gradient Descent (CGD) Method is a medium method between Steepest Gradient Descent (SGD) Method and Newton
Method.With the first-order derivative, it can not only reach convergence quickly but also avoid the disadvantage of Newton Method in calculating Hesse Matrix and its inverse matrix.CGD is not only the most effective way to solve the massive equation group, but also the most valuable way to optimize the equation group.CGD requires relatively small storage space and it has the advantage of step convergence, which makes it stable.Furthermore, CGD does not require any outer parameters.It is shown in Algorithm 1.

Theoretical Model
4.1.Factor Selection.In reality, there are a lot of factors affecting wind power.According to [8], wind power can be represented by the following: where  is the output power of wind turbines;   is rotor power coefficient;  is atmospheric density;  is the swept area of rotor; and V is wind speed.Among them, wind speed is the most influential factor [9]. Also, because the wind fleet usually adopts array structure, the wake effect will occur after the wind power is absorbed by the front turbines.Because of the wake effect, the output power of wind power stations will decrease at specific wind directions [8].Therefore, wind direction is also an affecting factor.
According to (1),  is an important affecting factor as well and it has positive correlation with the output power.Meanwhile, atmospheric density is influenced by temperature, humidity, and atmospheric pressure.So these factors need to be concerned.
To sum up, seven factors affecting wind power can be represented as . is defined as follows: where V  is the average measured value of wind speed during time slot ; cos  and sin  are the sine and cosine of wind direction;   , ℎ  ,  min , and  max are the average measured value of atmospheric pressure, humidity, minimum temperature, and maximum temperature during time slot .

Data Collection.
As explained in previous parts, factors that affect the output power of wind turbines include wind speed, wind direction, atmospheric pressure, humidity, and temperature, the data of which need to be collected.This paper collects these data of a wind fleet for two months from 6 am to 12 am a day with the interval of 15 minutes.Among the collected data, there are seven affecting factors for corresponding wind power.With additional timestamp functioning as index, 8-tuple  is formed and saved in the database. is defined as follows: where  is the timestamp.Note that the collected value might be null due to the instability of data acquisition equipment and the intermittence of wind fleet, so operators need to remove these "dirty" data.First of all, wind power data is featured with nonlinearity and randomness.Thus this paper adopts BPNN as the basic structure because of its excellent performance in nonlinear mapping, generalization, and fault tolerance.Secondly, input and output parameters should be clarified before determining the number of nodes in input and output layer.The number of input nodes is determined by affecting factors.Meanwhile, since this paper selects seven different factors, the number of input nodes will be seven.The number of output nodes is one because the output power is the only thing to predict.The data of affecting factors is used as input data and the target output is the data of corresponding wind power.The cost function is the sum of the squared errors between output value and target output value.Finally, empirical equation ( 4) can be used to determine the number of intermediate layers:

Artificial Neural
where  is the number of intermediate layer nodes and  and  are the number of input nodes and the number of output nodes.Usually  is a regulatory factor, which is obtained by training the sample data with neural network.According to (4), this paper comprehensively considers the trade-off between accuracy and convergence speed and determines that  is equal to four.Finally, the CGNN model is represented as in Figure 2.
In the equation,    is the value of feature  in th tuple;  max is the maximum of feature ; and  min is the minimum.
After normalization is performed, training procedure can be started.The cost function is defined as follows: where  is sample size;  is the number of output nodes;   () is the target output value of node  for sample ; and   () is the real output value.Afterwards, Conjugate Gradient Method can be performed to get acceptable weights in the model.
The predicted steps are shown as follows: (1) Standardize the data according to (5).
(2) Update the weight of ANN.
(3) Compare the target fault rate and the real fault rate.
If the real one is larger than the target one, then loop (2).( 4) Forecast the relative value of wind power.( 5) Destandardize the values.( 6) Calculate the fault and then analyze it.

Samples and Analysis
The data for this paper comes from two wind farms located in Inner Mongolia, China, during January 2014 and March 2014.These wind farms have total installed capacity of 1,500 MW.As mentioned above, the model collects data from 6 am to 12 am with interval of 15 minutes.The data in January is used to train the network and the trained model is used for predicting data in March.
In order to make the advantages of Conjugate Gradient Method clearer and more intuitive, we compared Conjugated Gradient Neural Network (CGNN), Steepest Gradient Neural Network (SGNN), Racial Basis Function Neural Network (RBFNN), and Extreme Learning Machine (ELM) in the aspect of time and accuracy.On the other hand, to prove the robustness of the model, this paper uses data from both wind farm I and wind farm II.The test is performed on these two types of datasets separately.Due to the performance limit of data acquisition equipment and the instability of wind fleet, there is some empty data within the test sum.
To eliminate the disturbances, we removed the empty data and got 131 groups of mid-term data from wind farm I and 548 groups of long-term data from wind farm II.The details of the neural networks used in the forecasting are shown in Table 1.

Accuracy Analysis.
In order to improve accuracy, we should figure out the causes of errors.As for predicting wind power, one controllable error is from model structure.Another one is the measurement error, which may have a strong impact on prediction, but it depends on the devices.So this error is not discussed in this paper.The last one is the instability of wind fleet, which means the output power may differ even in the same condition.It is represented by robustness comparison among different networks.
In this paper, the comparison on long-term power forecasting results of different networks is demonstrated in  Figures 3, 4, and 5.The comparison on mid-term forecasting results is presented in Figures 6, 7, and 8, respectively.In Figures 3 and 4, the green curve represents practical output power; the blue curve is for the prediction result of CGNN; the red curve indicates the result of SGNN; the yellow curve shows the result of RBF; the black curve refers to the result of ELM.
According to Figure 3, the blue curve (CGNN) is close to the green curve (practical) when the power is between 50 MW and 1200 MW.In contrast, the red curve (SGNN) approaches black curve only in case the power is lower than 50 MW and has rather larger error or when the power is between 700 MW and 900 MW.
Similarly, Figure 4 shows that the yellow curve (RBF) is near green curve only if the power is lower than 50 MW and has rather larger error when the power is between 700 MW and 1100 MW.Finally, Figure 5 presents the fact that the black curve (ELM) comes near green curve only in the event that the power is lower than 40 MW and has rather larger error if the power is between 40 MW and 1100 MW.
Figure 4 shows when trained with mid-term data, the blue curve is almost overlapping with the green curve except for some minor disturbances which are probably due to the shutting down of wind generators, since the model will not identify these occurrences.Figure 4 also presents that the performance of CGNN is better than that of SGNN as a result of the less departure of predicted result on the practical data.Similarly, Figure 5 shows that the yellow curve (RBF) presents greater departure when the practical power output is 40 MW-60 MW and 70 MW-90 MW while the curve of CGNN overlaps the practical output curve.Figure 6 shows that the black curve (ELM) reveals greater departure in 0 MW-20 MW and 100 MW-120 MW.This paper also demonstrates the long-term and midterm numerical prediction errors of different networks in Table 2, respectively.In the table, the maximum (Max), the minimum (Min), the average (Aver), and the sum error (Sum) of mid-term and long-term data are shown.The result indicates that the fault rate of CGNN is much less than that of SGNN.

Iteration and Convergence
Analysis.In an autocontrol system, the time spent on prediction should be minimized as long as the accuracy is acceptable.In engineering situation,  ANN usually works dynamically, which means it needs to be trained every time the prediction is performed.The training process will go through many epochs until the model converges.In general, neural networks do different work in each epoch due to different structures.To measure the time consumption of the neural network, we predefined an error limit, Mean Squared Error, MSE, so that the totally consumed training time is measured from the beginning of the algorithms (input the first row of data) to the end of the algorithms (reach the predefined limit).Table 3 shows the training time consumption of four different networks in long-term and mid-term prediction when MSE equals 0.004.Table 4 shows the training time consumption of four different networks in long-term and mid-term prediction when MSE equals 0.002.
Table 3 shows that, in case MSE equals 0.004, the training time consumption of CGNN is the shortest while that of SGNN is the longest either in short-term or in long-term forecast.The training time consumption of CGNN is even less than that of ELM (29 s to 49 s and 12 s to 27 s).Table 4 indicates that even if MSE is changed to 0.002, the training time consumption of CGNN is still the shortest while that Besides, the training time consumption of it is the least when compared with other three different neural networks.When MSE is set as 0.002, CGNN converges after 158 s and 68 s in long-term and mid-term prediction, respectively.In long-term prediction, the time consumption of CGNN is 76.97%, 73.31%, and 56.23% less than SGNN, RBF, and ELM separately.As for mid-term prediction, the time consumption of CGNN is 82.42%, 76.87%, and 61.36% less than SGNN, RBF, and ELM, respectively.When MSE is set as 0.004, CGNN converges after 29 s and 12 s in long-term and midterm prediction, respectively.In long-term prediction, the time consumption of CGNN is 58.57%, 49.12%, and 40.81% less than SGNN, RBF, and ELM.In mid-term prediction, the time consumption of CGNN is 72.09%, 66.67%, and 55.56% less than SGNN, RBF, and ELM.
The results prove that CGNN performs well in either long-term or mid-term power prediction of wind output.The training time consumption of the CGNN is relatively less than those of SGNN, RBF, and ELM.Generally speaking, this paper paves the way for the coming research concerning autocontrol of wind power connected with power grid.

Figure 1 :
Figure 1: Flow chart of forecasting process.

4. 4 .
Training Strategy.In order to get rid of the influence of physical dimension and speed up convergence process, data normalization should be performed before training NN model.In this paper, Feature Scaling Method is adapted to bring all value into the range [0, 1].Consider    =    −   min   max −   min .

Figure 3 :
Figure 3: The prediction results of CGNN and SGNN with longterm data.

Figure 4 :Figure 5 :
Figure 4: The prediction results of CGNN and RBF with long-term data.

Figure 6 :
Figure 6: The prediction results of CGNN and SGNN with midterm data.

Figure 7 :
Figure 7: The prediction results of CGNN and RBF with mid-term data.

Figure 8 :
Figure 8: The prediction faults of CGNN and ELM with mid-term data.
Network Model Construction.In order to judge the accuracy and convergence speed of Conjugate Gradient Neural Network precisely, this paper constructs two different Neural Network Models.One is based on Steepest Gradient Method and the other is based on Conjugate Gradient Method.

Table 1 :
Model of different networks.

Table 2 :
The faults of different networks.

Table 3 :
The training time consumption of four models when MSE equals 0.004.

Table 4 :
The training time consumption of four models when MSE equals 0.002.Prediction of wind power is of great significance for connecting wind plants with power grids.This paper optimizes neural network model with Conjugate Gradient Method and makes run-time control of wind power possible.According to the experiment result, Conjugate Gradient Method is capable of improving the convergence speed and prediction accuracy of neural networks dramatically at the same time.In mid-term prediction, CGNN performs best when fault, Max fault, Aver fault, and the Sum fault of CGNN are 0.23 MW, 27.62 MW, 2.54 MW, and 283.95 MW, respectively.When it comes to long-term prediction, CGNN performs best in case Min fault, Max fault, Aver fault, and the Sum fault are 0.40 MW, 66.18 MW, 19.03 MW, and 10714.14MW, respectively.The results show that the model deviation decreases after Conjugated Gradient Descent (CGD) is adopted in ANN model.