Knowledge Mining Based on Environmental Simulation Applied to Wind Farm Power Forecasting

1 Research Institute of Technology Economics Forecasting and Assessment, School of Economics and Management, North China Electric Power University, Beijing 102206, China 2 School of Management, Zhejiang University, Hangzhou 310058, China 3MOE Key Laboratory of Regional Energy Systems Optimization, S&C Academy of Energy and Environmental Research, North China Electric Power University, Beijing 102206, China


Introduction
With the increasing resources constraints and environment pressure, wind power generation as an important application of renewable energy gains more and more concern.China has enormous potential in the wind energy utilization, with rich wind energy resource mainly in grass land or gobi of northwest, north, and northeast China, as well as coastal area and islands in east and southeast China.Due to the fast developed wind power technology and the distribution characteristics of wind resource, the trend of wind power generation development is concentrated with the large scale.According to statistics, by the end of 2012, the capacity of grid-connected wind generation has been 62660 MW and generated 100.8 billion kWh clean electricity, about 2% of the total electricity supply in China during 2012 [1].
Moreover, in wind resources system, various uncertainties exist in a number of system components as well as their interrelationships, such as the random characteristics of natural processes (e.g., climate change) and weather conditions, the errors in estimated modeling parameters, and the complexities of system operation.Along with a large amount of wind power integrated into the power system, the intrinsic intermittency and uncertainties of wind resources will cause a great impact on the stability of the whole power system.Thus, precise short-term forecast of wind power farm will be necessary and bring lots of benefits, including cutting large spinning reserve, reducing the cost of wind power generation, and improving the safety and reliability of power system operation.
Previously, the physical method and the statistical approach are the main way for wind power forecasting, and physical conditions such as geography, topography, temperature, and pressure are employed to calculate the power generation capacity of wind farm [2,3].These methods require accurate relative meteorological data and difficult modeling processes.In addition, they usually need to obtain the relationship between abundant historical data of wind power and other variances and are used for long-term wind power forecast, based on a large data base.Therefore, the emphasis of the methods is to build time series or dynamic model based on the experience gained from historical data.
Considering many uncertain factors and disturbances of wind power generating systems, the variation of wind power is largely affected by the complex random factors (e.g., wind speed); significant errors can be easily generated by time series model and regression algorithm, and expert system needs a mass of knowledge and experience with worse maintainability.As a popular simulation method, neural network is an alternative for handling random nonlinear complex mapping without indentifing the transform rules and has been applied in many fields (e.g., environmental modeling and system analysis) [4][5][6].In this approach, the historical data of wind power and impact factors (e.g., uncertain parameters) are taken as the input variables.Moreover, the forecasting model is built after learning and training intelligently to avoid the errors or even minimize the impact of random factors on the simulation process.These advantages could lead the neural network to be a suitable method for the complex modeling problem.
In addition, knowledge mining can extract the connotative, unknown, and valuable knowledge or rules from the large-scale database, which is an area of the most extensive application value [7].Knowledge mining has been widely applied, such as fuzzy time series [8], customer relationship management [9], and agricultural production prediction [10].Climate factors, for example, wind speed, wind direction, temperature, and humidity, have a great impact on wind farm power changes [11].Therefore, considering the characters of wind power forecasting, knowledge mining based on environmental simulation is proposed to build a data and knowledge base including history wind power and relative climate factors and to extract the similar historical situations though knowledge mining.Meanwhile, the new sample set would be generated through the historical situations to forecast the wind power with a similar characteristic in future.Moreover, classification is an important topic in data mining research.Given a set of data records, the classification problem is concerned with the discovery of classification rules that can allow records to be correctly classified.
Based on the above mentioned, in this paper, the objective is to propose an improved self-organization mapping neural network to classify the historical data and extract the useful rules for a precise wind power forecasting.A subset of the samples with the similar weather condition is used for training the neural network.The remaining of this paper is organized as follows.Section 2 describes the self-organization mapping based on rough set theory after a brief introduction of relative theory.Detailed structure of echo state network is proposed in Section 3. In Section 4, a case study is carried out to demonstrate the effectiveness of the proposed approach, and the performance is compared with the traditional neural network.The results analysis and discussion are also demonstrated.Finally, some remarks and conclusions of this study are presented in Section 5.

Competition layer
Figure 1: The schematic diagram of the SOM neural network.

Knowledge Mining Based on Environmental Simulation
2.1.Self-Organization Mapping Theory.Self-organization mapping (SOM), firstly proposed by Kohonen in 1981, is a neural network with unsupervised learning [12].It has certain topological structure which is adjusted through the input information, and the pattern recognition is completed by the synergy among multiple neurons [13,14].The special idea of SOM is that there is no need to initialize the cluster center or guidance information and that the weight information of neurons is self-adaptively adjusted by input data.Figure 1 illustrates the schematic diagram of the SOM neural network.Generally, there are two layers in SOM neural network, the input layer and the competition layer.The number of neurons in input layer is equal to the dimension of samples.Every neuron in the input layer connects the neurons in the competitive layer with variable weight values.The neurons in competitive layer will compete for the opportunity to respond to the input pattern, and the weight with the closest match to the presented input pattern is the winner neuron or the best matching unit (BMU).There also exist partial connections between the neurons in the competitive layers.The two-dimension form is the most common form of the neurons' arrangement in competition.The basic principle of SOM is described as follows.
(1) Competition Process.Let  be the input samples with  dimensions  = [ 1 ,  2 , . . .,   ]  , where  is the number of the input neurons.The number of the neurons in competition layer with two dimensions is  ( =  * ).The connection weight between the input layer and the competition layer is denoted as Calculate the inner product  of input vector and the connection weight: Select the winner neuron in the competition process or the best matching unit (BMU).The basic rule is that the larger the inner product, the closer the neuron to the input vector, The effect of a neuron to its neighbors.
which indicates that the neuron matches to the presented input pattern.The winning formula is as follows: where  *  ≥   , for all , and  * ≤ , for all   =  *  .Finally, only one neuron wins; thus, the result is as similar as  = [0, . . . 1 . . ., 0], and  * is the winner neuron.
(2) Learning Process.SOM neural network is arranged according to the two-dimensional structure; each neuron has a promoting effect to the neighboring neurons and on the contrast an inhibition effect to those far away, as shown in Figure 2.
In Figure 2   denotes the distance between the neuron and its neighbor, and Δ  is the change of the connection weight.It is indicated that the neurons within the certain scope are promoted with weights increasing, while the neurons out of the scope are inhibited, and the weigh would be reduced.
In SOM neural network, the weight is adjusted according to Kohonen learning rules.The main idea is to make the winner neuron and its neighbors closer to the input sample by modifying the weights.The formula is expressed as where   () and   ( − 1) represent the weight vector of neuron  at time  and  − 1, respectively.() is the input sample at time .And () means the learning rate at time , ranging in [0, 1].At the beginning, the learning rate is the largest, and the value of () is decreasing with the training.Since the weight shock of the neurons may occur during the training process, the learning rate need to be reduced gradually.Training neighborhood () represents the neurons around the winner neuron whose weights will be adjusted.In the initial training, the scope of neighborhood is the slargest, which provides more neurons learning opportunity.With the increase of training, each neuron represents its own categories, and its neighborhood will be less.
Train the SOM neural network with all the input samples several times; after the procedure mentioned above, the neurons in competitive layer represent each cluster center, which achieves the clustering effect.As a result, the trained network can be utilized for pattern recognition.The algorithm itself can be summarized as follows.
Step 1. Initialize the SOM neural network.All the weight values   are initialized to be random in [0, 1].
Step 2. A training vector   is picked randomly from the training set.
Step 3. Calculate the Euclidean distance between input vector   and the weight vector   ;   = ‖  −   ‖. min{  } is the winning node and becomes the BMU.
Step 4. According to (3), the values of weight vectors for the winning neuron are updated.
Step 5. Choose the new input vector and repeat Steps 3 and 4 until stop criterion is satisfied, for example, reaching sufficiently large iterations.
In fact, during the competitive procedure, there may be several neurons that are closely matched with the input vector, thus judging only one of them as the winner seems improper.Considering this issue, the concept of rough set theory is introduced to solve this problem, and self-organized neural network based on rough set is proposed in this paper.

Rough Set Theory.
Rough set theory (RST) is a useful tool for data mining and decision support.In particular, it is popular in dealing with the incomplete information, vague concept, and uncertain data [15].Besides, combined with other data mining algorithm, it can produce more hybrid data mining algorithm [16,17].
Suppose  is the nonempty universe with finite members;  is the equivalence relation in ; thus, the knowledge base can be expressed as a relation system  = (, ).
For the subset  ⊆  and  ̸ = , the intersection of all the equivalence relation among  can be called -indiscernibility relation, defined by IND(): where []  represents the equivalence class containing  ∈  in relation .
Lower and upper approximations are the important concept of RST.They help to measure the description of uncertain knowledge.Suppose  is the subset of ; then, the lower approximation  and upper approximation  are Meanwhile, the boundary region BN  () =  − , and it consists of those objects that cannot be classified with certainty as members of  with the knowledge in .
If BN  () ̸ = , it indicates that  ̸ = , and  cannot be expressed by the equivalence class of  precisely.Thus, the set  is called "rough" (or "roughly definable"); otherwise,  is crisp.

SOM Neural Network
Combined with RST.RST can solve the uncertain or imprecise knowledge expression; thus, it could be employed to deal with the imprecise problem in learning the process of SOM neural network.The novel network still has two-layer structure, while the different of the traditional SOM is in the competitive layer.In the competitive layer, each neuron contains upper approximation and lower approximation.In order to judge which neuron wins, we can determine that the input vector belongs to the lower approximation of a neuron exactly or to the upper approximation of several neurons imprecisely.Through this process, the imprecise problem in judging the winner will be solved properly.
Besides, we can set different learning rates for these two different matching results.If the input vector belongs to the lower approximation, it will get greater learning rate  low ; otherwise, it will get a lower learning rate  up .The idea is that when the input vector belongs to a pattern exactly, it can accelerate the learning; when it belongs to a pattern imprecisely, it will reduce its learning effect.The key issue of the novel SOM neural network is how to determine the input vector that belongs to a certain neuron or a set of neurons.
After selecting the best match neuron, it still needs to choose some suboptimum neurons using the suboptimum match degree, calculated as where   is the key factor to determine the lower or upper approximation.Then, it would define the set of suboptimal neurons: where   is the weight of the th neuron;  is the threshold.Set  is the collection of the suboptimum neurons with match degree higher than .If set  is empty, it indicates that there are no other close match neurons expect the best match one, and the input vector belongs to its lower approximation.Otherwise, the input vector belongs to the upper approximation of best match and suboptimal neurons.
The different learning processes of SOM neural network are expressed, respectively.
(1) The neuron belongs to the lower approximation exactly: (2) The neuron belongs to the upper approximation imprecisely: where  is the total training times;  low () and  up () are the learning rates of lower approximation and upper approximation.
The detailed procedure of the proposed SOM neural network is presented as follows.
Step 1. Initialize the network, set  = 0, and (0) denotes the input sample vector; initial weight   (0) is a little random number;  low (0) and  up (0) are set at 0.9 and 0.5, respectively.
Step 2. According to (1), calculate the inner product of the input sample and the neurons in the output layers.
Step 3. Select the best match output neuron.
Step 4. Select the suboptimal match neurons as a collection .
Step 6. Turn to Step 2 and repeat the process until all the samples have been tested or the learning rates have been reduced to 0.

Echo State Network
Recurrent neural networks (RNNs) are very powerful tools for solving complex temporal machine learning tasks [18].In 2001, a new approach to RNN design and training was respectively proposed under the names of liquid state machines and echo state networks.Its reservoir computing (RC) is an RNN technique that offers a solution to the many problems associated with typical RNN architectures which have prevented their widespread use [19][20][21].
The classic echo state network contains three layers: input layer, hidden layer, and output layer (shown in Figure 2).The hidden layer is also called dynamic reservoir.Among the traditional circulate network, the scale of neurons is controlled within 12, while there is an abundance of neurons in the reservoir of ESN, about 20 to 500, with good short-term memory.Suppose there are  units in the input layer,  units in the output layer, and  units in the hidden layer.Generally,  in represents the connection weight matrix of input layer;  means the connection weight within the reservoir, which keeps 1%∼5% sparsely connected.In addition, the spectral radius is usually less than 1.These ensure the reservoir with dynamic memory and certain stability. out and  back denote the connection weight matrix of output layer and feedback.It should be noticed that  in , , and  out are decided randomly before the network is established, and once determined they would have not changed. out is finally gained by training.Therefore, the main goal of network

Input layer
Hidden layer Output layer W training is to determine the value of  out .The schematic view of echo state network is presented in Figure 3.The primary algorithm of echo state network is to inspire the reservoir by input information and to generate the state variables in the reservoir.Though linear regression between the state variables and desire output information, the connection weight of output layer can be determined.The state variables and the output are updated as follows: where are the activation functions of the reservoir and the output, and the most commonly used is a typical hyperbolic tangent function.

Case Study and Results Analysis
A wind farm in northwest of China is considered as a case study to demonstrate the effectiveness of the proposed approach.In the study area, there are 66 wind turbines on the wind farm with total capacity 49.5 MW.The historical meteorological data and wind power data from April 1, 2008, to May 6, 2009, are taken as the database.The forecasting model is solved though Matlab on a single core of a 32-bit Lenovo workstation running on Windows7 with 2 dual-core 2.60 GHz CPU and 4.0 GB of RAM.We extract rules from the past information to forecast the wind power load.The main factors considered here are wind scale, temperature, and humility.The feature vector  for knowledge mining is described as In order to eliminate the dimension influence among different variables, data preprocessing is the first job need to be done.
The proposed RS-SOM neural network is employed to cluster the history information.To evaluate the compactness of the clustering results, the sum of squared error (SSE) is adopted in this study.The smaller the SSE, the better the effect.And it is calculated as where   is the set of each cluster,   is the mean of the th cluster, and   is the number of samples belonging to the th cluster.
The cluster results are shown in Table 1.It can be seen that the 400 samples in the data base are divided into 7 classes.The SSE of class 3 is the smallest, while that of class 6 is the largest.This indicated that the samples in class 3 are the most closely similar and the shape of the curve can almost reflect the fluctuation of the wind output power on those days.Besides the average wind power output curve of each class is illustrated in Figure 4.Each curve has significant features and is obviously different from the others.For example, in class 1, the valley of most samples' power output is at about 10:00, while the peak time is at nearly 23:00.This is the most common situation with the largest number of class member 118.The class with the least samples is class 5, whose shape is extremely irregular.
Since the weather report may not be accurate, and the more the time lag, the worse the forecasting result, and this study only forecasted the wind power output on the next two days accordingly.The new input pattern can be discriminated using the trained RS-SOM neural network.The samples in the same class with the forecasting day are selected to train the ESN network for the forecasting day.In order to test the performance of ESN, BP neural network which has been widely used in load forecasting is also applied for the same task.

Mathematical Problems in Engineering
where ŷ is the forecasting value,   is the actual value, and  is the number of samples.Figure 5 shows the forecasting results of different methods and the wind power output.It indicates that the overall trend of forecasting power is in accordance with the actual situation.However, as the forecasting of peak and valley, the performance of ESN is obviously greater than BP model.The deviation of the latter is larger at the extremism values.During the period of 1:00 to 8:00, the performance of both methods is well, while, at night from 18:00 to 24:00, the deviation is larger.The error evaluation indexes for both methods are presented in Table 2, which shows that all of the indexes of ESN are lower than BP.The MAPE and MAE of ESN are 0.1366 and 1.7771 MW, respectively, lower than those  Compared with the results obtained from the two methods, it is indicated that the reason why accurate wind power forecasting of ESN is more than that of the BP model and the BP model has a higher degree of error than the ESN method is that (1) the wind farm is located in the northwest of China, the management system of meteorological measurements is imperfect, and meteorological data at the wind plant often are of poor quality and contribute to inaccuracy in traditional forecasting approaches; (2) wind direction readings from standard met towers might not even be applicable since surrounding terrain can affect this movement, and wind can ramp up or down quickly, that could lead to misleading information.All of the above problems with a very complex important prediction process could directly affect the accuracy of the traditional forecasts.Data management also is vital to the accuracy of the forecasting method.Through integrating the SOM and RST methods to cluster the historical data in to several classes, the approach could find the similar days and excavate the hidden rules for increasing the forecast accuracy.In addition, it could reflect the uncertainties of the input parameters, avoid the data error, and provide valid data for ESN forecasting.
However, compared with other approaches, there is still much space for improvement of the proposed method.For example, there is no uniform way to determine the relative parameters (as the spectral radius of the connection weight within the reservoir) of the ESN network, which are mainly gained though massive experiments.Besides, other neural networks with universal approximation capability, for example, the radial basis function (RBF) network, will be studied and reformed further to improve the forecasting accuracy and calculation speed.

Conclusion
Wind power forecasting is an important tool for managing the inherent variability and uncertainty in wind power generation.Increasing the accuracy of forecasting can help to reduce the likelihood of an unexpected gap between scheduled and actual wind power generation, which can be extremely helpful for operators of power systems and wind power plants.In this study, we developed a database by using historical meteorological environment and power output data.Self-organizing map combining rough set theory as a knowledge mining technology is employed to discover and extract the rules.The classified samples are taken as the input of echo state network to train the structure of the network, respectively.Through integrating the SOM and RST methods to cluster the historical data in to several classes, the approach would provide valid data for ESN forecasting.The developed methods are applied to a case of power forecasting in a wind farm located in northwest of China with a wind power data from April 1, 2008, to May 6, 2009.The results demonstrated the successful use of the proposed method, which performs better than BP.The accuracy of prediction has been improved.However, the database is static in this study, which means the information of new samples will not be added into the knowledge pool automatically.And if the database is small, it may not cover comprehensive situation.Thus, it will be proper for the wind farm with long operation period.

Figure 3 :
Figure 3: The schematic view of echo state network.

Figure 4 :
Figure 4: The wind farm output curves of different classes.

Figure 5 :
Figure 5: Forecasting results of different models and the actual wind power output.

Table 1 :
The cluster results and effectiveness analysis. and LT  are the highest and lowest temperatures on the th day; HT −1 and LT −1 are the highest and lowest temperatures on the  − 1th day;   and  −1 are the wind scales on the  − 1th day; MAX  −1 , MIN  −1 , and AVER  −1 are the maximum, minimum, and average wind speeds on the  − 1th day.

Table 2 :
Error evaluation of different forecasting models.
more than that typically seen from the BP model.This extensive comparison reflects forecasts for all available 66 wind turbines.It is illustrated that the accuracy of the ESN forecasts was consistently better for the year and for each wind plant.It should be noted that accuracy would be even better if the data were adjusted for curtailments.