Dynamic Equivalent Modeling of Wind Farm Based on Dominant Variable Hierarchical Clustering Algorithm

+e actual operating state of the wind turbine group is influenced by the wake effect and control mode; however, the current models cannot describe the actual operating state very well. A dynamic equivalent modeling method for a doubly fed wind power generator is proposed on the basis of ensuring the accurate description of the wind turbine group. As the clustering index, dominant variables are used in the hierarchical clustering algorithm, which are extracted by principal component analysis. +ree dynamic equivalent models of 24 wind turbines are established using PSCAD software platform, which use 13 state variables, wind speed, and dominant variables as clustering indexes, respectively. Furthermore, the active power and reactive power output curves of wind farm are simulated in the case of the three-phase short-circuit fault on the system side or wind speed fluctuation, respectively. +e simulation results demonstrate that it is reasonable and effective to extract slip ratio and wind turbine torque as clustering index, and the maximal relative error between the dominant variable equivalent model and 13-state-variable model is only 9.9%, which is greatly lower than that of the wind speed model, K-means clustering model, neural network model, and support vector machine model. +is model is easy to implement and has wider application prospect, especially for characteristics analysis of large-scale wind farm connected to power grid.


Introduction
Wind power generation is one of the most important renewable energies, which has attracted more and more attention from most of the countries for its mature technology and low cost in recent years [1]. However, wind energy has characteristics of random, intermittent, and instability. It is very difficult to study the operational characteristics of power grid with large-scale wind farm interconnection [2][3][4]. erefore, it is necessary to establish an effective model of wind farm, which should be as close as possible to the actual operating state of wind farm. Literature [5] proposed a method of reduced-order model computation for planning grid connection of a large-scale wind farm, and the effectiveness of the proposed method is demonstrated and evaluated by an illustrative simple example wind farm and an example large-scale wind farm with 200 WTGs (wind turbine generators). Literature [6] used the integrating field measurement method to characterize the monthly wind speed and wind direction distributions and investigate the wind characteristics in turbine wakes. e research work shows that good agreement is obtained for both mean wind speed and turbulence intensity, which also verifies the possibility of combining actual field measurements and high-fidelity simulations to describe the characterization of utility-scale wind farms. Literature [7] studied large-eddy simulations of coherent structures within and above different wind farm configurations in a neutral atmospheric boundary layer (ABL) using proper orthogonal decomposition (POD) to improve understanding of the flow structures in both physical and spectral space. e research work indicates that wind farm dynamics in the ABL are very complex. Literature [8] proposed a simplified floating offshore wind turbine model, which is applicable for the real-time simulation of large-scale floating offshore wind farms. e real-time results of offshore wind farms show the feasibility of the proposed turbine models for the real-time model of large-scale offshore wind farms.
Generally speaking, the wind farm model can be divided into detailed models and equivalent models, where the detailed model is difficult to solve, does not have good adaptability to parameter changes, takes a lot of work to modify, and requires high computer performance [9,10]. erefore, most of the current research studies choose the equivalent model to describe the wind farm. e equivalent model can be further divided into singlemachine equivalence and multi-machine equivalence, where the single-machine equivalent model uses a large-capacity wind turbine to replace the whole wind farm for reducing the complexity of the simulation system and improving the speed of calculation, and the multi-machine equivalent model divides the whole wind farm into many groups according to one or more indexes and then uses a wind turbine to replace the wind farm within the same group.
Literatures [11][12][13][14] proposed the single-machine equivalent model of wind farm, but the characteristics of wind turbines change greatly based on actual applications. Literature [15] proposed a systematic and simple method to model large-scale induction machine-based wind farms by a single WTG that contains an equivalent mechanical wind turbine and an equivalent electrical generator. e research results shows that the proposed method is adequately accurate in both transients and steady-state responses and it can be readily used for modeling large-scale wind farms to reduce the overall computational burden of the system. In a sense, the single-machine equivalent model is not accurate enough.
Literatures [16][17][18] proposed the earlier research results on multi-machine equivalent models of wind farm, where the wind farm is mainly grouped according to the geographic location. is classification method needs a regular arrangement of wind turbines and does not have good adaptability. To overcome the above shortcoming, literatures [19][20][21][22][23][24][25][26][27][28][29] proposed some novel classification methods, which are more close to the actual operating state of wind turbines. Yan et al. [19] sorted the wind farm using the support vector machine (SVM) under the consideration of similar wind speed and the wake effect of the wind farm. is method cannot accurately characterize the actual operating state of wind farm, which will lead to large algorithm errors. Literatures [20][21][22] sorted the wind farm by clustering algorithm, which is based on the measured data of all the wind turbines, the wind speed model, and the steady-state model of wind farm. However, this method does not consider dynamic characteristics of wind turbines. e clustering index of [23][24][25] was the dominant variable during the operation, and the dominant variables were extracted by the feature analysis method. However, the influence of disturbance location and type on the operation characteristics of wind turbines has not been considered by the characteristic roots. Literatures [3,5,26,27] sorted the wind farm considering 13 state variables of wind turbines during the operation. Literature [28] presented a method to develop computationally efficient dynamic model of a wind farm suitable for large disturbance simulation. e method based on a trajectory piecewise linear (TPWL) approximation uses single and multiple training trajectory to develop a nonlinear reduced-order model (ROM). Simulation results demonstrated the effectiveness of the model in capturing dynamic behavior of wind farm following large disturbances. But this kind of classification method has the disadvantages of redundant data, high calculation complexity, low applicability, etc.
ere are still some shortcomings in the above classification methods, such as large error, incomplete consideration of actual operating state, complicated calculation and low applicability, and so on. More importantly, the largescale wind farm has the characteristics of multi-time-scale and strong nonlinearity on the time axis. With the increase of wind farm's scale, the shortcomings of existing modeling methods are more obvious. To overcome the above shortcomings, the multi-machine dynamic equivalent modeling of wind farm based on dominant variables hierarchical clustering algorithm is proposed in this paper. e essence of principal component analysis is a mathematical transformation method, which transforms a given set of related variables into another set of irrelevant variables by linear transformation.
ese new variables are arranged in descending order of variance. During the process of mathematical transformation, the total variance of variables should always remain unchanged, and the first variable has the largest variance, which is called the first principal component.
e second variable has the second largest variance and is independent of the first variable, which is called the second principle component. en, the third and the fourth variables are obtained in the same method. Compared with other parameter analysis methods (correlation analysis, feature analysis, factor analysis, and so on), principal component analysis can not only retain as much information as possible about the original variables but also achieve the goal of dimensionality reduction, thus reducing the computational complexity and the cost of the algorithm. Meanwhile, compared with other classification methods (support vector machine, neural network, density clustering, grid clustering, and so on), the hierarchical clustering method does not need to determine the number of classifications and has high classification efficiency, which is especially suitable for the classification of large data. e specific research steps in this paper are as follows. Firstly, the 13 state variables of the wind turbines during actual operation are calculated. Secondly, the dominant variables are extracted from 13 state variables by principal component analysis.
en, the wind farm is clustered by hierarchical clustering algorithm, where the dominant variables are used as clustering index. Furthermore, three dynamic equivalent models of 24 wind turbines are established using PSCAD software, which use 13 state variables, wind speed, and dominant variables as clustering indexes, respectively. Finally, the active power and reactive power output curves of wind farm are simulated in the case of the three-phase short-circuit fault on the system side or wind speed fluctuation, respectively. e simulation results verify the effectiveness of dominant variables and the advantages of this proposed model.

Mathematical Model of Doubly Fed Induction Generator
At present, the doubly fed induction generator (DFIG) is most widely used in wind farms all over the world. Its mathematical model mainly includes wind turbine module, induction motor module, transmission system module, back-to-back converter system control module, boost transformer module, R-L series components, etc. Some important modules will be described below [29][30][31][32].

Wind Turbine Module.
e relationship between active power of the doubly fed induction generator and wind energy captured by wind turbines can be expressed as follows: where p e is the active power of the generator, C p is the utilization coefficient of wind energy, ρ is the air density, A is the sweeping area of the blade, and υ is the input wind speed endured by the wind turbine.
In the case of a given wind speed, the captured wind power of the wind turbine mainly depends on the C p , which is the function of the pitch angle (β) and tip speed ratio (λ).
where ω r is the rotating speed of the wind turbine, which can be obtained from the power characteristic curve under the known active power of the generator (p e ). e characteristic curve of wind energy utilization coefficient is shown in Figure 1.
For variable-pitch wind turbines, the wind power of the impeller can be controlled by changing the pitch angle (β). Usually, the wind energy utilization factor (C p ) of the wind turbines reaches the maximum value when β � 0.

Induction Motor Module.
In DQ coordinate, the fluxlinkage equation and voltage equation of the doubly fed induction generator can be obtained from the characteristics of induction motor [23][24][25], which can be expressed as follows: ψ ds � −L ss I ds + L m I dr , ψ qs � −L ss I qs + L m I qr , where ψ and U represent flux linkage and the voltage, respectively; the subscripts d and q represent the direct-axis component and the quadrature-axis component, respectively; the subscripts s and r represent the stator and the rotor, respectively; L ss and L rr represent the self-inductance of the stator winding and rotor winding, respectively; L m represents the mutual inductance between stator and rotor; R represents the resistance; and S represents the slip ratio.

Transmission System
Module. By incorporating the inertia of gearbox into the inertia of generator rotor, a twomass model of transmission device can be obtained [33], which can be expressed as follows: where T m represents the wind turbine torque, T t represents rotor mechanical torque, T e represents the electromagnetic torque, and T d and T j represent the inertia time constant of the transmission system and the generator, respectively. e electromagnetic power is the output power of the stator, and the expression of the electromagnetic torque (T e ) can be deduced from the electromagnetic power, which can be expressed as follows: T e � L m I dr I qs − I ds I qr . (6)

Converter Control System
Module. e rotor of the doubly fed induction motor is connected to the power grid through back-to-back converter. Active power and reactive power of the generator consist of stator side and rotor side, where P s and Q s represent the active power and reactive power on stator side, while P r and Q r represent the active power and reactive power on rotor side [29][30][31][32]. In the steady-state process, the stator copper consumption is neglected, the differential component is 0, and the directaxis component of the stator voltage is 0. Combining with the stator side active power expression, the active power and reactive power expression of the stator side can be obtained, which can be expressed as follows: From the above formula, it can be seen that under the DQ coordinate, assuming that the direction of the stator flux linkage is consistent with that of the generator, the active power and reactive power of the stator of the doubly fed induction generator can be decoupled by adjusting the direct-axis component and quadrature-axis component, respectively.
When the stator flux-linkage orientation is used, the stator flux linkage coincides with the synchronous rotating coordinate system on the direct axis, and the quadrature-axis flux-linkage component is zero. Combining with the stator flux-linkage equation, the relationship between stator current and rotor current can be obtained, which can be expressed as follows: e relationship between the rotor voltage and the rotor current can be obtained from the aforementioned simultaneous flux-linkage equation and the stator-rotor voltage equation.
where R represents the resistance and S represents the slip ratio. e relationship between the rotor voltage and the stator current is established based on formulas (8) and (9). e control of active power and reactive power on the stator side can be realized by controlling the rotor voltage by the control system.

Transient
Model. When using the pulse width modulation (PWM) converter, the power factor of the PWM converter can be controlled to be close to 1.0. at is to say, the active power of the doubly fed wind turbine is approximately equal to the reactive power, which can be expressed as If the wind turbine works under the status of constant power factor, it can be expressed as e relationship between the power of the doubly fed wind turbine and the active power on the stator side can be obtained by equations (3)-(11), which can be expressed as If the flux linkage of the stator is neglected, the rotor transient model can be expressed as From the above detailed deduction and analysis, one can see that these 13 state variables (slip ratio S (no unit, just a ratio), pitch angle β (unit: 0), wind turbine torque T m (unit: N·m), rotor mechanical torque T t (unit: N·m), electromagnetic torque T e (unit: N·m), stator direct-axis current I ds (unit: A), stator quadrature-axis current I qs (unit: A), rotor direct-axis current I dr (unit: A), rotor quadrature-axis current I qr (unit: A), rotor direct-axis voltage U dr (unit: V), rotor quadrature-axis voltage U qr (unit: V), direct-axis transient potential E d (unit: V), and quadrature-axis transient potential E q (unit: V)) can describe the actual operation process of DFIGs. When some special faults or wind speed disturbances occur, the control system of DFIGs can be controlled according to the initial values and the variations of these 13 state variables, whose initial values can be used to describe the initial operating state of DFIGs.

Dominant Variable Hierarchical Clustering Algorithm
Based on the theoretical analysis and formula deduction in Section 2, 13 state variables of wind turbines during operation can be calculated only under the consideration of few parameters, such as wind speed, power factors, nameplate parameters and power characteristic curves of the wind turbines, and so on.
In practical application, the analysis and calculation of 13 state variables will be difficult for huge computational complexity, especially for large-scale wind farm. erefore, this paper extracts the dominant variables using principal component analysis and considers them as clustering indexes, which can effectively reduce the computational complexity. is method can not only reduce the data dimension but also ensure the information completeness, so it has better performance than analyzing the original data of wind farm [34][35][36][37][38].
e detailed modeling process can be expressed as follows: Step 3.1: each sample point is defined as a class.
Step 3.2: calculate the distance between sample points within different classes using Euclidean distance formula.
Step 3.3: according to the above distance, the two classes with the shortest distance are merged into one class.
Step 3.4: the distance between the merged class and other classes is recalculated using Euclidean distance formula.
Step 3.6: the algorithm ends until all samples are merged into one class.
(iv) Step 4: according to the clustering results of dominant variables, wind farm is divided into several groups. (v) Step 5: according to the partition results, the equivalent model of wind farm is constructed.

Simulation Tool.
PSCAD is an electromagnetic simulation software platform, which has been widely used in many research fields. It can use the time-domain analysis method to solve the complete power system, and the operation results are very accurate. It also has an abundant component library and complex control modules, allowing users to flexibly establish circuit models for simulation analysis, such as inverter, transformer, rectifier, wind source, wind governor, wind turbine component, synchronous generator component, speed control component, and so on. In addition, it also provides a large number of simulation cases for learning. erefore, PSCAD software platform is especially suitable for model establishment and performance analysis of large-scale wind farm.

Initial Parameters. A wind farm model with 24
DFIGs is established using PSCAD software platform. e terminal voltage of wind turbine is 690 V, which is boosted to 33 KV by one-machine one-variable connection. Every 4 transformers are connected by overhead lines to public connection points, and then one boost transformer is connected to the infinite bus system. Actually, the initial wind speed data of 24 wind turbines have a great influence on the calculation cost of the algorithm. Literature [26] uses a K-means clustering algorithm to sort the wind farm, which is one of the state-of-the-art clustering methods. Meanwhile, the wake effect and topographic differences have also been considered in this literature, and the initial wind speed data of each wind turbine have been set as different values. To compare and quantitatively analyze the advantages and disadvantages of hierarchical clustering algorithm and K-means clustering algorithm, we use the same initial wind speed data as literature [26], as shown in Table 1, where all the data retain one significant digit after decimal point. e 13 state variables of 24 DFIGs are calculated according to initial wind speed data in Table 1 and equations (1)-(13) described in Section 2, and the calculation results are shown in Table 2, where all the data retain two significant digits after decimal point.
Because the transient time constant of motor stator is very small, I ds is ignored because its value is close to 0. Meanwhile, to obtain maximal value of wind energy utilization factor (C p ) of wind turbines, the pitch angle (β) should be equal to 0. Moreover, assuming that the wind speed is a fixed value, principal component analysis is carried out based on the above 13 state variables of 24 wind turbines. In general, according to the algorithm rules of principal component analysis, these variables can be considered as dominant variables as long as they can explain or express more than 85% variance. e analysis results are shown in Table 3.
As shown in Table 3, 94.331% variance can be expressed or explained by extracting only two factors, such as slip rate (S) and wind turbine torque (T m ). It demonstrates that these two variables can represent the actual operating state of wind turbines, so slip rate (S) and wind turbine torque (T m ) are regarded as dominant variables.

Numerical Simulations.
To verify the effectiveness of dominant variables, three clustering trees are obtained using hierarchical clustering algorithm, where 13 state variables, dominant variables, and wind speed are regarded as clustering indexes, respectively. Figure 2 shows three clustering trees.
References [3,26,27] indicate that it is most reasonable to divide 24 wind turbines into four groups. According to Figure 2, the clustering results can be obtained, as shown in Table 4. Figures 2(a), 2(b), and 2(c) correspond to (a), (b), and (c) in Table 4, respectively. From (a) and (b) in Table 4, one can see that clustering results are consistent when the 13 state variables and dominant variables are regarded as clustering indexes, respectively. From (a) and (c) in Table 4, one can see that the clustering results are quite different when the 13 Mathematical Problems in Engineering state variables and wind speed are regarded as clustering indexes, respectively. erefore, it is proved that it is effective and feasible to extract dominant variables as clustering indexes by principal component analysis.
Based on the above results, three models are established using PSCAD software platform, which use 13 state variables, dominant variables, and wind speed as clustering indexes, respectively. In order to compare the advantages and disadvantages of the above three models, the active power and reactive power output curves of wind farm are simulated in the case of the three-phase short-circuit fault on the system side or wind speed fluctuation, respectively. Because the exact data of large-scale wind farm are not readily available in real time, we set the wind speed of each wind turbine as a non-zero constant value, and the values of each wind turbine are not exactly equal during simulation process, and the detailed value is shown in Table 1. erefore, the active power and reactive power output curves of wind farm are continuous but relatively flat, where the X-axis is running time of wind turbine (unit: s). Meanwhile, the faults or wind speed fluctuations are superimposed when the wind turbine is running, not stationary. If this method is applied to real measurements, just only reset the initial wind speed data of each wind turbine and change the wind speed from a constant value to a variable value during the actual operating process. is method is also valid, but the computational cost will be largely increased. Moreover, the output curve is no longer so flat but fluctuates a lot.  It is worth mentioning that only a part of the curves with faults or wind speed fluctuations is captured to show the differences among three models, while the curves in other periods are not captured.

ree-Phase Short-Circuit Fault on System
Side. e simulation curve from the beginning stage to the stable stage is not captured. When the simulation curve reaches a stable state and lasts for 14 seconds, a man-made three-phase short-circuit fault is imposed to the system side, which lasts for 0.2 seconds. After 14.2 seconds, the output curve is relatively stable and there is no abnormal change. Hence, we only capture the curves for a period of time around 14 seconds. Within this period of time, the active power and reactive power output curves of wind farm are shown in Figure 3.   From Figure 3, one can see that the active curve and reactive curve of dominant variables almost coincide with that of 13 state variables, which demonstrates that it is reasonable to extract the dominant variables using dominant variable hierarchical clustering algorithm proposed in this paper. e dominant variable can represent actual operating state of wind turbines in wind farm in the case of the threephase short-circuit fault on system side. On the contrary, there is a large error between wind speed curve and 13-statevariable curve, which demonstrates that it is inappropriate to only take the wind speed as the clustering index.

Wind Speed Fluctuation.
e simulation curve from the beginning stage to the stable stage is not captured. When the simulation curve reaches a stable state and lasts for 14 seconds, a man-made gust fluctuation is imposed, which lasts for 2 seconds and the maximum wind speed is 6 m/s. Because of the short lag and persistence effect of wind speed fluctuation, the curve is relatively stable after 20 seconds. Hence, we only capture the curves from 11 seconds to 20 seconds. Within this period of time, the active power and reactive power output curves of wind farm are shown in Figure 4.
From Figure 4, one can see that the active curve and reactive curve of dominant variables almost coincide with that of 13 state variables, which demonstrates that it is reasonable to extract the dominant variables using dominant variable hierarchical clustering algorithm proposed in this paper. e dominant variable can represent actual operating state of wind turbines in wind farm even in the case of wind speed fluctuation. On the contrary, there is a large error between wind speed curve and 13-state-variable curve, which demonstrates that it is inappropriate to only take the wind speed as the clustering index. Figures 3 and 4, the following conclusions can be drawn:

Results and Discussion. From
(1) In the case of the three-phase short-circuit fault on system side or wind speed fluctuation, the maximal relative errors of output power curve of wind farm between the wind speed model and 13-state-variable model and between the dominant variable equivalent model and 13-state-variable model are 50.14% and 9.9%, respectively. (2) Reference [19] indicates that the maximal relative error of output curve of wind farm between the neural network model and 13-state-variable model is 12.1% and between the support vector machine model and 13-state-variable model is 11.4%, respectively. Reference [33] indicates that the maximal relative error of output power curve of wind farm between the K-means clustering model and 13-statevariable model is 18.06%. (3) Compared with the wind speed model, K-means clustering model, neural network model, and support vector machine model, the maximal relative error of the dominant variable equivalent model is the smallest, which demonstrates that this model is more accurate and much closer to the actual operating state of wind farm. (4) e dominant variable can also represent actual operating state of wind turbines in wind farm, even in the case of the three-phase short-circuit fault on system side or wind speed fluctuation. erefore, it is reasonable to extract slip rate (S) and wind turbine torque (T m ) as dominant variables from 13 variables, and it is also reasonable to classify 24 wind turbines using hierarchical clustering algorithm.

Conclusions
In this paper, the mathematical model of the doubly fed induction generator is established under the consideration of the actual industrial applications of large-scale wind farm, and 13 state variables which can represent the dynamic operation process of wind turbines are calculated. e principal component analysis is proposed to extract dominant variables from 13 state variables, which can be used as clustering index. en, three equivalent models are established, which use 13 state variables, dominant variables, and wind speed as clustering indexes, respectively. In order to compare the advantages and disadvantages of the above three models, the active power and reactive power output curves of wind farm are simulated in the case of the three-phase short-circuit fault on the system side or wind speed fluctuation, respectively. e simulation results demonstrate the following conclusions. (1) It is reasonable and effective to extract dominant variables by principal component analysis. (2) e dominant variable equivalent model has higher accuracy than the wind speed model, K-means clustering model, neural network model, and support vector machine model. (3) When the scale of wind farm increases, the advantage of this model will be more obvious. (4) e research result is especially suitable for characteristics analysis of largescale wind farm connected to power grid. e question of how to conduct the systematic experimental research on large-scale wind farm, establish a better model to be as close to the actual operation state of large-scale wind farm as possible, and quantify and evaluate the errors between various equivalent models, complete models, and actual large-scale wind farm will be focus of future research work.
Data Availability e data supporting the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.