Nowadays, haze has become a big trouble in our society. One of the significant solutions is to introduce renewable energy on a large scale. How to ensure that power system can adapt to the integration and consumption of new energy very well has become a scientific issue. A smart generation control which is called hierarchical and distributed control based on virtual wolf pack strategy is explored in this study. The proposed method is based on multiagent system stochastic consensus game principle. Meanwhile, it is also integrated into the new win-lose judgment criterion and eligibility trace. The simulations, conducted on the modified power system model based on the IEEE two-area load frequency control and Hubei power grid model in China, demonstrate that the proposed method can obtain the optimal collaborative control of AGC units in a given regional power grid. Compared with some smart methods, the proposed one can improve the closed-loop system performances and reduce the carbon emission. Meanwhile, a faster convergence speed and stronger robustness are also achieved.

Recently, the thermal power generation makes the environmental pollution more serious, especially the air pollution. Therefore, more and more clean energies such as wind and photovoltaics are continuously merged into the strongly coupling interconnected power grid [

In recent years, many scholars have devoted to the optimal control strategy of decentralized AGC [

The above literatures have some limitations that they only focus on the control strategy of the total power in the AGC. However, the dynamic optimal allocation of the total power is not involved. In fact, the modern power grid has gradually been developed into a hierarchical and distributed control (HDC) structure, which integrates the large-scale new energy. For this reason, a single control strategy is difficult to meet the requirements of control performance standards (CPS). Therefore, a hierarchical and distributed control based on virtual wolf pack strategy (HDC-VWPS) is proposed in order to attenuate the stochastic disturbance problem caused by massive integration of new energy to the power grid. The proposed strategy is based on multiagent system stochastic consensus game (MAS-SCG). It is divided into two parts. The first part is an AGC optimal control method which combines a new win-lose judgment criterion, policy hill-climbing algorithm (PHC) [

The rest of the paper is as follows. The SGC framework based on HDC structure is proposed in Section

Hierarchical reinforcement learning (HRL) [

The SGC framework based on HDC structure.

A HDC-VWPS is designed to coordinate and optimize the operation of GSGs in the SGC system with HDC structure through the integration of MAS-SG and MAS-CC.

Based on the MAS-SG framework, a PDWoLF-PHC(

The WoLF principle can meet the convergence requirement by changing the learning rate without sacrificing rationality, namely, learn quickly when losing and cautiously when winning [

It indicates that PHC algorithm can meet the requirement of the rationality in [

The eligibility trace is updated by

The

The win-lose criterion of PDWoLF-PHC(_{.} Strategy

In (

The MAS-CC framework is introduced into the HDC-VWPS to dynamically allocate the total power command to each unit.

The topology of MAS can be expressed as a directed graph

In a MAS, it is usually called collaborative consensus (CC) [

The CC algorithm can be achieved if and only if the directed graph is strongly connected on the condition of the continuous communication and constant gain

The ramp time is chosen as the consensus variable among all units in a GSG. A unit which has a higher ramp rate will be distributed with more disturbances. The ramp time of the

The ramp time of the

Then the ramp time of the GSG

In the condition of the total power command

As a ramp time CC algorithm among units is adopted, the power of some units may exceed their maximum power. At the same time, the smaller the unit maximum ramp time

The impact of energy management system (EMS) on the environment is considered, and carbon emission (CE) as part of the reward function is also introduced. Meanwhile, in the load frequency control (LFC), each regional power grid will control the generator set in this area according to its own area control error (ACE). The main purpose is the ACE is zero when the steady state is reached. Therefore, in the reward function, the weighted sum of CE and ACE is taken as the objective function. The reward function in GSG

A reasonable set of six parameters

The trace-attenuation factor

The discount factor

The Q-learning rate

The variable learning rate

The value of power error adjustment factor

The Overall HDC-VWPS Procedure Is Described in Algorithm

Initialize

Set parameters

Give the initial state

Repeat

(1) Choose an exploration action

(2) Execute the exploration action

(3) Observe a new state

(4) Obtain a short-term reward

(5) Update eligibility trace according to Eq. (

(6) Update Q function using Eq. (

(7) Select variable learning rate

(8) Compute

(9) Calculate

(10) Update the mixed strategy

(11) Obtain the total power

(12) Determine the ramp rate according to Eq. (

(13) Execute CC algorithm according to Eq. (

(14) Calculate the

(15) If the power limit is not exceeded, then execute step 17;

(16) Calculate

(17) Calculate the power error

(18) If

(19) Output the

(20) Set

In order to test the control performance of the proposed strategy, an IEEE-modified model with two-area LFC power system [

Modified power system model based on IEEE two-area LFC.

Model parameters of GSG units in the Hubei power grid.

GSG number | Plant types | Unit number | GRC (MW/min) | |||
---|---|---|---|---|---|---|

GSG1 | Coal-fired power plants | G1 | 200 | −200 | 5 | 0.99 |

G2 | 200 | −200 | 5 | 0.99 | ||

G3 | 200 | −200 | 5 | 0.99 | ||

G4 | 176.5 | 176.5 | 4.4 | 0.89 | ||

G5 | 300 | −300 | 7 | 0.99 | ||

G6 | 300 | −300 | 7 | 0.99 | ||

G7 | 300 | −300 | 7 | 0.99 | ||

G8 | 350 | −350 | 8.2 | 0.89 | ||

G9 | 185 | −185 | 4.6 | 0.99 | ||

Pumped storage power plant | G10 | 300 | 0 | 300 | 0 | |

G11 | 300 | 0 | 300 | 0 | ||

G12 | 300 | 0 | 300 | 0 | ||

G13 | 300 | 0 | 300 | 0 | ||

GSG2 | Coal-fired power plants | G14 | 220 | −220 | 5.5 | 0.89 |

G15 | 220 | −220 | 5.5 | 0.89 | ||

G16 | 600 | −600 | 12 | 0.89 | ||

G17 | 600 | −600 | 12 | 0.89 | ||

G18 | 300 | −300 | 6 | 0.99 | ||

G19 | 300 | −300 | 6 | 0.99 | ||

GSG3 | Coal-fired power plants | G20 | 300 | −300 | 6 | 0.99 |

G21 | 300 | −300 | 6 | 0.99 | ||

G22 | 300 | −300 | 6 | 0.99 | ||

G23 | 300 | −300 | 6 | 0.99 | ||

G24 | 600 | −600 | 12 | 0.89 | ||

G25 | 600 | −600 | 12 | 0.89 | ||

Hydropower plants | G26 | 150 | 0 | 150 | 0 | |

G27 | 150 | 0 | 150 | 0 | ||

G28 | 150 | 0 | 150 | 0 | ||

G29 | 150 | 0 | 150 | 0 | ||

G30 | 150 | 0 | 150 | 0 | ||

G31 | 150 | 0 | 150 | 0 | ||

G32 | 170 | 0 | 170 | 0 | ||

G33 | 170 | 0 | 170 | 0 | ||

GSG4 | Coal-fired power plants | G34 | 300 | −300 | 6 | 0.99 |

G35 | 300 | −300 | 6 | 0.99 | ||

G36 | 300 | −300 | 6 | 0.99 | ||

G37 | 300 | −300 | 6 | 0.99 | ||

G38 | 1000 | −1000 | 20 | 0.87 | ||

Hydropower plants | G39 | 1000 | −1000 | 20 | 0.87 | |

G40 | 300 | 0 | 300 | 0 | ||

G41 | 300 | 0 | 300 | 0 | ||

G42 | 300 | 0 | 300 | 0 | ||

G43 | 300 | 0 | 300 | 0 |

The work cycle of the AGC is set to be 4 s. Note that HDC-VWPS has to undergo a sufficient prelearning through off-line trial and error before the final online implementation. It includes extensive explorations in CPS state space for the optimization of Q-functions and state-value functions [

The prelearning of HDC-VWPS obtained in each GSG.

The average of 10-min CPS1

The average of 10-min ACE

The HDC-VWPS controller output

Furthermore, a

The convergence result of Q-function differences obtained in each GSG during prelearning.

The convergence result of Q(

The convergence result of HDC-VWPS

In order to evaluate the robustness of each algorithm, the control performances of DWoLF-PHC(

Control performance of four AGC controllers under a step load disturbance.

Controller output

ACE

CPS1

Control performance of four AGC controllers under a stochastic load disturbance.

Controller Output

ACE

CPS1

The stochastic white noise is used as the load disturbance after the prelearning process, in which the control performance of each algorithm obtained in each GSG is summarized in Figure ^{−4}~7.5851 × 10^{−4} Hz, and

Statistic performance of each GSG in the two-area LFC modified power system model.

Four-area model of Hubei power grid is shown in Figure

The interconnected network of Hubei power grid model in China.

Hubei power grid model.

The system includes coal-fired power plants, hydropower plants, and pumped storage power plants. The output of each plant is relative to its own governor, and the setting point of AGC is obtained according to the optimal dispatch. The long-term AGC control performance based on MA is evaluated by a statistic experiment with 30-day stochastic load disturbance. Four types of controllers are simulated, that is, Q-learning, Q(

Statistic experiment results obtained under the impulsive perturbation in the Hubei power grid model.

Statistic experiment results obtained under the white noise load fluctuation in the Hubei power grid model.

Figure

Figure

It can be seen from the simulation results that the HDC-VWPS has stronger adaptability and better control performance than that of other three methods. In each GSG area, the win-lose criteria of the unit depend on the sign of the product of

Based on the MAS-SCG theory, a novel HDC-VWPS method with new win-lose judgment criterion and eligibility trace is proposed to dynamically obtain the optimal total power and its optimal dispatch. Also, it can attenuate the stochastic disturbance caused by massive integration of new energy to the power grid.

Based on MAS-SG, a PDWoLF-PHC(

The simulation results verify the effectiveness of the proposed strategy using modified power system model in the IEEE two-area LFC and Hubei power grid model in China. Compared with other four smart methods, the proposed one can satisfy the CPS requirements and improve the performance of the closed-loop system. Also, it can reduce the CE and maximize the utilization rate of energy.

The data used to support the findings of this study are included within the article.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This work was supported by the National Natural Science Foundation of China (51707102 and 61603212).