A Novel Automatic Generation Control Method Based on the Ecological Population Cooperative Control for the Islanded Smart Grid

To achieve automatic generation control coordination in the islanded smart grid environment resulted from the increasing penetration of renewable energy, a novel ecological population cooperative control (EPCC) strategy is proposed in this paper. The proposed EPCC, based on the new win-loss criterion and the time tunnel idea, can compute the win-loss criterion accurately and converge to Nash equilibrium rapidly. Moreover, based on a multiagent system stochastic consensus game (MAS-SCG) framework, a frequent information exchange between agents (AGC units) is implemented to rapidly calculate optimal power command, which achieves the optimal cooperative control of the islanded smart grid. The PDWoLF-PHC(λ), WPH strategy (wolf pack hunting), DWoLF-PHC(λ), Q(λ)-learning, and Q-learning are implemented into the islanded smart grid model for the control performance analysis. Two case studies have been done, including the modified IEEE standard twoarea load frequency control power system model and the islanded smart grid model with distributed energy and microgrids. The effectiveness, stronger robustness, and better adaptability in the islanded smart grid of the proposed method are verified. Compared with five other smart ones, EPCC can improve convergence speed than that of others by nearly 33.9%–50.1% and the qualification rate of frequency assessment effectively by 2%–64% and can reduce power generation cost.


Introduction
Microgrid is an effective way to improve the utilization and permeability of distributed energy, which has attracted widespread attention from many researchers [1][2][3].However, the lack of support from the power grid and the uncertainty of environment and load fluctuations make the control strategy of microgrid a focus research point [4,5].The concept of a virtual synchronous generator in microgrid was proposed in [6], and the feasibility of applying centralized frequency control of a traditional power system into microgrid was analyzed in detail.In [7], a centralized automatic generation control (AGC) controller based on reinforcement learning in an island operation mode was proposed, which realized the AGC and frequency regulation in microgrid.However, considering utilization of distributed energy, it is difficult to realize cooperative control between provincial dispatching and regional dispatching in AGC-centralized control mode [8].As to the control method, the reinforcement learning has been applied into traditional centralized AGC for interconnected power grid to solve the stochastic disturbance caused by distributed energy access to the power grid in our previous studies [9][10][11][12][13][14].
However, all the above are centralized control methods, which require lots of remote information and have slow dynamic response and dis-satisfactory control performance of the centralized control method.Consequently, a study of the distributed control method becomes particularly necessary.In [15], the energy storage system in microgrid based on the distributed control multiagent consensus method was proposed, which solved the active power fluctuation on the common coupling point.Then a distributed stable modular control method for independent microgrid was put forward in [16], which solved the stability and convergence of complex microgrid.Moreover, it demonstrated in [17] that a distributed control method for daily voltage and daily power could handle the nonlinear integer programming in distribution grid.
The authors have also completed some previous studies on distributed control.In the authors' published paper [18], based on the heterogeneous multiagent system stochastic game (MAS-SG) principle, the decentralized win or learn fast policy hill-climbing(λ) (DWoLF-PHC(λ)) was proposed to obtain dynamic optimization control on AGC total command, in which average mixed strategy is used instead of equilibrium strategy.However, the total power command of the provincial dispatch center was achieved through a fixed proportion of the adjustable capacity rather than a dynamic optimization, and multisolution problem may emerge when agent number explodes, which may lead to a severe system stability collapse.So a method should be sought to solve the above problem.
Homogeneous multiagent system collaborative consensus (MAS-CC) is not only used in military, shipping, robot, and so on but also used in the power system control field [19].In addition, the incremental consensus acceleration algorithm was proposed to obtain optimal operation of microgrids in [20].In [21], the problem of decentralized autonomy for economic dispatch was effectively resolved through a collaborative dynamic agent framework.It was demonstrated in [22] that the PI controller was widely used to obtain the total power command, while the homogeneous MAS-CC theory was used to dynamically dispatch the total power command.However, the distributed dynamic optimal control is usually ignored in the study of the total power command.There are so rare available literatures that included a study on the dynamic optimal control and dispatch of AGC total power command simultaneously, which means a true intelligence from the whole and the part.
Therefore, this paper attempts to explore an AGC method with a hierarchical and distributed control (HDC) structure to solve the above problem.Based on authors' previous work [23][24][25], a novel multiagent system stochastic consensus game (MAS-SCG) framework was designed through the combination of MAS-SG and MAS-CC frameworks to solve the basic problem of "homogeneous/heterogeneous multiagent mixed stochastic game."Based on this framework, an ecological population cooperative control (EPCC) strategy is proposed, which can realize the total cooperative control and optimization of a distributed HDC system, to resolve the multisolution problem and stochastic disturbance problems arising from distributed energy access.Two case studies have been done, including the modified IEEE standard two-area load frequency control power system model and the islanded smart grid model with distributed energy and microgrids.The effectiveness, stronger robustness, and better adaptability in the islanded smart grid of the proposed algorithm are verified.Compared with five other smart methods, EPCC can improve convergence speed and the qualification rate of frequency assessment and can reduce power generation cost.

AGC Control Framework Based on HDC Structure
Taking the high voltage DC separatrix as the boundary, the large power grid can be virtually divided into multiple small regional power grids through a graph cut method.Figure 1 describes the islanded smart grid with an HDC structure [26], which can obtain the total power command through the game among each area, along with the unit's own optimal power command through the communication over each unit with its adjacent unit.The distributed energy group is regarded as a "virtual generation ecosystem (VGE)" in Figure 1.Here, the ecosystem indicates that various distributed energies are equivalent to a natural biological population, whose characteristics can be used to solve the control system.Cyber connection refers that each VGE area will be disconnected from the main power grid automatically into the island operation when a serious failure occurs in the system; the physical connection refers that each VGE area maintains the system steady-state operation through a physical and network information connection.

Ecological Population Cooperative Control
The ecological population cooperative control (EPCC) proposed in this paper is developed by the combination of MAS-SG and MAS-CC to get the distributed equilibrium solution for the islanded smart grid, which can obtain global control and optimization of this grid.

MAS-SG Theory.
Based on the MAS-SG, the novel PDWoLF-PHC(λ) with the idea of time tunnel is put forward to obtain dynamic optimal power command, such that the optimal control for the islanded smart grid is acquired.The win or learn fast (WoLF) principle has already been studied thoroughly by many scholars.The learning rate will be accelerated when the player fails and will be decelerated when the player wins to maintain the original strategic advantage [27].The player's win-loss is determined by comparing the current strategy with the average strategy.But in the 2 × 2 game, the player cannot accurately calculate over WoLF win-loss criterion.The decision of its extended algorithm can only be gained based on a valuation equilibrium reward, such as WoLF-PHC.Therefore, an improved WoLF version, policy dynamics-based WoLF (PDWoLF) principle, was put forward in [28], in which the decision change rate and the decision space slope value were adopted to be the assessment factors.If the product of them is less than 0, the player wins.
In [28], by combining the policy hill-climbing (PHC) with PDWoLF, the extended PDWoLF-PHC algorithm is proposed.With variable learning rate, the algorithm converges to the optimal solution by reacting to the environmental changes and adjusting the adaptive self-strategy in the multiagent system.The algorithm is rational as well as convergent.In general, WoLF-PHC can estimate the value directly to acquire decisions based on the valuation 2 Complexity equilibrium reward.But the way of comparison between the average strategy and the current strategy cannot be used as a win-loss criterion in more than 2 × 2 games.However, PDWoLF-PHC can directly obtain the decision according to the dynamic development of the joint trajectory in its phase space.This principle provides PDWoLF-PHC a faster convergence, a lower decision-making error rate, and a better stability of the global learning process.For a given agent, the win-loss criterion of PDWoLF-PHC is determined by two parameters φ win and φ lose .Let the agent in state s k and with reward function R, based on a mixed strategy table π s k , a k , and after an exploration action a k , it will transit to state s k+1 .The updating rules of π s k , a k are as follows: where Δ s k a k is variable quantity during the strategy update, A i is the number of action under state s k , φ is the variable learning rate, and φ lose > φ win .The updating rule is described as The PDWoLF-PHC(λ) algorithm which integrates PDWoLF-PHC [28] and time tunnel idea is put forward in the paper, and it is based on the Q-learning framework.Q-learning [29] presented by Watkins in 1989 is a reinforcement learning algorithm with a strong self-learning ability and can obtain the optimal solution through 3 Complexity continuous trial-and-error and environmental interaction.The optimal target value function V π * s and strategy π * s are as follows: where A is the set of actions.
The time-varying multi-step backtrack eligibility trace [30] can be considered a time tunnel.The frequency of each joint action strategy is recorded into the time tunnel to update the iterative Q value of each action strategy.Furthermore, in each iterative process, the joint state and action are recorded into the time tunnel, which gives the reward and punishment of the multistep historical decision-making in the learning process.The Q function and time tunnel are recorded in the form of a twodimensional state-action lookup table.The frequency and Update the time tunnel element (5); let e(s k , The islanded smart grid  Complexity the recency information of the historical decision-making process are combined in the time tunnel to obtain the optimal Q function of the AGC controller.The multistep information updating mechanism of the Q function is obtained by the backward valuation of the time tunnel. In the paper, the SARSA(λ) algorithm [31] based on the time tunnel idea is chosen.The tunnel time based on SARSA(λ) is expressed as where e k s, a is the time tunnel at kth step iteration under state s and action a, γ is the discount factor, and λ is time tunnel attenuation factor.The agent calculates the evaluation of the current value function errors through reward value R obtained in the current exploring, which is given as where R s k , s k+1 , a k is the agent's reward function from state s k to s k+1 under the selected action a k , a g is the greedy action strategy, ρ k is the Q function error of the agent at the kth iteration, and M k is the evaluation of Q function error.
Q λ algorithm [32] is updated iteratively as follows: where α is the learning rate.
With sufficient trial-and-error iterations, the state value function Q k s, a will converge to Q * matrix with the probability of 1, and finally, an optimal control strategy represented by Q * matrix will be acquired.

MAS-CC Theory.
Based on the MAS-CC, the consensus algorithm based on the equal incremental principle is adopted to achieve dynamic optimal AGC unit dispatch, so that the optimization for the islanded smart grid system is realized.

Graph Theory. The topology of the MAS can be expressed as a digraph
and the weighted adjacency matrix B = b ij R n×n , where v i represents the ith agent and the edge stands for the relationship among agents; constant b ij b ij ≥ 0 is the weight factor among agents.If there is a connection between any two vertexes, then the graph G is called a directed strongly connected graph.The Laplace matrix L = l ij R n×n of digraph G is given as where L determines the topology of MAS.

The First-Order Consensus Algorithm of a Discrete
System.For the digraph G, an MAS consisting of n autonomous agents is treated as a node.The purpose of the consensus algorithm is to obtain a consensus among each agent and to update state in real time after communicating with adjacent agents.Due to the communication delay among agents, the first-order consensus algorithm of a discrete system can be written by where x i is the state of the ith agent, k represents the discrete time series, and d ij k denotes the i, j item of the row stochastic matrix D = d ij k ∈ R n×n at discrete time k, which is given as The collaborative consensus can be achieved if and only if the digraph is strongly connected on the condition of continuous communication and constant gain b ij.

Consensus Algorithm Based on Equal Incremental
Principle.According to (9), the consensus algorithm based on equal incremental principle is expressed as When the power deviation is introduced to the consensus update of the chief unit who communicates with other units and sends an optimal power command to them [23], which meets the power constraints, it can be calculated by where the ΔP error is the difference between the total power and the total regulation power of all units, which is calculated by Therefore, considering the consensus of the generation constraints, the equation is updated as follows: where ω i,lower and ω i,upper are the consensus variables of the ith agent.
5 Complexity 3.2.4.Virtual Consensus Variable.From ( 14), it can be seen that the update of the consensus variable is restricted by the maximum and minimum of the regulation capacity of the unit.Basically, if the unit capacity exceeds the active power limitation, this limitation will be selected as a consensus variable and will not update anymore.The jump change of update rules means that the dimensions of the topology matrix D and its element d ij will change frequently.In addition, it is necessary to seek an effective method to solve real-time varying topology to meet the demand of plugand-play for the islanded smart grid.
Hence, the virtual consensus variable is proposed in this paper to deal with the above issue.As shown in (14), it is not necessary to consider the unit power constraint under a self-update condition, so that the amount of calculation can be greatly reduced.
Moreover, the real consensus variable ω i can be achieved by the virtual connection between one and reserve unit through virtual consensus variables ω i,virtual along with correction on power constraints.So plug-and-play function can be accomplished without any further modification of the system topology.
The real consensus variable ω i is obtained as 3.2.5.AGC Power Dispatch Model for the Islanded Smart Grid.In the islanded smart grid, the generation cost is chosen as the consensus variable for all units, which is usually expressed by the following equation: where P Gi represents the active power of the ith unit, C i is the power generation cost of the ith unit, and the positive constants a i , b i , and c i are the coefficients of the power generation, respectively.Therefore, the power generation cost for the power dispatch of the specified AGC is given as follows: where P Gi,actual is the actual active power of the ith unit, P Gi,plan represents the planned generation power of the ith unit, ΔP Gi means the AGC regulation power of the ith unit, and the positive constant σ i , β i , ψ i indicates the dynamic coefficient under the power disturbances, in which σ i = a i , β i = 2a i P Gi,plan + b i , and ψ i = 2 a i P 2 Gi,plan + b i + c i .For a system consisting of n AGC units, the objective function of AGC can be written by where C total is the total actual generation cost, ΔP Σ is the total power, and ΔP min Gi and ΔP max Gi are the minimum and maximum of regulation power, respectively.
According to the equal incremental principle, C total will reach the minimum when all partial derivatives of the generation cost to the AGC regulation power of each unit are equal.The constraint equation is established as dC P G1,actual dΔP G1 = dC P G2,actual dΔP G2 = ⋯ = dC P Gn,actual dΔP Gn = ω, where ω denotes the equal incremental rate of generation costs.ω is selected as the MAS consensus variable which is calculated as follows: where ω i is described as the equal incremental rate of the ith unit generation cost.3.4.1.Reward Function Selection.In general, the absolute value of the frequency deviation Δf can affect the longterm benefit of the control effect and restrain the power fluctuation, while the effect of the energy management system to the economy is considered inside generation costs.As a result, the weighted sum of Δf and C instantaneous is chosen as the reward function, in which the greater  6 Complexity weighted sum will lead to the smaller reward.The reward function is chosen as where Δf and C instantaneous represent the instantaneous absolute values of the frequency deviations and the actual generation costs of all units in the kth iteration, respectively.μ and 1 − μ are the reward weighted ratios of Δf and C instantaneous , where μ = 0 5.

Parameter Setting.
In the paper, the design of the control system requires a reasonable set of four parameters λ, γ, α, and φ.Through the repeated simulations and trial-anderror, it proves that pretty good effects can be obtained by setting the parameters as shown in Table 1.When participating in load primary frequency modulation (FM), the frequency linear droop is adopted by EVs.When the frequency fluctuation is detected, the droop control will change the charge and discharge power of EVs by the frequency deviation in a certain ratio, as shown in

Example Analysis
where ΔP droop denotes the charging power of the EVs due to the droop control change, k p is the characteristic coefficient of EVs, and Δf is the system frequency deviation.
If the frequency of the system is automatically restored to operate at the rated frequency, secondary regulation will be needed, which is similar to the secondary FM of the traditional power system.According to the integral frequency method, the integral controller is added to the EVs model for meeting the demand for a transient response, and the dead zone is applied to it for avoiding frequent discharge/charge.The EVs frequency control model is established in Figure 3, in which the frequency deviation integral signal is obtained after the deviation signal passing through the integrator, and traditional unit and EVs are distributed according to a certain proportion, where α EV is the integral coefficient of EVs.Considering the delay effect in the communication and control, the 7 Complexity first-order inertia link is adopted to simulate the delay in the control process of EVs, where T EV indicates the charge and discharge time.
In order to test the control performance of the proposed PDWoLF-PHC(λ), the improved IEEE standard two-area LFC power system model is chosen, whose parameter settings are from the literature [33].After adding the static frequency characteristic model of EVs, the frame structure of the improved model is shown in Figure 4.
The work cycle of the AGC is 4 sec with a 20 sec time delay T s in the secondary FM.Note that PDWoLF-PHC(λ) has to undergo sufficient prelearning through offline trial-and-error before the final online operation, which involves a mass of exploration in CPS state spaces to optimize Q function and state value function.In the prelearning stage, a continuous sinusoidal load disturbance with a period of 1200 sec and an amplitude of 1000 MW is applied to the improved model.The simulation results of the typical prelearning procedure of the PDWoLF-PHC(λ) controller are given by Figure 5.In Figure 5(a), the output of the PDWoLF-PHC(λ) controller has completely tracked the load disturbance after a trial-anderror of about 2530 sec.In addition, Figures 5(b 4.2.The Islanded Smart Grid Model.As shown in Figure 6, the islanded smart grid model which contains both distributed energy (small hydro power, wind farms, biomass, etc.) and several typical microgrids (hybrid diesel generator-wind, hybrid micro gas-photovoltaic, etc.) has been built, where inertia constant H and load damping coefficient D are equal to 20 sec and 1 Hz, respectively.
Considering that the PDWoLF-PHC(λ) controller is used to obtain total power command in the first stage of AGC, the output of the controller is obtained by frequency deviation value and regulation cost.The islanded smart grid contains 5 hydropower units, 2 biomass units, 6 micro gas turbine units, 2 fuel cell units, 4 diesel generator units, and other units with uncontrollable generation power, of which total regulation power is 2760 kW.Note that photovoltaic, wind farms, and EVs are considered disturbance loads and not included in FM, so the model is simplified in a certain degree.The models for small hydropower units, biomass generators, micro gas turbines, fuel cells, diesel generators, flywheel energy storage, and so on are applied according to typical models in [34][35][36][37], respectively.The corresponding models of photovoltaics are built by imitating light intensity changes of the full day in [38]; the wind farm model is established by adopting stochastic wind of the finite bandwidth white noise with a 3 m/sec cut-in wind speed and a 20 m/sec cut-off wind speed; the model of EVs access to power grid is selected from [33].In addition, the relevant parameters  2, are taken from the literatures [23,24,39].Moreover, each regulation unit has a corresponding agent, while connection weight b ij between agents is set as 1.

Prelearning of Model.
Note that EPCC has to undertake sufficient prelearning through offline trial-and-error before the final online operation; a given continuous sinusoidal load disturbance (the light blue line in Figure 7(a)) is applied.The prelearning result of the islanded smart grid model is demonstrated in Figure 7, and it is obvious that the EPCC can converge to the optimal strategy.
Besides, in this optimal strategy, Q matrix 2 norm 0 0001 is a specified positive constant) is chosen as the termination criterion for the prelearning.Both the Q values and look-up table will be saved after prelearning to ensure the application of EPCC into a real power system.The deviation convergence of the Q function during the prelearning process is shown in Figure 8, in which the convergence speed of EPCC is faster than those of other algorithms by nearly 33.9%-50.1%.

4.2.2.
Step, Impulsive, and White Noise Load Disturbance.For online operation mode, the step load disturbance (the light blue line in Figure 9(a)) is introduced into the model to simulate an often occurring sudden load increase in the islanded smart grid, taking the 24-hour load disturbance as the assessment period to evaluate the control performance of the EPCC strategy.
Six types of controllers are tested: EPCC strategy, WPH [25], PDWoLF-PHC(λ), DWoLF-PHC(λ) [18], Q(λ)-learning [32], and Q-learning [29].Figure 9 shows the control performances of different methods under step load disturbance.Figure 9(a) indicates that the EPCC strategy can quickly track a given power curve.And Figure 9(b) presents that the overshoots of six controllers are around 2.6%, 8.3%, 2.8%, 3.3%, 4.8%, and 4.9%, respectively, while the average of Δf is 0.0013 Hz, 0.0017 Hz, 0.0033 Hz, 0.0065 Hz, 0.0413 Hz, and 0.0452 Hz, respectively.Compared with other smart methods, EPCC can decrease overshoots than that of others by 0.2%-5.7%and Δf by 0.0004-0.04Hz.It can be seen that the EPCC controller has a significant control effect on Δf with less output fluctuation, which can provide better control performance for AGC units in the condition of reducing control costs and the unit abrasion.
A more practical operation is considered in this study, which can further verify the control performance of the proposed strategy.The impulsive load disturbance (the light blue line in Figure 10(a)) is introduced into the islanded smart grid model to simulate a series of sudden regular load increase and decrease, and white noise load disturbance (the light blue line in Figure 11(a)) is   11 Complexity disturbances and white noise load disturbances more quickly and accurately and has small output fluctuations, good stability, and accuracy.
In addition, considering Δf , if 50 ± 0.2 is selected as the operating frequency range of the islanded smart grid, the evaluation of the frequency index of the different algorithms under impulsive load disturbance and white noise load disturbance is shown in Table 3. From the data in the table, it can be seen that compared with other methods, EPCC significantly reduces the average value of Δf under impulsive load disturbance, and the qualification rate increases by 4.52% to 11.09%.The average value of Δf in EPCC under white noise load disturbance is decreased by 0.0005-1.6484Hz, and the maximum value is decreased by 0.0404-90.8175Hz.The standard deviation is decreased by 0.0023-1.2359Hz, and the frequency qualification rate is increased by 2%-64%.It is further proved that EPCC has optimal control performance under load disturbance conditions, as well as faster dynamic optimization speed and stronger robustness.

Stochastic Load Disturbance.
A real-time simulation of 24-hour stochastic load is conducted in the islanded smart grid model, in which the stochastic load disturbance consisting of square wave, wind farms, photovoltaics, and electric vehicles can be regarded as a square stochastic load with a cycle of 3600 sec and a disturbance amplitude smaller than 2000 kW.
The active power of wind farms, photovoltaics, and EVs produced during 24 hours is shown in Figure 12(a).Figure 12(b) illustrates that the total active power can accurately and quickly track the load disturbance.Note that the peak of AGC active power is used to balance the stochastic power disturbance of wind farms, photovoltaics, and EVs.The 24-hour power regulation for each type of AGC is given by Figure 12(c).It can be seen from the figure that for a positive disturbance, small hydropower plants and micro gas turbines with low regulation cost will be regulated positively at first; otherwise, biomass units and diesel generators with high regulation cost will be regulated negatively at first.Due to the output power of the AGC unit, the principle of equal increment is met.Therefore, each unit can achieve the economic dispatch.
For further verification of the application of EPCC, the simulation comparison of the WPH [25], gray wolf optimizer (GWO) [40], PROP method [41], quadratic programming (QP) [42], and genetic algorithm (GA) [43] has been made here.12 Complexity The generation costs of different algorithms and the 24-hour total power generation cost are represented in Figure 13.As shown in Figure 13(a), the generation costs of PROP are the highest among the six algorithms while those of EPCC are the lowest.In Figure 13(b), the EPCC can save about $8878 than that of PROP.
Consequently, the EPCC has better adaptability and selflearning capability than other algorithms in various operation conditions, especially when the system is affected by the stochastic load disturbance.Based on the application of both joint decision actions and historical state action, EPCC uses the condition that the product of the decision change rate and the decision space slope value is less than 0 to design variable learning rate, so that win-loss judgment criterion can be calculated easily by EPCC without knowing the equilibrium strategy.This rate can adapt to the learners' learning rate of the instantaneous location in the joint strategic space, so cooperative control for the islanded smart grid model can be obtained.
Moreover, it is easy to obtain a related weight of each unit, which can dynamically update its Q function look-up table by experience sharing, so that the controller can properly and timely regulate its mixed strategy table to obtain the total optimal control performance.The real-time information interaction among multiagents ensures the convergence speed and robustness of the algorithm.The experimental results verify that the utilization rate of distributed energy has been effectively increased with reduced generation costs.

Conclusion
The contribution of this paper can be summarized as follows.
Considering the basic theory, mixed homogeneous and heterogeneous MAS-SCG problem is solved by the proposed EPCC strategy and the multisolution problem due to explosion in the number of agents is solved too.From an engineering application, the strategy can acquire the total optimal power command and dynamic optimal dispatch, so the disturbance caused by the access of large-scale distributed energy into power grid can be handled.
Based on the MAS-SG principle, a novel PDWoLF-PHC(λ) algorithm with new win-loss criterion along with the time tunnel idea is proposed to solve the agent problem without a strict knowledge system in comparison to the traditional MAS-SG system and also solve inaccurate calculation and slow convergence to Nash equilibrium under the traditional MAS-SG win-loss criterion in 2 × 2 game.And its effectiveness is verified by simulation on the improved IEEE standard two-area LFC power system model.13 Complexity When multimode disturbances such as step, impulsive, and white noise disturbances are introduced to the islanded smart grid model, compared with other smart methods, the proposed EPCC strategy has a faster convergence speed and can significantly improve the robustness and adaptability of the islanded smart grid, to increase the qualification rate of frequency assessment and decrease the cost of power generation.

Figure 1 :
Photovoltaics Evaluate the SARSA(0) value function error M k(6) Choose variable learning rate  (2)MAS-SGOutput the total power command ΔP Output power command of each unitApply a consensus algorithm(11) or(12) Calculate regulation power ΔP Gi(20) Calculate the consensus variable ω i and the unit regulation power ΔP Gi (14) function (21)Calculate the single-step Q function error (6)Let e k+1 (s, a)= e k (s, a);

Figure 2 :
Figure 2: The execution steps of the EPCC.

3. 3 .
EPCC Procedure.The execution steps of the EPCC are shown in Figure 2.

3. 4 .
AGC Based on EPCC.This section aims to design the AGC based on the EPCC strategy.During each iteration, the PDWoLF-PHC(λ) controls the current operation state online to update the value function and Q function and then executes an action based on the mixed strategy.

Figure 3 :
Figure 3: Static frequency control model for electric vehicles.
The controller output of different methods and given sinusoidal load disturbance The system frequency of different methods
The controller output of different methods and given step load disturbance The system frequency of different methods during step load disturbance

Figure 9 :
Figure 9: The control performance of different methods during step load disturbance.

Figure 10 :
Figure 10: The control performance of different methods during impulsive load disturbance.
) PDWoLF-PHC() (a) The controller output of different methods and given white noise load disturbance ) PDWoLF-PHC() (b) The system frequency of different methods during white noise load disturbance

Figure 11 :
Figure 11: The control performance of different methods during white noise load disturbance.
Photovoltaics Electric vehiclesWind farms (a) Active power of photovoltaics, wind farms, and EVs within 24 hours
Hourly power generation costs for different algorithms within 24 hours Comparison of generation costs

Figure 13 :
Figure 13: Comparison of the results of different algorithms.
otherwise, 2where Δ s k , a k is the decision change rate and Δ 2 s k , a k is the decision space slope value.If the product of the decision change rate Δ s k , a k and the decision space slope Δ 2 s k , a k is less than 0, then the agent wins and selects φ win , otherwise selects φ lose .Δ s k , a k and Δ 2 s k , a k are individually updated by

Table 1 :
Parameter values of EPCC.

Table 2 :
System parameters of units used in the islanded smart grid model.Figures 10 and 11show the controller output curves and frequency curves of different methods under impulsive load disturbance and white noise load disturbance, respectively.As shown in these two graphs, compared with other methods, the EPCC strategy can track impulsive load

Table 3 :
Frequency index assessment under impulsive load disturbance and white noise load disturbance.