Microgrid Group Control Method Based on Deep Learning under Cloud Edge Collaboration

Aiming at the economic benefits, load fluctuations, and carbon emissions of the microgrid (MG) group control, a method for controlling the MG group of power distribution Internet of Things (IoT) based on deep learning is proposed. Firstly, based on the cloud edge collaborative power distribution IoT architecture, combined with distributed generation, electric vehicles (EV), and load characteristics, the MG system model in the power distribution IoT is established. Then, a deep learning algorithm is used to train the features of the data model on the edge side. Finally, the group control strategy is adopted in the power distribution cloud platform to reasonably regulate the coordinated output of multiple energy sources, adjust the load state, and realize the economic operation of the power grid. Based on the MATLAB platform, a group model of MG is built and simulated. The results show the effectiveness of the proposed control method. Compared with other methods, the proposed control method has higher income and minimum carbon emission and realizes the economic and environmental protection system operation.


Introduction
With the continuous advancement of new energy power generation technology, communication technology, Internet technology, and other power industry technologies and new-generation information and communication technologies, the IoT technology and the distribution network are deeply integrated to form the Internet of distribution things and microgrids (MG). The use of networked supply and intelligent management technology can play a series role between the user side and distributed energy [1,2]. With the continuous development of power grid technology, MG is a relatively independent system, which can not only operate independently but also constitute a multienergy complementary intelligent MG group. Among them, the shortcomings of the intermittent power output of distributed power can be compensated by reasonable regulation, so as to ensure the quality and reliability of the power supply [3].
At present, MG still faces greater challenges in regulating distributed generation (DG), battery energy storage system (BESS) equipment, and loads [4,5]. Ref. [6] proposes a "source-storage-load" coordination balance algorithm based on deep learning, which enables the system and user load to achieve Nash equilibrium without prior information, and optimizes the MG's intelligent control capabilities. With the development of research on the mobile BESS characteristics of EV, it enters the MG as a special DG [7]. Ref. [8] constructed a real-time MG optimal energy management system, by using the random forest method to predict the EV driving mode to schedule the charging and discharging of the EV battery, which not only improves the consumption of distributed energy but also improves its utilization efficiency. Ref. [9] studies the energy management framework of intelligent MG and analyzes the energy optimization among household load, EV, BESS, and distribution network. Most of the control strategies proposed in the above literature are from the perspective of MG and use demand response to guide users to optimize the MG economy [10]. However, the uncertainty of EV and collaborative optimization of distributed energy in the MG group of power distribution IoT still need to be further studied.
Therefore, under the framework of cloud edge collaborative in power distribution IoT, a MG regulation method based on deep learning is proposed. Based on the established MG system model, as well as the system optimization objectives and constraints, the edge side training learning of the deep learning algorithm is used to regulate and control the MG group.

MG System Model in Power Distribution IoT
Combining edge computing with cloud computing, the cloud edge collaborative computing framework is constructed, and the power distribution IoT architecture based on cloud edge collaboration is established, as shown in Figure 1. Taking the edge computing group as the basic unit, according to the logic structure of cooperative autonomy between groups and cloud edge collaborative control, the mathematical model and training learning model of MG group computing are established; finally, the control optimization calculation of the MG group is carried out on the power distribution cloud platform.
Among them, the end device mainly collects the data of each MG for modeling; the edge node has the edge computing ability, collects the data of the end device and determines the optimization objectives and constraints, and trains and learns the data model based on the deep learning algorithm; the power distribution cloud platform uses the data information of each edge node and considers the target optimization model to achieve a larger scale. The optimal energy distribution of the MG group is proposed.
The topology structure of MG is shown in Figure 2, which mainly consists of wind turbine (WT), photovoltaic (PV) energy, BESS, gas generator, EV, fuel cell, energy conversion device, and users. In the MG system, MG is connected with the main network, and vehicle to grid is introduced. The role of vehicle to grid is to stimulate the charging of the vacant EV, so that it does not need to be charged during the peak load, which reduces the power supply pressure of the main network, and the electric energy stored in the EV can be sent to the main network, increasing the power supply in the system. As the power supply of MG, DG will change under the influence of weather and other factors, and the system will be adjusted accordingly. The power in the system eventually flows to the user.
2.1. Generating Unit Side. Renewable energy such as WT power generation and PV power generation is increasingly widely used in MG. At the same time, BESS can effectively solve the problem of intermittent output of distributed energy [11]. Therefore, the MG group adopts the WT/optical/storage/grid collaborative power generation mode.
The output of WT is closely related to environmental wind speed, wind cut-in and cut-out speed, and rated wind speed. PV output power P PV is determined by the output power of PV modules, solar irradiance, and ambient temperature under standard conditions. The battery next state of charge ðSOCði + 1ÞÞ is related to the current state of battery (SOCðiÞ).

End
Side EV Model. The randomness of EV is mainly reflected in the uncertainty of the time to access/leave the MG and the randomness of the initial SOC due to the driving distance. The end time and mileage of EV generally follow normal distribution [12]. Therefore, based on the Monte Carlo algorithm, the probability model of end time and mileage of EV is established, which is expressed as follows: where t is the end time of driving, σ t = 3:41, μ t = 17:47, d is the mileage, σ d = 3:24, and μ d = 8:92. EV in MG can be divided into dispatchable vehicles and schedulable vehicles according to whether the owners agree to participate in centralized control. Among them, disorderly charging is adopted for nonschedulable vehicles; that is, the owners charge by returning time and driving demand of EV in the next period; and orderly charging is adopted for schedulable vehicles; that is, under the time of use price mechanism, the owners can charge uniformly within the specified time [13,14].

End User Load
Unit. Based on the comparative analysis of users' usage habits and load types, user loads can be divided into base load, reducible load, translatable load, and interruptible load [15,16]. Among them, the base load is a necessary load and does not have the ability to adjust. The latter three are adjustable loads, which can be adjusted according to electricity price or other incentive policies. Electric water heater (EWH) and air conditioning (AC) are widely used and have BESS characteristics. Optimization strategies can be adopted to control their output power in peak power consumption as a representative of translatable load; its working range can be adjusted to the low power consumption period [17]. The user side of the MG system can adjust the load utilization through the electricity price mechanism.

Optimization Model and Control Strategy of MG Group in Power Distribution IoT
Under the time-of-use price mechanism, the overall load demand of users in the MG group of the power distribution IoT will inevitably change [18,19]. Therefore, under the cloud-side collaborative architecture, deep learning is used to control the MG group, rationally regulate the coordinated 2 Wireless Communications and Mobile Computing output of multiple energy sources, adjust the load status, and realize the economic operation of the power grid.

The Optimization Goal of Cloud-Side Collaboration
3.1.1. Daily Operating Cost. The operating cost of a MG group in a cycle is an important factor to improve the economic benefits of users, including its initial investment cost, daily operation and maintenance costs, and load transfer compensation after users participate in the time-of-use electricity price mechanism [20,21]. The optimization objective function is where C WT , C PV , and C BESS are the total operating costs of WT, PV array, and BESS, respectively, and C G is the interactive power cost of MG and large grid.

Heterogeneous Energy Synergy and Charging Power
Optimization. In the low-voltage distribution network, the sources of charging energy usually include BESS, WT, and PV. However, since PV and WT are greatly affected by environmental factors, if the BESS can be used to balance the impact of environmental factors, the fluctuation of charging power in the MG can be reduced [22,23]. The calculation formula of charging power is as follows: where t is the time, P t BESS is the charging power of BESS, P t WT is the charging power of WT, P t PV is the charging power of PV, P t charge is the total charging power of MG with upper and lower limits. p t e is the cost price, and p t WT and p t PV are the cost of WT power generation and PV power generation, respectively.

Constraint
where SOC EV,max is 0.95, SOC EV,min is 0.2, P EV,dis is the maximum discharge power of EV, and P EV,cha is the maximum charging power.

Supply and Demand Balance Constraints of MG.
In order to ensure the normal life of users, the power provided by MG should be balanced with the power required by users: where P load ðiÞ is the load demand at i time after MG participates in the control strategy.

Edge-Side Training Learning Based on Deep Learning
Algorithm. By combining the edge computing capabilities of edge nodes with the super perception of deep learning and the decision-making of reinforcement learning, the deep reinforcement algorithm can perform output control based on the analysis of input data, making it closer to the way people think [24,25]. Reinforcement learning is based on the Markov decision process (MDP), which makes the transition of the system at the next moment independent of the previous moment [26,27]. The deep learning algorithm uses the following function value update method to approximate the Q function: where α is the learning rate, φ is the neural network weight, γ is the discount factor, s is the system state, and a is the action strategy, by which α = 1, γ = 0:85. When training a neural network, use the mean square error to define the error function: Obtain the gradient of the error function in the φ direction, update the parameters by means of stochastic gradient descent, and obtain the optimal strategy on the basis of obtaining the optimal Q value. In the deep learning training process, if the selection action and the evaluation action come from the same Q value of the same network, the final result may have a large error due to overestimation. The dual deep learning calculates the maximum Q value in the main network for selection actions, and the target Q value calculation is performed in the target network, as shown in the fol-lowing formula: In order to alleviate the problem of model overestimation, the model usually needs to control a small difference range between the target Q value and the actual Q value difference, and this helps to improve the algorithm convergence speed.

Control Strategy of MG Group Based on Power
Distribution Cloud Platform. The MG group control strategy takes BESS, EV, gas storage, and time as system states, discretizes the original continuous MG operation process, and separates charging and discharging and other forms of electrical energy [28,29]. As the action strategy, assume that the current state is s t , the next state is s t+1 , the allowed action strategy is a, and the action process includes changes in equivalent parameters.

MDP Tuple Description.
The state space s consists of three parts: controllable battery s b , uncontrollable PV and load s PV,l , and time series s m : The reward function is a real-time reward function, which is aimed at evaluating a point in time information, and cannot explain the quality of the overall strategy. Therefore, it is necessary to define the state action value function to represent the long-term effect of the strategy on the state: 3.4.2. Control Strategy of MG Cluster. In the power distribution cloud platform system, the state input includes BESS battery storage capacity E, natural gas storage capacity G, EV storage capacity V, and time t. The discrete-time state quantity is 48. Different state variables have different ways to determine the action strategy a in MG. After the input state and action strategy are determined, the online learning can be synchronized [30]. The multiple iterations of the Q algorithm can make the Q value table tend to converge, so as to determine the optimal scheduling route. The overall flow of the algorithm is shown in Figure 3. The calculation of each state conversion income usually includes variable information such as selection environment, current time price, and natural gas price and then fills in the R matrix of the corresponding action under the state [30]. If there is no action corresponding to the state in R, the R value table is generated. In state s t , according to the BESS, natural gas storage, automobile power storage, and current time contained in the current MG group, determine the action A t that should be taken; then, the system will enter the next state.

Simulation Results and Analysis
In order to verify the effectiveness of the proposed MG group control method, a group model containing 4 autonomous MG was built in the real-time simulation platform MATLAB, and its topology is shown in Figure 4.
The rated voltage/frequency of the MG group is 380 V/50 Hz. The MG group includes PV module unit, WT power generation unit, and BESS, and the specific capacity is 400 kW, 600 kW, and 400 kW, respectively. According to the actual load demand of a city in China, the time-of-theart price mechanism is adopted for electricity sales and purchase in MG. According to the actual load demand of a city in China, the valley load period is from 22:00 to 5:00 the next day; 6:00 to 7:00, 11:00 to 12:00, and 17:00 to 18:00 are load sharing periods; the rest are peak load periods. The electricity purchase prices of peak, flat valley, and valley are 0.83 yuan/kWh, 0.49 yuan/kWh, and 0.17 yuan/kWh, respectively, and the electricity selling prices are 0.65 yuan/kWh, 0.38 yuan/kWh, and 0.13 yuan/kWh, respectively.

Regulation Results of Single MG.
The time-of-use electricity price mechanism is used to guide users to adjust the usage habits of adjustable loads in order to achieve the purpose of "peak cutting and valley filling." The overall load curve before and after optimization of the MG is shown in Figure 5.
It can be seen from Figure 5 that before optimization, the overall load curve of consumers fluctuates greatly, and the peak value of electricity consumption is concentrated in the period of high electricity price. After the user load participates in the control strategy, the overall load curve changes, showing that the daytime demand power decreases, while the night time demand power increases, and the load decreases during the peak period, thus reducing the peak valley difference and smoothing the load curve. It can be seen that after the energy regulation of MG, the total load energy consumption is reduced and the economy of the system is improved.

Optimization
Results of MG Group. By adjusting the optimal coordination mode of distributed energy, BESS, and load, the capacity utilization rate of the MG group can be improved, and the economic benefit can be improved. The results of heterogeneous energy optimization control are shown in Figure 6.
It can be seen from Figure 6 that during the period from 22:00 to 06:00 the next day, the WT of distributed energy has a large output. Under the condition of ensuring the normal load demand, the BESS charges. Since the power generated by distributed energy is greater than the load demand, the MG sells electricity to other loads in the MG group, so as to increase the economic benefits of users. However, during 11:00-16:00, during the peak period of MG power consumption, the output of WT is reduced, and the output of PV power generation is large. At the same time, due to the peak electricity price period, the BESS starts to cooperate with the WT and PV array to output at the same time, so as to reduce the consumers' purchase of electricity from the large grid; at the same time, the BESS stores the energy during the low electricity price period and when the distributed energy output has surplus. In the high electricity price period, it not only ensures the stability of MG power supply but also improves the consumption capacity of distributed energy,

Comparative Analysis of Different Methods.
In order to demonstrate the economic and environmental protection of the proposed method, it is compared with the methods in Refs. [6,8,9]. Among them, the economy and environmental protection are calculated quantitatively from the electricity purchase cost and carbon emission of MG, respectively. The product of the two is used as the evaluation index. The smaller the value is, the stronger the regulation ability is. The experimental results of the growth trend of economic and environmental protection of the MG group are shown in Figure 7.
As can be seen from Figure 7, at the beginning of the iteration, the performance of each algorithm is low, but with the increase of the number of iterations, the optimal solution is constantly approaching and finally tends to converge; the economic and environmental performance is optimal, which is about 24000 yuan * t, exceeding the benefit of the empirical learning algorithm. The algorithm used in Refs. [6,8] has a relatively small amount of calculation, so it can converge quickly, but the cost of purchasing electricity is very high, about 26500 yuan, and the overall regulation performance is poor. Ref. [9] grows rapidly in the initial iteration stage, but due to the lack of prediction for future strategies, it grows slowly in the later stage and has poor performance. It can be demonstrated that the proposed method has a good ability of energy coordination and optimization. Through the improved deep reinforcement learning algorithm control strategy, the economic and environmental protection of the system has been greatly improved.

Conclusions
With the promotion of distributed energy, the number of MG has increased dramatically, forming MG groups. In order to improve the coordination and optimization of MG group energy, a control strategy based on deep reinforcement learning is proposed. Based on the cloud-side collaborative power distribution IoT architecture, the system model of the MG is proposed and interconnected to construct the system architecture of the MG group. In addition, the edge-side training and learning of the deep reinforcement learning algorithm are used to control the MG group, rationally regulate the coordinated optimization of multiple energy sources, and realize the economic and environmental protection operation of the MG group. A MG group model was built on the MATLAB platform to conduct simulation    Figure 7: Growth trend of total economic income in MG group. 6 Wireless Communications and Mobile Computing experiments. The results show that the proposed method introduces the time-of-use electricity price mechanism to regulate load operation and achieve the purpose of peak shaving and valley filling, and the overall energy consumption is small, and the economic performance is better. Compared with other methods, the system has the smallest carbon emissions, maximizes the consumption of renewable energy, and realizes economical and environmentally friendly system operation.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflicts of interest.