Optimization Method of Power Equipment Maintenance Plan Decision-Making Based on Deep Reinforcement Learning

,


Introduction
In order to ensure the continuous and safe operation of power equipment, it is necessary to do a good job in preventive plan maintenance and repair of equipment and timely find and eliminate potential accidents. e arrangement of power equipment maintenance plan is an important measure to ensure the safe operation of power grid, reliable power supply, and improve the health level of equipment. With the continuous development of the power system, equipment maintenance is facing new challenges [1]. How to optimize maintenance plan, prevent over maintenance of equipment, save the maintenance cost, and improve the reliability of planned maintenance power supply is problem which is still have to be solved. On the one hand, the increasing number of power equipment makes the maintenance workload heavier, further increasing the maintenance cost. How to optimize the maintenance plan, prevent over maintenance of equipment, and reduce the maintenance cost is one of the goals to achieve the economic operation of the power grid. On the other hand, the economic development and people's daily life put forward higher requirements for power supply reliability, and planned maintenance is one of the reasons for the reduction of power supply reliability. erefore, the optimization of equipment maintenance plan and reasonable arrangement of equipment maintenance can ensure the safe and reliable operation of power equipment and improve the reliability of power supply. e maintenance plan of power equipment depends on manual preparation, which is limited by the professional technical ability and working experience of the personnel.
According to the requirements of the power grid operation mode, dispatching plan, and equipment maintenance regulations, maintenance plan preparation personnel prepares maintenance plan to meet the safe and reliable operation of power grid. However, with the development of the power grid, some problems are gradually exposed in the manual maintenance plan; reliability and economy cannot be guaranteed; many factors need to be considered in the maintenance plan formulation, such as heavy workload and low efficiency; manual planning is not easy to analyze data statistics; the quality of maintenance plan is affected by the technical ability and work experience of professional personnel.
Maintenance plan optimization is a multistage dynamic planning process [2]. e solving algorithm for optimization models includes mathematical optimization method and intelligent optimization method.
ere are two kinds of optimization models: single objective function model and multiobjective function model. In the economic single objective function model, for example, literature [3] maximizes demand reliability by minimizing total square sum of reserves (SSR) and power production cost (mainly fuel cost) and uses the multiobjective simulated annealing method based on dominance to determine the compromise solution of the model. In [4], the optimization model is established based on the index of network loss, aiming at the minimum of monthly network loss. In the reliability single objective function model, for example, literature [5] puts forward an optimization model of distribution equipment maintenance based on risk assessment, which takes the minimum risk of power grid operation as the optimization objective and uses the particle swarm optimization algorithm to solve it. Based on the risk assessment, literature [6] puts forward the coordinated maintenance strategy for UHV receiving power equipment, which provides reference for ensuring the normal operation of UHV. In [7], based on the equipment status and the risk of power grid loss, the optimization model of power distribution equipment maintenance plan is established. In [8], a maintenance rate optimization model of power equipment asset management is proposed, which uses Markov process and explicitly considers the aging of equipment. In the literature, the algorithm used to solve the problem stays in single objective optimization but does not achieve multiobjective optimization; the proposed optimization algorithm only optimizes the single objective and does not coordinate the economy and reliability of the maintenance plan.
In [9], an improved multiobjective differential evolution algorithm for adaptive optimization of control parameters to solve high-speed rail line planning is proposed, as well as a heuristic algorithm to obtain a better initial solution. In [10], according to application of multiagent reinforcement learning to guide multi workflow scheduling on service cloud, an optimization algorithm is proposed, which takes the number of workflow applications and heterogeneous virtual machines as state input and takes the maximum completion time and cost as rewards. In [11], based on the radial basis function neural network approximation model, a fast multiobjective optimization algorithm for contactor characteristics is proposed, which can speed up the convergence and get optimal solution. In [12], a multiobjective optimization method based on sensitivity is proposed to optimal configuration of thyristor-controlled series capacitors in the transmission network, which can reduce the total reactive power loss and reduce the load capacity of the transmission line. In reference [13], a coordinated stochastic scheduling model based on the multiobjective optimization method is proposed to improve the absorption capacity of wind power on the premise of energy saving and emission reduction. e application of deep learning and reinforcement learning in multiobjective optimization provides a new idea for decision-making optimization of power equipment maintenance plan. e development and application of AI [9], big data [10], parallel computing, and optimization algorithm and theory [14,15] create favorable conditions for the optimization of power equipment maintenance plan. Deep learning [16] (DL) is composed of multilayer nonlinear units, which can automatically learn abstract features from training data; reinforcement learning (RL) [17] has strong decisionmaking ability, high adaptability, and advantages of decision optimization; deep reinforcement learning (DRL) [18] uses the high-dimensional feature extraction ability of DL and the decision-making ability of RL to solve the decision-making problems in high-dimensional state space and high-dimensional action space [19][20][21]. e application of the deep reinforcement learning method in the power system is mainly in intelligent power generation control, power grid intelligent control, and other fields [1,22,23].
Deep reinforcement learning is an end-to-end perception and decision-making system, which has universality. e process of perceptual learning and decisionmaking optimization is as follows: the agent interacts with the environment to get a high-dimensional observation state s, and uses the deep learning method to get specific feature representation; based on the reward evaluation value function, the current state is mapped to the corresponding action by decision; the environment reacts to the action and gets the next observation. Finally, the optimal decision of the target can be obtained by continuously cycling the above process.
In this paper, a multiobjective mathematical model of power equipment maintenance optimization considering reliability and economy is established, and a kind of deep distributed Q-network is adopted (Q-networks, DDRQN) to realize multiagent deep reinforcement learning solution, to make full use of DRL's optimization ability and decisionmaking ability, and to realize the intelligent arrangement of power equipment maintenance plan. e comparison experiment has processed the new method and single agent deep reinforcement learning algorithm as well as the particle swarm intelligence algorithm for the deep cyclic Q-network.
e results show that the proposed algorithm has higher reliability, lower maintenance cost, and more reasonable maintenance plan.

Objective Function.
Power equipment maintenance plan optimization is a multiobjective and multiconstraint optimization problem. e objective function can be divided into three categories: reliability objective function, economy objective function, and practicability objective function [24]. e reliability objective functions include loss of load probability (LOLP), minimum load loss due to power failure, maximum system reliability index, and minimum expected energy not supplied (Expected Energy Not Supplied, EENS). e economic objective function includes maintenance cost and outage loss. e practical goal is to put forward due maintenance, reduce maintenance outage, and even distribution of maintenance workload from the practical point of view.
EENS is defined as the sum of the power loss caused by the outage of the power equipment. It evaluates the power supply reliability reduction caused by the outage of the equipment. e objective function expression is as follows: where t refers to the number of maintenance periods, S t refers to a collection of states, x � (x 1 , x 2 , . . . , x n ) means the state vector of the equipment, C x refers to the load shedding under the fault state, M refers to the number of equipment, x i � 1 refers to the shutdown state, x i � 0 means the equipment is in the running state, P i refers to the equipment outage probability, T t refers to the number of unit hours, and the unit is MW·H. e maintenance cost of the equipment refers to the expenses incurred by the maintenance of the equipment during the maintenance period.
e objective function expression is as follows: where N is the total number of equipment to be overhauled; p t i represents the unit time cost; z t i represents the maintenance team arranged for the equipment i in the t period; u t i � 1 represents the outage maintenance of the equipment i in the t period, u t i � 0 represents the normal operation, and unit is 10000 yuan.

Time Constraints.
Any maintenance work shall be completed on time.
where u t i refers to the equipment i maintenance status, u t i � 1 means the equipment i is powered off for maintenance in the t period, u t i � 0 means the equipment i is in normal operation for maintenance in the t period, s i means the equipment i starts maintenance in the s i period, m i is the maintenance period, e i means the earliest time to start maintenance, and l i means the latest time to start maintenance.

Maintenance Resource Constraints.
In a maintenance cycle, the number of equipment that can be maintained at the same time is limited.
where Z max indicates the upper limit of maintenance workload in the t period.

Simultaneous Maintenance Constraints.
Equipment maintenance shall avoid repeated power failure at the same load point, and all problems shall be solved during power failure maintenance.
where s k is the time for equipment k to start maintenance and s i is the time for equipment i to start maintenance.

Mutually Exclusive Maintenance Constraints.
To prevent the expansion of power outage, some power equipment cannot be arranged for maintenance at the same time.
where s j is the time when equipment J starts maintenance; s i is the time when equipment i starts maintenance; m i is the maintenance period of equipment i.

Security Constraints.
In order to ensure the safe and stable operation of power grid, the maintenance plan must be checked by power flow calculation.
where I i, max , V i, max , V i, min , and P i, max are, respectively, expressed as the upper and lower limit of current flow, the upper and lower limit of node voltage, and the allowable power limit. I i is the current flowing through equipment I, V i is the node voltage amplitude, and P i is the active power passing through the device.

Optimization Model.
In this paper, the expected value of energy shortage (EENS) is taken as the reliability evaluation index, and the maintenance cost is taken as the economic evaluation index of maintenance plan.
min f i (X), where f i (X) represents the number i objective function of power equipment maintenance plan, X is the n-dimensional decision vector, m means number of objective function, g i (X) and h j (X), respectively, represents the equality constraint function and inequality constraint function, p is the number of equality constrained functions, and q is the number of inequality constrained function.
In most cases, the economic objective function and reliability objective function are in conflict with each other. e improvement of one objective may lead to the performance degradation of the other, and it is impossible to make both objectives reach the optimal solution at the same time. It can only be dealt with in the middle of them to make each subobjective reach the optimization as much as possible. e solution of a multiobjective optimization problem is usually a set of solutions that meet the conditions. For the solution set, we can find a set of solutions that are as close to the best as possible.

DDRQN Algorithm.
For complex decision-making problems, the decision-making ability of the single agent system is still far from enough. e multiobjective power equipment maintenance planning model established in this paper is competitive or cooperative in economy and reliability. erefore, under certain conditions, the DRL algorithm needs to be extended to a multiagent system in which multiple agents cooperate or compete with each other. DDRQN allocates the deep recurrent q-networks (DRQN) training module for each agent to build a multiagent system. Its function expression is as follows: where y m t is the target Q-value function: where Q(o m t , h m t−1 , m, a m t−1 , a m t ; θ i ) is expressed as Q function, o m t is the observation of agent with number m at time t, h m t−1 represents the long-and short-term memory hidden layer state of agent number m at time t − 1, a m t means the action corresponding to the current Q-value function of the agent number m, θ i is the network weight corresponding to the agent number m in the i round iteration, θ − i is the target network weight corresponding to the agent number m in the i round iteration, m is the number of the agent currently processed, a m t−1 is a part of the state action history sequence, c is the discount factor, and s t is the status of time t. R t is the reward function. θ i and θ − i are expressed as follows: where ∇θ is the gradient value, α is the learning rate, and α − is the target learning rate.

Transforming Optimization Objectives into Deep Reinforcement Learning Tasks.
A deep reinforcement learning problem must have two conditions: agent and environment. Its learning and decision-making process is as follows: according to agent strategy and environment information, the agent makes corresponding actions to make the reward optimal [25,26]. Deep reinforcement learning has been successful in other areas, but the decision-making optimization of power equipment maintenance plan is not the same as other environments. erefore, the key to its successful application is to transform the decision-making optimization problem of power equipment maintenance plan into deep reinforcement learning task. e decision-making and optimization of power equipment maintenance plan can be used as a dynamic environment, and the equipment to be overhauled can be used as the a t of an agent. For any t time, the power generation and load of the system can be used as a state, and s t is defined as follows: Under the action of a t , the reward of environmental feedback can be determined by optimization objective (1). e decision-making optimization problem of power equipment maintenance plan is transformed into solving the optimal solution of the equation set. en, Build up an action A for the power equipment which need to be overhauled. After that, calculate and get the state s t and reward a t of feedback, when the action value is r t . e memory unit is shown as d t � a t , s t , r t . By calculating the loss error, the agent learns the strategy corresponding to the power equipment maintenance problem, namely, the maintenance plan. e solution expression is as follows: e agent learning process is shown in Figure 1.

Solution Flow.
e decision-making and optimization process of the power equipment maintenance plan is shown in Figure 2, which is mainly divided into three parts: data preparation, algorithm solution, and safety check.
Data preparation: acquisition of initial data, including the list of equipment to be overhauled and the date of overhauling, the overhauling plan arranged by the superior, the remaining overhauling plan, and the data of power grid operation mode; establishment of the optimization model of power equipment overhauling plan based on the acquired initial data according to expression (15); adjustment of power grid operation mode based on the scheduled overhauling plan, and adjustment of the adjusted operation mode e mode of safety check is adopted to ensure that the power grid operation is safe and stable at this time. If the safety and stability conditions are not met, the alarm information will be output.
Algorithm solution: DDRQN algorithm solution process is used according to the current power grid operation state to get the power generation P G and load of the system P Load so as to get the initial state s t ; at this time, the power grid is safe and stable operation state; build action set A with the power equipment to be overhauled; in the process of performing an action a t exploration, get the state s t+1 and reward r t+1 so as to calculate the target Q-value function y m t , Q-value function, and loss error L m t . rough the gradient descent algorithm, all parameters of the DDRQN network are updated; through one-time solution process, under the condition of equipment I maintenance, EENS and maintenance cost of power grid at this time are obtained; based on a certain strategy, under the condition of state s t+1 , the next exploration action a t+1 is made to start the next solution process, and the solution is not terminated until all equipment to be overhauled are arranged and the arrangement results meet the constraint conditions. At this time, the strategy is the power equipment maintenance plan.
Safety check: take the maintenance strategy as the section to calculate the power flow and carry out safety check. If the constraint conditions (7), (8), and (9) for safe operation of the power grid are not met, update the parameters a t , s t , and r t of the DDRQN algorithm and solve it again. Carry out n − 1 check for the maintenance plan and output the alarm prompt information if the condition for stable operation of the power grid is not met. After a series of iterative solutions, the optimal maintenance strategy is obtained.

Case Analysis
In order to verify the distributed deep cycle Q network to achieve the decision-making optimization of power equipment maintenance plan, this paper uses the IEEE-118 node system to verify its network topology as shown in Figure 3, including 54 generators and 132 lines. e verification environment is based on MATLAB R2016 with Intel Core i7-8550U CPU and 8GB RAM computer.   (10000 yuan). Table 1 shows the solution obtained by 100 iterations of the optimization algorithm. As can be seen from Table 1, the results of the optimization algorithm proposed in this paper are better than those of the other two methods because EENS and maintenance costs are less than the other two methods.
It can be seen from the curve in Figure 4 that the DDRQN algorithm proposed in this paper is superior to the PSO algorithm and DRQN algorithm in terms of algorithm convergence performance; the maintenance scheme obtained by DDRQN algorithm, DRQN algorithm, and PSO algorithm, respectively, is superior to the other two algorithms in terms of reliability and economy. e proposed algorithm can effectively obtain the optimal cooperative control and the quality of decision-making in the multiagent system by using its strong comprehensive decision-making ability and coordination interaction ability. erefore, the overall performance has been greatly improved.

Conclusion
In this paper, the maintenance plan optimization model of power equipment is established. e reliability evaluation index EENS is the expected value of electric energy shortage, and the maintenance cost is the economic evaluation index of the maintenance plan. e reliability and economy of power grid operation are fully considered. In this paper, a novel distributed deep loop Q network multiagent deep reinforcement learning algorithm is proposed to obtain the optimal decision of power equipment maintenance plan and improve the accuracy and intelligence level of making power equipment maintenance plan. e DDRQN algorithm uses its own comprehensive decision-making ability and coordination interaction ability to get the optimal decision among multiagent. rough case analysis, the results show that the DDRQN algorithm has strong learning ability, high adaptability, strong comprehensive decision-making ability, and better optimization performance; it can reasonably plan the maintenance plan, and the maintenance plan obtained fully takes into account the economy and reliability. e practical guiding significance of this algorithm is to realize the accuracy and automation of scheduling in the scheduling of power grid maintenance plan.
Data Availability e processed data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Mathematical Problems in Engineering 7