Fault Diagnosis of Data-Driven Photovoltaic Power Generation System Based on Deep Reinforcement Learning

Aiming at the problem of fault diagnosis of the photovoltaic power generation system, this paper proposes a photovoltaic power generation system fault diagnosis method based on deep reinforcement learning. )is method takes data-driven as the starting point. Firstly, the compressed sensing algorithm is used to fill the missing photovoltaic data and then state, action, strategy, and return functions from the environment. Based on the interaction rules and other factors, the fault diagnosis model of the photovoltaic power generation system is established, and the deep neural network is used to approximate the decision network to find the optimal strategy, so as to realize the fault diagnosis of the photovoltaic power generation system. Finally, the effectiveness and accuracy of the method are verified by simulation. )e simulation results show that this method can accurately diagnose the fault types of the photovoltaic power generation system, which is of great significance to enhance the security of the photovoltaic power generation system and improve the intelligent operation and maintenance level of the photovoltaic power generation system.


Introduction
With the continuous advancement of energy transformation, the proportion of clean energy in the energy supply is increasing year by year. At present, the development of photovoltaic power generation technology has been relatively mature and has been more and more widely used at home and abroad. Statistics show that, by the end of 2020, the cumulative installed capacity of photovoltaic power generation in China has reached 204.3 million kW, and the total annual photovoltaic power generation has reached 224.3 billion kWh [1]. Because solar energy is intermittent energy, in order to ensure the normal operation of the photovoltaic system and reduce the life reduction and power loss caused by faults, the research on accurate and fast photovoltaic fault diagnosis method is of great significance.
With the development of artificial intelligence technology, there are various fault diagnosis methods based on intelligent algorithms. e neural network method proposed in [2] can judge the existence of short-circuit fault after learning by establishing several neural network structures. e fuzzy algorithm proposed in [3] estimates the output power value under normal conditions and then compares the value with the real-time measured value. If the difference between the two is greater than the set threshold, it is proved that there is a fault. References [4,5] proposed a photovoltaic array fault detection method based on pattern recognition.
is method obtains appropriate fault characteristic parameters through signal decomposition technology and then uses a fuzzy inference system to judge whether the photovoltaic array has a fault. is method needs to formulate fuzzy rules in advance, and the formulation of fuzzy rules often depends on experience or experts in this field, so it is difficult to obtain fuzzy rules. Reference [6] proposed a fault diagnosis method of photovoltaic power generation system based on BP neural network, which has strong adaptive nonlinear pattern recognition ability and is suitable for multifault complex systems. Reference [7] proposes a multiclassification supported axis to diagnose the faults between neutral lines and equipment faults of photovoltaic cells. Reference [8] proposed a graph-based semisupervised detection method for fault diagnosis of short circuits, open circuits, and line to line faults. Reference [9] proposes a method of applying investigation to solve photovoltaic fault, but this method needs to obtain the data of fault data set in advance. Reference [10] proposes a photovoltaic array fault diagnosis method based on long-term and short-term memory neural network (LSTM). is method establishes the LSTM neural network fault diagnosis model and trains the model by collecting the characteristic parameters of the photovoltaic array under different fault conditions as training samples.
e above literature provides a good reference for the fault diagnosis research of the photovoltaic power generation system, but most of the above research depends on specific algorithm models, with low monitoring accuracy and lack of self-learning of diagnosis methods [11].
In order to accurately diagnose the fault types of the photovoltaic power generation system, a photovoltaic power generation system fault diagnosis method based on deep reinforcement learning is proposed in this paper. Firstly, for the photovoltaic power generation system data summarized by the operation and maintenance platform, the compressed sensing algorithm is used to fill in the missing data, then the enhanced learning algorithm is used to establish the fault diagnosis model of the photovoltaic power generation system, and the deep neural network is used to approximate the decision network to find the optimal strategy, so as to realize the fault diagnosis of the photovoltaic power generation system. Finally, the feasibility and accuracy of the proposed method are verified by simulation experiments.

Reinforcement Learning.
Reinforcement learning is a branch of machine learning, which is mainly used to learn control strategies. Its learning process is similar to the process of living organisms getting along with the external environment, which is in line with human behavioral psychology. e model of reinforcement learning is shown in Figure 1. e brain represents the agent, and the Earth represents the environment. e agent continuously interacts with the environment for learning, that is, the process of reinforcement learning [12].
As can be seen from Figure 1, the interaction between the agent and the environment will produce a time series s 1 , a 1 , r 1 , s 2 , a 2 , r 2 , . . . , s t , a t , r t composed of state, action, and return. Based on the premise of time series and certainty, reinforcement learning can be regarded as a Markov decision-making process. Markov decision process is usually defined by five tuples: 〈S, A, P a (s t , s t+1 ), r(s t , a t ), c〉.
(1) S represents the state space, which is the external environment that the agent can perceive. (2) A represents the action space that the agent can choose. In each state, the agent selects a behavior action to feedback to the environment according to the strategy. (3) P a (s t , s t+1 ) represents the state transition probability of the environment. See formula (1). P(s t+1 |s t , a) represents the probability that the environment reaches state st + 1 after deciding to take action a in state st. At this time, state st + 1 is only related to st and action a and has nothing to do with all states before time t.
P a s t , s t+1 � P s t+1 |s t , s � P s t+1 |s t , s t · s t−1 , . . . , s 0 , a . (1) (4) r(st, at) refers to the return of the agent to implement the action at through decision-making when the agent is in the state st. (5) c represents the discount factor. c ∈ [0, 1]; the discount factor is the important parameter that determines each return.
In reinforcement learning, two important state value functions are defined to describe the importance of state and value, respectively, as shown in the following equations: rough the interactive process of enhanced learning, it is finally required to find the optimal strategy π * (s) that can maximize the benefits of the agent: e state value function is an iterative expression, which meets the requirements of the Behrman equation, so it can be solved by the iterative method. When the transition probability between states is known, the value iteration method is adopted, that is, the state value function is updated through the iterative method, and the adopted strategy is changed according to its value, and the final convergence result is the optimal state value function. e main content of the Q (Q-learning) learning algorithm is to calculate the maximum value function of state and behavior, update it by using the past and recent weight average, and then solve it by using the optimal action state value function to obtain the optimal state value function, to obtain the optimal learning strategy [13,14]. As shown in the following equation, e deep neural network is introduced into the Q-learning algorithm, which is called DQN (Deep Q-learning Net) algorithm [15]. Neural network training samples need to have labels, and reinforcement learning is a type of learning without direct labels. erefore, the target Q value is used as the training label, and the purpose of training is to make the Q value close to the target Q value. e calculation of the target Q value is as the formula in step 11 of Algorithm 1 and then makes a difference with the output Q(ϕ j , α j ; θ) of the current network. e parameters of the neural network are updated by the method of backpropagation gradient descent until the Q network converges [16]. e implementation of the DQN algorithm involves the experience playback mechanism; that is, the information of each interaction is stored. During training, a sample is randomly selected from the experience pool for training, which can maintain independent and identically distributed among samples and eliminate the correlation between samples [17].
In the DQN algorithm, the Q-learning algorithm and deep learning network are trained at the same time. A large number of training samples are obtained through Qlearning, and then the neural network is trained. e key lies in the label (i.e., target Q value).

Compressed Sensing Algorithm.
A compressed sensing algorithm is an algorithm that compresses the signal at a very high compression rate and reconstructs and recovers the compressed signal after transmission [18]. is algorithm can change the asymmetry of the signal in the process of acquisition, transmission, and processing. e signal acquisition is generally carried out by using sensor devices. Generally, these devices have poor storage endurance and do not support complex processing such as collecting a large amount of data and compressing the data. After transmission to computers and other devices with strong computing power, the computer only needs to do some simple decompression; this asymmetry brings great pressure to the sensor acquisition equipment [19]. e compressed sensing technology completes the compression in the sampling process, so it only needs to collect a small amount of data and use the computer to process a large amount of reconstruction calculation. erefore, the compressed sensing algorithm is widely used in signal processing and so on.
Compressed sensing algorithm: one is the sparse representation of the signal. For a signal X ∈ R N×1 , we select a group of orthogonal transformation basis Ψ to sparse decompose the signal to obtain a group of sparse signals. Second, the observation matrix is designed to observe the signal. e observation matrix is required to be uncorrelated with the sparse orthogonal transformation basis [20,21], and an observation matrix with the size M × N is selected Փ. e sparse representation of the original signal S is projected into M dimensionality reduction vectors Y � ՓS, where Φ ∈ R M×N . e third is signal reconstruction. e process of signal reconstruction is the process of finding the optimal solution under constraints. e reconstruction algorithm is equivalent to the following mathematical programming problem.
Objective function: min‖S‖. Constraints: ϕψ T X � Y. e flow chart of the compressed sensing algorithm is shown in Figure 2.
Compressed sensing is that the data is incompletely sampled when it is less than Nyquist sampling law, and then the original signal is reconstructed, which is very similar to the partial loss of photovoltaic monitoring data. erefore, the compressed sensing algorithm is used for photovoltaic missing data reconstruction. In the photovoltaic monitoring signal, the same physical quantity is sampled in adjacent periods. e change between the two sampling values is very small and smooth. After sparse transformation, it has the characteristics of a sparse signal. e compressed sensing algorithm is used for data filling, and finally, the reconstructed signal is used for filling. Secondly, in the design of the observation matrix, the observation matrix is designed according to the location of the missing data, so that the sparse representation basis of the observation matrix has little correlation. e signal reconstruction process uses the orthogonal matching pursuit algorithm, which can reconstruct the signal with high quality.

Photovoltaic Data
Filling. e photovoltaic monitoring system is an important part of the photovoltaic power generation system, which can collect a large amount of data.
rough the extraction and analysis of the collected massive data, much valuable information is obtained, which plays a positive role in improving the power generation efficiency of the photovoltaic power generation system and power station operation and maintenance. However, in practice, the collected data are missing due to various reasons (such as transmission fault, sensor fault, etc.), and these missing data may have a great impact on the analysis and mining of later photovoltaic data and the fault diagnosis of the photovoltaic power generation system. In serious cases, it may lead to the direct failure of the fault diagnosis model of the photovoltaic power generation system. Compared with statistical and intelligent algorithms, this paper uses a compressed sensing Mathematical Problems in Engineering 3 algorithm to fill in the missing photovoltaic data. Compared with statistical and intelligent algorithms, this paper uses a compressed sensing algorithm to fill in the missing photovoltaic data. e process is as follows: (1) Suppose a monitoring signal sampled at a certain time is X ∈ R M×1 , in which there is some missing data. After the missing data is supplemented with zero, the signal X ′ ∈ R N×1 is obtained again; that is,N − M data is missing. (2) e photovoltaic monitoring data X ′ ∈ R N×1 is calculated by using a matrix Ω 1 to obtain Ω 1 X ′ . e obtained signal X ″ ∈ R N×1 is sparsely represented by discrete cosine transform; that is S � Ψ T X ″ .
(3) Design the observation matrix. By deleting the missing data in the unit matrix I of N × N relative to the signal X′, an observation matrix Ω 2 of M × N can be obtained. Observe the signal S after the observation matrix Ω 2 is sparsely represented to obtain Y � Ω 2 S. (4) Reconstruct the signal. e reconstruction algorithm is equivalent to the following mathematical programming problem.
Objective function: min‖S‖, constraint condition: ϕψ T X � Y. e problem is solved by an orthogonal matching pursuit algorithm. (5) e filling value of missing data can be obtained by inverse discrete cosine transform of the obtained signal X ″ . (6) Calculate the mean square error.

Diagnostic
Model. e fault diagnosis model of the photovoltaic power generation system is established based on the DQN algorithm. Figure 3 is the schematic diagram of fault diagnosis of photovoltaic power generation system based on DQN algorithm. e modeling process is as follows.

Diagnostic Tasks and Interaction Rules.
e diagnosis task is constructed as a continuous decision-making process of the agent: the agent successively diagnoses the fault of each training sample in the environment, uses the reward to guide the agent to carry out training and learning, and gives the corresponding reward according to a certain reward principle. e training goal is to maximize the cumulative return of agents in diagnostic tasks.
In the fault diagnosis task of the photovoltaic power generation system, to guide the agent to learn the fault diagnosis strategy, the interaction rules between the agent and the environment are formulated: determine a corresponding return according to the distribution of each category. e principle is that if the agent correctly diagnoses the fault type in the sample, it will give the agent a positive return, and if the agent diagnoses the fault, it will give a negative return; that is, it needs to be deducted from the reward.
In reinforcement learning, the agent is allowed to interact with the environment continuously, record each interaction completely, and then store it in the experience pool. e subsequent learning is to continuously sample and train from the experience pool. Each training process starts from the first sample and ends when the most common fault type in the sample is diagnosed incorrectly. is process is called a plot.

Simulation Environment.
e environmental state is an important element in the reinforcement learning model. In the fault diagnosis of the photovoltaic power generation system, because the fault diagnosis of photovoltaic power generation system mainly depends on the data at a certain time, the collection of photovoltaic monitoring data collected at a certain time is regarded as a state, and the data at each time represents a state.

Action Space.
e action space of the agent corresponds to the label of the sample (i.e., fault type). ere are as many actions as there are fault types for the agent to select during fault diagnosis. Here, the fault types are numbered with Arabic numerals.

Return Function.
In the training process, the value of agent action is evaluated by the return function. If the fault distribution is balanced, all samples shall be treated equally. However, due to the unbalanced distribution of photovoltaic power generation system faults, the fault distribution of photovoltaic power generation system equipment of different regions and manufacturers is also different. To better guide learning and training, the return after each fault diagnosis shall be given according to the actual distribution of various faults in the power plant. If the agent correctly diagnoses many faults in the sample, it will give a relatively small positive return. If the agent correctly diagnoses a few faults in the sample, it will give a relatively large positive return. On the contrary, if the agent makes a relatively negative return for this kind of fault diagnosis error with few samples, if the agent makes a diagnosis error for many common faults in the sample, it indicates that the agent has not learned experience and knowledge at all, so it is not necessary to continue, and the current round of diagnosis process should be terminated immediately.
In the fault diagnosis of the photovoltaic power generation system, it is assumed that there are n fault types, the label is defined as k, the training sample set of photovoltaic power generation system fault type is D k , |D k | is the number of training samples of label k, and the imbalance proportion of category k is defined as ρ(k), as shown in formula (7). Take all the training samples of the most unbalanced n categories as D S , and the return function is formula (8). When the agent classifies the samples in D S incorrectly, the current classification task will be terminated.

Classification Task Termination Condition.
For the problem of fault diagnosis of the photovoltaic power generation system, when the agent diagnoses the fault of the sample with the largest number of samples, this scenario ends, and the score of the agent in this scenario is cleared. If the previous situation does not occur, but the agent completes the fault diagnosis of all samples, it will reset the agent's cumulative return and start a new round of tasks.

Tactics.
In the training stage, to enable agents to fully learn knowledge and experience, they began to focus on exploration, followed by utilization, so linear annealing greedy strategy is used [22]. e purpose of the test phase is mainly to detect the learning situation of the agent, mainly for utilization. erefore, the greedy strategy is used; that is, we select an action with the largest Q value every time:

Training Objectives.
Deep reinforcement learning is applied to the fault diagnosis of photovoltaic power generation systems. A large number of training samples are learned through a data-driven method, and the ultimate goal is to correctly diagnose the fault types. Because the DQN algorithm uses an empirical playback mechanism, it is necessary to use a submechanism to train samples when designing a photovoltaic power generation system fault diagnosis model based on deep reinforcement learning. Store the information 〈s, a, r, s ′ , terminal〉 of each interaction in the experience pool, and then, randomly sample it to train the Q network. e specific process is shown in Figure 3. When the depth neural network is used to fit the Q function, the actual Q value of the target state s is the output value of the current Q network, and the target Q value is recorded as y, which is determined by the progress of the classification task, as shown in equation (8).
Taking the target Q value as the label of deep neural network training, the loss function of Q network training is L(θ k ), as shown in equation (11). According to equation (11), the parameters of the neural network are updated by the gradient descent method through backpropagation until convergence, and the Q function is obtained.

Evaluation Index.
In this experiment, firstly, the fault distribution of the photovoltaic power generation system is counted according to the obtained photovoltaic monitoring historical data, and then, the obtained photovoltaic monitoring data are sampled according to a certain proportion to simulate the fault distribution of other photovoltaic power generation systems to verify the effectiveness of the model, to explore the influence of fault distribution on the effect of photovoltaic power generation system fault diagnosis model based on deep reinforcement learning. In this paper, G − mean total is taken as the evaluation index of fault diagnosis [23]. Two categories c i and c j are selected from the fault types of photovoltaic power generation system to calculate the G − mean(c i , c j ) index of the two fault diagnosis results, and then, all G − mean values are weighted and summed [24,25]. e calculation formulas are shown in equations (13) and (14), respectively.
In equation (13), TP refers to the number of correct diagnoses of most samples, TN refers to the number of correct diagnoses of a few samples, FP refers to the number of diagnostic errors of most samples, and FN refers to the number of diagnostic errors of a few samples.
At the same time, accuracy and G − mean total index are used as evaluation indexes.

Data and Parameter
Design. Based on the historical data of a photovoltaic power station, fifty thousand groups of daytime photovoltaic power station operation monitoring data are selected and recorded as PV monitoring data set. e data set is shown in Table 1. e collected monitoring information mainly includes meteorological environment Mathematical Problems in Engineering information, photovoltaic array information, combiner box information, photovoltaic inverter DC and AC side information, and grid connection information. e amount of information related to photovoltaic power generation system fault diagnosis is selected for photovoltaic power generation system fault diagnosis.
is paper mainly focuses on the five common fault types in Table 2. ere are 6 operating states, including 5 fault states and one normal operating state. Each group of monitoring data has only one operation state.
ere are thirty thousand groups of normal operation state data, four thousand groups of data for fault 1, four thousand groups of data for fault 2, four thousand groups of data for fault 3, four thousand groups of data for fault 4, and four thousand groups of data for fault 5.
Label the six operating states, respectively, normal operation (label 0), fault 1 (label 1), fault 2 (label 2), fault 3 (label 3), fault 4 (label 4), and fault 5 (label 5). e simulation experiment in this paper is based on the photovoltaic power station data set. To simulate the distribution of faults of different photovoltaic power generation systems and study the impact of the distribution of fault samples on the experimental results, the obtained photovoltaic monitoring data set is selected from the samples labeled 0-5 according to different methods, as follows:  (1) All parameters w of the Q network are initialized randomly with the corresponding value Q (2) Clear set D of experience playback (3) for episode � 1, M do (4) Initialization status S 1 � x 1 , then get the eigenvector ϕ 1 � ϕ(s 1 ) (5) for t � 1, T do (6) Use ε−greedy selection actiona t � π ε (ϕ(s t )) (7) Execute action a t to get return value r t and next state s t+1 � x t+1 (8) Sets s t+1 � x t+1 and gets ϕ t+1 � ϕ s t+1 (9) Stores (ϕ t , a t , r t , ϕ t+1 ) back to experience pool (10) Randomly collect a sample (ϕ t , a t , r t , ϕ t+1 ) from the experience pool (11) Update: (12) Perform gradient descent steps:(y j − Q(ϕ j , a j ; θ)) 2 (13) End ALGORITHM 1: DQN algorithm.
(1) e distribution of various faults in the original sample is shown in Figure 4, and this data set is recorded as DS0. (2) 4000 samples are taken from label 0, and all other labels are taken. e fault distribution data of the photovoltaic power generation system obtained after sampling is shown in Figure 5. At this time, the number of samples of various tag types is equal and the distribution is balanced. is data set is recorded as DS1. selected according to 50%. e fault distribution data of the photovoltaic power generation system obtained after sampling is shown in Figure 6. is data set is recorded as DS2. (4) Label 0, label 2, and label 4 are sampled by 50%. e fault distribution data of the photovoltaic power generation system obtained after sampling is shown in Figure 7. is data set is recorded as DS3.
Due to the complexity of the photovoltaic power generation system, there are many related physical quantities to be monitored, and the units of each physical quantity are also different. During data analysis, the number size problem caused by the problem of each physical quantity unit may occur, which may have an impact on the analysis. erefore, these data need to be dedimensioned before data analysis.
For the PV monitoring data set, firstly, convolution neural network CNN is used to extract the features of normalized signals. Four convolution layers with a convolution kernel size of three are used, and then, the fault of the photovoltaic power generation system is diagnosed through a fully connected neural network. Because deep reinforcement learning is a fitting regression model, the output layer cannot use the activation function when using the neural network, so the fully connected result is directly used as the output result of the neural network.
In training, the input of the neural network is the number of system states. e number of output neurons is the number of fault types. e activation functions used by all neurons are ReLU, and the loss function is the mean square error. e Adam optimizer is used for model training.
e network learning rate is 0.00025 and the discount rate of immediate return is 0.99. When using the DQN algorithm and linear annealing strategy, we set ε start with one. e imbalance rate and return function of the three extracted data sets are calculated for use in the experiment. e details are as follows: (1) e unbalance rate and return function of the original sample DS0 data set are shown in Table 3. (2) For the DS1 data set with four thousand samples taken from tag 0 and all other tags, the imbalance rate and return function are shown in Table 4. (3) For the DS2 data set extracted according to 50% for labels 1, 3, and 5, the imbalance rate and return function are shown in Table 5. (4) For the DS3 data set extracted according to 50% for labels 0, 2, and 4, the imbalance rate and return function are shown in Table 6.

Result
Analysis. e imbalance rates of the four experimental data set DS0, DS1, DS2, and DS3 after processing are different, and the return function in training is also different. Among them, data set DS1 is a balanced data set, the number of other samples is equal, and other data sets can be compared with data set DS1. Table 7 shows the fault diagnosis accuracy of various data sets under the DQN algorithm. Table 8 shows the G − mean total evaluation indexes of fault diagnosis under the DQN algorithm for different data sets.
It can be seen from Tables 7 and 8 that the fault diagnosis of a photovoltaic power generation system based on a deeply enhanced learning algorithm performs well under four  Inverter open circuit 4 Inverter short circuit 5 Inverter DC protection different distributed data sets, which shows that it is feasible to introduce a deeply enhanced learning algorithm into the fault diagnosis of the photovoltaic power generation system, and reflects that this method can be applied to different photovoltaic power generation systems in different regions. In addition, it can be seen from Tables 7 and 8 that the number of samples of each fault type of data set DS1 is the same, the distribution between samples is balanced, and the return function of each fault type is the same during training. erefore, the fault diagnosis accuracy of the data set reaches 96.6% in the DQN algorithm. rough the comparative experiment of different distributed data sets under the same algorithm, it can be seen that the actual effect of photovoltaic power generation system fault diagnosis is related to the distribution of various faults and the balance rate between faults. For the fault diagnosis of different photovoltaic power    generation systems, the return function should be designed according to the distribution of various faults of photovoltaic power generation systems.
In order to verify that the fault diagnosis method proposed in this paper has high accuracy, the model in this paper is simulated and compared with cascade random forest [26] and BP neural network [27]. One thousand groups of labeled data are used as training samples, and seven hundred and fifty groups of data are randomly selected as test samples. Fault samples account for 20% and 4% from fault 1 to fault 5; that is, 30 data are randomly selected for each fault type as markers for simulation experiments.
It can be seen from Table 9 that the accuracy of the DQN algorithm is 96.6%, that of the cascade random forest model is 89.63%, and that of the BP neural network model is 88.00%. erefore, under the same sample size, the DQN algorithm has higher accuracy than cascade random forest and BP neural network.

Conclusion
Based on the operation and maintenance data of a photovoltaic power station, in order to realize the accurate fault diagnosis of a photovoltaic power generation system, a datadriven photovoltaic power generation system fault diagnosis method based on deep reinforcement learning is proposed. It is verified and analyzed by simulation, and the following conclusions are drawn: (1) rough the simulation of different distributed data sets under the DQN algorithm, it is concluded that the actual effect of fault diagnosis of photovoltaic power generation system is related to the distribution of various faults and the balance rate between faults.       (2) e accuracy of the photovoltaic power generation system fault diagnosis model based on deep reinforcement learning is 96.60%. Under the same sample size, the proposed method can effectively judge the fault type of the photovoltaic power generation system and has higher accuracy than other diagnostic methods.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.