Application of a Hybrid Model of Big Data and BP Network on Fault Diagnosis Strategy for Microgrid

Aiming at the characteristics of timely transmission, rapid update, and large magnitude of microgrid data, based on the large data samples generated by microgrid operation, a fault diagnosis and analysis method of microgrid systems supported by big data is proposed in this paper. The multisource joint feature vectors of microgrid are extracted using Wavelet transform, Rayleigh entropy, and big data technology, which combine short-circuit current and voltage. The extracted feature dataset is clustered and segmented to realize deep data mining. Combining BP neural network and big data, the fault diagnosis of microgrid is realized. The simulation results show that the BP neural network algorithm based on big data support can accurately identify the type and phase of internal faults in microgrid, which is more suitable for extracting the temporal characteristics of information and spatiotemporal correlation of data to realize the prediction of big data and solve the core problems in the analysis of big data of microgrid faults, and the accuracy is as high as 96.8%.


Introduction
Big data refers to the dataset that cannot be captured, managed, and processed by conventional software tools within a certain period of time. It is a massive and diversified information asset of high growth rate that needs a new processing mode to have stronger decision-making power, insight, and process optimization ability. "Volume, Variety, Value, Velocity, and Veracity" are the "5V" characteristics of big data. Relevant research and surveys have pointed out that the global annual data growth rate is basically twice or even higher than that of the previous year. In the next 10 years, nonstructural data will account for about 90% and the data patterns will be different. It will become impossible to analyze based on previous experience. erefore, it is necessary to study relevant data mining technologies and understand and master the basic "5V" characteristics of big data, which is particularly important for data analysis based on big data. BP neural network can be applied to the prediction of models and the study of the relationship between different models. erefore, the combination of big data technology and BP neural network can deal with large and complex nonlinear structural problems from a statistical point of view, with high stability and accuracy. Big data fault analysis refers to collecting a large amount of data through real-time analysis and mining of microgrid fault information to master the microgrid operation characteristics, accurately predict the microgrid topology behavior, and improve the capabilities of service and risk control. e key to big data is to be able to quickly obtain useful information from a large amount of data or quickly realize big data assets. erefore, the information processing of big data is often based on cloud computing. Cloud computing is the product of a new era. Taking cloud computing as the development strategy, big data is applied to cloud computing to obtain a series of data that can be used for fault analysis, so as to form a big data fault diagnosis model. e reasonable development and utilization of green and clean energy have become an important topic with increasingly prominent energy crisis, environmental pollution, and slow recovery caused by large power grids. Microgrid has been widely used because of its flexible installation locations, less pollution, and high energy efficiency. Its technical problems have also attracted the extensive attention of researchers at home and abroad. One of the important research fields is fault diagnosis technology, which guarantees safe and reliable operation. e common fault types of internal lines in the microgrid are single-phase grounding short circuit, two-phase short circuit, three-phase short circuit, etc. Conventional fault diagnosis methods cannot be directly applied to microgrid and work well due to the difference between current and voltage and bidirectional power flow. It is an urgent problem to find new fault diagnosis and identification methods for microgrid because of the changing characteristics of fault voltage and current.
A lot of research has been carried out and been applied in practice to the fault diagnosis technology of microgrid. e research mainly focuses on the following two aspects: Firstly, fault diagnosis is carried out according to the changes of circuit breakers, the flexible topology, protection elements, and other equipment status in the microgrid. For example, the voltage and bus current signals of the low-voltage circuit breaker are used for fault diagnosis. Using SOM neural network and multiagent systems, the simulation results show that good diagnosis results can be achieved for a single fault, but the situation of multiple faults in the same period needs further study [1,2]. In [3], a Petri net analysis model using line fault protection information is established. e corresponding protection set information needs to be updated while the topology changes without remodeling, which can adapt to the variability of the microgrid topology structure. However, the identification accuracy of fault diagnosis will be affected when the protection device and circuit breaker have such factors as refusing to operate or misoperation.
Secondly, the abnormal change of voltage or current is used to diagnose the fault in the microgrid. e transient recovery performances of microgrid under different control modes are compared and analyzed by studying the change of current amplitude and the transient duration of overvoltage when a short-circuit fault occurs in the microgrid [4,5]. According to the impedance and impedance angle, the symmetrical component method was used to analyze the voltage and current in the microgrid during a fault. However, it is difficult to accurately extract the fault component when the system is unstable, is oscillating, has interference, or has continuous fault. At the same time, its effect is a lack of experimental verification while only theoretical and methodological research is carried out.
e fault voltage and current signal are used to realize fault diagnosis and identification in a combined wavelet packet with a neural network, but the fault phase cannot be accurate to the line [6,7]. e voltage signal of the optical storage microgrid line is analyzed to realize the safe and reliable operation of the optical storage microgrid based on genetic algorithm [8]. Another fault detection scheme based on deep neural networks and wavelet transform for microgrid was proposed in [9]. e authors in [10] employed an approach focusing on identifying and evaluating the faulted line section by implementing data mining and wavelet packet transform. At present, the scale of microgrid data has increased exponentially, and the traditional data processing methods cannot afford to process large-scale data. At the same time, the big data of the microgrid system runs through all nodes of smart grid, which is far beyond the scope of traditional power system monitoring. e current data processing platform is difficult to meet the requirements of smart grid for the processing of power system data. In particular, it is difficult to achieve realtime data, which results in the loss and repetition of power grid data. In the era of big data, how to process microgrid data to analyze and diagnose the working conditions of power equipment is an urgent task for the development of intelligent microgrids. In order to solve the problems of incomplete fault information, single information structure, and imperfect diagnosis results in the existing microgrid fault diagnosis methods, this paper extracts and reconstructs the features of multisource data using BP neural network based on big data technology, obtains the spatiotemporal correlation characteristics between different data, and then realizes fault prediction and diagnosis better. e main contributions of this paper are as follows: First, most of the existing research literature is about the relationship between the amplitude of voltage, current, and fault, but less about the relationship between the time-frequency characteristics of voltage and current and the impact of time-frequency characteristics on fault. In fact, the time-frequency characteristics of voltage and current on the fault will have a greater impact. Secondly, this paper not only analyzes the relationship between voltage and current but also studies the regulatory role of voltage, current, and fault. ird, this paper combines BP neural network and big data to predict the spatiotemporal correlation of data in order to realize fault diagnosis using big data analysis. Considering that the voltage and current of three-phase lines contain rich transient sudden change signals when the internal lines of the microgrid fail, which can effectively reflect the fault characteristics, this paper obtained the high-frequency and low-frequency details of the signal using wavelet analysis, Rayleigh entropy, and other theoretical methods to extract the characteristic vectors of three-phase line fault voltage and current from massive microgrid data. Based on the analysis of fault information, the BP neural network is applied to fault diagnosis and identification. Trained with historical data as samples, the BP neural network model supported by big data is constructed to realize the accurate diagnosis of fault type and phase. e rest of this paper is arranged as follows: Section 2 describes fault characteristics; Section 3 gives wavelet packet decomposition and energy entropy; Section 4 provides construction of the BP neural network fault diagnosis model; Section 5 describes fault diagnosis algorithm and process; 2 Computational Intelligence and Neuroscience Section 6 gives simulation analysis and experimental verification; Section 7 concludes the study.

Microgrid Prototype.
A microgrid prototype is shown in Figure 1. e bus voltage is 220 V and connected to the distribution network through the PCC of the common connection point. DG1 is a photovoltaic power generation unit with a capacity of 60 kVA. DG2 is a wind power source with a capacity of 20 kVA. ere are three power transmission lines in the microgrid: WL1 is 1 km long with 8 kW load; WL2 is 5 km with 50 kW load; and WL3 is 2 km long with 15 kW load. e fault resistance and grounding resistance are both set as 0.001 Ω when singlephase faults happen, while the grounding resistance is set as 10 mΩ when two-phase or three-phase short circuits occur.
e three-phase output voltage and current are symmetrical when the microgrid operates normally. e current waveform is shown in Figure 2.
e three-phase output current is equal, and the phase difference is 120°.

Fault Analysis.
e Matlab simulation platform is used to analyze the variation of line voltage and current under different fault types and fault positions. In this paper, Matlab/Simulink 2018b software is mainly used in the process of modeling and simulation. Simulink is mainly responsible for the construction of the whole system and offline simulation verification. At the same time, it can be combined with StarSim HIL and StarSim RCP software of Yuankuan; then, the hardware in the loop test is realized. e simulation analysis is carried out at 10%, 50%, 70%, and 85% of the microgrid side, respectively, while single-phase ground fault, phase-to-phase short circuit, two-phase ground fault, or three-phase short-circuit fault happens. e simulation time is set as 0.5 s, and the fault will be removed in 0.4 s after the fault occurs in 0.1 s. When the fault occurs, the voltage and current at the fault point are shown in Figure 3. Figure 3 shows that the voltage at the fault point decreases significantly after the short circuit, the amplitude of fault current increases, and there are certain components of harmonic and aperiodic components. It is very likely to cause microgrid instability and protection misoperation even after the fault is removed. e fault diagnosis method based on the single signal has certain limitations, so it is necessary to consider the voltage and current signals to build a new feature vector for fault diagnosis.
e fault data information during microgrid operation includes steady-state data, parameter data, alarm events, etc. e circuit information includes frequency, voltage, current, harmonic voltage, harmonic current, voltage imbalance, current imbalance, flicker, power and power factor in the circuit, power grid clutter interference, vibration, temperature and humidity, harmonic interference, abnormal events, and other indicators.
rough the analysis and processing of these data, accurate fault characteristics can be extracted to realize fault identification and diagnosis.

Wavelet Packet Decomposition.
e wavelet packet selects the optimal basis to decompose the original signal in the frequency domain, which improves the ability of signal analysis, avoids the defect of fixed time frequency of wavelet decomposition, and accurately reflects the nature and characteristics of the signal. It has good time-domain and frequency-domain positioning characteristics and excellent signal adaptability.
Let ϕ(x) be an orthogonal scaling function and ψ(x)be a wavelet function; then, the two-scale function equation is as follows: where h k is the scale coefficient and g k is the wavelet coefficient.
Defining a basis function u 0 (x) � ϕ(x) and u 1 (x) � ψ(x), then the two-scale equation is generalized as follows: where the constructed sequence u n (x) is the wavelet packet of basis function u 0 (x) � ϕ(x). e parameter j is the scale index, k is the position index, and n is the oscillation number; then, e wavelet packet reconstruction algorithm is as follows: Taking 3-level wavelet packet decomposition as an example, the process of wavelet packet decomposition and reconstruction is shown in Figure 4. Node (i, j) represents the jth node in layer i (i � 0,1,2,3; j � 0,1,2, . . ., 7), and each node represents a signal with certain characteristics. For example, the node (0,0) represents the original signal S, node (1,0) represents the low-frequency coefficient of the first layer of wavelet packet decomposition, and node (1,1) represents the high-frequency coefficient of the first layer. e relationship between the original signal S and its decomposition coefficient is as follows: Computational Intelligence and Neuroscience 3

Feature Entropy Extraction and Multisource Joint Eigenvector Construction.
Entropy can measure the degree of information uncertainty in information theory, such as information entropy and relative entropy [11,12]. e Shannon entropy extracted from the wavelet packet decomposition and reconstruction can reflect the high-frequency and low-frequency characteristics of the signal more accurately and has stronger anti-interference performance [6]. e time complexity of the frequency component of the signal transient can be accurately expressed by the singular entropy of the Rayleigh wavelet packet, which is more conducive to the identification and diagnosis of fault signals.
Let the random signal X � x 0 , x 1 , . . . , x N−1 , x N ; the probability of occurrence of x i is en, the Rayleigh entropy of X is e expression of energy entropy fused with wavelet packet decomposition is e vector T is composed of the energy entropy of each frequency band: To facilitate signal analysis, the vector T can be normalized and represented by T ' when the energy entropy of each frequency band is relatively large.
After feature entropy extraction, the voltage and current signals are fused by the interval crossing method to form a new multisource fault feature vector for fault diagnosis. e final signal eigenvector is obtained.

Construction of the BP Neural Network Fault Diagnosis Model
In 1986, the BP neural network was proposed by Rumelhart and McClelland, which has become one of the most popular neural network models because of its strong functions of self-learning and adaptability. e BP network can solve the fault diagnosis problem of some complex systems and provide theoretical research and technical implementation methods for more intelligent diagnosis methods.
With the support of big data technology, the development of the BP neural network gradually tends to mature.      Computational Intelligence and Neuroscience the layer-by-layer adjustment of the weight and threshold, and stops after repeated times, and the difference is consistent with the standard. e BP neural network is mainly composed of an input layer, a hidden layer, and an output layer. e topological structure of a typical three-layer BP neural network is shown in Figure 5.
Regulation: J is the number of nodes in the input layer, the node serial number is j, input vector _ X � [x 1 , x 2 . . . x j . . . x J ], j � 1, 2 . . . J, K is the number of nodes in the output layer, node serial number is k, is the output vector, k � 1, 2 . . . K, L is the number of nodes in the hidden layer, the node serial number is ι, W jι represents the connection weight between the jth neuron in the input layer and the ι neuron in the hidden layer, T ιk represents the connection weight between the ιth neuron in the hidden layer and the kth output neuron in the output layer, and the input of the ιth node in the hidden layer is I ι . e output is y ι . e activation function is f(·). en, the input and output of the ιth neuron are expressed as 1, 2, . . ., N), the expected output vector of the input vector _ X n is _ d n . e objective function is defined as the sum of error squares between the expected output and the actual output during backpropagation.
e total error of N samples is defined as By adjusting the connection weight and threshold, the total error E is minimized and the weight changes along the negative gradient direction of the error function.
where t is the number of iterations and η is the step size.

Fault Diagnosis Algorithm and Process
Based on the algorithm of the first two sections, a new method of short-circuit fault diagnosis for microgrid is proposed in this paper. e fault feature is extracted by wavelet packet decomposition, and the energy entropy is calculated. e multisource joint eigenvector is composed of voltage and current characteristic entropy, which is used as the BP network input and realizes fault diagnosis. Firstly, the time-frequency analysis of the three-phase current and voltage signals is carried out by wavelet packet decomposition. en, the Shannon energy entropy is calculated as the signal feature vector, and the multisource joint eigenvector is composed of cross fusion and used as the input of the BP neural network for training and learning. e fault type and phase of microgrid can be accurately identified and diagnosed. e algorithm flow is shown in Figure 6. 500 groups of line fault sample data are randomly selected. Firstly, wavelet packet decomposition is used to analyze the signal time frequency, and then, Shannon energy entropy is used to extract the signal feature vector as the input signal of the BP neural network. e sample size for neural network training and learning is 80%. Another group of 10% data is used for verification, and the third group of 10% data is used for testing.
Using the BP neural network algorithm model, the multilayer feedforward network is trained according to the error backpropagation algorithm of microgrid fault output data, and then, through a large amount of sample learning, the input-output mode mapping relationship of the fault is stored, which can realize real-time and online mapping of fault information, making the complex nonlinear relationship in the output data samples become obvious. e accuracy of fault diagnosis is greatly improved, and the data error rate is reduced.

Wavelet Packet Decomposition of Fault Current in
Microgrid. According to the experimental analysis and comparison, the frequency band distribution of 2-level wavelet packet decomposition is too wide and the resolution is low, while those of the 4-level wavelet packet decomposition and 3-level wavelet packet decomposition are the same, but the amount of calculation is significantly increased. erefore, this paper chooses 3-level wavelet packet decomposition for the signal.
As an example, the step of extracting the wavelet packet energy entropy of the A-phase current signal is illustrated by the A-phase single-phase grounding short-circuit current of line WL3 away from the microgrid side. e waveform diagram of the single-phase grounding short-circuit current is shown in Figure 7.
Comparing Figures 7 and 5, it can be found that when the system is short-circuited, the current changes suddenly and has transient fault information. e db6 wavelet base is selected to further extract the effective fault information. e A-phase current signal is decomposed by 3-level wavelet multiresolution using formula (2), which can obtain the wavelet packet decomposition coefficient and wavelet reconstruction signal. e current decomposition signal is shown in Figure 8.
It can be seen from the detailed diagram that the first impulse current wave is received at 800 sampling points and the second impulse current wave is received at 3500 sampling points. Considering the period, it can be used as the basis for the fault location of microgrid. e wavelet packet reconstruction of the short-circuit current and voltage wavelet signal shows that the fault signal contains rich nonstationary fault signal components, so the wavelet packet reconstruction signal will immediately have obvious fluctuations at the time of fault, which can be used as an important criterion to judge whether the fault occurs in the internal lines of the microgrid and to calculate the wavelet energy entropy value. e fault types and fault phases of the internal lines of the microgrid can be well prepared using the BP neural network.

Extraction and Construction of Multisource Joint
Eigenvector.
e energy entropy of 8 wavelet reconstruction signals of A-phase short-circuit current is calculated using Shannon entropy formula (8), and then, an eigenvector E is formed as follows: where E 30 , E 31 , E 32 , E 33 , E 34 , E 35 , E 36 , E 37 is the entropy of wavelet reconstruction signals. Because the wavelet packet Shannon entropy can detect small abnormal changes in the signal, when the signal-to-noise ratio is low, it can extract the effective weak signal and eliminate the noise very well. e smaller the entropy value is, the more orderly the signal is, and vice versa. It has little influence on the accuracy of judgment with the amount of calculation being multiplied in the process of fault diagnosis and recognition using BP neural network; at the same time, the high-frequency signal component is more than one order of magnitude smaller than the low-frequency signal component, so E 30 , E 31 , E 32 , E 33 can be taken as the input of BP neural network.
In the same way, the other two-phase current and voltage signals are processed and 16-dimensional wavelet energy entropies are obtained. e multisource fault information is fused by the interval cross mode to form the feature vector for fault diagnosis. e fusion mode is shown in Figure 9.
Some multisource feature eigenvectors are shown in Table 1.
Compared with the normal state energy entropy, some results are shown in Figure 10. e energy entropy E 31 , E 32 , E 33 of phase A is surely increased when A phase shows ground fault but B and C phases almost remain unchanged; the energy entropy E 31 , E 32 , E 33 of two phases A and B are significantly increased when two-phase short circuit or two-phase grounding short

Training Results of BP Neural Network.
ree-phase current and voltage eigenvalues constitute a multisource fault eigenvector which is taken as the input vector of the BP neural network. e fault signal of microgrid is decomposed and reconstructed with a wavelet, and the number of input neuron nodes is set as 16. e state of the three phases and the neutral line are taken as output vectors, so the number of output neuron nodes of the neural network is set as 4. e output value of 1 indicates that the corresponding line is at fault or the fault phase is grounded, and 0 indicates that the corresponding line has no fault. e number of hidden layer neurons affects the training results of the model. e training accuracy is poor if too few nodes are selected; the training time and step size are relatively large when there are too many nodes, and it is easy to fit. is is verified by experiments, and combined with the empirical formula L � ����� J + K √ + α, this paper chooses 10 hidden layer nodes. e BP network stops the model training and learning when the training error meets the given requirements. e training curve is shown in Figure 11. e fitting degree curve of neural network training is shown in Figure 12. rough observation, the fitting degree of neural network is high, the fitting degree of training and testing is more than 0.9, and the fitting degree of verification is also 0.88. us, BP neural network can accurately diagnose and identify faults.

Hardware in the Loop Simulation Test.
Because the actual microgrid systems are built outdoors, where the natural conditions are bad and the system structure is complex, including a large number of power electronic device load units, the voltage level is high and dangerous. However, the construction of a traditional electrical laboratory has high construction costs and site requirements. It is difficult to simulate and research some special conditions such as faults considering safety and equipment maintenance. erefore, the hardware in the loop simulation technology is applied to the experimental environment based on the existing equipment and conditions of the laboratory. e hardware in the loop simulation platform is composed of PXI hardware and StarSim HIL and StarSim RCP software developed by Yuankuan energy to realize microgrid fault identification and diagnosis. e hardware in the loop platform is shown in Figure 13, the system framework is shown in Figure 14, and the microgrid interface is shown in Figure 15.

Fault Diagnosis and Identification.
e multisource feature vectors of microgrid test samples are brought into the trained BP neural network to diagnose and identify the fault   Table 2. e BP neural network method based on big data analysis extracts and abstracts the features of fault data and strengthens the spatiotemporal correlation of heterogeneous data to realize the prediction of fault type and phase. At the same time, the BP neural network method based on big data obtains accurate data information by constructing many invisible models and performing a large amount of data analysis and training. It can be seen from Table 2 that the trained BP neural network model can accurately and effectively identify the fault type and fault phase and the error between the actual output and the expected output value meets the requirements of fault diagnosis. e test results show that two samples have fault diagnosis among 50 samples; the accuracy rate reaches 96%, which meets the requirements for intelligent fault diagnosis of microgrid lines.  Step size 50~100 us AC power grid power electronics Step size 0.

Conclusions
Aiming at the characteristics of real-time transmission, fast updating, and large-scale fault information data of microgrid, a microgrid fault diagnosis and analysis technology supported by big data is proposed in this paper.
is technology combines Rayleigh entropy, wavelet packet decomposition method, and BP neural network to extract the fault feature vector of microgrid. e BP neural network method based on big data strengthens the spatiotemporal correlation of heterogeneous data to realize the prediction of fault type and phase. e BP neural network method based on big data obtains accurate results by constructing many invisible models and a large amount of data training. e experimental results show that the fault diagnosis and analysis technology based on big data support proposed in this paper has an accuracy of 96%, which fully meets the needs of engineering practical application. However, due to the complex topology and many fault types of microgrid, only five line fault types are considered in this paper. erefore, in the next step, the technical method proposed in this paper needs to be applied to other fault diagnosis to verify the universality of the technical method proposed in this paper.

Data Availability
e datasets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.