A Photovoltaic Array Fault Diagnosis Method Considering the Photovoltaic Output Deviation Characteristics

There are a large number of photovoltaic (PV) arrays in large-scale PV power plants or regional distributed PV power plants, and the output of different arrays fluctuates with the external conditions. The deviation and evolution information of the array output are easily covered by the random fluctuations of the PV output, which makes the fault diagnosis of PV arrays difficult. In this paper, a fault diagnosis method based on the deviation characteristics of the PV array output is proposed. Based on the current of the PV array on the DC (direct current) side, the deviation characteristics of the PV array output under different arrays and time series are analyzed. Then, the deviation function is constructed to evaluate the output deviation of the PV array. Finally, the fault diagnosis of a PV array is realized by using the probabilistic neural network (PNN), and the effectiveness of the proposed method is verified. The main contributions of this paper are to propose the deviation function that can extract the fault characteristics of PV array and the fault diagnosis method just using the array current which can be easily applied in the PV plant.


Introduction
In recent years, the PV industry has developed rapidly as the cost of PV modules has been greatly reduced. The installed capacity of PV power plants is increasing rapidly [1]; by the end of 2018, the cumulative PV installed capacity of China reached 174.63 GW, with an additional installed capacity of 44.1 GW. The PV power plant has a large number of modules, which works under the natural environment, so the module or array often failures in PV power plants [2]. A failure in module can degrade the operating efficiency of the PV array and even seriously endanger the safe operation of the PV plant [3]. Therefore, the real-time monitoring of the operating status and timely detection of the PV arrays faults are very important for its effectively operating.
There are two main types of fault diagnosis strategies for DC side in PV plant [4,5]. The first type methods rely on the test equipment for PV modules/arrays. And in references [6,7], infrared cameras are used to detect the temperature differences among modules and then identify the fault modules. Madeti et al. [8] diagnosed faults directly by plac-ing sensors on the PV array. Yihua et al. [9] collected voltage data by installing voltage sensors in the arrays and then used these data to realize PV fault diagnosis. Livera et al. [10] summarize the disadvantages of infrared-based fault diagnosis methods and the advantages of PV electrical parameters-based methods, such methods require a large number of test equipment, which greatly increases the cost of diagnosis, so it is difficult to be applied in actual PV power plants. So more and more scholars are trying to use the operational data to develop the fault diagnosis methods of PV plant.
The second type method is based on the operational data from PV array, and such method can be divided into three categories. The first category methods are based on the reference model. Chine et al. [11] used the ANN (artificial neural network) to build the reference model of the PV module. Fouzi et al. [12] and Yang et al. [13] developed the reference model of the PV module from the historical data and used the deviation between the actual and theoretical output for fault diagnosis. Chaibi et al. [14] used an artificial colony optimization algorithm to build the PV model, based on the deviation of measured and reference value for fault diagnosis. Through the PV model, Fu et al. [15] and Liu et al. [16] introduced the indicator of the array current dispersion rate of a combiner box. Such type of method can effectively determine the fault type through the deviation analysis, but due to the complex modelling process, the performance differences between modules, and the nonlinear distortion of the PV module output parameters caused by the aging of the PV power plant, the model accuracy is difficult to meet the fault diagnosis requirements.
The second category methods are based on the statistical analysis of running data from PV plant. Mahmoud et al. [17] introduced the PV output indicators and obtained the threshold of indicators by using statistical t-test; finally, the threshold is used for fault diagnosis. Based on the statistical analysis of measured and reference value, Majdi et al. [18] proposed a multiscale weighted generalized likelihood ratio test chart for PV fault diagnosis. Garoudjaa et al. [19] combined the residual error between the actual and reference values with the exponentially weighted moving average control chart for fault diagnosis. In reference [20] based on variation between measured and estimated power, a statistical approach was introduced to set thresholds that can be used for locating defects in the PV system. This kind of fault diagnosis method needs to master the prior knowledge of the distribution characteristics of the analyzed objects, but the prior knowledge is difficult to be obtained in advance.
The third category methods are the intelligent classification-based methods. Chen et al. [21] used a principal component analysis and support vector machine to classify the faults in PV systems. Some scholars used the extreme learning machine [22] and fuzzy clustering method [23] to classify the obtained data and then identified the various faults of the PV array. Chen et al. [24] used the random forest ensemble learning algorithm for fault detection of PV array. In reference [25,26], the newly deep residual network model trained by the adaptive moment estimation deep learning algorithm is built for fault diagnosis of PV arrays. The intelligent classification method avoids the complex process of modelling and the classification process is easy to implement, but this method requires a large amount of fault sample data to train the model. Akram and Lotfifard [27] selected the PNN algorithm for PV fault diagnosis by comparing various fault diagnosis methods of PV system. PNN algorithm has good nonlinear learning ability and is suitable for small sample size training, which are the important reasons for choosing PNN to classify the sample in this paper.
Through the above analysis, the methods using operational data of PV array are the most potential fault diagnosis method. But the operational data of the PV array will change with the external environment [28,29], and the output characteristics of the array are easily covered by a large number of data. The author of this paper studied the spatial-temporal distribution characteristics of PV array under different faults [30] and the statistical characteristics of PV array output under different conditions [31]. Based on the above research, designing a classification method for PV fault characteristics are the key to improve the quality of PV fault diagnosis. This paper focuses on the fault feature extraction of PV array output and combines the fault feature extraction method with PNN classification algorithm for fault diagnosis. In the PV power plant, the PV arrays are connected to the combiner box in parallel, and the PV array voltage in one combiner box is the same which is difficult to be used for fault diagnosis of PV array. The main contribution of the paper is to build a new fault diagnosis method of PV array by analyzing the deviation characteristics of different arrays.
In this work, the distribution characteristics of PV array output deviation are studied. And a deviation function that can effectively extract the deviation information of the PV array current is constructed. The rest of the paper is as follows. Section 2 studies the deviation characteristics of PV arrays output currents in PV power plant. In Section 3, the deviation function is established to describe the output deviation of PV array. Section 4 proposes a fault diagnosis method for PV arrays. In Section 5, the experimental verification of the proposed method is carried out. Finally, Section 6 summarizes the major innovation points of this work.

Deviation Characteristics of PV Arrays
2.1. Output Characteristics of PV Arrays. This paper uses a large-scale PV power plant in China as the object for analysis. This PV power plant consists of 553 intelligent PV combiner boxes and 74 inverters. It has approximately 130,000 PV modules and more than 8000 arrays; each array consists of 16 modules. 16 arrays are connected in parallel in each combiner boxes and 7 combiner boxes are connected to one inverter. The analyzed data in this paper are all from this plant, and the time resolution of the data is 10 minutes. For actual PV power plants, the array current is the main available data for fault diagnosis. Therefore, this paper takes the array current as the analyzed variable.
In order to analyze the output characteristics of different arrays, five arrays with arrays 1-3 are connected in parallel in the same combiner box and the other two arrays are connected in different combiner boxes are selected. Figure 1 shows the current distributions of the five arrays in the PV power plant in 7 consecutive days. The output of each array is similar under normal operation, and it fluctuates wildly as the weather changes. Large-scale PV power plants have a large number of arrays, and the data collected are very complicated. Therefore, to diagnose the fault of PV arrays, the fault characteristics of the PV array must be extracted under complex operating conditions.

Deviation Characteristics of PV Array Current.
To show the deviation characteristics of the PV array output directly, the reference current is introduced to compare with the actual value. The reference current is the theoretical maximum power point current of the PV array, and the calculation formula of the maximum power point current, I m , of the PV array is International Journal of Photoenergy I m−ref is the maximum power point current of the PV modules in standard test conditions and α is the compensation coefficient. The PV module temperature T is calculated by the ambient temperature [32], T ref = 25°C is the reference temperature, G is the measured solar radiation intensity, which is sampled by the solar pyranometer set in the power plant, and The current are from nine arrays in the power plant, and Figure 2 shows the distribution of the difference between the array current I T,S and reference current I m of 9 consecutive days, and the 9 days are selected from July 2 nd to 10 th . The sampling interval here is ten minutes, and 138 data points were collected from 0 o'clock to 23 o'clock every day. The difference of current is obtained by subtracting the reference current calculated by Equation (1) from the measured current. Figure 2(a) compares different arrays, and Figure 2 Through the analysis of Figure 2, it can be seen that the outputs of different arrays are different, but the deviations are small. The array output varies significantly between different days, and the output of PV arrays shows strong volatil-ity. Therefore, the deviation data of the array output is difficult to be used for PV fault diagnosis directly. Extracting the fault characteristics from the output deviations of different arrays and time series can be an effective way for the fault diagnosis. And this paper focuses on the extraction of the deviation characteristics of PV output and its application in array fault diagnosis.    Figure 3, the threedimensional (3D) data are composed of multiple sets of twodimensional (2D) data. The 2D data are essentially the crosssectional data of different time series. The 3D data sets can show the differences between different samples and the changes in the sample evolution process. The output deviation between different arrays and different time series can be used to construct 3D data that can show the output deviation characteristics of PV arrays. Suppose there are N data samples, x 1 , x 2 , ⋯, x N , and each data sample consists of a time series deviation component (TSD) and a cross-sectional deviation component (CSD). " TSD and CSD describe the deviation between different time series and the deviation of arrays of the array current, respectively. Therefore, the two components can be used to describe the PV output deviation.
The k th data sample, x k , is represented as   Through the analysis of Figure 4, the following conclusions can be drawn. Under normal operating conditions, the two indicators fluctuate within a certain range; TSD can effectively reflect the output deviation of different time series and CSD can effectively reflect the output deviation of different arrays. Therefore, these two indicators can be used for fault diagnosis of PV arrays. (PNN). The probabilistic neural network is a feedforward neural network developed from a radial basis function network. Based on the radial basis function neural network, the PNN integrates density function estimation and the Bayesian decision theory, and it is suitable for pattern classification [33]. Moreover, the PNN has the advantages of a simple network learning process, fast learning speed, accurate classification, high error and noise tolerance, and strong classification ability. Using the strong nonlinear classification ability of the PNN model, the failure mode of the sample space is mapped into fault space, and then it can build a fault diagnosis network system with strong structure of fault tolerance and self-adapted ability to improve the accuracy of diagnosis [34]. Based on the kernel estimation of probability density function of the PNN network, each training sample determines a sample of  5 International Journal of Photoenergy neurons, neuron weights directly from the input sample values. And the expansion of the PNN neural network is good, the learning process of network is simple, and to increase or decrease the number of pattern classes does not need a long training and learning time [35]. PNN has been widely used in the field of fault diagnosis [33][34][35][36], and it is suitable for fault diagnosis of PV array [27]. PNN has been chosen over other algorithms for fault detection and classification for a number of reasons. (1) The output of PV systems depends on environmental conditions. The PNN training system can develop its own decision boundaries based on the sampled data. (2) Simple classifiers such as fuzzy Cmeans clustering and K-means clustering are likely to be stuck in a local optimum rather than reaching the global optimum. The intelligent PNN method, which uses heuristic method, is able to more efficiently reach the global optimum.

The Probabilistic Neural Network
(3) Output power from the PV array may drastically vary when there are momentary shadings due to clouds, rain, etc. PNN has the advantage of being relatively insensitive to these outliers unlike other simple classifiers and multilayer perceptron neural networks [27]. Therefore, this paper uses PNN as the tool for classification. The PNN is generally divided into four layers: the input layer, mode layer, sum layer, and output layer. (1) The input layer is responsible for transferring the feature vectors to the network and transferring the data to the hidden layer. The number of neurons in this layer is equal to the length of the input vector. (2) The mode layer connects with the input layer to calculate the matching degree between the input feature vector and each mode in the training set. The number of neurons in the mode layer is equal to the number of input sample vectors. (3) The summation layer obtains the estimated probability density function of the failure mode according to the probability accumulation results of a certain class. The number of neurons in this layer is equal to the number of sample categories. (4) The function of the output layer is to select a neuron with the maximum probability density from the estimated probability density of each fault mode as the output of the whole system.
The input and mode layers are connected by the Gaussian (Equation (4)), which is used to set the matching degree between each neuron in the mode layer and each neuron in Model number: PYQX-02 Collect ambient temperature T, horizontal irradiance G, and other meteorological parameters of the PV power plant.

Data collection system
Collect the current of the PV array, the voltage of the combiner box, and the data collected by the weather station, sampling interval is 10 minutes.   7 International Journal of Photoenergy the input layer. By summing the matching degree of each class and taking the average, we can get the category of the input samples.
y g ðx, σÞ is the classification result of input vector x under smoothing parameters σ; l g is the number of g classes; m is the sample dimension; σ represents the smoothness parameters, which are generally between 0 and 1; x i,j is the j th data sample of the i th neuron in class g. Suppose there is a recognition task for two types of samples, and there is a variable number of samples for each type, and each sample has a 3D feature. Then, the network structure diagram can be drawn as in Figure 5.

Fault Diagnosis Method.
The operation state of each array is basically the same under normal operation. When a fault occurs, the output deviation distribution of the fault array will be different from that of the normal array. Figure 6 presents the flow chart of the fault detection method proposed in this paper.
(1) Data Preprocessing. The historical and real-time data of the PV power plants are preprocessed. Data of night time is removed and only data with irradiance greater than 0 W/m 2 is used (2) The Calculation of Deviation Component. The reference current of TSD is calculated using the historical data of irradiance and array current. The reference current of TSD under different irradiances is calculated according to Equation (2). The reference current of CSD is calculated using the real-time data. The reference current of CSD is calculated by using the current of each array in the same combiner box. After the reference current is obtained, based on the deviation function, the deviation component of each array can be calculated according to the difference between the actual value and the reference current of array current   As shown in Figure 7, the distributions of the CSD and TSD of the PV array are obviously different under different fault conditions, which indicates that fault diagnosis of PV arrays based on the deviation characteristics is feasible.

5.2.
Verification. The PNN model is trained using data collected every 10 minutes for 16 days. The 16 days were divided into four groups, and each group containing 4 days with different operating conditions of PV array. The radial basis function distribution density of the PNN is set to 0.5. The results of the statistical analysis showed that the training accuracy of this model reached 0.9921. The training accuracy of the PNN algorithm is high, so, the PNN model can effectively classify faults through CSD and TSD.
The performance of the proposed method is analyzed using 4 days experimental data. The data is collected from 7 am to 6 pm every 10 minutes and the setting conditions of faults are shown in Table 2. Figure 8 shows the fault diagnosis results for these 4 days. As shown in Table 3, the accuracy of the fault diagnosis is over 97%. Therefore, the proposed method can detect different faults effectively.

Comparison.
In order to demonstrate the superiority of the PNN algorithm, ANN (artificial neural network) and GRNN (generalized regression neural network) are selected for the comparison about the training speed and the training  Figure 9. It can be seen from Figure 9 and Table 4 that the training accuracy of these methods is above 95%. But the training results of ANN are the worst, while the training accuracy of PNN and GRNN is above 99%. Compared with PNN, GRNN has a significantly longer training time, so the PNN shows the best performance.

Conclusion
This paper studies the deviation characteristics of the PV array output and quantifies the deviation by the proposed function. Through the quantitative analysis of PV output deviation, it can be known that the deviation can be effectively used to identify the PV array fault. The PV array fault diagnosis method combining PNN algorithm with deviation (1) In the large-scale PV power plant, due to the parallel structure of the PV array, the actual available data is the PV array current. Therefore, based on the monitoring status of PV power plant, this paper proposes an effective method for fault diagnosis of PV arrays (2) The deviation function realizes the quantification of the PV output deviation, effectively describes the output deviation of PV array between different time series and different arrays, and extracts the deviation characteristics of the PV output under different operating conditions (3) The fault diagnosis of the PV array is carried out by combining the PNN algorithm with the PV array output deviation function. The proposed method is simple and effective, and is applicable for fault diagnosis of PV power plants (4) The configuration and structure of different power plants are different, and the output characteristics of PV arrays are also different. So, the proposed method needs to be optimized for its applications in different PV power plants

Data Availability
The [Research Data IN EXCEL] data used to support the findings of this study are available from the corresponding author upon request.