Multivariate Statistics and Supervised Learning for Predictive Detection of Unintentional Islanding in Grid-Tied Solar PV Systems

Integration of solar photovoltaic (PV) generation with power distribution networks leads to many operational challenges and complexities. Unintentional islanding is one of them which is of rising concern given the steady increase in grid-connected PV power. This paper builds up on an exploratory study of unintentional islanding on a modeled radial feeder having large PV penetration. Dynamic simulations, also run in real time, resulted in exploration of unique potential causes of creation of accidental islands. The resulting voltage and current data underwent dimensionality reduction using principal component analysis (PCA) which formed the basis for the application of Q statistic control charts for detecting the anomalous currents that could island the system. For reducing the false alarm rate of anomaly detection, Kullback-Leibler (K-L) divergence was applied on the principal component projections which concluded that Q statistic based approach alone is not reliable for detection of the symptoms liable to cause unintentional islanding. The obtained data was labeled and a K-nearest neighbor (K-NN) binomial classifier was then trained for identification and classification of potential islanding precursors from other power system transients. The three-phase short-circuit fault case was successfully identified as statistically different from islanding symptoms.


Introduction
The distribution segment of the electricity supply network is always under stress given the variable consumption patterns in complex geographical spread.Since the power grid was not designed for the inclusion of sources in its distribution pathway, the integration of distributed energy resources is a technically complex issue.When the sources are renewableenergy based, their stochastic nature and the variability in availability of generation induce extra randomness in the system operation.The distributed generators (DGs) challenge the conventional functioning of the distribution feeders and lead to operational issues affecting the power quality, stability, and protection aspects [1].
Since most of the dispersed solar PV integration is taking place on the distribution side [2], especially in the form of rooftop systems, the vulnerability of such a system to supply interruptions and shutdowns becomes an important concern.Interconnecting a solar PV system thus becomes challenging; however, the continued feeding of loads in the vicinity of the PV when the mains suddenly go off is a situation that must be avoided.
Most of the DGs including PV inverters operate in constant PQ control mode or constant power (active and reactive) control mode.This means they are commanded to give output power in synchronism with the grid based on the provided power set points.PV is generally operated at unity power factor [3] because this is a strategy of maximizing the energy yield from the array through maximum power point tracking (MPPT).The inverter thus cannot adjust its active or reactive power output accordingly to regulate the grid frequency and voltage.
Islanding is said to occur when the DG continues providing power (with utility level voltage and frequency) to a segment containing certain loads even after that portion of the network, including the point of common coupling (PCC), gets disconnected from the main power system.This is clear from a schematic diagram given in Figure 1.Based on the reasons for disconnection, islanding can be categorized into intentional or unintentional.When the distribution system operator knowingly separates certain sections from the main grid with an intention to secure critical loads, this practice is called intentional islanding.The move is preplanned and is generally required in situations of network congestion or a large power system blackout.On the other hand, accidental or unplanned disconnection of a loads-DG portion from the mains and continuation of the DG in grid-connected mode of operation, maintaining grid level voltage and frequency, are known as unintentional islanding.The creation and maintenance of an unplanned island are harmful to system health.Basically, such an islanded network operates as an independent autonomous entity without the regulation from the grid.Since the DG continues to operate in constant PQ control mode, it cannot adjust its supply to maintain voltage and frequency according to the loads hence leading to poor power quality.Sudden resumption of grid due to action of automatic reclosers can cause circulating currents to flow if the utility and the island are out of phase.Loss of effective grounding in the islanded portion and transient overvoltages are other issues.There is also a constant safety threat to utility repair personnel due to a live portion existing on a dead power network.
Although internationally documented experiences of unintentional islanding events are limited, some real events noted in [5,6] indicate the potential threat expected to intensify in the scenario of rising DG penetration.A recently published survey in [7] reaffirms the same concern among distribution utilities that feel that unintentional islanding will impact their network the most after DG interconnection.
Prevalent anti-islanding methods include classical methods involving passive, active, and hybrid techniques.These local techniques either monitor the parameters around the DG and sense any threshold-exceeding changes in them to detect islanding (passive) or force the parameters out of the safe range (active) or combine both strategies (hybrid).The problems of threshold selection causing large nondetection zones (NDZs) in passive and power quality disturbance from active techniques impact their use in high DG penetration.Many computational intelligence (CI) based techniques have also been reported in the literature [8] but they are fundamentally based on classical techniques and seem to reinforce the reactive strategy of detecting the island formation and then disconnecting the DG.This practice is not expected to remain in the future smart grid that will accommodate a large share of renewable-energy based DG power which cannot be wasted even for a few cycles.Hence, a predictive approach to islanding detection can prove to be a robust solution.
Early works like [9] contemplated predicting PV inverter unintentional islanding in distribution grids considering the predictable utility supply and the load and PV generation profiles and utilizing analytical modeling of the early selfcommutated inverters [10,11].More recently, data mining techniques were used on real [12] and simulated [13] phasor measurement unit (PMU) data to predict islanding in bulk transmission networks.Other works predicted the parameters at which a possible island could form.Traditional concepts of limit cycle behavior and small signal stability and describing function methods [14] and analysis of various inverter control functions [15] were used.
This paper describes the application of anomaly detection techniques: multivariate statistical and supervised learning based for predictive detection of an unintentional islanding event.The discovered anomalous currents occur before the islanding event on a modeled distribution feeder with a solar PV interconnection.This is a highlight of the paper which begins with a description of the system model in Section 2. Section 3 describes the anomalous precursors obtained from the dynamic simulations as part of the exploratory study.Section 4 describes data reduction using PCA and application of  statistics for detecting the islanding precursors from other signals.An improved detection accuracy obtained using K-L divergence is detailed in Section 5. Section 6 describes labeling of the data points for training a -NN classifier and reports its performance for the same test data sets as used in the previous two sections.Section 7 concludes the paper.

Power System Model
The distribution network modeled in this study is based on the benchmark IEEE 13 node test radial feeder [16] and was modeled in MATLAB-Simulink.Some modifications were made to the original feeder model in order to carry out the islanding studies as required.First of all, the substation automatic voltage regulator (AVR) at node 650 was not included in the model.This was done so as to avoid any possible interaction of PV with the tap-changing controls of the AVR [17] that could mask any signature related to islanding on the system.A 100.7 kWp solar PV array was integrated at node 692 through a three-phase inverter, thus making this node the PCC.These changes are visible in the modified feeder shown in Figure 2.
Apart from the changes mentioned previously, the constant current load at node 675 was removed and the active and reactive power demands  and , respectively, of the constant current load at 692 were modified so as to make section 671-692 the islandable feeder section.The PV inverter operates on unity power factor and thus the loads on lateral 671-675 had to be scaled according to the fixed PV penetration and feeder capacitor bank size for attaining - balance required for the exploratory islanding study described in the next section.
The details of the system component modeling are given in our earlier work [18] from which the same model is taken here.This section only produces the results that describe the system model functioning and verification.
To verify that the system performs according to theory, it was required to test the voltage and frequency at any point on section 671-692 when it is islanded with the PV inverter.The voltage and frequency in an unintentional island depend on the mismatch of  and  between the loads and the source(s) in that network.Accordingly, an island was forced to form by opening the islanding switch (circuit breaker in Figure 2) from  = 0.45 s to  = 0.48 s.The solar irradiance in this case and for all cases discussed in this paper was kept fixed at 1000 W/m 2 in order to operate the PV array at standard test conditions (STC) for a fixed, rated output.The MPPT is switched on at  = 0.40 s from the start of the simulation and reaches the MPP at 0.42 s after a few transients.
The situation of - mismatch between the island loads and sources was created for which the values were set as follows:  load = 90 kW,  load = 151 kVAr,  PV = 100 kW (effective three-phase AC inverter output), and  PV = 0.  was supplied by the feeder capacitor bank and the inverter's filter circuit, coming in the islanded network; thus,  supply =  cap.bank +  inv.filterckt.= 600 kVAr + 10 kVAr = 610 kVAr.The island load values are for a single-phase load while the supply values of  and  come from three phase sources.The voltage magnitude and frequency in the resulting island, measured at the PCC, are plotted along with many other quantities in Figure 3.The PV array continues operating in the preprogrammed constant - control mode and hence the undervoltage and underfrequency are evident in the figure.The low value of voltage magnitude and frequency is consistent with the given amount of - mismatch inside the island [19].
The harmonics observable in the displayed voltage and current are obvious when a PV system is integrated.However, the current harmonics shown are those in the grid-side current and not the inverter output.The simulation runs the model using a discrete solver which samples the voltages and currents at the rate of 1 MHz.Similar kinds of wave shapes were obtained from field tests carried out on a medium voltage distribution feeder section in Spain [4].The resulting   island three-phase voltages and currents are reproduced with permission in Figure 4. Here, overvoltages are observed in the island.The time scale of observations in the field-test results is 10 ms/division.Some kinds of similarities in these two figures provide some confidence about the modeling approach although comparing simulation results with fieldtest results might not be completely appropriate.

Exploration of Anomalous Precursors to Unintentional Islanding
The level of - mismatch between the loads and the PV inverter has a close association with island formation and sustenance as was seen in the previous section.Many practical studies documented in [20,21] have characterized inverters' islanding behavior for different levels of - mismatch.However, in the expected scenario of rising PV penetration levels on distribution feeders, complete - match is not a remote possibility.Internationally accepted reports like [22] have acknowledged that - balance between PV inverter and loads is a quantifiable possibility.Field tests in [5] and laboratory tests in [23] have studied the inverters' antiislanding capabilities for complete match.Also, a case study done for India [24] has estimated the risk of unintentional islanding in a spot distribution network based on the number of hours for which such a condition occurs.These documented practices have highlighted the significance of power mismatch to islanding, but after the occurrence of the event.
This study explores the impact of complete - match case on the possibility of building up of an imminent islanding condition.Such dynamic load-PV interactions on a high PV penetration radial feeder for different grid conditions can throw up interesting results related to system islanding.In an attempt to explore such patterns, two types of disturbances were programmed to occur from the substation, undervoltage and overvoltage, in concurrence with exact - balance.These two types of disturbances are commonly used in practice for islanding related studies [25].In either case, the following values were set to implement - match between the 1-phase load and the PV and  sources on the islandable feeder section:  load = 23.33 kW and  load = 203.33kVAr.A 10 kW resistor in parallel with this load model forces the exact  match as  and  demands of the inverter filter circuit are negligible.The values of  PV and  supply remain the same as before.
An undervoltage disturbance of 0.7 per unit (pu) of the nominal voltage amplitude was forced from the grid side for a period of 30 ms from  = 0.45 s to  = 0.48 s.The gridside current flowing in phase C of section 671-692 has many anomalous peaks during the voltage disturbance and after the disturbance ends as shown in Figure 5.The 30 ms window during and after the disturbance is important for this study from the perspective of data extraction.The circled peak is an anomaly whose severity to cause islanding is explained and verified in [26] which also verifies the occurrence of a similar peak for the same set of conditions on a single-phase, single bus system implemented in emulators and related hardware.An overvoltage disturbance, symmetrically 1.3 pu of the nominal voltage amplitude, was programmed to occur from the grid side for 30 ms from  = 0.45 s to 0.48 s.The resulting grid-side anomalous current shown in Figure 6 was not proved to be severe enough to lead to section islanding.The complete system was remodeled and simulated for 0.5 s in real time on a real-time digital simulator (RTDS) for both disturbance cases.This was to validate the model robustness and the real-time results similar to Simulink results are shown in Figures 7 and 8   the different statistical and CI techniques discussed in the next three sections.
A 30 ms window after each disturbance ends is also captured during the simulation run.For the case of Figure 7 previously, this window contains data points corresponding to the anomalous current liable to cause islanding and hence is used as a part of the composite training data set used for the -NN classifier training.More details about data sets and applications follow in the next section.

Data Handling and Anomaly Detection Using PCA
The simulation for each of the two grid disturbances was run for 0.5 seconds at a sampling rate of 1 MHz as discussed above.The data of the voltage and current samples mentioned above was collected and bifurcated into different data sets according to three conditions found in each of the two cases, namely, normal, during disturbance, and after disturbance.The normal condition was common for both the disturbance case simulations and contained 30484 samples.The normal condition corresponds to that where no extra disturbances, apart from the PV induced harmonics, are present in the system.PCA was used to reduce the dimensionality of the data set corresponding to the normal system operation case as voltage and current are correlated quantities.The standard singular value decomposition (SVD) approach was used in MATLAB to find the principal component matrix.The two-dimensional  data resulted in two principal components (PCs).Based on the variance of the projections onto the two PCs, the 1st PC was retained for all analyses.The latent matrix containing the values of variances onto the 2 PCs called the scores is shown in Table 1 and makes the choice of PC selection clear.The PCA model created was used for detecting any abnormal occurrence using statistical process control strategy for anomaly detection.The data belonging to different cases simulated was preprocessed and projected onto the 1st PC of the reference PCA model.The aim of this strategy is  to differentiate a condition that can cause unintentional islanding on the modeled feeder from conditions like faults and other transients which appear close to islanding and thus are tricky to detect and identify correctly.The existing literature describes techniques that detect an islanding condition among other transients like surges, load and capacitor switching, and faults after the island has been formed.This study initiates efforts towards exploring possible practical causes of the event and detecting such conditions from the ones that appear close enough to fool the inverter.Hence, only the four cases resulting from the two grid-side disturbances as described previously and a 3-phase short-circuit fault case have been simulated.Case 1 is the normal system operation case which has been described previously.Each of the grid-side undervoltage and overvoltage disturbance conditions gives rise to two cases.A three-phase line-to-line-to-line-to-ground (L-L-L-G) fault at the PCC is designated as case 4. All the event based cases simulated are summarized in Table 2.
For  number of data sample vectors  ∈   stacked above one another to form a data matrix  × , application of PCA on  leads to a  ×  coefficient matrix .If an  ≤  number of PCs are retained based on latent values, then  can be resolved as a PCA model and a residual model as  =  pca +  res .The projection onto the PC or loading matrix leads to formation of a score matrix  × .The original data matrix  can be reconstructed using score matrices  pca and  res and loadings  pca and  res as  =  pca   pca +  res   res , where  res and  res are of  ×  −  dimension [27].
The statistical process control method is widely used in industrial engineering for quality control purposes.It has found other applications in many domains for outlier detection by checking whether the process variables are in Hotelling's  2 statistic is a multivariate distance for a set of data points from a target value indicating variance inside the PCA model.If  is a mean-centered (scaled) sample data vector, then  pca =  pca is a score vector.The  2 statistic for  is defined as  2 =   ∧ , where ∧ is a diagonal matrix having  eigenvalues of data matrix  × for  ≤  number of retained PCs.The UCL for the statistic is defined as  2   .If all data points are linear and normally distributed,  2   follows an  distribution and is given as  2  = (( 2 − 1)/( − )) ,− at a given level of confidence .
The  statistic is a measure of deviation of the original data points from the projection onto the PC axes.Hence, it measures variance among data points inside the residual subspace.The  statistic is calculated using residuals and, for a residual vector  of a scaled sample vector ,  statistic is given as  =    =   ( −  pca   pca ), where  is an identity matrix.For normally distributed linear data points, the  statistic follows a central  2 distribution and its UCL is given as   = ( 2 /2) ×  2 (2 2 / 2 ) at a given level of confidence .
Here,  and  2 are the mean and variance of the  statistic.
Recently, PCA based process control strategy has been applied for detecting the occurrence of islanding and distinguishing it from several nonislanding events.PMU recordings of frequency measurements on 6 different sites in the UK power grid were used as reference data for implementing  2 and  statistic based islanding detection in [28].The occurrence of an islanding situation was evident only when   was crossed in addition to the crossing of  2  by the corresponding multivariate statistics for a test event data set.Since the power system is a dynamically changing system, the system variables used for creating the reference PCA model change dynamically causing it to change with time also.To tackle this issue, a recursive PCA algorithm was developed in [29] for the same UK power system case.The reference PCA model was updated in every iteration and the detection results for abnormal transients verified its effectiveness over the simple PCA approach.This study has made use of the usual SVD for creating the reference PCA model since the reference data does not change from one event to another as the simulation has been performed for fixed settings to observe some unique changes that occur in fixed windows as described previously.Each new test data set  samples×2 underwent scaling to make the mean along the columns zero.The mean-centered data set  mc was projected onto the 1st PC of the reference PCA model by  mc ×  pca .Correspondingly, the  2 and  statistics were calculated.Each of the remaining 5 cases was used as the test case.Since crossing of the  2  limit for the reference case by the  2 statistic of any test case indicates only a faulty or out-of-control event, the  statistic was used as the only parameter for detection. statistic measures deviation inside the residual subspace and hence is a strong indicator of any abnormal or anomalous condition.
Following the same, the 5 test cases were subjected to mean-centering as before and were projected onto the 1st PC of the reference PCA model.The  statistics for each case projected data matrix were found and compared with the UCL   of the reference case score.  at 98% confidence level was calculated to be = 3.846 × 10 7 .The results of this multivariate statistics based detection are given in Table 3.
As seen in Table 3, this approach identifies the anomalous case correctly.It also identifies the disturbance event in case 2 correctly as not an anomaly that can island the system.However, cases 4, 5, and 6 are incorrectly identified.This shows that the  statistic based statistical process control approach is not completely reliable for detecting the anomalous currents that can lead to islanding on the system.To improve upon the false detection rate, the Kullback-Leibler (K-L) divergence based approach using the PCA model is presented in the next section.

K-L Divergence Based Detection
K-L divergence, also known as relative entropy, is an important statistical measure coming from information theory.It has shown a great potential for application in fault detection and diagnosis (FDD).It has been aptly used for incipient fault detection in mechanical and electrical systems in [30] and has also been widely used in multimedia security and neuroscience.However, the application of K-L divergence in islanding detection related studies could not be confirmed in the literature.This section details the use of K-L divergence involving the PCA model for improved accuracy of anomaly detection.
K-L divergence is basically a measure of dissimilarity between two probability distributions.If two data samples are drawn from two populations having the same distribution, their K-L divergence will be zero.For two continuous probability density functions (PDFs) () and () of a random variable , the K-L Information (KLI) is defined as ( ‖ ) = ∫ () log(()/()).The K-L divergence is then given as K-LD(, ) = ( ‖ ) + ( ‖ ), a symmetric operation of KLI.For discrete distributions, K-LD is defined as the mean value of the log-likelihood ratio of the two distributions.
For an anomalous behavior or a sudden change in a process, the PDF of the corresponding data set changes from the reference case and if it goes beyond the safe threshold , it can be statistically detected.For two normal (Gaussian) probability densities  and  having means and variances as  1 ,  2 and  2 1 ,  2 2 , respectively, the K-L divergence between them can be given by a simple expression: In our case, we find the divergence between two distributions: projection of different event (test) cases onto the 1st PC and the reference PCA score or projection of case 1 on the 1st PC.Nonparametric kernel-density estimation has been used to approximate each of these two distributions as normal distributions graphically.Since mean-centering of data samples is a part of PCA, the means of the projections for both test cases and reference case are zero.Since the PC scores are linear combinations of the original data samples, they are assumed to be fairly normally distributed [31].
Taking this assumption into consideration, we have used the following formula to calculate K-LD between a test case and the reference case: Here, ( ref −  test ) 2 = 0 and  2 ref.
is nothing but the variance of projection on the 1st PC which is in the first column of Table 1.The other variances are those of the projections of the different test cases onto the 1st PC.
Using (2), test cases 2 to 6 were used as the second distribution and case 1 was taken as the reference distribution.The values of K-L divergence calculated for different cases are given in Table 4.The results from Table 4 throw an important picture.All those cases which had  >   and were wrongly detected seem to have been differentiated by their K-L divergence values.It can be seen clearly that cases 4, 5, and 6 do not fall in the same category as had been previously clubbed by the  statistic approach.The extremely large and small values of cases 4 and 6, respectively, segregate them into different category of events; however, the similar orders of values for case 3 and case 6 do not give a clear boundary.
The kernel-density estimated normal PDFs for cases 3, 4, and 6 and their divergence from that of case 1 are shown in Figures 9, 10, and 11, respectively.The more the K-LD, the more the gap between the densities.The L-L-L-G fault case has the least variance and hence it has the largest K-L divergence among all cases.Physically, this event creates such low voltages for a given short-circuit capacity of the feeder that the PV inverter itself trips thus avoiding islanding and this fact is brought out by its large divergence from the reference case PDF.However, after looking at the divergence values of case 3 and case 6, setting the correct  for an event to be identified as the anomalous islanding precursor seems to be the problem with this approach although the false alarm detection rate has reduced to 1/5 from 3/5 in the previous section.To tackle this issue of threshold selection, a machinelearning based approach to detect anomalous events correctly has been presented in the next section.(ii) return:

𝐾-NN Classifier Based Approach
For this study, this algorithm was selected because the anomalous instances liable to cause islanding in case 3 were all consecutively located in the data set.Two class labels were used for binary classification of the test data points.Class label 0 = all data points ∈ set of cases that cannot cause islanding while class label 1 = all data points ∈ set of cases that can cause islanding.Physically, this corresponds to all currents  > 0.1 kA as decided from the static fault study explained in [26].Three training data sets were used as a composite training data set.The average cross-validated classification error or 10-fold loss on the training data was 0.0070 indicating high training accuracy.The classifier performance was tested for each of the test cases using the majority vote with nearest point tie-break rule.For test set II, the classifier identified all data points to ∈ class 0. This pertains to a 100% accuracy in this case.The variance of data points in case of the 3- fault is the least among all cases.Fault also causes very low voltage which itself can trip a PV inverter and thus it is detected naturally and cannot be labeled as an anomalous precursor.This confirms the correctness of the classifier in assigning label 0 to this case.The classifier accuracy for test sets I and III was found to be 97.42% and 90.12%, respectively, after multiple runs.The confusion matrices for test sets I and II are shown in Tables 5 and 6, respectively.Case 6 comes very close to the case of actual islanding precursors discovered in case 3 and hence a large number of data points were assigned label 1.The average classifier accuracy can be reported as 95.75%.The classifier takes an average time of 294 ms in classifying a new test data point.As  was reduced till 1, the time taken remained the same but the accuracy improved even for testing on the third training data set 3. Clearly, in this approach also, the threephase short-circuit fault case is identified to be different from all other cases with 100% accuracy.
A comparison of the performances of the three methods discussed in this paper is presented in Table 7.

Conclusions
This paper has contributed an exploratory study towards understanding, discovering, and analyzing the possible reasons that can unintentionally island a modified IEEE 13 bus system with large PV penetration on a segment.The application of multivariate statistical techniques and a supervised learning technique to identify anomalous signatures liable to cause islanding from other transients has been detailed.The three-phase short-circuit fault is clearly identified as not an islanding precursor case by both the K-L divergence and -NN classification methods.The paper also shows that using PCA based process control strategy alone is not sufficient for predictive islanding detection.The classifier gives the best accuracy among all and it can be concluded that its implementation should detect the precursor and trip the PV inverter before the utility PCC relay.The classification time can be equal to the relay delay if not less than it.

Figure 1 :
Figure 1: Creation of a power island.
for both disturbance cases, respectively.The large peaks are not noteworthy since they are due to initial PV integration transients.The 30 ms window of each disturbance case contains data points sampled at 1 MHz.The voltage and current samples of phase C taken around the PCC collected in this window are used as one of the data sets in each of PCC region voltage (V) PCC region current (A) PV array DC power output (kW) Grid-side current in phase C of node 692 (A) Solar irradiance fixed at 1000 W/m offset: 0.266 s (graph origin at 0.266 s)

First 20000
Points from Case 1, All Data Points from Case 2, and 555 Anomalous Data Points from Case 3. The number of neighbors  was set = 5 and Euclidean distance measure was used.The classifier was trained and tested with three test data sets: Test set I: last 10,484 points of case 1. Test set II: data points ∈ case 4. Test set III: data points ∈ case 6.

Table 2 :
Cases simulated. 2 statistic and  statistic.Both have an upper control limit (UCL) defined and when both of them are crossed by the corresponding statistics of a data point or data set, this indicates an anomalous and abnormal behavior.

Table 3 :
Event detection results using  statistic.

Table 4 :
Event detection results using K-LD.

Table 5 :
Confusion Matrix I.

Table 7 :
Performance comparison of the three methods.