Fault Location Method Based on SVM and Similarity Model Matching

To locate the fault location accurately and solve the problem quickly is the key to improve the power supply capacity of power grid. )is paper presents a fault location method based on SVM fault branch selection algorithm and similarity matching. Firstly, an SVM-based fault branch filter classifier was constructed based on the positive sequence component feature matrix data of each monitoring point, which can accurately select the branch where the current fault is located. )en, based on the positive sequence voltage distribution characteristics, the Euclidean distance and Pearson correlation coefficient (PCC) are used to establish the similarity objective function of fault location. And then, the fault is accurately located by the objective function. Finally, the proposed method is validated by using an IEEE-14 node network. )e results show that the proposed method is effective and accurate.


Introduction
Fast and accurate location of distribution network faults can effectively reduce the time of troubleshooting and blackout, reduce economic losses, and improve power supply reliability [1][2][3][4][5]. However, the distribution network has many branches with long and short lines and few measuring points, and it is difficult for protection to locate fault points accurately. erefore, the method of locating fault areas and cooperating with manual patrol are mostly used in practice. With the improvement of automation and intellectualization of the distribution network in Jiangsu power grid, a large number of intelligent monitoring equipment for the distribution network have been put into use, such as FTU (feeder terminal unit) and PMU. ese devices can measure the voltage and current of the distribution network in real time and provide data basis for accurate fault location [6,7]. erefore, the research in this paper is based on the fact that each node is equipped with monitoring instruments.
At present, scholars have done a lot of work on the accurate fault location of the distribution network. e main fault location methods are mainly divided into the traveling wave method and nontraveling wave method (including impedance method, node matrix method, and fitting optimization method). e traveling wave method [8][9][10] determined the fault distance by measuring the propagation time of voltage and current traveling wave to the fault point. However, due to the uneven distribution of line impedance and too many branches in the distribution network, the location accuracy of the traveling wave positioning method was greatly affected. Additionally, the monitoring devices required for traveling wave positioning are expensive, and the practical application of engineering is difficult. e impedance method [11] calculated the impedance of the fault branch by measuring the voltage and current at the fault point, and then calculated the fault distance. Like the traveling wave method, the multibranch of the distribution network affected the accuracy of the method. is method is widely used in the high voltage transmission network. Reference [12] constructed a fault distance function based on the node impedance matrix by using the information of bus voltage drop during fault, and used the matching degree between the calculated value of voltage drop and the measured value to locate the fault, but the robustness to load change was poor and susceptible to transition resistance. In reference [13], the least square method was used to fit the fault distance distribution function, but the error between the fault distance calculated by the fitting distribution function and the actual fault distance was large. e accuracy of the algorithm needs to be improved. In reference [14], the distribution function reflecting the change of fault distance was established by the node impedance matrix after fault, and the fault distance was determined by solving the corresponding location equation, but the location results are easily affected by the false fault point and measurement error. Reference [15] has been improved on the basis of reference [14]. Considering the influence of measurement error, the probability distribution curve of possible fault section was obtained by Monte Carlo simulation, and the fault location was determined according to the peak value of probability distribution. However, the precondition of this method is to identify the fault type and phase accurately, and the simulation calculation is too large to be applied in largescale power grid.
In summary, most of the existing location methods either fail to overcome the influence of transition resistance on the location results or need to know the fault branch and fault type to locate, or the location results are greatly affected by measurement errors and fake fault points or need to traverse all locations of the network to search fault points, which results in a huge amount of calculation when the system scale is large.
To solve these problems, a fault location method based on the SVM fault branch selection and similarity model matching was proposed in this paper. e method utilized the distribution characteristics and their interrelationships of positive sequence voltage variations at each monitoring point to construct fault feature modes that were not affected by fault types and transition resistance. A fault branch selection database was established based on the simulation data and use this data to train SVMbased fault branch selecting classifier. And this SVM classifier is used to determine the branch where the current fault is located. en, the fault location models of each branch were established with the fault distance parameter λ as the only variable. e concept of similarity index was defined based on the Euclidean distance and Pearson correlation coefficient (PCC) to measure the similarity between the current fault and the fault location model. When the similarity index obtained the optimal solution, the accurate fault location results λ can be obtained. e validity and accuracy of this method were verified by an example.

Fault Characteristics Based on Positive
Sequence Voltage Variation e location, type, and transition resistance of faults are the three variables that determine the voltage of each monitoring point. e schematic diagram of failure in the system is shown in Figure 1.
In this paper, the symmetrical component method is used to decompose the positive sequence component, and the positive sequence component is used to analyze the fault electrical characteristics. In three phases, A phase is 120 degrees ahead of B phase, B phase is 120 degrees ahead of C phase, C phase is 120 degrees ahead of A phase, and the components with the same amplitude of three phases are called positive sequence components. e positive sequence driving point impedance Z (1) MF between monitoring point M and fault point F can be calculated from the impedance matrix of positive sequence nodes, as shown in the following formula: In formula (1), Z (1) MA is the positive sequence driving point impedance between monitoring point M and node A. Z (1) MB is the positive sequence driving point impedance between monitoring point M and node B. Superscript (1) indicates positive sequence.
Assuming that the positive sequence voltage before system failure of monitoring point M is _ V std(1) M , the positive sequence component of short-circuit current at fault point F is _ I (1) F . According to the superposition theorem, the positive sequence voltage after system failure of monitoring point M can be expressed as In formula (2), _ V Positive sequence voltage variation at monitoring point is defined as It can be seen from the formula above that the change of positive sequence voltage at the monitoring point is only related to the positive sequence impedance between the monitoring point and the fault point and the positive sequence fault current. e positive sequence impedance between the monitoring point and the fault point represents the relative position information between them. For any fault position in the branch, the positive sequence impedance between the monitoring point and the fault position corresponds to it one to one and is not affected by the fault type and transition resistance.
It is assumed that there are two monitoring points in the system and two faults take place at point F successively, as shown in Figure 2. e fault type and transition resistance of fault 1 and fault 2 are different.
According to formula (3), it can be obtained that In formula (3)

Fault Location Method Based on SVM Fault
Branch Selection and Similarity Matching  (1) and (3). e sequence of positive sequence voltage variations of monitoring points is as Z (1) l and Z (1) r represent the positive sequence impedance from each monitoring point to the head node and the end node of the branch, respectively. e calculation formula of positive sequence short-circuit current is standard positive sequence voltage before fault divided by positive sequence driving point impedance at fault position: In formula (6), Z (1) FF , the positive sequence driving point impedance at fault position F can be expressed as follows: where Z (1) ll and Z (1) rr are the sequence driving point impedances at nodes l and r, respectively, Z (1) lr is the sequence driving point impedances between nodes l and r, and Z (1) L is the sequence line impedances between nodes l and r.
From the data information uploaded from the monitoring point and the network topology parameters, it can be seen that the fault location λ is the only variable in formula (5)-(7). In order to avoid the influence of data amplitude, the sequence of positive sequence voltage variation is standardized in this paper. e standardized processing formula is Among them, ΔV (1)N is the standardized result, E (ΔV (1) ) is the expectation of ΔV (1) array, and D (ΔV (1) ) is the variance of ΔV (1) array.
After standardizing the positive sequence voltage variation sequence of the monitoring points, the fault location model (LM) of the branch with the fault distance λ as the only variable can be obtained: In the above formula, b represents the number of branch, λ represents fault distance, and LM b,λ represents positive sequence voltage variation at n monitoring point when a fault occurs at λ of standardized branch b.

SVM Fault Branch Selection Method.
According to the characteristics of positive sequence voltage distribution, the proportional relationship between positive sequence voltage variations at each monitoring point is fixed for faults occurring at the same location. erefore, a unique LM b,λ corresponds to faults occurring at different locations of the branch. e closer the fault location is, the more similar the value of LM b,λ is. For faults occurring on the same branch, the corresponding LM b,λ has the highest similarity. erefore, this paper introduces the SVM-based fault branch selection method for the primary selection of the branch where the fault is located, in order to reduce the amount of calculation needed for fault location. Support vector machine (SVM) makes the linear nonseparable samples in the input space project to the highdimensional space through nonlinear mapping and becomes the linear separable samples by introducing inner product kernel function [16].
Define the category tag y i of the sample x i . In the case of two classifications, if x i ∈ ω 1 , then y i � 1; if x i ∈ ω 2 , then e distance from the sampling point to the classification interface is e optimal classification interface should satisfy the idea of maximum separation, i.e., ‖ω‖ minimization: By using the extreme value method of inequality constraints, we can get e partial derivatives of ω and b are obtained, respectively, and the result is 0. e dual principle is used to transform the problem into and under the condition of solving the maximum value of the following functions for α i : If α * i is the optimal solution, e samples that with α * i ≠ 0 are support vector. In conclusion, the optimal classification function is where b * is the classification threshold. e inner product kernel function K(x i , x) is used to replace the inner product (x i · x) of the training samples and the samples to be classified. At this point, the objective function becomes e kernel classification function is expressed as For the linear nonseparable case, SVM introduces the relaxation variable ζ i and penalty factor C, and the objective function becomes e kernel function used in this paper is sigmoid function: Based on this, this paper regards the failure of each branch as the same type. At 25%, 50%, and 75% of each branch, the virtual fault points are set up, respectively. e fault simulation is carried out at the virtual fault points. e fault data are processed according to formula (5)∼ (9). A sufficient number of fault samples LM b,λ are generated as training samples of the SVM-based fault branch selection algorithm. e trained SVM classifier is used to judge the branch of the network short-circuit fault, so as to narrow the fault location range and lay the foundation for accurate fault location.

Fault Location Method Based on Similarity Matching.
After determining the branch where the fault is located, this paper uses the similarity matching method to solve the fault location parameter λ. Using the Euclidean distance and Pearson correlation coefficient to establish the target optimization model, the enumeration method is used to obtain the value of the fault location parameter λ that makes the objective function optimal, so as to determine the fault occurrence position.

Euclidean Distance.
e Euclidean distance transform is useful for a variety of applications including image processing, computer vision, pattern recognition, shape analysis, and computational geometry [17][18][19]. ere are two vectors X � [x 1 , x 2 , x 3 ,· · ·,x n ] and Y � [y 1 , y 2 , y 3 ,· · ·,y n ]. eir Euclidean distance is where x i is the i-th component of X, y i is the i-th component of Y, and n is the number of elements. In this paper, n denotes the number of monitoring points in the network. For the two faults at the same position, the Euclidean distance between their LM b,λ is extremely small or even zero, and the similarity between them is high. Conversely, when the two faults are far apart in the network. en, the Euclidean distance is very large, and the similarity is small.

Pearson Correlation Coefficient (PCC).
PCC is a parameter used to measure the linear relationship between distance variables [20,21]. e larger the absolute value of PCC, the stronger the correlation: the closer the PCC is to 1 or −1, the stronger the correlation. e closer the PCC is to 0, the weaker the correlation. e PCC is positive for positive correlation and negative for negative correlation. e PCC between X and Y is defined as where N is the dimension of X, and X and Y have the same meanings as in formula (22).
Combined with the above Euclidean distance and PCC algorithm, the fault distance objective function established in this paper is shown in the following formula: where LM F is the positioning information corresponding to the current fault, which is calculated by normalization of formula (5). S LM F ,LM b,λ is the similarity index between the current fault information and the faulty branch b positioning model LM b,λ . ρ LM F ,LM b,λ is the PCC between LM F and LM b,λ , and D LM F ,LM b,λ is the Euclidean distance between LM F and LM b,λ . S LM F ,LM b,λ has a value range of [−1, 1]. e higher the linear positive correlation between the two groups, the closer the S LM F ,LM b,λ is to 1; the higher the linear negative correlation between the two groups, the closer the S LM F ,LM b,λ is to −1. When they are identical, S LM F ,LM b,λ � 1; when the similarity between the two sets of information is lower, S LM F ,LM b,λ becomes closer to zero.

Fault Location Method Execution
Steps. In the fault location calculation process, the short-circuit fault is simulated at 25%, 50%, and 75% of each branch in the network, and a fault-based branch selection database is established. Use this database to train the SVM fault branch selection model. en, based on the current fault information, LM F , the SVM fault branch selection model is used to determine the branch where the fault is located. Based on the results of the faulty branch, the maximum value of the indicator S LM F ,LM b,λ is the final goal. Look for the LM b,λ that best matches LM F . us, the optimal solution of the fault distance parameter λ is obtained. Determine the location of the fault at the branch. is paper uses the enumeration method to calculate the value of S LM F ,LM b,λ , the constraint is λ ∈ [0, 1], and the objective function is as shown in formula (11). e execution flow of the fault location algorithm is shown in Figure 3.

Case Analysis
e case analysis is based on the IEEE-14 node typical distribution network as a simulation model. e model topology and the parameters of each branch are shown in Figure 4 and annex. e analysis of the cases in this chapter consists of three parts. Firstly, the effectiveness of the SVM fault branch selection algorithm is verified in Section 4.1. And then, in Section 4.2, the short-circuit fault is simulated on a branch to verify the effectiveness of the proposed algorithm. Finally, the fault simulation experiment is carried out on each branch, and the performance of the fault location algorithm proposed in this paper is verified in Section 4.3.

Case 1: SVM-Based Fault Branch Selection Algorithm.
Fault tests were carried out at 25%, 50%, and 75% of each branch. e fault types included single-phase short-circuit fault, two-phase short-circuit fault, and three-phase Mathematical Problems in Engineering short-circuit fault. Table 1 shows the data of each monitoring point when single-phase ground fault occurs at 50% of all 14 branches of the network. e gray area in Table 1 is the monitoring point where the positive sequence voltage value of the node is not affected. e fault sample data of the same branch are classified into one class.
In order to know how many training samples should be sufficient to meet the expected accuracy, the relationship between training set size and test set accuracy was tested before the fault location experiment. e experimental results are shown in Table 2. According to Table 2, when the number of training samples reaches 540 or above, the accuracy of the test set can reach 100%. So, in the preparation of fault location experiment, we simulated each type of fault three times at 25%, 50%, and 75% of each branch and generated 540 training samples in total.
en, in each branch, three fault points (60 in total, random position, and random fault type) are randomly selected as the test data of the SVM fault branch selection. e process of generating validation samples is as follows: (a) A random number from 0 to 1 is generated as the fault distance λ to determine the fault location. (b) e fault type is selected randomly, in single-phase fault, two-phase fault, and three-phase fault. (c) On branch 1, the failure can be simulated according to the result of random selection, and 1 verification sample is obtained. (d) Repeat the random fault simulation three times on branch 1, and then continue the random fault simulation on the next branch. ere are 20 branches in total, so 60 verification samples can be obtained.
e correct rate of branch selection test results is shown in Table 3. It can be seen from Table 3 that the SVM-based fault branch selection method can accurately determine the fault branch and lay a foundation for subsequent fault location.

Case 2：Fault Location Method Based on SVM Fault Branch Selection and Similarity Model Matching.
Assuming that single-phase ground fault occurs in branch 10, λ � 0.8 from   Table 4.
Establishing the fault distance location model of the No. 10 branch, LM 10,λ , calculate the similarity index S LM F1 ,LM 10,λ between LM F1 and LM 10,λ according to formula (11). Using the enumeration method to optimize the variable λ, with λ∈[0, 1] as the constraint.
Step size is 0.001. max(S LM F1 ,LM 10,λ )  is the objective function, seeking the optimal value of the fault distance parameter λ. In order to more intuitively observe the change of S LM F1 ,LM 10,λ with λ. As shown in Figure 5, the similarity changes with the fault distance parameter λ. It can be seen from the figure that the S LM F1 ,LM 10,λ obtains the maximum value 0.981583 when the fault distance λ � 0.788. e location result is consistent with the position of F 1 , which verifies the effectiveness of the proposed algorithm.

Case 3: Performance Verification of Fault Locating
Algorithm. In each branch, 10 fault points (200 in total, including single-phase short-circuit fault, two-phase shortcircuit fault, and three-phase short-circuit fault) are randomly selected for fault location this verification. Establish fault location models for 20 branches. Use the location error rate to describe the performance of the algorithm. As shown in formula (25). In formula (25), the line length is 1. e smaller the error rate, the more accurate the location method is. e results of the average location error rate of each branch are shown in Table 5: e � actual fault distance λ − simulated fault distance λ F line length × 100%.

Conclusion
e fault location algorithm proposed in this paper only deals with the positive sequence voltage variation of each monitoring point when the fault occurs. e location result is not affected by the fault source type and transition resistance. Firstly, the branch of the fault is determined by the SVM fault branch selection method, which avoids the traversal of the whole network branch, reduces the location range, and reduces the calculation amount. en, the fault location parameter λ is solved by using the fault location similarity model combining Euclidean distance and PCC.
is paper builds an IEEE-14 node network to verify the effectiveness of the algorithm. e results of the case show that the method can locate various types of faults, and the average location error of each branch can be well controlled within 4%. e algorithm lays a solid theoretical foundation for the rapid processing of grid faults.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.