An Extensible Gradient-Based Optimization Method for Parameter Identification in Power Distribution Network

Accurate parameter identifcation of power distribution network (PDN) has attracted remarkable attention recently. However, power device parameters usually show an instability attributed to both the operating status and manual entry. Terefore, it is urgent to develop reliable algorithms for identifying PDN parameters with both high accuracy and high efciency. Most of the existing algorithms are gradient-free and based on the heuristic schemes, resulting in an unstable numerical calculation. Herein, based on our previous work about the adaptive gradient-based optimization (AGBO) method, we propose an extensive version, namely, AGBO-Pro model. In this method, both the numerical and categorical features of experimental observations are utilized and incorporated with each via a weighted average. By comparing the proposed method with several heuristic algorithms, it is found that the errors in RMSE, MAE


Introduction
Acquiring accurate and reliable device parameters is crucial in the context of power distribution networks (PDNs) due to their multifaceted implications [1].However, the lack of in situ measurement techniques poses challenges in directly obtaining certain PDN parameters, which are typically assumed to be static in real situations.Tese parameters include line resistance, line reactance, transformer resistance, transformer reactance, transformer conductance, and transformer electrical susceptance.Tis limitation often leads to poor estimation in parameter identifcation for PDN [2].To address these challenges, numerous approaches have been developed to enhance numerical efciency and reduce residuals in parameter estimation.Tese approaches include supervisory control and data acquisition, power management unit (PMU), and advanced metering infrastructure.Tey can be classifed into various methods, such as the fullscale approach [3], PSOSR [4], normalized Lagrange multiplier (NLM) test [5], fnite-time algorithm (FTA) [6], residual method, sensitivity analysis method, Lagrange multiplier method [7], Hefron-Phillips method [8], and specialized Newton-Raphson iteration [9].Additionally, recent advancements in machine learning and deep learning techniques have led to the proposal of smart methods, including artifcial neural network [10], graph convolution network (GCN) [11], support vector machine (SVM) [12], multihead attention network [13], deep reinforcement learning [14], estimation using synchrophasor data [15], PSCAD simulation [16], multimodal long short-term memory deep learning [17], and edge computing [18].While these methods show efectiveness with simulation data, they often require specialized measuring devices.
To overcome the challenges posed by the lack of required data and measuring devices, a mathematical approach called the power fow model can be employed.Te power fow model establishes relationships between PDN parameters and easily obtainable data, such as active power, reactive power, and voltage.Te challenging parameters mentioned earlier can be optimized using algorithms in combination with the static parameters, namely, active power, reactive power, and voltage, on the low-voltage side.By utilizing the power fow model, the voltage values on the high-voltage side can be calculated, and the residuals between the calculated and true values can be used to construct a loss function for optimization methods.Te optimization methods for parameter identifcation can be generally classifed into two categories, namely, the gradient-free methods [19][20][21][22][23][24][25] and gradient-based methods [26,27].First, in gradient-free methods, the heuristic or biomimetic optimization rules are designed such as particle swarm optimization (PSO), genetic algorithm (GA), ant colony algorithm, Aquila optimizer (AO) [28], nuclear reaction optimization (NRO) [29], and Pareto-like sequential sampling heuristic method (PSS) [30].Te performance of these heuristic algorithms is largely dependent on the initialization.On the other hand, the gradient-based method is usually constructed by combining the physical model of PDN and the neural network with the backward propagation of the loss function.Benefcial from the chain rule and automatic derivative of the loss function with respect to the parameters to be optimized, the gradient-based method is a deterministic approach.Terefore, many advanced gradient descent algorithms can be utilized to accelerate the optimization.In addition, researchers pay more attention to data preprocessing methods, including the utilization of clustering algorithms and hypothesis testing [31,32].
In previous work, we proposed an adaptive gradientbased optimization (AGBO) algorithm for parameter identifcation in PDN.In AGBO, a physical model based on the power fow calculation is incorporated into the neural network, and the input data are the numerical features obtained from experimental measurement.However, besides the numerical features, the experimental observations of PDN also contain many categorical features such as recording duration and peak-valley electricity.It should be noted that such categorical features are usually hard to be utilized directly in heuristic algorithms, but the neural network-based methods have a great advantage on feature embedding and extraction.Terefore, in this work, based on the analysis of the physical model in PDN, we propose an extensive gradient-based optimization method for parameter identifcation in PDN, and we call this new model AGBO-Pro.Te proposed model can utilize not only the numerical features as before but also the categorical information, which is rarely used in previous works.Based on the abovementioned points, this paper mainly has the following contributions: (1) An extensible gradient-based optimization method is proposed, which is constructed with customized neural network layer and loss function, and it achieves a higher and more robust performance in parameter identifcation problems of PDN.(2) In the physical-informed multihead neural network, we separate the experimental measurements into the numerical features and categorical features.After several manipulations, the categorical features are transformed to a weight distribution and incorporated into the numerical features via a linear transformation.Such a treatment is rarely studied or neglected by other investigations.(3) Te performance in three evaluation functions of this hybrid method during the numerical calculation is much better than that of several individual optimization algorithms.
Tis paper is organized as follows.Section 2 introduces the identifcation equations of power fow model in PDN and proposes the AGBO-Pro optimization method.Te experimental data and calculation details are given in Section 3. Te results and discussions are given in Section 4. Finally, Section 5 gives a brief conclusion.

Power Flow Model Calculation.
Te fundamental principles of PDN analysis can be found in references [20,[31][32][33].To streamline the computational process, the assumption of balanced three-phase condition is made as a prerequisite for power fow calculations in this work.Te schematic diagram of power fow calculation circuit model is shown in Figure 1.
In Figure 1, P d , Q d , and U d represent the active power, reactive power, and voltage on the high-voltage side of the transformers at bus D, respectively.Tese three parameters can be obtained directly by real-time measurements.Other parameters, such as transformer electrical R d , transformer resistance X d , transformer conductance G d , transformer electrical susceptance B d , line resistance R cd , and line reactance X cd , are in general hard to be detected in PDN calculation and satisfy the following equations: where ΔU T d and δU T d in equation ( 3) are the longitudinal and transverse components of the transformer impedance voltage drop at bus D. P Ld , Q Ld , and U Ld represent the active power, reaction power, and voltage on the low-voltage side of the transformers at bus D, respectively.ΔU T d and δU T d can be obtained by the following equations: International Transactions on Electrical Energy Systems Te equation of bus C can be expressed as equations ( 6)-( 8): where ΔU T cd and δU T cd in equation ( 6) are the longitudinal and transverse components of the transformer impedance voltage drop at bus C.

Teoretical Framework of AGBO-Pro for Parameter
Identifcation in PDN.Te schematic diagram is shown in Figure 2, where it is combined with gradient-based neural network (NN) and gradient-free optimization method.Te inputs of the framework are experimental measurements of PDN, which can be divided into several blocks (or felds) including numerical features and categorical features.Each block can be processed via a customized way and then connected with customized layers.Te loss function for gradient-based optimization is also fexible.

Physical-Informed Multihead Neural Network.
Te experimental measurements of PDN can be classifed into two categories, namely, the continuous (or numerical) features and discrete (or categorical) features such as measurement time, primary time type, and secondary time type.Te numerical and categorical features are processed with diferent treatments by introducing a multihead neural network, which is shown in Figure 3.It is shown that the input layer of NN is separated into two blocks; the frst includes numerical features, which are defned as ) representing the set of the active power, reaction power, and voltage on the high-voltage side of the transformers, and the subscript i represents the sample points for i � (1, 2, • • • , N). Te other includes categorical features X 2 � (T i , P i , S i • • •), whereT i and PV i represent the measurement and peak-valley period, respectively.It is noted that the recording duration has 24 categories, and the peak-valley electricity has 3 categories (i.e., high, medium, and low).
After the input layer of NN, the numerical features X 1 are fed into the PDN model, while the categorical features are frst encoded (i.e., one-hot encoding) and then imported into an embedding layer to transform the sparse feature matrix into a dense matrix.After the embedding layer, we utilize the maxmin normalization to scale the categorical features into , which can be seen as a probability distribution.Ten, the numerical and categorical features are merged with a linear combination as follows: where η represents the noise term, which is subject to the normal distribution, i.e., η ∼ N(0, 1).With the help of maximum likelihood estimation, the loss function of this work is defned as where y i and y ⌢ i represent the theoretical calculation value of PDN and true value by experimental measurement, respectively.In addition, to avoid gradient vanishing during the backward propagation of NN, a nonlinear transformation, namely, sigmoid activation function rather than linear rectifcation function (ReLU), is utilized: (11) Terefore, the loss function is further modifed as where U ⌢ c,i represents the true value of voltage on the highvoltage side.
Te above loss function is also known as the Euclidean distance measuring the diference between theoretical calculation and experimental observation.In this work, we also utilize a Pearson correlation loss function, which is defned as In the previous work, we have derived the gradients of the loss function with respect to International Transactions on Electrical Energy Systems calculated with forward calculation; then, the back propagation of the gradient of the loss function can be applied to update the connection weight in NN.

Gradient-Based Optimization
Algorithm.Once we have the above gradients of the loss function with respect to the parameters, then the gradient-based optimization can be implemented.Te pseudocode of optimization method of this work is shown in Algorithm 1.

Evaluation Functions of the Parameter Identifcation
Algorithm.Te underlying three functions are employed to estimate the performance of the proposed algorithm: (1) Mean absolute error (MAE): (2) Root mean square error (RMSE): (3) Mean absolute percentage error (MAPE): where y i and y ⌢ i represent the ground true value and prediction value, respectively.

Dataset and Calculation Details
3.1.Data Collection and Description.In this work, a dataset including 1499 samples is collected via SCADA [33,34] for the training of the proposed model.Te voltage profles on the high-voltage (U a , U b , and U c ) and low-voltage (u a , u b , and u c ) sides are presented in Figures 4 and 5, respectively.
From Figures 4 and 5, it is found that the high-voltage sides in the dataset are similar to the three-phase balance satisfying the equations in Section 2.1.In addition, the active power (P a , P b , and P c ) and reactive power (Q a , Q b , and Q c ) profles on the low-voltage side are given in Figures 6 and 7, respectively.
It is found from Figures 6 and 7 that the variations of active power and reaction power show a similar trend, indicating that the data collection is stable enough for parameter identifcation of PDN.
It is noted that all samples collected have four categorical features, namely, measurement time, date type, primary time type, and secondary time type.Te measurement time 4 International Transactions on Electrical Energy Systems represents the time information when the sample is measured, which ranges from 0 to 24 hours.Te date type represents whether the measurement time is on a workday and holiday.Te primary time type and secondary time type have two diferent defnitions for daytime.Te primary time type has three levels: peak mean hours refer to 09:00 to 12:00 and 18:00 to 21:00 daily, plateau mean hours refer to 13:00 to 17:00 and 22:00 to 23:00 daily, and valley represents 00:00 to 08:00.Te secondary time type has two levels: peak means 08:00 to 21:00, whereas valley refers to 22:00 to 07:00.Te distribution of these four categorical features is shown in Figure 8.

Evaluation and Calculation.
In this paper, 75% samples (1124) are split randomly as train set to identify PDN's parameters.Te best parameters are used to calculate voltage per unit in C bus (denoted as U cal ) by the power fow model.After that, the rest of 25% samples (375) are used to evaluate the performance of parameter identifcation as test set through the three metrics as shown in equations ( 14)-( 16).Instead of directly calculating these metrics, linear regression should be applied in this paper, and the values of U c and U cal are regarded as dependent variable and independent variable, respectively.Te output values of linear regression are denoted by U * cal , and the fnal evaluations of parameter identifcation are gained between U c and U * cal : where a and b are denoted as slope and bias of linear regression.In the following discussion, the parameters of linear regression optimized by SMBO methods are signed as RS-LR, TPE-LR, and SA-LR, respectively.Te upper bounds and lower bounds of the identifed parameters should be determined frstly, and they are listed in Table 1.
To mitigate the impact of randomness associated with AGBO and SMBO-based methods on the results in this study, the dataset was randomly partitioned 25 times to ensure accuracy and stability in the results.

Results and Discussion
Before discussing the results, some hyperparameter settings of each method are described as follows.Te prior weight and number of started jobs are set as 1 and 20 for TPE, and the rate of reduction in SA is 0.1 as default value.Te learning rate is 5e − 4 in AGBO-based methods.Te maximum of iteration step is 1000 for all the methods in this study.Te parameter identifcation results of AGBO and SMBO-based methods with mean square error between U c and U cal are shown in Table 2.
It can be found in Table 2 that AGBO-Pro has the best performance with signifcantly low values of MAE, RMSE, and MAPE compared with other metaheuristic algorithms such as AO, NRO, and PSS.AGBO also has better results than SMBO-based methods, but the prediction results do not have remarkable diferences since the statistical properties between U c and U cal are neglected.6 International Transactions on Electrical Energy Systems Other recent studies also have proposed the prediction results with the same metrics and the same dataset in this paper, such as the methods of MCMC and SMBO combined with clustering and hypothesis testing (denoted as MCMC C and SMBO C ). Li et al. [32] published the best results of MAE values of MCMC C and SMBO C being 62.467 ± 0.366 and 61.868 ± 0.322, respectively.In another paper [31], the values of MAE computed by MCMC C and SMBO C are 62.136 ± 0.336 and 61.268 ± 0.311, respectively.
Based on the previous study [26], the line transformation should be implemented to U cal before calculating loss function.Te parameter identifcation results with linear transformation are listed in Table 3.
All methods perform better in Table 3 than the results in Table 2, which indicates that the linear transformation between U c and U cal has an important contribution to identify PDN's parameters.Moreover, the results between AGBO-Pro and AGBO mean that the supplementary categorical information such as measurement time, date type, primary time type, and secondary time type plays an important role in PDN's parameter identifcation and the key categorical information can be merged by AGBO-Pro proposed in this work.
Leaning rate, the size of the embedding layer dimension, and the number of hidden layers are three critical hyperparameters of AGBO-Pro; therefore, the PDN's parameter identifcation performance under diferent hyperparameters has been investigated in this section.Te performances of various learning rates are displayed in Table 4.
It can be found that the learning rate has a remarkable infuence on AGBO-Pro; when the learning rate is set to 5e − 3, the identifcation performance is optimal, and the    Since the dimension of categorical features is small, the embedding dimension of the neural network is less than 128 in this work.According to the results in Table 5, the change of the embedding dimension has only a minor impact on the identifcation performance, and the optimal size of the embedding dimension is chosen as 64, 32, 32, and 32 for the four categorical features, respectively.AGBO-Pro include the hidden layer to leaning the information of categorical features after embedding, and the infuence of the number of the hidden layers are shown in Table 6.
Having more hidden layers in the network implies a larger number of parameters, slower computation speed, and a higher risk of overftting.Combining the results from Table 6, it can be found that a single hidden layer achieves        International Transactions on Electrical Energy Systems better identifcation performance.Te convergence plots of AGBO and SMBO-based methods are displayed in Figure 9.
Te AGBO-based methods converge after 200 iterations; compared with the SMBO-based methods, the convergence plots of AGBO-based methods are much smoother and stable, since the searching direction for parameter update is deterministic to the gradient-based optimization method, such as AGBO and AGBO-Pro.After 25 repeated splitting datasets, the distribution plots of the identifed PDN's parameters from AGBO-Pro-LR are shown in Figure 10.It can be found that all the identifed parameters are roughly distributed within a relatively fxed range, providing a data foundation for the subsequent parameter analysis in future research.International Transactions on Electrical Energy Systems

Conclusion
In this work, we propose an extensible gradient-based optimization method for parameter identifcation in PDN calculation and analysis.A physical-informed multihead neural network is adopted to treat the numerical features and categorical features separately.Te two kinds of features are merged via a weighted average.After several forwardbackward calculations, the similarity loss function with respect to the six parameters to be identifed achieves a fast convergence.
We compare the proposed method (namely, AGBO-Pro model) with the original AGBO model and several heuristic algorithms such as RS, TPE, SA, AO, NRO, and PSS.Te numerical calculations show that the errors by AGBO-Pro are the lowest in all three evaluation functions, i.e., MAE, RMSE, and MAPE, with a faster and more stable convergence of the loss function.By further taking a linear transformation of the loss function, the method of this work has a lower variance in 25 repeat experiments, showing a much more robust performance in parameter identifcation.
In addition, the variations in hyperparameters of optimization method such as the number of hidden layers and embedding layers, learning rate, and weight decay are also systematically investigated.It is found that the method proposed in this work achieves more stable and robust performance to identify PDN parameters.Tis work shows an efective exploration in incorporating the numerical and categorical features of experimental measurement into gradient-based optimization method.

Figure 2 :
Figure 2: Schematic diagram of extensible gradient-based optimization for parameter identifcation.

Output
U a (or U b , U c ) and secondary time type, etc.

Figure 3 :
Figure 3: Schematic diagram of the proposed physical-informed multihead residual neural network.

Figure 4 :
Figure 4: U a , U b , and U c on the high-voltage side.

Figure 5 :
Figure 5: u a , u b , and u c on the low-voltage side.

Figure 7 :Figure 6 :
Figure 7: Q a , Q b , and Q c on the low-voltage side.

Figure 8 :
Figure 8: Te distribution plot of measurement time, date type, primary time type, and secondary time type among samples.

Figure 10 :
Figure 10: Te distribution plot of R d (a), X d (b), G d (c), B d (d), R cd (e), and X cd (f ).
and X cd .According to the PDN model, the loss function can be

Table 1 :
Te upper and lower bounds of the identifed parameters.

Table 2 :
Te results of parameter identifcation with the loss function of mean square error.AGBO-Pro uses mean square loss to ensure fairness in comparison.Te bold values indicate that the AGBO-Pro method gains the lowest values in all three evaluation functions, viz.MAE, RMSE and MAPE, indicting its best performance. *

Table 3 :
Te results of parameter identifcation with linear transformation.Te results of AGBO-Pro with Pearson correlation coefcient loss.After the linear transformation labeled as "AGBO-Pro-LR," the optimization method proposed in this work still has the best performance.

Table 4 :
Te performance of AGBO-Pro under diferent learning rates.
Te bold values indicate that with the learning rate 5e − 3, the model has the lowest values in three evaluation functions.

Table 5 :
Te performance of AGBO-Pro under diferent sizes of the embedding layer dimension.

Table 6 :
Te performance of AGBO-Pro under diferent number of the hidden layers.
Te bold values indicate that with one hidden layer, the model gains the best performance.