An Improved Hybrid Feature Selection Algorithm for Electric Charge Recovery Risk

In order to extract more information that affects customer arrears behavior, the feature extraction method is used to extend the low-dimensional features to the high-dimensional features for the warning problem of user arrears risk model of electric charge recovery (ECR). However, there are many irrelevant or redundant features in data, which affect prediction accuracy. In order to reduce the dimension of the feature and improve the prediction result, an improved hybrid feature selection algorithm is proposed, integrating nonlinear inertia weight binary particle swarm optimization with shrinking encircling and exploration mechanism (NBPSOSEE) with sequential backward selection (SBS), namely, NBPSOSEE-SBS, for selecting the optimal feature subset. NBPSOSEE-SBS can not only effectively reduce the redundant or irrelevant features from the feature subset selected by NBPSOSEE but also improve the accuracy of classification.(e experimental results show that the proposed NBPSOSEE-SBS can effectively reduce a large number of redundant features and stably improve the prediction results in the case of low execution time, compared with one state-of-the-art optimization algorithm, and seven well-known wrapper-based feature selection approaches for the risk prediction of ECR for power customers.


Introduction
With the rapid development of global energy market, smart grid [1] in power industry has been built continuously, and the scale of information data accumulated by power system is becoming larger and larger. As the main income of power companies, electric tariff plays a decisive role in the development of power enterprises. However, in the whole process of power marketing, the risk of arrears' users has always existed, which hinders the development of power enterprises. Electric charge recovery (ECR) has always been a difficult problem that power supply enterprises need to solve urgently. It is also the most important management part of power meter reading, verification, and checking. Once the processing of ECR is delayed, it may have a bad impact on the charging result. It also causes power customers to occupy the funds of power enterprises, which is not conducive to the fund management of power companies. e main operating profit of power grid enterprises comes from ECR. rough the analysis of electric power clients' paying behavior and customer's characteristics, the risk factors and risk levels of ECR are predicted and evaluated, which contribute to collecting electric charge, formulating effectively preventive measures in time, reducing management risks and safeguarding economic benefits of power enterprises. erefore, accurate and reliable forecasting of arrear risks is an important reference in terms of determining the management of ECR.
In the past, some people adopted these methods of feature extraction [2][3][4], information entropy theory [5], combined support vector machine (SVM) with VIKOR method [6], artificial immune algorithm [7], and combined trust evaluation cloud with decision algorithm [8] and established a power arrears risk model, but these results predicted by these methods were not very good. In recent years, some scholars have used quantitative analysis [9], classification [10], clustering [11], ensemble [12], and improved algorithms [13,14] to analyse arrears behavior of electricity users and improved prediction results. However, with the increase in huge quantities of data and the difficulty in capital operating, power companies urgently need to adopt faster and more accurate data processing methods to predict the arrears risk of power users in the future.
In order to extract more information that affects customer arrears behavior, low-dimensional features are extended to high-dimensional features via the feature extraction method. However, many irrelevant and redundant features will reduce the accuracy of classification and raise the complexity of dimensions. erefore, feature selection is a very effective solution [15]. Feature selection and classification methods are widely used in high-dimensional and multiclass data sets [16,17], which can improve the accuracy of model prediction by removing irrelevant and redundant features. Generally, the feature selection process includes the following stages: selecting feature subsets, evaluating feature subsets, and verifying results. e purpose of this process is to remove unrelated or redundant features and generate a smaller optimal feature subset. Generally, feature selection methods are divided into filter, wrapper, and embedded [18]. For the filter algorithms, the inherent distribution characteristics of the data are used as the basis for feature selection. e process of selecting features by filter approaches is not related to the learner. e filter approaches can be further divided into single variable algorithms and multivariate algorithms [19]. e single variable algorithms are able to evaluate each feature individually, reducing the accuracy of classification. However, multivariate algorithms have the advantage of evaluating the correlation between features. e filter algorithms are independent of the classifier and have fast computation speed. However, there is no interaction between classification algorithm and features in the process of feature selection by filter approaches. e wrapper approaches rely on classification algorithms to evaluate the selected feature subsets, which can achieve a higher classification accuracy than filter methods [20]. However, wrapper algorithms have high computational complexity in feature selection of highdimensional data sets. In embedded approaches, feature selection is directly integrated into the training process of learners [21], but embedded methods are more complex in concept because it is not easy to obtain better classification results by the improved classification model. In contrast, wrapper approaches can make use of the performance of machine learning algorithm as evaluation criteria for selecting features, making it more flexible and more effective analysis of high-dimensional data. In recent years, wrapper approaches have aroused much attention by solving feature selection problems and seeking global optimal solutions through heuristic algorithms.
In recent years, many researchers have successfully applied wrapper approaches to feature selection. Li and Yang [22] integrated OS-extreme learning machine (EOS-ELM) and binary Jaya-based feature selection to real-time transient stability assessment using the phasor measurement unit (PMU) data. Yang et al. [23] propose a novel binary Jaya optimization algorithm, which is integrated with the lambda iteration method to transform the dual objectives of economy and emission commitment into a single objective problem. Aslan et al. [24] proposed the JayaX binary optimization algorithm replacing Jaya's solution update rules with XOR operator and compared the results with the latest algorithm, which can produce better quality results in binary optimization problems. e whale optimization algorithm (WOA) [25,26] was proposed, using the wrapperbased method to reach the optimal subset of features and effectively improve the accuracy of classification. Houssein et al. [27] proposed an S-shaped binary whale optimization algorithm for feature selection. Tawhid and Ibrahim [28] used a feature selection based on binary whale optimization algorithm (BWOA) to solve the problem of feature selection. Rao et al. [29] used the artificial bee colony approach (ABC) based on the boosting decision tree model to improve the quality of the selected features. e method can accelerate the convergence speed and balance exploration and exploitation efficiently. Furthermore, binary artificial bee colony algorithm (BABC) [30,31] is used to select feature subset. S. Oreski and G. Oreski [32] used the genetic algorithm-based heuristic for feature selection in credit risk assessment. In the paper, the genetic algorithm (GA) applies the neural network model to select optimal subset of features and increases the results of classification in credit risk assessment. Chen et al. [33] introduced a heuristic feature selection approach into text categorization by using chaos optimization and GA. Shukla et al. [34] proposed a new hybrid feature subset selection framework based on binary genetic algorithm and information theory. e method accelerates the search of important feature subset. Emary et al. [35] used binary grey wolf optimization (BGWO) approaches to select feature subset. Al-Tashi et al. [36] introduced the feature selection method based on GWO for coronary artery disease classification. Several studies used feature selection methods based on particle swarm optimization (PSO) algorithm to search the feature space for optimal solutions [37][38][39]. PSO performs equally well or better than WOA, ABC, GA, and GWO in solving global optimization problems [40]. erefore, PSO is a promising approach to many tasks, including feature selection.
Kennedy and Eberhart proposed a particle swarm optimization (PSO) algorithm in 1995 [41]. PSO has been widely applied to many fields due to its superior performance, such as neural network training [42], classifier design [43], clustering analysis [44], and network community [45]. Since many problems are discrete in practice, Kennedy and Eberhart proposed a binary particle swarm optimization (BPSO) algorithm in 1997 [46], which uses binary encoding form to solve the discrete optimal combination problem. e BPSO has also been widely used in many fields, such as knapsack problems [46][47][48], power systems [49,50], data mining [51][52][53][54][55], and image processing [56]. Because the search range of particles cannot be dynamically adjusted, BPSO can easily fall into the local optimum and premature convergence with the decline of population diversity. In view of the shortcomings of BPSO, many scholars have proposed various improved methods. First, Chuang and other scholars used chaotic binary particle swarm optimization (CBPSO) algorithm [57][58][59] to chaotic mapping the inertia weight improving the performance of BPSO. However, the behavior of chaotic mapping does not exist at fixed points, periodic or quasiperiodic orbits. Liu et al. [60] proposed BPSO with linear adaptive inertia weight, which improved the search performance of particles, but it often ignored the nearby optimal solution in the process of linear transformation of inertia weight. Wu et al. [61] proposed a feature selection algorithm based on hybrid improved binary quantum particle swarm optimization (HI-BQPSO) for feature selection.
e proposed HI-BQPSO method effectively and efficiently improves the classification accuracy and introduces the strategies of crossover and mutation, compared with other feature selection approaches on nine gene expression datasets and 36 UCI datasets. e sequential backward selection (SBS) [62] is a heuristic search algorithm, that is, simple to implement, but the amount of computation is greatly affected by the initial feature set. e traditional BWOA, BABC, BGA, BGWO, BPSO, and CBPSO algorithms have simple structure and few parameters. ese algorithms are proved to be effective by using a binary mechanism to select feature subsets, but they are difficult to effectively jump out of the local optimum. For the state-of-theart metaheuristic optimization algorithm, the HI-BQPSO algorithm enhances the diversity in the search process and can effectively search the optimal feature subset, but HI-BQPSO lacks stability in balance exploration and exploitation for searching global optimal solution. In order to effectively balance exploration and exploitation in the search process, a hybrid nonlinear inertia weight binary particle swarm optimization with shrinking encircling and exploration mechanism is proposed, and then, the SBS method is introduced, namely, NBPSOSEE-SBS, for solving feature selection tasks. NBPSO-SEE-SBS can effectively reduce the number of features while maintaining the best classification effect. In order to prove the effectiveness and superiority of NBPSOSEE-SBS, the two groups of comparative experiments are set up, using the logistic regression method, to realize the risk prediction of ECR for power customers in June, July, and August 2018. NBPSOSEE-SBS can significantly reduce the feature dimension, improve the accuracy of classification, and effectively enhance the global search capability. e main contributions of this study are summarized as follows. Firstly, the proposed algorithm achieves the balance between local search and global search by nonlinearly updating the inertia weights and enhances the diversity of particles for searching the optimal solutions. Secondly, two dynamic contraction factors are introduced into the update of velocity and position, which can not only effectively enhance the inheritance ability, self-recognition ability, and social cognitive ability of the particles but also improve the quality of the particle position. Furthermore, a novel position updating approach is proposed to get rid of the trap of local optimum and shrinking encircling, and exploration mechanism is introduced. Finally, the SBS algorithm is used to remove individual redundant features separately and help to find the potential optimal solutions.

Binary Particle Swarm
Optimization. PSO is a random search algorithm based on group cooperation developed by simulating the foraging behavior of birds [41]. Assuming that the feature dimension of a target search space is D, the size of the population is m, where X i � (x i1 , x i2 , . . . , x iD ) is a D-dimensional vector and represents the i th particle, i � (1, 2, . . . , m), and m is the number of particles. x i is the position of particle i in the target search space.
is the velocity determining the direction and distance of each particle flight in dimension D. pbest i and gbest are the optimal position of the i th particle and the optimal position for the whole population, respectively. e velocity and position of particles are updated by the following equations, respectively: where  [46] based on binary encoding form. In the BPSO, the position of each particle is represented by binary strings, and the velocity vector is not. e positions of particles are updated according to the following equation [63]: where S(v id ) � 1/(1 + e − v id ).

Fitness
Function. e purpose of feature selection is to get the best classification results with the least features. e fitness function is shown in the following equation: e fitness value is the F1-measure average value of predicting the high-risk, medium-risk, and low-risk levels of customer ECR, and its range is between [0, 1]. In equation (4), F1_High, F1_Med, and F1_Low represent the F1measure value of predicting the high-risk, medium-risk, and low-risk levels of customer ECR, respectively. In order to make an objective evaluation of the performance of the model, this paper introduces four evaluation criteria: accuracy, precision, recall, and F1-measure. e definitions are shown in the following equations: Mathematical Problems in Engineering 3 TP, FP, FN, and TN represent true positive, false positive, false negative, and true negative, respectively, in equations (5)- (8). In theory, the higher the values of accuracy, precision, recall, and F1-measure, the higher the fitness value, and the better the predictive performance of the model.

Improved Hybrid Feature Selection
Algorithm Based on NBPSOSEE-SBS e framework for the risk prediction of ECR includes three processes: data preprocessing, selecting the optimal feature subset based on NBPSOSEE-SBS, and classification and evaluation using the logistic regression method. e framework for feature selection of ECR risk based on NBPSOSEE-SBS algorithm is shown in Figure 1.

Data Preprocessing.
e characteristic dimension of the original power data is low, which cannot adequately express the arrears behavior of power users. In order to solve this problem, data are expanded from low-dimensional features to high-dimensional features by feature extraction. According to the electricity data with 21 consecutive months, the features of training set and test set are extracted, respectively. e training set contains the power data for 20 months, and the test set contains the power data for one month. For the data of each month, firstly, the categorical features are transformed based on the method of one-hot encoding and, secondly, adding characteristics of the last six months to the encoded data and calculating the maximum, minimum, average, median, variance, and standard deviation; finally, the original 34-dimensional features are extended to 748 dimensions. e processing of feature extraction is shown in Algorithm 1, where m is the total number of months, data_month k is the power data of the k th month, and dataSet is the data set after the features have been extracted.
After all the features have been calculated, the min-max standardization method is used to transform the features and map the values to [0, 1]. e min-max standardization function is shown in the following equation: In equation (9), x presents the feature be used for transforming, and max and min are the maximum and minimum of each feature, respectively. e expansion of features can reflect the information of users' historical electricity consumption behavior, as shown in Table 1 (only one of the features is taken as an example due to too many features). ese 26 features of electricity consumption behavior in the current month are expanded by this method. en, the statistical features such as maximum value, minimum value, mean value, variance, and standard deviation of these historical electricity consumption features are calculated. Take the expansion of "jfjsl" (payment timeliness rate) as an example, and the specific statistical analysis is shown in Table 2 (only one of the features is taken as an example).
In addition, one-hot coding method is used to transform the category features into numerical features. Finally, the original 34-dimensional features are extended to 748 dimensions.

Improved NBPSOSEE Algorithm
Inertial weight w is a very important parameter in the adjustable parameters of the BPSO algorithm. e value of w plays an important role in the performance of the algorithm. e small value w is convenient for local search of the current search area, and a more accurate solution can be obtained to facilitate the convergence of the algorithm, but it is not easy to jump out of the local extremum point; a large value of w is convenient for global search, but it is not easy to get an accurate solution. Literature [60] shows that linear optimization of inertia weight can improve the performance of the algorithm, but this strategy cannot effectively satisfy the optimization process of the algorithm. erefore, in order to get closer to the actual evolutionary state of the algorithm, this paper performs nonlinear incremental optimization of inertia weight. In each iteration, the inertia weight w is calculated as follows: where t and T represent the current iteration number and the maximum number of iterations, respectively. As the number of iterations increases, the inertia weight exhibits a nonlinear incremental state. It can be known that the improved algorithm has a smaller w in the early stage of searching for the optimal solution, and the particles have a stronger local search capability. In the later stage of finding the optimal solution, there is a large w, and the particles have strong global search ability. Furthermore, in order to enhance the optimal performance of the PSO, two new contraction factors are introduced into the position updating equation in the NBPSOSEE algorithm. Clerc and Kennedy [64] proposed a particle swarm optimization algorithm with contraction factor (CFPSO) in 2002, which is intended to improve the convergence speed while getting rid of the local optimal value. e algorithm flow of CFPSO is similar to the original PSO, but the velocity updating formula of particles is different. CFPSO uses the contraction factor to compress the particle velocity after each update, which not only changes the influence ability of its own historical velocity but also changes the influence ability of the historical optimal position on the particle velocity, so as to improve the convergence speed of the population. However, CFPSO has some drawbacks. Too large value of the contraction factor results in poor convergence performance and makes the PSO close to random search optimization. If the value of the contraction factor is too small, the PSO will easily converge earlier and reduce the accuracy of classification. In order to solve this problem, the two dynamic contraction factors ϖ 1 and ϖ 2 are used to compress the velocity and position of each update, respectively, in the equations (11) and (12). On adding a contraction factor to the velocity, NBPSOSEE can improve the inheritance ability, self-recognition ability, and social cognitive ability of the particles. NBPSOSSE introduces a  (10) Update constriction factors, respectively, using equations (13) and (14) Update velocity of particle using equation (11) Update the pbest and gbest Is termination satisfied?
3 No

Yes No
Selection feature based on NBPSOSEE 1 Update position of particle using Equation (18) Optimal feature subset Delete redundant features in the current feature subset using equation (26) Update the optimal feature subset using equation (27) 6 Selection feature based on SBS (1) Input: the original data set (2) Output: the data set after feature extracted (3) Get training set; (4) Get test set; (5) Create an empty new data table dataSet; (6) for k � 7 to m//m is the total number of months (7) data_month k ← Get power data for the k th month; (8) One-hot encoding of categorical features in data_month k ; (9) for j � 1 to 6 //k-6 th to k-1 th month (10) data_month k−j ← Get power data for the k-j th month; (11) data_month k ← Add the numeric features in data_month k−j to data_month k ; (12) end for (13) data_month k ← Calculate the maximum, minimum, mean, median, variance, and standard deviation; (14) dataSet←dataSet ∪ data_month k //Merge the extended datasets to genera (15) end for ALGORITHM 1: Feature extraction.
Mathematical Problems in Engineering contraction factor, which will improve the quality of the particle position when updating the position. In NBPSOSSE, the two dynamic contraction factors can enhance the performance of exploration and exploitation and improve the convergence speed: ese two dynamic contraction factors reveal a nonlinear convert of the particle velocity and position. Parameters ϖ 1 and ϖ 2 are calculated as follows: where PV is position value of the particle, t is the current iteration, and τ is a constant (τ � 2). en, in NBPSOSEE, using the two mechanisms of shrinking encircling and exploration can improve the search ability of the population. Firstly, the moving position of the particle is determined between the current position and the target position via using shrinking encircling operation, which can shorten the search range of the particles and achieve the purpose of enhancing the local search ability of the population. In addition to shrinking encircling strategy, NBPSOSEE refers to random search mechanism to improve the diversity of particles. When updating the particle position, it is based on the change of coefficient A. If A exceeds the range of [−1, 1], distance coefficient D will be updated randomly. In order to find the target, the particles will traverse the original target direction, making the population have the performance of global search. In NBPSOSEE, the particle position is updated by equation (18). NBPSOSEE adopts dynamic contraction strategy, shrinking encircling operation and exploration mechanism with some probability, which not only can get rid of the local optimal solution but also can accelerate the convergence speed: jfjsl of the second month before the forecast month jfjsl_3 jfjsl of the third month before the forecast month jfjsl_4 jfjsl of the fourth month before the forecast month jfjsl_5 jfjsl of the fifth month before the forecast month jfjsl_6 jfjsl of the sixth month before the forecast month In equations (16) and (17), C represents a coefficient variable, and j is a target that is randomly searched (j ≠ i). In equation (14), t represents the current number of iterations, A represents a coefficient variable, gbest t d is defined as the optimal position of the current target d, and p represents a random value between [0, 1]. e coefficient variables, namely, A and C, are calculated separately as follows: In the above formulas, a is a variable with a range of values between [0, 2] and presents a linear decreasing trend. It is updated in the form of a � 2 − 2 * (t/T), where t is the current number of iteration, T represents the maximum iteration number, and r is the random value between [0, 1].

SBS Algorithm.
Since the feature subset selected by NBPSOSEE still contains redundant features, the importance of features is different, and the order of arrangement is confusing. erefore, this paper firstly uses the feature selection method of random forest to sort the importance degree of the features selected by NBPSOSEE and then uses SBS algorithm to delete features with low importance in turn.
For each node on the random forest decision tree, features are usually randomly extracted from the d-dimensional feature set.
en, according to the Gini gain maximization principle [65], a feature is selected to divide the data on the node into two left and right child nodes, which means that the data on the parent node is divided into its child nodes n l and n r . e Gini gain maximization is to maximize the following equation: Here, I G (n) is the Gini index of node n, p c is the proportion of class c samples in node n, while p l and p r are the rates of data divided into left and right child nodes n l and n r by parent node n p , respectively. e importance of features on nodes is shown in the following equation: If the set of nodes in which the feature f i appears as a node partitioning attribute in the k th decision tree is N, the importance of the feature on the decision tree is calculated as shown in the following equation: Assuming that there are K trees in the random forest, the importance of the feature f i in the random forest can be calculated as follows: Here, K is the number of decision trees in the random forest. After the NBPSOSEE obtains the optimal feature subset, the SBS starts searching, calculates the fitness value corresponding to each feature when it is deleted separately, and then selects the feature subset with the best fitness value to enter the next iteration. e iterative steps of the algorithm in the SBS stage are as follows: Step 1: determine the optimal feature subset bestSubFs of the SBS stage.
Step 2: delete a feature f in the current feature subset and f satisfies the following equation: where bestSubFs t − f denotes the feature subset after bestSubFs t removes feature f, t is the number of iterations, and fitness is the fitness value. e larger the fitness value in this paper, the better the selected feature subset.
Step 3: update the optimal feature subset and the number of iterations: Step 4: repeat steps 2 and 3 until the termination condition is met.
e feature selection processing of NBPSOSEE-SBS is shown in Algorithm 2. Where maxIterations is the maximum number of iterations, swarmSize is the number of particles in the population, dimension is the dimension of each particle, and fitness is the fitness value.

Computational Complexity of NBPSOSEE-SBS
where T is the computing time of the SBS method. It can be seen that the computational complexity of NBPSOSEE-SBS algorithm is obviously highly compared with the BPSO and CBPSO algorithms. However, R 4 , R 5 , and R 6 are the only simple numerical operations according to equations (10), (13), (14), (19), and (20). Furthermore, a large number of redundant or irrelevant features are deleted by the NBPSOSEE algorithm. SBS method will spend less time deleting the remaining unimportant features. erefore, the proposed NBPSOSEE-SBS algorithm does not significantly improve the computational complexity.

Data.
In this paper, the data set from January 2017 to September 2018 is provided by a power enterprise, including the electricity consumption data of 11,860 high-voltage users who have had arrears. According to the users' past payment behavior, the power enterprise divides the users into highrisk, medium-risk, or low-risk arrears ones. A total of 34 features are used for data processing and model training.
ere are 8 category features that represent basic information for these users, while 26 statistical features describe these users consuming electric information.

Experimental Results.
In order to prove the effectiveness and superiority of the proposed algorithm, two groups of comparative experiments are set up and use logistic regression model [66,67] to realize the risk prediction of ECR for power customers in June, July, and August 2018. e first group of experiments verifies the effectiveness of the NBPSOSEE. e second group of experiments proves the superiority of the proposed hybrid feature selection algorithm based on NBPSOSEE with SBS, called NBPSOSEE-SBS. e relevant parameters selected for NBPSOSEE-SBS are listed in Table 3. Population size 20 to 40 is the optimal size for optimization problems [68]. Generally, population size is set to 20 [46,58,60]. Hence, 20 particles are selected to form a particle swarm in this paper. e maximum iteration number is 100; maximum value w max is 0.9, and minimum value w min is 0.4 for the inertia weight; learning factor c 1 � c 2 � 2.
(1) Input: the data set after feature extracted (2) Output: best feature subset (3) Get training set and test set after the features have been extracted; (4) Initialize population, such as initial position, velocity and fitness values of particles, as well as local optimal pbest and global optimal gbest of particles; (5) for k � 1 ⟶ maxIterations do (6) for i � 1 ⟶ swarmSize do (7) Update inertia weight using equation (10); (8) for j � 1 ⟶ dimension do (9) Update a and p; (10) Update A and C, respectively, using equations (19) and (20); (11) Update ϖ t 1 and ϖ t 2 , respectively, using equations (13) and (14); (12) Update velocity of particle using equation (11); (13) Update position of particle using equation (18); (14) end for (15) end for (16) for i � 1 ⟶ swarmSize do (17) Update fitness value fitness i ; (18) Update local optimal pbest; (19) end for (20) Update global optimal gbest; (21) end for (22) Getting the optimal feature subset selected by NBPSOSEE; (23) Delete a feature f in the current feature subset using equation (26); (24) Update the optimal feature subset using equation (27); (25) Repeat steps 23 and 24 until the termination condition is met; ALGORITHM 2: NBPSOSEE-SBS feature selection. 8 Mathematical Problems in Engineering e logistic regression model is trained by training set with the optimal feature subset selected by the NBPSOSEE-SBS algorithm. e logistic regression model outputs the default probability p of users through the test set, p ∈ [0, 1]. en, setting the appropriate threshold value θ, according to the threshold value θ, users are divided into three levels: high risk, medium risk, and low risk. e specific division principles are shown in Table 4. e users with the default probability of p ≥ 70% are divided into the high risk level; the users with the arrears probability of p ≥ 40% and p < 70% are divided into the medium risk level; and the users with the arrears probability of p < 40% are divided into the low risk level.  Figure 2 shows the test results of the improved NBPSOSEE for ECR risk in June 2018. Compared with the other eight feature selection algorithms, NBPSOSEE gets the maximum fitness value with the least number of features. In terms of number of features and fitness value, the performance of HI-BQPSO, LBPSOSEE, and NBPSOSEE is significantly better than the other six algorithms, of which NBPSOSEE is the best, LBPSOSEE is the second, and HI-BQPSO is the third. In the process of calculating fitness, BABC obtains the least fitness value. e fitness values calculated by HI-BQPSO, LBPSOSEE, and NBPSOSEE are 5.77%, 7.34%, and 11.61% higher than BABC, respectively. However, NBPSOSEE increased by 5.52% and 3.98% in HI-BQPSO and LBPSOSEE, respectively. Furthermore, in the process of selecting feature subset, BABC selects the most features; HI-BQPSO, LBPSOSEE, and NBPSOSEE select 242, 216, and 205 features, respectively. NBPSOSEE selects the fewest features and deletes 190, 179, and 153 fewer features in HI-BQPSO, LBPSOSEE, and BABC, respectively. NBPSOSEE selects approximately one-third of the features from the original feature set, removing a total of 543 redundant or unrelated features. In addition, BWOA, BABC, BGA, BGWO, BPSO, CBPSO, HI-BQPSO, and LBPSOSEE search for optimal fitness values after 97, 37,61,84,68,35,48, and 43 iterations, respectively. e NBPSOSEE obtains the optimal fitness value until 15 iterations and can still search for the best fitness value after 17, 20, and 98 iterations. In the initial search, NBPSOSEE quickly increases the initial fitness value to a very high level, which reflects the convergence performance of the proposed algorithm. e search results show that the proposed NBPSOSEE still has effective global search ability after falling into the local optimal state and can continue to search for a better feature subset in later search. Figure 3 shows the test results of the proposed NBPSOSEE and other feature selection algorithms for ECR risk in July. e fitness values calculated by the proposed NBPSOSEE and these comparison algorithms show an increasing trend with the increase in the number of iterations. Moreover, the number of selected features by NBPSOSEE is decreasing. LBPSOSEE and NBPSOSEE get higher fitness values and select fewer features than the other seven algorithms.
e fitness values calculated by LBPSOSEE are 2.21%, 4.17%, 6.94%, 4.41%, 5.60%, 4.41%, 1.09% higher than those of BWOA, BABC, BGA, BGWO, BPSO, CBPSO and HI-BQPSO, respectively. Moreover, the number of features selected by LBPSOSEE is 122, 46, 106, 51, 80, 105, and 59 less than those of the seven algorithms. From these results, it can be seen that LBPSOSEE has better ability to avoid falling into local optimal solutions than these seven comparison algorithms. After 58 iterations, the fitness value of NBPSOSEE is 0.939, and the selected optimal feature subset has 213 features. However, the fitness value calculated by NBPSOSEE is 1.62% higher than that of LBPSOSEE, and the number of features selected is 69% less than that of LBPSOSEE. is proves that the global search ability of NBPSOSEE is stronger than LBPSOSEE, and the performance of NBPSOSEE is significantly better than other feature selection algorithms. Figure 4 shows the test results of the improved NBPSOSEE and other eight comparison algorithms for ECR risk in August. e search capabilities of LBPSOSEE and NBPSOSEE are significantly higher than other algorithms. e optimal fitness values obtained by LBPSOSEE and NBPSOSEE are increasing, and the selected features are decreasing.
e optimal fitness value calculated by LBPSOSEE is 0.957, which is 2.46%, 3.12%, 1.27%, 1.70%, 7.71%, 2.79%, and 0.95% higher in BWOA, BABC, BGA, BGWO, BPSO, CBPSO, and HI-BQPSO, respectively. LBPSOSEE selects 172 features, and the number of selected features is reduced by 169, 144, 208, 137, 175, 184, and 111, respectively, compared with seven algorithms. However, the optimal feature subset selected by NBPSOSEE contains 102 features, and the optimal fitness value calculated is 0.977.  is verifies that the NBPSOSEE convergence speed and global search ability are higher than other algorithms and also demonstrates the effectiveness and stability of the proposed NBPSOSEE.

Experimental Results of NBPSOSEE-SBS.
Although the results obtained by NBPSOSEE have been improved, the feature subset selected by NBPSOSEE still contains many erefore, a hybrid feature selection algorithm is proposed, namely, NBPSOSEE-SBS, to select optimal feature subset in this paper, which can not only effectively reduce the number of features but also improve the accuracy.  Figure 5 shows the test results of the improved NBPSOSEE-SBS for ECR risk in June 2018. With the constantly deleting redundant or irrelevant features, the fitness values calculated by NBPSOSEE-SBS and other eight feature selection algorithms show an increasing trend. e optimal fitness value calculated by BWOA-SBS is 3.01% higher than that of BWOA, and the number of deleted redundant features is 9 more than BWOA. e best fitness value calculated by BABC-SBS is 0.24% higher than that of BABC, and the selected features are reduced by 6 compared to BABC. e optimal fitness value obtained by BGWO-SBS is increased by 1.29% compared with BGWO, and the selected features are 21 fewer than that of BGWO. e optimal fitness value obtained by BPSO-SBS is 1.9% more than BPSO, and the number of selected features is reduced by 12 compared with BPSO. e optimal fitness value obtained by CBPSO-SBS is 0.7% higher than CBPSO, and 9 more redundant features are deleted than CBPSO. e best fitness value of HI-BQPSO-SBS calculation is 3.54% higher than that of HI-BQPSO, and the deleting redundant features are 35 more than HI-BQPSO. e best fitness value of LBPSOSEE-SBS is 0.79% higher than that of LBPSOSEE, and selecting redundant features are 14 fewer than LBPSOSEE. In addition, the optimal fitness value obtained by NBPSOSEE-SBS is 0.54% more than NBPSOSEE, and the number of selecting features is reduced by 11 compared with NBPSOSEE. e experimental results show that NBPSOSEE-SBS obtains the highest fitness value and removes the most redundant or unrelated features. Figure 6 shows the test results based on NBPSOSEE-SBS and other feature selection algorithms for ECR risk in July.  Figure 7 shows the test results of the improved NBPSOSEE-SBS for ECR risk in August.
e results obtained by NBPSOSEE-SBS are significantly better than other algorithms. NBPSOSEE-SBS deletes the most redundant or unrelated features and obtains the highest fitness value. e optimal fitness values obtained by NBPSOSEE-SBS increased by 4.45%, 5.79%, 4.34%, 4.23%, 6.71%, 5.45%, 3.79%, and 2.60% compared with the other eight algorithms, respectively. e number of features selected by NBPSOSEE-SBS is reduced by 256, 236, 299, 227, 259, 280, 198, and 85,  and F1 represent the selected number of features, accuracy, precision, recall, and F1-measure, respectively. Table 5 shows the test results of the improved NBPSOSEE-SBS and comparison algorithms for ECR risk in June 2018. For high-risk arrears users, the accuracy, precision, recall, and F1measure calculated by these comparison algorithms are all 100%. is is because the number of high-risk arrears is small, and the characteristics of reflecting arrears behavior are obvious. Among the arrears' users of the medium risk level, HI-BQPSO, NBPSOSEE, BWOA-SBS, BGA-SBS, BGWO-SBS, HI-BQPSO-SBS, and the proposed NBPSOSEE-SBS achieve the highest precision, reaching 100%. e proposed NBPSOSEE-SBS obtains the highest accuracy, precision, and F1-measure for the arrears' users of the low risk level. e accuracy, precision, and F1-measure obtained by NBPSOSEE-SBS are 0.1%, 1.24%, and 0.67% higher than NBPSOSEE, respectively. Moreover, the proposed NBPSOSEE-SBS removes the most redundant or unrelated features and calculates the highest average value of F1-measure. Table 6 shows the test results of the improved NBPSOSEE-SBS and comparison algorithms for ECR risk in July 2018. e results calculated using all feature subsets are the worst, but the results of the proposed NBPSOSEE-SBS are significantly better than other algorithms. e accuracy and F1-measure of NBPSOSEE-SBS are higher than other algorithms except for CBPSO in the arrears' users of the medium risk level. Furthermore, NBPSOSEE-SBS achieves a precision of 100%. In addition, the accuracy and precision  e execution time of algorithm is also an important indicator for evaluating the performance of algorithm. e long running time of algorithm means that the complexity of algorithm is high, and the short running time indicates that the complexity of algorithm is low. In this paper, the running time of the proposed NBPSOSEE, NBPSOSEE-SBS, and other algorithms is shown in Figures 8 and 9. In order to  remove more redundant features and improve the classification results, these hybrid feature selection algorithms with SBS take slightly more execution time than those without SBS. e execution time of NBPSOSEE-SBS is not significantly different from other algorithms. e execution speed of NBPSOSEE-SBS is faster than BWOA-SBS, BABC-SBS, BGA-SBS, BGWO-SBS, BPSO-SBS, CBPSO-SBS, HI-BQPSO-SBS, and LBPSOSEE-SBS for the testing in July and August. In summary, the proposed NBPSOSEE-SBS outperforms other algorithms. NBPSOSEE-SBS can effectively reduce a large of redundant features and stably improve the prediction results while keeping low execution time.

Conclusion
To solve the problem of accurately predict the risk of ECR for power customers, a hybrid nonlinear inertia weight binary particle swarm optimization with shrinking encircling and exploration mechanism (NBPSOSEE) is proposed for solving feature selection tasks. In addition, an improved feature selection approach that is based on the NBPSOSEE and SBS is proposed, namely, NBPSOSEE-SBS, to select the optimal feature subset. e experimental results prove that the proposed NBPSOSEE-SBS can steadily remove more redundant or irrelevant features and obtain better prediction results of ECR for power customers in the case of low execution time, compared with one state-of-the-art optimization algorithm, and seven well-known wrapper-based feature selection approaches for the risk prediction of ECR for power customers.

Data Availability
e experimental data contain the user's privacy. erefore, in order to protect the security of users, we cannot upload the data set.   e file NBPSOSEE_SBS _results.csv represents the calculation result of the algorithm NBPSOSEE-SBS. In the headline of these CSV files, features_num means selected number of features. Accuracy_high, Loss_high, Pre-cission_high, Recall_high, and F1_high represent the accuracy, loss, precision, recall, and F1-measure of predicting the high-risk level of customer electric charge recovery, respectively. Accuracy_med, Loss_med, Precission_med, Recall_med, and F1_med represent the accuracy, loss, precision, recall, and F1-measure of predicting the mediumrisk level of customer electric charge recovery, respectively. And Accuracy_low, Loss_low, Precission_low, Recall_low, and F1_low represent the accuracy, loss, precision, recall, and F1-measure of predicting the low-risk level of customer electric charge recovery, respectively. (Supplementary Materials)