Feature Selection Based on Adaptive Particle Swarm Optimization with Leadership Learning

With the rapid development of the Internet of Things (IoT), the curse of dimensionality becomes increasingly common. Feature selection (FS) is to eliminate irrelevant and redundant features in the datasets. Particle swarm optimization (PSO) is an efficient metaheuristic algorithm that has been successfully applied to obtain the optimal feature subset with essential information in an acceptable time. However, it is easy to fall into the local optima when dealing with high-dimensional datasets due to constant parameter values and insufficient population diversity. In the paper, an FS method is proposed by utilizing adaptive PSO with leadership learning (APSOLL). An adaptive updating strategy for parameters is used to replace the constant parameters, and the leadership learning strategy is utilized to provide valid population diversity. Experimental results on 10 UCI datasets show that APSOLL has better exploration and exploitation capabilities through comparison with PSO, grey wolf optimizer (GWO), Harris hawks optimization (HHO), flower pollination algorithm (FPA), salp swarm algorithm (SSA), linear PSO (LPSO), and hybrid PSO and differential evolution (HPSO-DE). Moreover, less than 8% of features in the original datasets are selected on average, and the feature subsets are more effective in most cases compared to those generated by 6 traditional FS methods (analysis of variance (ANOVA), Chi-Squared (CHI2), Pearson, Spearman, Kendall, and Mutual Information (MI)).


Introduction
Large amounts of data have been generated in various fields such as social media, healthcare, cybersecurity, and education in the past decades, and edge computing provides an effective solution for data storage and transmission. However, as the dimensionality of the data increases, the curse of dimensionality problem becomes common, which has a negative impact on the stability, security, and computational efficiency of edge computing. Feature selection (FS) is a data preprocessing technique in machine learning and data mining that has been applied to improve the performance of edge computing by eliminating irrelevant and redundant features in the datasets [1][2][3]. In general, it is a combinatorial optimization problem [4,5] that tries to find the optimal feature subsets with essential information from the original datasets. Given a dataset with N features, there will be 2 N possible feature subsets, and the search space rises exponentially as the number of features increases [6,7]. Hence, some traditional FS methods have received considerable interest due to their ability to evaluate feature importance and select a certain number of top-ranked features. ese methods include statistical test (e.g., analysis of variance (ANOVA) [8,9] and Chi-Squared (CHI2) [10,11]), correlation criteria (e.g., Pearson [12], Spearman [13,14], Kendall [15,16]), and information theory (e.g., symmetrical uncertainty (SU) [17], mutual information (MI) [18,19], and entropy [20]). However, the statistical test and correlation criteria techniques only consider the correlation between features and labels, and the feature subsets are not appropriate because some highly correlated but redundant features are selected. As a result, information theory techniques are applied to FS problems owing to their consideration of redundancy between features as well. Moreover, the redundancy calculation only focuses on the interaction between two features and fails to identify those of multiple features [21], which may ignore some important features. erefore, how to find suitable feature subsets efficiently needs to be further investigated.
Metaheuristic algorithms such as monarch butterfly optimization (MBO) [22], slime mold algorithm (SMA) [23], moth search algorithm (MSA) [24], hunger games search (HGS) [25], hybrid rice optimization (HRO) [26], colony predation algorithm (CPA) [27], weighted mean of vectors (INFO) [28], grey wolf optimizer (GWO) [29], clonal flower pollination algorithm (FPA) [30], salp swarm algorithm (SSA) [31], Harris hawks optimization (HHO) [32], and particle swarm optimization (PSO), have been used to solve combinatorial optimization problems because of their dynamic exploration and exploitation capabilities in the search space, some of which have shown to be successful in FS problems [33,34]. For instance, Shen and Zhang [29] proposed a two-stage GWO for processing biomedical datasets, which showed better performance in terms of time consumption and classification accuracy by removing more than 95.7% of the redundant features. Hussain et al. [32] developed an FS method based on HHO, which removed 87% of features and achieved 92% of classification accuracy. Yan et al. [30] presented a binary clonal FPA for some biomedical datasets, which enhanced population diversity and selected fewer features with strong robustness. Balakrishnan et al. [31] designed an FS method based on salp SSA, which increased the ability of particles to explore different regions by randomly updating their position and improved the confidence level by 0.1033% on 6 datasets. However, a series of parameters need to be set by users in these metaheuristic algorithms, and unsuitable parameters may lead to slow convergence and local stagnation. A lot of experiments and extensive experience are needed to find the appropriate parameter settings.
Compared with the above metaheuristic algorithms, PSO is applied to solve FS problem of its fast convergence and few parameters. However, the exploration and exploitation capabilities are influenced by parameter setting and population diversity as the number of features increases. erefore, some improved PSO based on parameter updating and population diversity updating strategies have been proposed for FS. For example, Song et al. [35] developed a three-phase hybrid FS algorithm, which reduced the computational cost by using correlation-guided clustering and an improved integer PSO. Tran et al. [36] used a bare-bones PSO for FS, which reduced the search space of the problem and improve the search efficiency. Song et al. [37] also introduced a variable-size cooperative coevolutionary PSO for high-dimensional datasets, which divided a high-dimensional FS problem into multiple low-dimensional subproblems with a low computational cost. Hu et al. [38] presented a multi-objective PSO for FS, which achieved superior performances in approximation, diversity, and feature cost by introducing a tolerance coefficient. Hosseini Bamakan et al. [39] proposed a time-varying PSO-based FS method to deal with the network intrusion detection problem, which obtained a higher detection rate and lower false alarm rate by introducing a chaotic concept and timevarying parameters. Mafarja et al. [40] proposed a binary PSO-based FS method, which adopted a time-varying inertia weighting strategy and showed a superior convergence rate on some datasets. Huang et al. [41] utilized cut-point and feature discretization to expand the searching scope of PSO for gene expression datasets, which selected fewer features and maintained similar classification accuracy. Xue et al. [42] introduced adaptive parameters in PSO for high-dimensional datasets, which allowed particles to automatically adjust parameters during the search process and decreased time consumption. Moradi and Gholampour [43] used a PSO with the local search strategy for high-dimensional datasets, which adjusted the search process by considering the correlation information between distinct features. Chen et al. [44] introduced an FS method based on hybrid PSO and differential evolution (HPSO-DE), which enhanced population diversity by adopting mutation, crossover, and selection operators. Although the optimization ability of PSO is improved to some extent by the above techniques, the randomness of the search process may be increased and they lack consideration for jumping out of the local optima.
In the paper, an FS method based on adaptive PSO with leadership learning (APSOLL) is proposed, which combines parameter updating and population diversity updating strategies to compensate for the shortcomings of PSO. e adaptive updating strategy for parameters is used to guide particles to search in a more reasonable scope, and the leadership learning strategy is utilized to enhance population diversity. Overall, the main contributions of our work are as follows: (1) Based on the population state, an adaptive updating strategy for parameters is proposed to replace the constant parameters which guide particles to search in a more reasonable scope. (2) Adopting leadership learning strategies to provide valid population diversity by learning from the first three leaders in the population that enhances the exploration and exploitation capabilities of PSO.

Overview of PSO.
PSO is a population-based metaheuristic algorithm for simulating the predatory activities of bird and fish populations [45,46], and each particle in the population has two properties: velocity vector v i � (v i1 , v i2 , · · · , v id ) and position vector x i � (x i1 , x i2 , · · · , x id ), where d denotes the dimension. In the search process of PSO, the velocity vectors are dynamically adjusted by the personal best position (pbest i ) and the global best position (gbest) at the current stage, and the position vectors are the candidate solutions to the optimization problems, all of which are updated by equations (1)-(2).

Computational Intelligence and Neuroscience
where v i and x i represent the velocity and position vectors of the i-th (i = 1, 2, . . ., N) particle, and the upper and lower limits of each dimension are set to 1 and 0, respectively. ω is defined as the inertia parameter, and it is a non-negative number. c 1 and c 2 are acceleration parameters, and the former represents the personal learning parameter and the latter represents the global learning parameter, which is used to control the search scope of particles and set by users. r 1 and r 2 are random numbers in [0, 1].

e Leadership Learning Strategy.
Leadership learning strategy is a management concept that describes the dynamic process of feed-forward and feedback in a living system. Hirst et al. [47] suggested that learning activities of individuals will affect the decisions of leaders, and it is called feed-forward learning flow. Moreover, effective leaders may quickly identify key information in group development and have a lasting impact on the individuals and group activities through their decisions in turn, which is regarded as feedback learning flow. In the model of leadership learning strategy, feed-forward and feedback learning flow among individuals, groups, and leaders together determine the scope of the system development, and the framework is shown in Figure 1. Based on the leadership learning strategy, GWO was proposed with effective exploration capability and acceptable time consumption by learning from the first three best solutions (leaders) of each iteration [48][49][50][51]. In the search process, the population is divided into four levels, sequentially α, β, δ, and ω, where α, β, and δ are regarded as leaders, the remaining particles ω are considered as individuals, and the population is considered group. Moreover, the particles and leaders learning from each other are considered as the leadership learning strategy, and it is shown in Equation (3).
denote the distance between particles and leaders. C 1 �→ , C 2 �→ , and C 3 �→ are random numbers from 0 to 2. e search scope of particles is controlled by the convergence factor A → , which is computed as Equation (4).
where the variable a � 2(1 − t/T) is the control coefficient (T denotes the maximum number of iterations), and it decreases linearly from 2 to 0 during the search process.

The Proposed Method
In this section, an FS method based on APSOLL is presented to conduct classification on 10 UCI datasets. e corresponding techniques for the proposed method are described as follows:

Adaptive Updating Strategy for Parameters.
During the search process of PSO, the search scope of particles is affected by convergence factor c 1 and c 2 . In general, they are usually less than 2 and set to constant values by users [52][53][54]. However, the population is dynamically changed according to the optimal fitness value, it is appropriate to adaptively adjust c 1 and c 2 for better exploration and exploitation. Moreover, the change of fitness value during the iteration reflects the state of the population, thus the adaptive updating strategy is proposed based on this case, and it is used to replace the convergence factor, which is shown in equations (5)- (6).
where m is a variable and initially set to 0, and it is increased by 1 if the fitness value is improved in the next iteration, otherwise the value of which is always 0. us, c is dynamically changed between 1 and 2 during the search process, and it is gradually increased if the algorithm falls into the local optima.

e Search Process of Leadership Learning
Strategy. e population diversity of PSO may be inadequate due to the strategy learned from pbest i and gbest. Smith [55] proposed that the more leaders of individuals engage feed-forward and feedback in a living system, the more possible it is for the Computational Intelligence and Neuroscience group to change, innovate, and cooperate. However, the time consumption will increase as the number of leaders increases during the process. erefore, inspired by GWO, the leadership learning strategy from 3 leaders is used to reconstruct the velocity vectors of PSO, which will increase population diversity and provide more accurate information for better exploration and exploitation. In addition, an adaptive parameter c is combined to guide the particles to search in a more reasonable scope, and the process is shown in Equation (7).
where X 1 �→ , X 2 �→ and X 3 �→ represent the leadership learning strategy. r 4 is a random number between 0 and 1. c is updated by (6), it is dynamically changed between 1 and 2 during the search process, and it is gradually increased if the algorithm falls into the local optima. e cooperation of c/2, c/3 and c/4 will allow particles to search in a more reasonable scope with higher possibilities.
As for the leadership learning strategy, Hu et al. [50] proposed that the convergence factor | A → | greater than 1 shows better exploration capability and less than 1 shows better exploitation capability. However, it can be seen from (4) that | A → | is linearly decreased and always less than 1in the last 50% of iterations, and the exploration capability is insufficient when the algorithm is trapped in the local optima in this case. Hence, it is considered to increase the possibility that | A → | is greater than 1 at this stage and it is modified as shown in Equation (8).
where r 5 is a random number in [0, 1], and | A → | is adaptively changed during the search process. It will be greater than 1 with a higher possibility and thus enhance the exploration capability when the algorithm falls into the local optima.

e Encoding Schema.
e core object of the proposed method is to select a suitable expression form for FS and establish a reasonable mapping between the solutions and the feature subsets. e candidate solutions that are binarized are used to represent the features, where "1" denotes the feature is selected and "0" illustrates the feature is abandoned. For instance, there is a feature dataset with 10 features, and the candidate solution is coded as 1010000011, which means the 1st, 3rd, 9th, and 10th features are selected and the others are abandoned. e position vector of each particle is binarized according to Equation (9).
where Xb i � (Xb i1 , Xb i2 , · · · , Xb id ), i and d denote the number of particles and the number of features, respectively.

e Definition of Objective
Function. e feature subsets generated by FS methods for classification have two main goals, which are maximizing the classification accuracy (minimizing the classification error) and minimizing the number of selected features. As a mainstream classifier, K nearest neighbor (KNN) [56][57][58] is utilized for FS due to its advantages of simplicity and insensitivity to noisy data. Furthermore, how to reduce the number of selected features is considered another core issue. e ultimate goal is to obtain the optimal feature subsets with essential information from the original datasets while achieving higher classification accuracy with fewer features. Hence, the objective function that combines the classification accuracy and the number of selected features is adopted and it is defined as Equation (10).
where acc (X) denotes the classification accuracy of the feature subsets, #X and N represent the number of features in the feature subset and the original dataset. θ is a weighting factor to balance the classification accuracy and the number of selected features, and it is set to 0.7.

Implementation of the Proposed Method.
e main process of APSOLL is to search for the optimal feature subsets with essential information from the original datasets and apply it for classification, and the pseudocode is shown in Algorithm 1. Among these, the particles are binarized to determine the corresponding feature subsets in each iteration, and the leaders are determined by computing the fitness function, which is used to guide the search process. Figure 2 shows the flowchart of APSOLL. When the algorithm starts running, it randomly initializes the velocity vector v i , position vector x i , pbest i , gbest, and sets m � 0 and t � 0. In each iteration, the fitness value of each particle is calculated in order to find the optimal three solutions (leaders). Based on the information provided by the leader, the velocity of the particles and the position of the population are updated. In this process, if the optimal fitness value is not changed, the adaptive parameter m is added by 1. e algorithm run is ended and the optimal solution is binarized when the maximum number of iterations is reached.

Experimental Design
All experimental procedures are implemented using Python 3.8 in a PC with Intel(R) Core (TM) i5-9400 @ 2.9 GHz CPU, and 16 GB DDR4 of RAM under Windows 10 Operating System. 10 public datasets are used to assess the quality of the proposed method. APSOLL is compared with 7 metaheuristic algorithms to evaluate the optimization ability, and 6 traditional FS methods such as ANOVA, CHI2, Pearson, Spearman, Kendall, and MI are used to analyze the effectiveness of the feature subsets selected by the proposed method. and instances ranging from 69 to 2600, and the details of datasets are shown in Table 1. In the experiments, each dataset is randomly divided into two parts: a total of 70% of the instances are chosen as the training data, and the remaining 30% are used as the testing data. Li et al. [54] described in detail why the dataset dividing approach was adopted.

Start
Randomly initialize velocity v i position x i , Pbest i , gbest,set m=0, t=0 Calculate fitness value of each particle t<max_iter No fitness(t)=fitness(t-1) t=t+1 Yes m=m+1 m=0 Output the best particle End Update x α , x β , and x δ Update c by equation (6) Update the velocity of each particle by using equation (7) Update the position by using equation Yes No Compute |X 1 | |X 2 |, and |X 3 | by using equation (3) Update |A| by equation (8) Input: the number of iterations T, population size N Output: e classification accuracy and the number of features among the feature subsets Initialization: x i � (x i1 , x i2 , · · · , x id ) Set ub � 1, lb � 0, m � 0, initial iteration t � 0 while t< T do Binarize each particle by using Equation (9) Compute the fitness value of each particle by using Equation (10) Update xα, xβ, and xδ Update c by Equation (6) Update , and X 3 �→ by using Equation (3) Update the velocity of each particle by using Equation (7) Update the population position by using Equation (2) t � t+1 end while Binarize x α by using Equation (9)  Computational Intelligence and Neuroscience 4.2. Parameters Setting for Metaheuristic Algorithms. As for APSOLL, the search process requires only one inertia weight parameter ω to be set. In addition, some commonly used FS methods based on metaheuristic algorithms are adopted to evaluate the optimization ability, such as GWO, PSO, HHO, FPA, SSA, LPSO, and HPSO-DE. Among them, LPSO [40] and HPSO-DE [44] are classical benchmark PSO-based FS methods by adopting parameter updating and population diversity updating strategies, respectively. e parameters of each metaheuristic algorithm are set based on the published literature, which is shown in Table 2. Furthermore, the binary encoding scheme is utilized for each metaheuristic algorithm and it is run independently 30 times to take the average as the result in order to eliminate the influence of randomness.

Experimental Results of Different Metaheuristic
Algorithms.
e optimization ability of APSOLL is evaluated from the fitness value, classification accuracy, number of selected features, and CPU time. e average convergence curves of the fitness value are shown in Figures 3-4, and the number of selected features in the search process is shown in Figures 5-6. In the experiment, the t-test with a significance level of 0.05 is used to determine whether the results obtained from the proposed algorithm are statistically significantly different from other metaheuristic algorithms, and the experimental results are presented in Tables 3-4, where Fit, Acc, and #F denote the fitness values, classification accuracy and number of selected features after 30 independent runs, and Time presents the CPU time of the whole process (in seconds). S fit , S acc , and S f display the t-test results, where "+" or "−" means the result is worse or better than the proposed method and "�" means they are similar in the ttest. In other words, the more "+", the better the proposed methods.
From the variation curves of the fitness value, it is shown that APSOLL has achieved better fitness values on all datasets, which means the optimization ability of APSOLL is better than other metaheuristic algorithms by adopting the adaptive updating and leadership learning strategy. From

Figures 3-4, it can be observed that HHO and HPSO-DE
converge prematurely on most datasets, and PSO, SSA, FPA, and LPSO converge slower and have poor overall performance. In contrast, APSOLL achieves a balance in convergence speed and performance. In terms of classification accuracy, APSOLL-based FS method exceeds 80% on average in 9 of the 10 datasets, especially on MF, which has reached 98.07%. As it can be seen in Figures 5-6, PSO, SSA, FPA, and LPSO have limited performance in reducing the size of feature subsets, while APSO performs better than other methods on most datasets during the iterative process. In Tables 3-4, the number of selected features by APSOLL is less than those of other metaheuristic algorithms in most cases. A total of 30%-50% of features in the original datasets are selected by FPA and SSA, while less than 8% of features are selected by APSOLL. In particular, only 7.58 features are selected on average from the original 754 features on PD. As for CPU time, APSOLL consumes less time on MIC and madelon compared to other metaheuristic algorithms. Moreover, although it consumes slightly more time on other datasets, it performs better in the two main aims of the classification accuracy and the number of selected features.
In summary, the optimization ability of APSOLL is better than other metaheuristic algorithms, and the suitable feature subsets are selected with higher classification accuracy and fewer features at an acceptable time.

Experimental Results of Traditional Methods.
To demonstrate the effectiveness of APSOLL-based FS method, the performance is compared with that of 6 traditional methods. Figures 7-8 show the classification accuracy of 6 traditional FS methods for different numbers of selected features, and the optimal solutions of the proposed and traditional methods are presented in Table 5. It is observed from Figures 7-8 that the traditional methods are difficult to improve the classification accuracy by sequentially increasing the number of features when a certain level is reached. In comparison, more suitable feature subsets are obtained by the metaheuristic algorithm-based FS method, among these, APSOLL has better performance. In addition, it is not the case that the more features selected, the higher the classification accuracy is, which indicates that the redundancy among features affects the classification performance on most datasets.
As can be seen from Table 5, it is clear that the classification accuracy is improved by at least 1.28% on average via the proposed method on 5 datasets, especially on arrhythmia and isolet5, with 11.77% and 4.26%, respectively. Although the classification accuracy of the proposed method is about 2% on average lower than traditional methods on myocardial, MF, PD, and CNAE-9, the number of selected features is lower than that of these methods, only 2, 21, 9, and 64 features are selected, respectively. To further analyze the number of selected features, fewer features are selected by the proposed method on 6 datasets. Among them, it is noticed that more than 30% of the features are selected by traditional methods on Isolet5 and MF, while only 7.46% and 3.24% of the features are selected by the proposed method, respectively. In terms of time consumption, traditional methods are affected by the number of features due to the sequential addition of features to the feature subsets, and its time consumption increase dramatically as the number of features increases, while APSOLL performs more stability on most datasets because its dynamic exploration and exploitation capabilities, and the CPU time is still acceptable. In brief, the proposed method is dependable and effective for solving FS problems compared with traditional methods.    14 Computational Intelligence and Neuroscience

Conclusions and Future Work
In the paper, APSOLL is proposed for FS, which enhances exploration and exploitation capabilities by utilizing an adaptive updating strategy to guide the population search in a more reasonable scope and the leadership learning strategy to increase population diversity. Experimental results in comparison with other FS methods based on metaheuristic algorithms reveal that APSOLL offers better optimization ability and selects the suitable feature subsets within an acceptable time. Moreover, APSOLL-based FS method achieves better or approximate classification accuracy by selecting less than 8% of features from the original datasets compared to other traditional methods. In conclusion, the suitable feature subsets are selected by the proposed method while ensuring a proper balance between the classification accuracy and the number of selected features. In the future, it is interesting to decrease the CPU time of APSOLL by combining the feature ranking and applying it to process ultrahigh dimensional datasets.

Data Availability
e data used to support the findings of this study are openly available in the UCI archive.

Conflicts of Interest
e authors declare that they have no conflicts of interest.