A Novel Feature Selection Method Based on Extreme Learning Machine and Fractional-Order Darwinian PSO

The paper presents a novel approach for feature selection based on extreme learning machine (ELM) and Fractional-order Darwinian particle swarm optimization (FODPSO) for regression problems. The proposed method constructs a fitness function by calculating mean square error (MSE) acquired from ELM. And the optimal solution of the fitness function is searched by an improved particle swarm optimization, FODPSO. In order to evaluate the performance of the proposed method, comparative experiments with other relative methods are conducted in seven public datasets. The proposed method obtains six lowest MSE values among all the comparative methods. Experimental results demonstrate that the proposed method has the superiority of getting lower MSE with the same scale of feature subset or requiring smaller scale of feature subset for similar MSE.


Introduction
In the field of artificial intelligence, more and more variables or features are involved. An excessive set of features may lead to lower computation accuracy, slower speed, and additional memory occupation. Feature selection is used to choose smaller but sufficient feature subsets, to improve or at least not significantly harm the predicting accuracy in the meantime. Many studies have been conducted to optimize feature selections [1][2][3][4]. As far as we know, there are two key points in search-based feature selection process: learning algorithms and optimization algorithms. Many techniques could be involved in this process.
Various learning algorithms could be included in this process. Classical neural networks such as -nearest neighbors algorithm [5] and generalized regression neural network [6] were adopted for their simplicity and generality. More sophisticated algorithms are needed for better predicting complicated data. Support vector machine (SVM) is one of the most popular nonlinear learning algorithms and has been widely used in feature selection [7][8][9][10][11]. Extreme learning machine (ELM) is one of the most popular single hidden layer feedforward networks (SLFN) [12]. It possesses faster calculation speed and better generalization ability than traditional artificial learning methods [13,14], which highlights the advantages of employing ELM in feature selection, as reported in some studies [15][16][17].
In order to better locate optimal feature subsets, an efficient global search technique is needed. Particle swarm optimization (PSO) [18,19] is an extremely simple yet fundamentally effective optimization algorithm and has produced encouraging results in feature selection [7,20,21]. Xue et al. considered feature selection as a multiobjective optimization problem [5] and firstly applied multiobjective PSO [22,23] in feature selection. Some improved PSO such as hybridization of GA and PSO [9], micro-GA embedded PSO [24], and fractional-order Darwinian particle swarm optimization (FODPSO) [10] were introduced and achieved good performance in feature selection.
Training speed and optimization ability are two essential elements relating to feature selection. In this paper, we propose a novel feature selection method which employs ELM as learning algorithm and FODPSO as optimization algorithm. The proposed method is compared with SVM-based feature 2 Computational Intelligence and Neuroscience selection method in terms of training speed of learning algorithm and compared with traditional PSO-based feature selection method in terms of searching ability of optimization algorithm. And also, the proposed method is compared with a few well-known feature selection methods. All the comparisons are conducted on seven public regression datasets.
The remainder of the paper is organized as follows: Section 2 presents technical details about the proposed method. Section 3 conducts the comparative experiments on seven datasets. Section 4 makes conclusions of our work.

Learning Algorithm: Extreme Learning Machine (ELM).
The schematic of ELM structure is depicted as Figure 1, where denotes the weight connecting the input layer and hidden layer and denotes the weight connecting the hidden layer and output layer. is the threshold of the hidden layer, and is the nonlinear piecewise continuous activation function which could be sigmoid, RBF, Fourier, and so forth. represents the hidden layer output matrix, is the input layer, and is the expected output. Let be the real output; ELM network is used to choose appropriate parameters to make and as close to each other as possible.
is called the hidden layer output matrix, computed by and as (2), in which̃denotes the number of hidden layer nodes and denotes the dimension of input : . . . d . . .
As rigorously proven in [13], for any randomly chosen and , can always be full-rank if activation function is infinitely differentiable in any intervals. As a general rule, one needs to find the appropriate solutions of , , to train a regular network. However, given infinitely differentiable activation function, the continuous output can be approximately obtained through any randomly hidden layer neuron, if certain tuning hidden layer neuron could successfully estimate the output, as proven by universal approximation theory [24,25]. Thus, in ELM, the only parameter that needs to be settled is . , can be generated randomly.
By minimizing the absolute numerical value in (1), ELM calculated the analytical solution as follows: where G is the Moore-Penrose pseudoinverse of matrix . ELM network tends to reach not only the smallest training error, but also the smallest norm of weights, which indicates that ELM possesses good generalization ability.

Optimization Algorithm: Fractional-Order Darwinian
Particle Swarm Optimization (FODPSO). Kiranyaz et al. [19] developed a population-inspired metaheuristic algorithm named particle swarm optimization (PSO). PSO is an effective evolutionary algorithm which searches for the optimum using a population of individuals, where the population is called "swarm" and individuals are called "particles." During the evolutionary process, each particle updates its moving direction according to the best position of itself (pbest) and the best position of the whole population (gbest), formulated as follows: where = ( 1 , 2 , . . . , ) is the particle position at generation in the -dimension searching space.
is the moving velocity. denotes the cognition part called pbest, and represents the social part called gbest [18]. , ,

Initialize parameters for FODPSO
Select features where corresponding  > 0 Calculate fitness value for each particle by ELM

Record pbest and gbest
Update velocity and position for each particle as equation (8) and equation (5) Decide whether to kill or spawn swarms in DPSO Select new feature subsets Repeat FODPSO until reaching the maximum generation Test the selected features on testing set denote the inertia weight, learning factors, and random numbers, respectively. The searching process terminates when the number of generation reaches the predefined value.
Darwinian particle swarm optimization (DPSO) simulates natural selection in a collection of many swarms [25]. Each swarm individually performs like an ordinary PSO. All the swarms run simultaneously in case of one trap in a local optimum. DPSO algorithm spawns particle or extends swarm life when the swarm gets better optimum; otherwise, it deletes particle or reduces swarm life. DPSO has been proven to be superior to original PSO in preventing premature convergence to local optimum [25].
Fractional-order particle swarm optimization (FOPSO) introduces fractional calculus to model particles' trajectory, which demonstrates a potential for controlling the convergence of algorithm [26]. Velocity function in (4) is rearranged with = 1, namely, The left side of (6) can be seen as the discrete version of the derivative of velocity [V +1 ] with order = 1. The discrete time implementation of the Grünwald-Letnikov derivative is introduced and expressed as where is the sample period and is the truncate order. Bring Employ (8) to update each particle's velocity in DPSO, generating a new algorithm named fractional-order Darwinian particle swarm optimization (FODPSO) [27,28]. Different values of control the convergence speed of optimization process. The literature [27] illustrates that FODPSO outperforms FOPSO and DPSO in searching global optimum.

Procedure of ELM FODPSO.
Each feature is assigned with a parameter within the interval [−1, 1]. The th feature is selected when its corresponding is greater than 0; otherwise the feature is abandoned. Assuming the features are indimensional space, variables are involved in the FODPSO optimization process. The procedure of ELM FODPSO is depicted in Figure 2.

Comparative Methods.
Four methods, ELM PSO [15], ELM FS [29], SVM FODPSO [10], and RReliefF [30], are used for comparison. All of the codes used in this study are implemented in MATLAB 8.1.0 (The MathWorks, Natick, MA, USA) on a desktop computer with a Pentium eight-core CPU (4 GHz) and 32 GB memory.

Datasets and Parameter Settings.
Seven public datasets for regression problems are adopted, including four mentioned in [29] and additional three in [31], where ELM FS is used as a comparative method. Information about seven datasets and the methods involved in comparisons are shown in Table 1. Only the datasets adopted in [29] can be tested by their feature selection paths; thus D5, D6, and D7 in Table 1 are tested by four methods except ELM FS. Each dataset is split into training set and testing set. 70% of the total instances are used as training sets if not particularly specified, and the rest are testing sets. During the training process, each particle has a series of feature coefficients ∈ [−1, 1]. Hidden layer neurons number is set as 150, and kernel type as sigmoid. 10-fold cross-validation is performed to gain relatively stable MSE.
For FODPSO searching process, parameters are set as follows: is formulated by (9) Convergence rate is analyzed to ensure the algorithm convergence within 200 generations. The median of the fitness evolution of the best global particle is taken for convergence analysis, depicted in Figure 3. To observe convergence of seven datasets in one figure more clearly, the normalized fitness value is adopted in Figure 3, calculated as follows:

Comparative Experiments.
In the testing set, MSE acquired by ELM is utilized to evaluate performances of Computational Intelligence and Neuroscience 5    and SVM in seven datasets is recorded in Table 2. It is observed that ELM acquires faster training speed in six of seven datasets. Compared with SVM, single hidden layer and analytical approach make ELM more efficient. Faster speed of ELM highlights its use in feature selection due to many iterative actions involved in FODPSO.
ELM FODPSO, ELM PSO, and ELM FS adopt the same learning algorithm, yet employ FODPSO, PSO and Gradient Descent Search as optimization algorithms, respectively. For D1, D2, and D3, ELM FODPSO and ELM PSO perform better than ELM FS; the former two acquire lower MSE than ELM FS under similar feature scales. For D4, three methods get comparable performance. Table 3 shows the minimum MSE values acquired by five methods and the corresponding numbers of selected

Conclusions
Feature selection techniques have been widely studied and commonly used in machine learning. The proposed method contains two steps: constructing fitness functions by ELM and seeking the optimal solutions of fitness functions by FODPSO. ELM is a simple yet effective single hidden layer neural network which is suitable for feature selection due to its gratifying computational efficiency. FODPSO is an intelligent optimization algorithm which owns good global search ability.
The proposed method is evaluated on seven regression datasets, and it achieves better performance than other comparative methods on six datasets. We may concentrate on Computational Intelligence and Neuroscience 7 exploring ELM FODPSO in various situations of regression and classification applications in the future.

Conflicts of Interest
The authors declare that they have no conflicts of interest.