A Novel Reformed Reduced Kernel Extreme Learning Machine with RELIEF-F for Classification

With the exponential growth of the Internet population, scientists and researchers face the large-scale data for processing. However, the traditional algorithms, due to their complex computation, are not suitable for the large-scale data, although they play a vital role in dealing with large-scale data for classification and regression. One of these variants, which is called Reduced Kernel Extreme Learning Machine (Reduced-KELM), is widely used in the classification task and attracts attention from researchers due to its superior performance. However, it still has limitations, such as instability of prediction because of the random selection and the redundant training samples and features because of large-scaled input data. This study proposes a novel model called Reformed Reduced Kernel Extreme Learning Machine with RELIEF-F (R-RKELM) for human activity recognition. RELIEF-F is applied to discard the attributes of samples with the negative values in the weights. A new sample selection approach, which is used to further reduce training samples and to replace the random selection part of Reduced-KELM, solves the unstable classification problem in the conventional Reduced-KELM and computation complexity problem. According to experimental results and statistical analysis, our proposed model obtains the best classification performances for human activity data sets than those of the baseline model, with an accuracy of 92.87 % for HAPT, 92.81 % for HARUS, and 86.92 % for Smartphone, respectively.


Introduction
In recent decades, the rapid advancement in technology increased computation capacity that welcomed the second spring of Artificial Intelligence (AI). As the backbone of AI, Machine Learning (ML) touches our daily life, and even we do not notice. For example, some functions are applied in the wearable devices, including sporting detection, fall detection, and activity detection. ese applications based on the classification algorithms are implemented successfully in our real-world life. Many classical neural networks, such as Artificial Neural Network, Support Vector Machine, and Back-propagation algorithm, performed well for the classification tasks [1][2][3]. However, the main limitation of these algorithms is the heavy computation, especially for the largescale data. In the support vector machine, the kernel method, which connects the input layer of the model with the hidden layer, increases the computational complexity. At the same time, the main reason of backpropagation and artificial neural network with heavy computation is to compute suitable input weights and output weights for the neural network.
To solve the problem of complex computation, in 2004, Huang et al. proposed a single-layer feed-forward network called Extreme Learning Machine [4]. Due to the random selection of input weights between the input and hidden layer, it was faster thousands of times and achieved better performance in classification than that of the traditional algorithms [5]. After that, Extreme Learning Machine with Kernel (KELM) was proposed [6], which used Gaussian function to connect input layer and hidden layer and then found a least-squares solution. It obtained better performance in classification than that of conventional extreme learning machine [7]. However, the computation of the kernel method is heavy, especially for the large-scale data. In 2016, Deng et al. proposed a fast kernel algorithm called Reduced Kernel Extreme Learning Machine (Reduced-KELM) [8]. It randomly selects a certain percentage of training samples. Although this strategy reduced the computation complexity and solved the limitation of KELM, the random selection method became an unstable element that leads to the unstable forecasting performance.
To overcome the limitations mentioned above, there are two main aims in this study. e first one is to filter out redundant features based on feature selection methods, because the large-scaled data usually appears in the training process. In aspect of feature selection, RELIEF-F is the one of the efficient algorithms that is used to select features in the different models. Paper [9] applied RELIEF-F to select training features for the classifier on the facial expression recognition. Yahdin et al. employed RELIEF-F for the feature selection in the prediction of the relevance education background [10]. In 2021, Cui et al. applied machine learning methods with feature selection method, RELIEF-F, to classify the wood materials [11]. ese classification algorithms with RELIEF-F have better performance than those without RELIEF-F. Besides, paper [12] concluded that RELIEF-F had much better performance in feature selection than other feature selection methods. erefore, RELIEF-F plays a significant role in the feature selection and assisting in enhancing performance of classification. e other one is to overcome the limitation of random element in the model Reduced-KELM and enhance the performance in the classification. e aim of randomly selected sample in Reduced-KELM is to select samples that represent the all features from the training data. However, random selection approach cannot select all samples with the different features and probably miss important features.
is situation leads to the decrease forecasting performance and unstable prediction performance. To solve random influence in the process of selecting training features, clustering method is applied to select suitable samples in the training phase or reduce the complex computation of training part. For example, Wu et al. combined K-means clustering method with KELM. It successfully reduced the complexity of computation in the training process [13]. Huang et al. proposed a clustering method with extreme learning machine for classification, which increased the ability of classification [14]. Moreover, samples selection method also impacts on the model performance. Liu et al. applied samples selection method based on correlation analysis, and Fisher is proposed, which could remove the redundant features that had close correlations with each other, to extreme learning machine [15]. It showed the role of samples selection method in the speech emotion recognition model, which increased the speed of discriminating emotional states of different speakers from speech. ese proved that the good samples selection method played a vital role in increase efficient and speed of training model. Inspired by these summary and conclusions, this study applies RELIEF-F to select reliable features. It discards those insignificant features from the data set, which reduces the computation complexity in the training process. Moreover, a novel sample selection method called Reformed Sample Selection Method (RSSM) is proposed. It takes the advantages of K-means and Correlation Detection Selection (CDS) method and takes new strategy to seek more important samples from the training data by modification of K-means and CDS. is study applies RSSM to successfully replace the random selection part in the conventional Reduced-KELM. e proposed model is called Reformed Reduced Kernel Extreme Learning Machine. It not only solves the limitation of random selection in Reduced-KELM, but also improves the performance in classification. erefore, the main contributions of this study are summarized as follows: (i) RELIEF-F algorithm is applied to select relevant features for the training phase. It directly reduces the computation complexity and has less training time than the baseline model Reduced-KELM. (ii) We proposed a novel reformed reduced kernel extreme learning machine. It uses an efficient sample selection method to replace the random part of Reduced-KELM and obtain better performance in classification than that of the compared models. (iii) e proposed model performed better than the baseline models on both benchmark data and realworld data. Especially for human activity recognition, our proposed model has superior ability in human activity recognition task.
is paper is organized as follows. Section 2 reviews Extreme Learning machine, Kernel Extreme Learning Machine, and the works with RELIEF-F. Section 3 represents Reduced Kernel Extreme Learning Machine, RELIEF-F for the features reduction, three types of Sample Selection Method (K-means, Correlation Detection Selection, and Reformed Sample Selection Method), and our proposed model. Section 4 reports on data description, experimental design, the experimental results, and discussion on these results. Section 5 represents the comparison by the statistical method. Finally, the conclusion is represented in Section 6.

Related Works
Since the rapid development of machine learning algorithms, the artificial intelligence technologies were applied in various domains and achieved good performance, such as face recognition [16,17], time series prediction [18,19], and classification [20,21]. ese algorithms involve some traditional and classical neural networks. Taking Backpropagation Neural Network (BPNN) [22] and Support Vector Machine (SVM) [23] as examples, they showed the superability in classification and regression [24][25][26][27]. With the appearance of the 'big data era,' huge-scale data is collected. However, due to the characteristics of the traditional and classical neural networks, these algorithms cannot afford the heavy computation with large-scale data. e computation cost is a barrier to the implementation of these algorithms in the real world.
In the recent decade, random projection algorithms attracted lots of attention of researchers. Due to the random selection of weights, these types of algorithms solved the 2 Computational Intelligence and Neuroscience heavy computation problem. Extreme Learning Machine (ELM), which was proposed by Huang et al. [4], was one of the random projection algorithms. Paper [28] indicated that ELM was thousands of times faster in the training time and achieved better performance than the traditional neural networks, such as BPNN and SVM in classification and regression. In recent years, this algorithm and its variant algorithms were widely used in many domains, such as the stock market prediction [29], image classification [30], flight control [31], and speech emotion recognition [15]. Because ELM is a modified Single Layer Feed-forward Network (SLFN), before discussing ELM, SLFN should be introduced. e structure of SLFN can be shown in Figure 1, which includes three layers, regarding the input, hidden, and output layers.
We assume that there are N arbitrary samples (X, T), where the input samples represent X � [x 1 , x 2 , . . ., x L , . . ., x N ] ∈∈ R N×W , and its corresponding target values are T � [t 1 , t 2 , . . ., t L , . . ., t N ] ∈∈ R N×D . L stands for the number of training samples, and D is the number of output nodes. e hidden neurons can be shown as hidden matrix (H) that is calculated by the activation function (g (·)). e input weights (a) connect between the input layer and hidden layer. Output weights (β) connect the hidden layer with the output layer. en, the output (T) of a feed-forward neural network with S hidden neurons can be expressed as follows: where S is the number of hidden neurons, β represents the output weights with dimension of (S × D), a is the input weights with dimension of (S × L), and b is a bias matrix with dimension of (L × S). If there is no error between the activation function g(x) with S hidden neurons in the singlelayer feed-forward network and actual target values, the mathematical equation can be shown as follows: It can be extended as Traditionally, the main aim of training SLFN is the minimization of the cost function for finding the corresponding weights and bias. In this processing, the BP learning algorithm is used from the output to the input. e cost function is shown in Unlike SLFN, ELM applies the gradient-based algorithms and proposes an efficient learning algorithm for feedforward neural networks in order to solve the drawbacks of BP learning algorithm. Based on the theory of ELM, unlike the traditional activation function that requires adjusting the input weights and biases, the input weights and biases of hidden layer can be selected randomly. en, the training process of ELM is to find a least-squares solution β of (3), which is shown in the following equation: where H is the hidden matrix based on the activation function. It is a non-squared matrix that can be calculated by (6). e input weights (a) and hidden biases b were selected randomly.
Finally, Huang [4] proposed that the smallest norm leastsquares solution is where H † represents the Moore-Penrose generalized inverse of matrix H. Its mathematical equation can be shown as H † � (H ⊺ H) − 1 H ⊺ , where the superscript (⊺) of H stands for the transpose operator. erefore, the training process can be shown in Algorithm 1. At the same time, with the advent of the era of big data, the large-scaled data is widely used for training model. However, it also brings the huge computation and decreases the training efficiency. Although the training speed of ELM is faster than that of the conventional algorithms, it also faces this situation. Furthermore, the dimension of training samples impacts the complexity of computation. An efficient filter-method called RELIEF, which was proposed by Kiral [32], showed attributes based on how well their values distinguish among samples that are near each other. After that, Kononenko et al. updated the RELIEF algorithm [33] and proposed the RELIEF-F algorithm. It used the Manhattan (L1) norm to compute the distance between the nearhit and near-miss instances. It reported that RELIEF-F algorithm is an efficient method that takes absolute differences rather than the square of those differences. Besides, to reduce complex computation and increase training efficiency, researchers pay more attention to deal with input features before going through training phase in ELM. For example, Tian et al. applied RELIEF-F as feature selection method in ELM for the gait recognition [34]. In Paper [35], RELIEF-F algorithms is used to propose a feature selection technique for the purpose to eliminate redundancy. It reported that this structure of model with feature selection technique showed significant improvements than other existing forecasting models in terms of forecast accuracy and convergence rate. Many studies [36][37][38][39] concluded that RELIEF-F, as a feature selection technique, is an efficient and common approach for eliminating redundant features.
However, due to the random selection of input weights in ELM, the forecasting results are not the same under the same parameters setting of ELM, which causes the unstable forecasting performance, while the number of hidden neurons is also required to define by user. To solve unstable Computational Intelligence and Neuroscience forecasting performance problem, Kernel Extreme Learning Machine (KELM) was proposed by Huang in 2011 [5]. It applied the kernel method to connect input layer and hidden layer, which avoided the unstable forecasting performance from ELM causes by the random selection of the input weights.
In KELM, the hidden matrix (K) is calculated by Gaussian function k (·), which is represented as where the training samples represent X L � x 1 , x 2 , . . . , x L . e output weights (β) of KELM can be computed by where I is an identity matrix, C represents the regularization parameter that generally is defined as 1, and T L � T 1 is the corresponding e papers indicating kernel functions played a vital role in KELM compared with conventional ELM in regression and classification [6,7,40]. However, the kernel method with large-scale data generates the huge dimensional kernel matrix, which directly leads to the heavy time consumption in the training process of KELM.
To overcome the limitation of KELM, Deng et al. proposed an efficient and fast model called Reduced Kernel Extreme Learning Machine (Reduced-KELM) [8]. It applied random method to select part of training sample to calculate the hidden kernel matrix, which to some extent reduces the computation. However, due to random selection for training samples, its forecasting performance is not stable. Based on above revision, Table 1 briefly summarizes the advantages and drawbacks of ELM, KELM, and Reduced-KELM.
is study is inspired by the idea of RELIEF-F. Firstly, it applies RELIEF-F to discard useless features of training data. Secondly, to solve the limitation of Reduced-KELM, we propose a novel sample selection method to replace random selection method of Reduced-KELM. Finally, we propose a model named Reformed Reduced Kernel Extreme Learning Machine with RELIEF-F. e following section describes details of the proposed techniques.

Methodology
is section explains a novel framework for reducing training computation and improving performance during classification. Firstly, RELIEF-F algorithm is proposed for feature selection, which discards the irrelevant features and reduces the training time of the classifier. Secondly, two sample selection methods, including K-mean and correlation detection selection, are introduced.
en, a novel sample selection method named Reformed Sample Selection Method is proposed, which is combined with K-means and Correlation Detection Selection method. Finally, this novel sample selection method successfully replaced the random part of Reduced-KELM, which generates a model called Reformed Reduced Kernel Extreme Learning Machine with RELIEF-F. Require: Input data matrix X, the corresponding target values T with D output nodes, the number of hidden neurons S, and activation function, g (·). Ensure: the output weights β. (1) Random select input weights (a) and biases (b); (2) Calculate hidden matrix H by (6); (3) Calculate output weights β by (7). ALGORITHM 1: e training process of ELM. 4 Computational Intelligence and Neuroscience KELM applies all training samples to generate the hidden matrix by Gaussian activation function. e main idea of model Reduced-KELM is to reduce complexity in the computation of kernel matrix by randomly selecting a certain percentage of training samples from all training samples to compute the hidden kernel matrix. It is less timeconsuming as it uses only 10 percent of nodes. Paper [8] concluded that Reduced-KELM, randomly selecting ten percentage of nodes, rapidly decreased the training time and achieved almost the same performance as KELM. In the following experiments, we apply ten percent as the random selection percentage in Reduced-KELM. It is assumed that X � x i m i�1 is certain percentage of training samples that are randomly selected, where m is the total number of selected samples. en, the hidden matrix of Reduced-KELM is computed by using the following equation: e dimension of the hidden matrix (K) in Reduced-KELM is reduced from (L × L) to (L × m), which directly decreases the computation of the training process. It computes output weights by e training process of Reduced-KELM is summarized in Algorithm 2.
Reduced-KELM has less training time than the conventional KELM due to the random selection of support vectors for computing kernel matrices. However, the classification result of Reduced-KELM is unstable. To overcome this limitation and to further reduce the training time, this study proposes a RELIEF-F algorithm for selecting features of observations. It represents a novel sample selection method to replace the random selection process of support vectors for enhancing classification performance. e following subsections will introduce the details.

RELIEF-F Algorithm for Features Reduction.
In this study, inspired by the characteristic of the RELIEF-F algorithm and successful application on regression and classification models, it is applied to select features from data sets. e following is the process of feature selection by the RELIEF-F algorithm. Firstly, a feature is selected randomly as R. en, its P has searched nearest neighbors from the same class that are named as the nearest hits (B). At the same time, it also searches P nearest neighbors from other different classes as M. It updates the quality estimation W D for all features based on the selected features R and M by (13). e updated formula is similar to that of RELIEF. Our proposed algorithm weighs the contribution from each class of the misses with the prior probability of that class (P). e contributions of hits and misses in each step will be in the range between zero and one. e values of W determine the ranking of the importance of features. It discards all features with values that are less than zero. e rest of the features continue to process in the training part of the model.
where the initial weight W D is set as zero, diff is a function for calculating the absolute difference, and P(C) stands for the probability of this attribute appears in class C. is algorithm seeks M for each different class and averages their contribution for updating estimates W D , which estimates the ability of features for the target values. In order to reduce useless features from the data set, this study applies the RELIEF-F algorithm to calculate the weight values of each attribute in the data set and then discards all features with negative weight values for reducing the dimension of feature vectors. Besides that, RELIEF-F originally needs to search P nearest neighbors from the same class. e number P requires to be defined by the user. In this study, P is defined as 10 based on the reference paper [33]. In our proposed model, RELIEF-F is used before starting training model. It is an efficient algorithm for reducing dimension of features and complex computation in the further process. RELIEF-F algorithm using filter-method approach calculates a score (weight) for each feature to identify which features are most relevant to the set of instances. A weight is linked to each attribute, where the most relevant attribute has the highest weight. If a feature value difference is observed in a neighboring instance pair with the same class, the weight decreases. Alternatively, if a feature value difference is observed in a neighboring instance pair with different class values, the weight increases. Compared to positive weight features, negative weight features will have more chance in the same or closed class [41]. Moreover, Kira and Rendell demonstrated that, statistically, the relevance level of a relevant feature was expected to be larger than zero and that of an irrelevant one was expected to be zero or negative [32]. erefore, generally, the threshold of RELIEF-F (τ) should be defined such as τ > 0. is new method applies modified K-means and correlation detection selection methods to select the efficient samples. Before introducing the proposed approach, we described the two new sample selection methods. [42] is a classical clustering method, which solves the optimal clustering center and optimal classification by learning. It has a high learning efficiency and can process large-scale data [43]. In this paper, the K-means algorithm clusters the data for achieving stable prediction and higher accuracy than the conventional Reduced-KELM.

K-Means. K-means
K-means is an unsupervised learning clustering algorithm and one of the most popular clustering algorithms at present. It applies the Euclidean distance metric as the standard similarity analysis method and divides the whole into a certain number of classes with high similarity. It assists in decreasing the number of samples and applying the cluster centroid position to stand for the original samples. e main goal of the K-means algorithm is to minimize the sum of the squared errors on all Z clusters. Its mathematical equation is as follows: where μ Z represents the average value of all data that belonged to cluster Z (Z � 1, 2, . . . , Z { }). It is assumed that the data set contains N sample data, and the number of clusters is set as Z. Firstly, Z observations are selected from the whole data and set as the cluster center of the initial partition. Secondly, according to the similarity measurement method, it computes the distances between the undivided sample data and each cluster center point. After that, it divides the observations that are closer to the cluster center into the corresponding cluster. en, it calculates the sum of square error between the center position and the corresponding observations for all classes. With moving the cluster center, observations belonged to each class are redivided until there is no change in the sum of squared errors of class. Finally, K-means will return the center position of each cluster. Algorithm 3 summarizes the process of the K-means sample selection method. e returned values can be used to replace the random part of Reduced-KELM for achieving stable forecasting performance.

Correlation Detection Selection Method.
ere are many samples for the different classes in the classification. Generally, these samples are not all useful for the training model. As compared to the K-means clustering for sample selection, this study proposes an efficient technique named as Correlation Detection Selection method (CDS). It mainly finds the correlation among samples and discards the samples with high correlation values. Discarding the samples with similar information not only plays a positive role in training the model, but also replaces unstable random parts in Reduced-KELM. It increases the classification performance.
e main idea of CDS is to select memory without the high correlation values from all training data observations. Firstly, we initialize the threshold of CDS as δ, and the initial memory (mem) is defined as the first observation in the training data (mem � X 1 ). Secondly, this method calculates the average value of correlation between the coming sample (X i ) and filtered memory (mem). e coming sample will add into the filtered memory when the average correlation value is smaller than the threshold of CDS (δ). In contrast, it will exclude the coming data from the filtered memory. Algorithm 4 shows the pseudocode of CDS.

Reformed Sample Selection Method.
In this section, a new sample selection method named Reformed Sample Selection Method (RSSM), which applies the advantage of K-means and CDS to seek the more suitable samples for calculating kernel matrix, is proposed.
In RSSM, randomly set Cent as the initial centroids from the input matrix X L and find out the samples that are nearest to each centroid based on Euclidean distance. Based on these samples, we can recalculate the position of centroids of Cent. en, J can be computed. Cent can be computed until the value of J is not changed based on (14). In the next step, we initialize memory as mem � Cent 1 . Start from the second Cent, and calculate the average value of correlation (AC) between coming sample from Cent and mem. Based on the condition, mem can be updated. Finally, the matrix of mem can be returned. Algorithm 5 shows the detail about the pseudocode of Reformed Sample Selection Method.

Proposed Model: Reformed Reduced Kernel Extreme
Learning Machine with RELIEF-F. For fair comparison and to further decrease the computation, the data is processed by RELIEF-F algorithm firstly. It is a first step to deal with input features. e output of RELIEF-F can be set as a new input data for the further steps. Based on the sample selection Require: Training input data matrix X L , the corresponding target values T L with D output nodes, and kernel function, k (·). Ensure: the output weights β.
(1) Random select m samples from all training observation as support vectors X; (2) Calculate reduced hidden matrix K by (11); (3) Calculate output weights by (12). ALGORITHM 2: e training process of Reduced-KELM. 6 Computational Require: Training input data matrix X L , and the threshold of CDS δ. Ensure: the filtered memory mem. Initial Part: (1) Sort training samples by class; (2) Set the threshold of CDS as δ; (3) Set the initial filter memory mem � X 1 ; e Selecting Part: Calculate the average value of correlation (AC) between X i and mem; (6) if AC < δ then (7) mem � [mem; x i ]; (8) else (9) mem � mem. (10) end if (11) end for (12)

Experimental Works
To enhance the ability of classification and overcome the limitation of Reduced-KELM, this section designs two experiments. ey employed the eight data sets, including benchmarks and real-world human activity data, to evaluate the classification ability for the RELIEF-F algorithm and Reduced-KELM with the different sample selection methods, respectively. is section mainly introduces data description, experimental design, and parameter setting. Lastly, based on the experimental design, the experimental results and discussion will be introduced.

Data Description.
In the experimental section, the five benchmarks data sets and three human activity data sets are used for evaluating the classification ability.
A set of commonly used benchmarks includes German, Image, Ringnorm, Twonorm, and Waveform, available at UCI Machine Learning Repository [44]. ese data sets contain binary class classification tasks. Furthermore, with the data explosion and popularity of portable devices, researchers and developers pay more attention to human activity recognition, such as fall detection and sports detection in portable devices. en, this study employs three real-world data sets to evaluate, including the Human Activities and Postural Transitions Recognition using Smartphone Data (HAPT) [45], Human Activity Recognition Using Smartphones Data Set (HARUS) [46], and Smartphone Data set for Human Activity Recognition in Ambient Assisted Living Data Set (Smartphone).
Besides, we separated the percentage of training and testing data in all benchmarks as 70% and 30%, respectively. e training and testing data of all real-world data sets are divided by their data source. We used the same division in our experiments. ese data sets involve multiclass classification tasks. Table 2 shows the details of each data set.

Experiment Design and Parameter Setting.
To evaluate the ability of our proposed methods and compared models fairly, this study designs two experiments. All experiments are simulated on Matlab2014a in the laptop with Windows 10, 16 GB RAM environment. e first experiment compares the classification performance of model Reduced-KELM with RELIEF-F algorithm with that of the conventional Reduced-KELM. It indicates the role the RELIEF-F algorithm plays in the features dimension reduction in Reduced-KELM. e performances of all benchmarks and human activity data in model Reduced-KELM are compared with those of Reduced-KELM with the RELIEF-F algorithm. e main aim of the RELIEF-F algorithm is to rank the features based on their importance in the classes and keep reliable attributes for the following training phase. Based on this algorithm, the feature selection process not only improves the classification performance, but also decreases the training time rather than conventional Reduced-KELM. To compare models fairly, the design of parameter setting needs to make sure that every model has the best performance under specific parameter setting. In the first experiment, the number of P nearest neighbors needs to be defined, which is critical to the performance of RELIEF-F algorithm. Based on the conclusion of paper [33], P is defined as ten in the first experiment. At the same time, we set the percentage of random selection as ten for all models in the first experiment, including the conventional Reduced-KELM and Reduced-KELM with RELIEF-F. Because the reference paper [8] concluded that Reduced-KELM randomly selected ten percentage of nodes that assisted on rapidly decreasing the training time, the performance of Reduced-KELM obtained was almost at the same level as that of KELM. Besides, due to the implementation of the kernel method, the kernel parameter impacts the performance in classification. For fair comparison among the models, the value of the kernel parameter is defined as one for all models in the first experiment.
On the other hand, the second experiment mainly observes the role the three different sample selection methods played in classification by model Reduced-KELM. ese three methods successfully replace the random part of Reduced-KELM, respectively. is experiment shows the superior ability of selecting samples in the different sample selection methods and the ability of reducing the complexity computation of training model. It compares the performance of the proposed model R-RKELM with the conventional Reduced-KELM, K-RKELM, and C-RKELM. To reflect the connection between the first experiment and the second experiment, the second experiment applies the data sets that are processed by RELIEF-F algorithm. e parameters of RELIEF-F algorithm and kernel method in the second experiment are the same as the first one. To exhibit the performances of model under the different measurements, except for the common measurement accuracy and the corresponding Standard Deviation (SD) and Time, Sensitivity, Specificity, and Precision are employed to evaluate the performance in all experiments as well. At the same time, to observe the generalization ability, the fifty times will be run, and then calculate their average values of measurements when the model has a random selection method. A high standard deviation indicates that the accuracy values among fifty times are spread out over a wider range, and vice versa. 8 Computational Intelligence and Neuroscience

Experimental Results and Discussion.
e first experiment demonstrates the differences between the conventional Reduced-KELM and Reduced-KELM with RELIEF-F algorithm (Relief-F). Relief-F algorithm is applied in the benchmarks and real-world data sets. e ranking of predictor weights is shown in Figure 2, which represents the level of importance of features.
Based on these bar charts in Figure 2, the values of the vertical axis represent the level of importance for features. Because of the conclusion from paper [32], compared to the positive weight features, the negative weight features have more chance in the same or closed class. ese features probably are redundancy. And paper [41] showed that the features with positive weights have much better performance than that with negative weights. erefore, this study discards the features with values that are below zero. e dimension of data sets can be reduced by RELIEF-F algorithm. e final dimension of each data set is shown in Table 3. e column named 'Difference' represents the number of features that the RELIEF-F algorithm has reduced. For example, the German data filters half of the features from original data, and Ringnorm has only reduced one attribute by RELIEF-F. Table 4 indicates the performances between Reduced-KELM and Reduced-KELM with RELIEF-F in the eight data sets. It represents the performance of accuracy, Difference (Accuracy of Reduced-KELM -Accuracy of Relief-F), Standard Deviation (SD), and training time (Time). Besides, the other three measurements are also shown in Table 4, including Sensitivity, Specificity, and Precision.
In terms of accuracy, only one data set (Twonorm) shows the best performance in model Reduced-KELM. e rest of the data sets obtain the super classification ability in the Relief-F model rather than the conventional Reduced-KELM. On average, the growth rate of accuracy by relief-F in these data sets reaches 1.33 %. At the same time, the positive value of difference indicates that the super classification ability in model relief-F is better than the conventional model Reduced-KELM, and vice versa. e maximum difference of accuracy appears in German data. On the contrary, the image obtains the minimum difference. Although three data sets (including Twonorm, Waveform, and HAPT) have the same performance in SD for the conventional Reduced-KELM, Relief-F obtains the minimum value in standard deviation for the rest of data sets. In aspect of training efficiency, the main achievement of Relief-F is saving the training time. Especially for the data with high dimensions, such as HAPT, HARUS, and smartphones, the training time (minutes) is reduced by relief-F with 0.0371, 0.1102, and 0.031, respectively. In other measurements, the Relief-F algorithm has reduced Sensitivity for the majority of the data sets. is situation indicates that the Relief-F algorithm has a more stable prediction ability than the conventional Reduced-KELM. At the same time, the same performances appear in Specificity and Precision. Expect for Twonorm, HAPT, and HARUS, the Relief-F algorithm shows much better classification performance than the Reduced-KELM model. erefore, the Relief-F method not only improves the accuracy of classification in benchmarks and real-world data sets, but also has saved the training time.
e second experiment compares the performance in classification of the proposed model R-RKELM with the model K-RKELM and C-RKELM. Table 5 collects the information about accuracy, SD, time, sensitivity, specificity, and precision for the model K-RKELM, C-RKELM, and R-RKELM.
Require: Training input data matrix X L ; the corresponding target values T L with D output nodes; kernel function, k (·); the number of clusters, Z. Ensure: the output weights β.

RELIEF-F Algorithm:
(1) Based on RELIEF-F, the input features are processed and the output matrix (Rel) can be obtained; Reformed Sample Selection Method: (2) Set Rel as input data; (3) Return mem based on Algorithm 5; (4) X � mem; Training Model: (5) Calculate reduced hidden matrix K by (11); (6) Calculate output weights β by (12). ALGORITHM 6: e training process of R-RKELM. Computational Intelligence and Neuroscience 9 In terms of accuracy, the proposed model, R-RKELM, successfully assists on enhancing classification performance in model Reduced-KELM. For the whole data sets, model R-RKELM obtains the best performance in accuracy than the other two models. Moreover, due to the random characteristic in the K-means algorithm, there is a minor difference in performance appearing in forecasting results when K-RKELM and R-RKELM are run repeatedly. e standard deviation represents the degree of forecasting difference in all predictions. Except for HARUS and Smartphone data, model R-RKELM obtains the lowest value in Standard Deviation. In the aspect of training time, benchmarks data sets take longer training time in model C-RKELM than model R-RKELM. In HAPT and HARUS data, model R-RKELM takes the similar time in the training process   than model C-RKELM. Model R-RKELM in Smartphone data takes less than five times as much training time as model C-RKELM. In Sensitivity, model C-RKELM has the best performance than other models. Model R-RKELM shows the best performance in Specificity for the majority of data sets. Compared with performance in Precision, only real-world data sets achieve the best performances in model R-RKELM. However, there is a small gap between R-RKELM and other models. erefore, the three sample selection methods play a positive role in Reduced-KELM for enhancing classification performance. And the proposed model R-RKELM has the best achievement in terms of classification performance.

Statistical Analysis
According to the comparison results in Tables 4 and 5, the best performances are achieved by model Relief-F and R-RKELM, respectively. To measure the level of classifying ability between R-RKELM and Relief-F, this study has applied Wilcoxon-signed Rank Test to test whether R-RKELM has superior in classification ability to Relief-F. Table 6 reports the accuracy of R-RKELM and Relief-F in each data set. e difference between these two models in terms of accuracy for all the eight data sets is computed. e ranking number based on these absolute difference values is shown.
en, values of R+ and R− are computed. R+ represents the sum of ranks for the positives, and R− stands for the sum of ranks for the negatives in Table 6. R+ is 36 and R− is 0. Based on the table of critical values, at the confidence level of p � 0.05, the difference between the algorithms is significant if the value of R− is less than 3. Based on the result, we conclude that model R-RKELM has the super classification ability with model Relief-F statistically.

Conclusion
is study introduces a novel classifier called Reformed Reduced Kernel Extreme Learning Machine with RELIEF-F  Computational Intelligence and Neuroscience (R-RKELM) for human action recognition. e proposed framework has two stages. In the first stage, it employs RELIEF-F to discard the irrelevant features with the negative values in the weight. e second stage focuses on the training samples selection for the reduction of computation complexity. Moreover, the proposed approach NSSM in R-RKELM takes advantage of K-means and CDS to replace the randomly reduced part of conventional Reduced-KELM, which reduces the unstable element for classification. Based on the experimental evaluation on eight data sets and statistical analysis, R-RKELM has much better performance in terms of classification and training time than conventional Reduced-KELM than other baselines. e accuracy of the proposed model reached around 90 %. In the future, we will focus on the parameter dependency in our proposed model. e kernel parameter impacts the performance of classification.

Data Availability
All data sets used in paper are from UCI Machine Learning Repository

Conflicts of Interest
e authors declare that they have no conflicts of interest.