An ECoG-Based Binary Classification of BCI Using Optimized Extreme Learning Machine

In order to improve the accuracy of brain signal processing and accelerate speed meanwhile, we present an optimal and intelligent method for large dataset classiﬁcation application in this paper. Optimized Extreme Learning Machine (OELM) is introduced in ElectroCorticoGram (ECoG) feature classiﬁcation of motor imaginary-based brain-computer interface (BCI) system, with common spatial pattern (CSP) to extract the feature. When comparing it with other conventional classiﬁcation methods like SVM and ELM, we exploit several metrics to evaluate the performance of all the adopted methods objectively. The accuracy of the proposed BCI system approaches approximately 92.31% when classifying ECoG epochs into left pinky or tongue movement, while the highest accuracy obtained by other methods is no more than 81%, which substantiates that OELM is more eﬃcient than SVM, ELM, etc. Moreover, the simulation results also demonstrate that OELM will signiﬁcantly improve the performance with p value being far less than 0.001. Hence, the proposed OELM is satisfactory in addressing ECoG signal.


Introduction
e development of brain-computer interfaces (BCI) has undergone extensive growth in recent years, with the aim of providing an effective method for human-computer interaction without neuromuscular transmission [1]. e ultimate goal of BCI research is to establish a direct communication system that translates human intentions, which is reflected by specific brain signals, into a control command for output devices [2].
According to the method by which users derive their neural signals, BCI can be classified into noninvasive, invasive, and partially invasive ones. ElectroCorticoGram (ECoG), which is obtained by putting electrical electrodes directly on the cortex, has attracted substantial and increasing interest and has been the dominant signal used for invasive BCIs due to its high spatial and temporal resolution [3]. In 2004, the first online ECoG BCI study by Leuthardt et al. provided initial evidence that ECoG signals contain information about the direction of hand movements, which was one of the earliest demonstrations to show that specific details of motor function can be accurately inferred without measurements from individual neurons [4]. ese properties of ECoG offer the possibility of a new nonmuscular communication and control channel, a practical BCI system. By converting ECoG to machine instructions that control peripheral devices, BCI enables users to interact with the outside world with their own thought.
Nowadays, there exists a considerable part of groups which cannot manually control machines. Besides, the study on the human brain in military applications and living entertainment has a greater command. erefore, the research and development of an effective method is very promising. So far, scientists from a multitude of disciplines have achieved good results in this field. For instance, American Wadsworth Center has designed a multiclassification BCI based on P300 potentials, so paralyzed people with atresia can input 36 characters, numbers, and spaces through the signal corresponding to specific brain activities, rather than using their own fingers [5]. Besides, BCI research team of Science and Technology University from Austria has also designed an algorithm for different motor imagery signal classification, so that patients with paralyzed arm can achieve a simple action of drinking. In China, the BCI research team from Tsinghua University has designed an automatic dialing system that controls telephone, which is connected to a computer for real-time dial by interpreting brain thinking to corresponding numbers [5].
In recent years, many innovative methods are commonly used in processing binary-class ECoG signals of motor imagery [6,7]. For example, support vector machine (SVM) is a supervised learning model that is used for pattern recognition, classification, and regression analysis. However, when processing training samples in large size, it is difficult to implement and cannot produce satisfactory results whether in precision or speed. erefore, a newly different classification method named extreme learning machine (ELM) is proposed by scholar Guang-Bin Huang from Nanyang Technological University [8]. ELM has better generalization performance and fast executing time than SVM. We adopt a newly developed method which is called Optimized Extreme Learning Machine (OELM) to distinguish the imagined movements between left pinky and the tongue on the basis of ELM. It serves as a classification algorithm that classifies two different kinds of ECoG signals quickly but accurately. We expect that results obtained by OELM show superior performance in both classification accuracy and speed. Especially when processing signals in large size, it is supposed to save a lot of time.
is paper is organized as follows: Section 2 describes the data acquisition and description, including how data are acquired and distributed for training and testing. In Section 3, we focus on basic BCI algorithm research and implementation, where we present that OELM serves as the classifier. Prior to that, we use CSP to extract features. Principles and procedures of these algorithms are described in detail. Section 4 carries out experimental results and analysis on ELM and OELM, respectively. Finally, the paper is concluded in the last section.

Data Acquisition and Description
Based on the existing research, there is substantial theoretical and empirical evidence that ECoG could support a clinically and functionally reliable BCI with a high level performance.
us, it is reasonable to envisage that an ECoG-based implant could substantially enhance the functional capability of a disabled patient by enabling their ability to modulate their environment, communicate, or control a prosthesis [9]. In order to evaluate the proposed classification algorithms, dataset I of BCI competition III is adopted in this study. It is provided by the University of Tübingen, Germany, Dept. of Computer Engineering (Prof. Rosenstiel) and Institute of Medical Psychology and Behavioral Neurobiology (Niels Birbaumer), etc.
[10]. Compared to signals acquired from the scalp, such as electroencephalography (EEG), and intraparenchymal single neuronal recordings, ECoG recordings show characteristics that make them especially suited for basic neuroscience research. ese characteristics include high spatial resolution and signal fidelity, resistance to noise, and substantial robustness over long recording periods. us, we regard the ECoG data as the most suitable datasets to validate the OELM.
All ECoG data were collected during two imagined movements of either left pinky or tongue. Recordings were performed with a sampling rate of 1000 Hz from 64 platinum electrodes, whose size is approximately 8 × 8 cm [10]. e electrodes are placed as an array which covers a specific area of cortex. Considering that brain potential is weak and prone to interference, we are supposed to characterize the incoming electrosensory signals. Electronic sensors are put inside the electrodes and all fitted with serial current-limiting resistors to measure afferent signal intensity and guarantee the high fidelity [11].
ere are 378 trials in total. Every single trial represents either an imagined tongue or finger movement and is recorded for 3 seconds duration. To avoid visually evoked potentials being reflected by data, recording intervals start 0.5 seconds after visual cue has ended. After being amplified and filtered, the recorded potentials are stored as microvolt values [10]. We measure minute differences in the voltage between neurons among each trial.
Among the total 378 trials, 278 labeled trials are used to train classifiers, whereas other 100 unlabeled testing trials are available for measuring generalization performance of the trained classifier. Concerning on the situation that training data and testing data are picked up from the same subject with the same task, but on a different date with about one week apart, the design of our classifier remains challenging. More detailed description can be found in [10]. e task in this study is to try to correctly classify the testing dataset (100), where all these samples belong to either negative or positive class. Our goal is to turn those tiny voltage measurements into two imagery robotic movements of left pinky or tongue. e detailed block scheme of the BCI system is presented in Figure 1 [12].
In Figure 1, the grid of 8 × 8 ECoG platinum electrodes is placed on the contralateral (right) motor cortex. Signals are obtained from the brain through the primary electrosensory afferents. We transmit ECoG signals to computer for subsequent data processing. Peripheral devices can then be controlled by the generated motor commands [13].

Basic BCI Algorithm Research and Implementation
In this section, first we adopt CSP to extract features of ECoG signal and then transfer them to feature classification module, using ELM to train and test the corresponding data. en an improved classification algorithm named Optimized Extreme Learning Machine (OELM) is proposed and put into practice on the basis of ELM. Here the basic 2 Complexity principles of these three algorithms are introduced. Also, the detailed process and procedure are presented.

Common Spatial Pattern Algorithm Principle.
Common spatial pattern (CSP) is an effective method for feature extraction in discriminating two kinds of data. In recent years, CSP has become greatly popular to extract ECoG features. It is a signal processing method based on two or more different brain potentials, and it filters the signal in space [14]. e fundamental idea is that after filtering, spatial energy of two kinds of signals holds the biggest differences. at is to find a projection direction that discriminates two classes of ECoG data, by maximizing the variance of one class while minimizing the variance of the other. e basic principles are as follows.
Assuming that X 1 and X 2 , respectively, represent two different types of ECoG signals, and their dimensions are both N × T, where N and T represent the number of channels and measurement samples corresponding to a single channel, respectively. Two covariance matrices C 1 and C 2 are calculated as follows: where X T represents the transpose of matrix X, while trace (X) gives the sum of diagonal elements of X. In order to get the experimental data with higher accuracy and lower error rate, multiple experiments have been carried out. Finally, take C 1 , C 2 as the average value of C 1 , C 2 , then the covariance matrix is obtained by using principal component analysis as shown in the following formula: where U is the matrix composed by eigenvectors of the mixed covariance matrix C, while Λ is a diagonal matrix composed by eigenvalues correspondingly. e whitening transformation matrix P is represented by the following formula: en, matrices C 1 and C 2 carry on the whitening transformation, respectively. Results are obtained as shown in the following formula: where S 1 and S 2 share common eigenvectors. It can be proved that if two diagonal matrices are summed, the result is an identity matrix. at is, From formula (6), it is not difficult to find out that the maximum variance of one class leads to the minimum variance of the other. us, the two spatial filters are designed according to such property. Sorting the eigenvalues in descending order, then take out the maximum m value in Λ 1 . e first spatial filter F 1 is constructed using the m eigenvectors. In the same way, the maximum m eigenvalues in Λ 2 are taken out, and the corresponding eigenvectors are used to construct the second spatial filters F 2 [15].
After two kinds of filters are obtained, the original multichannel ECoG signals are divided into two categories, which can be formulated as follows: Finally, the characteristics of two kinds of signals are constructed: In formula (8), in order to make the two types of features more close to normal distribution, the base of logarithmic operation is set to 2.

Optimized Extreme Learning Machine. With its fast speed and high precision, Optimized Extreme Learning
Machine (OELM) is more effective in discriminating two classes of ECoG data than conventional algorithms. As the basis of OELM, Extreme Learning Machine (ELM) has been paid more attention by researchers around the world in recent years. It is a neural network essentially, which is composed of input layer, hidden layer, and output layer [16]. Different from other traditional learning algorithms for a neural type of SLFNs, ELM aims to reach not only the smallest training error but also the smallest norm of output weights [17].
Assuming that a single-hidden layer feed-forward neural network (SLFN) is given, and it has N hidden layer nodes. Different from the previous algorithm that all parameters need to be tuned by feed-forward neural network, ELM can accurately learn N different observation values with no need to adjust the weights between input neurons and initial hidden layer bias in practical application [18]. In fact, many simulation results also show that ELM is not only fast in classification speed, but also can yield very high recognition accuracy for its universal approximation capability [19]. e steps of ELM are presented as follows.
Firstly, randomly given N sample pairs (x j , o j ), where x j and o j , respectively, denote the input and the output, Given the number of hidden layer nodes in the network is N, then the corresponding SLFN is expressed as follows: where β i � [β i1 , β i2 , . . . , β im ] T is the weight between the ith hidden neurons and the mth output neurons and g(x) � [g 1 (x), . . . , g N (x)] is the output vector of the hidden layer with respect to the input x. It is worth noting that g (x) actually maps the data from input space to hidden layer space, and thus, g (x) is indeed a feature mapping. o j is the output of the jth neuron. For additive nodes with activation function g, g i is defined as follows: where a i � [a i1 , a i2 , . . . , a im ] is the weight vector connecting the ith hidden neuron and the input neurons and b i is the bias of the ith hidden neuron. e selection of activation function is not unique. All the equations above can be written compactly as follows: where β � [β 1 , β 2 , . . . , β N ] T is the vector of the output weights between the hidden layer of N nodes and the output node. H is the hidden layer output matrix: Assuming that the smaller the norms of output weights are, the better the generalization performance the networks tend to have. ELM is to minimize the training error as well as the norm of the output weights. e following mathematical model is established: Minimize: where T is the expected output. e minimal norm least square method is used as follows: where H − 1 is the inverse of matrix H [19]. Since ELM has no need of iteration, the learning speed is much faster than traditional classification algorithms. By continuously adjusting the number of hidden layer nodes, the learning ability and classification accuracy can both achieve an optimal value. According to the ELM algorithm above, a widespread type of activation functions g can be used in ELM so that ELM can approximate any target continuous function T. With this universal approximation capability, the bias in the optimization constraints of SVM can be removed [20], which explains for the better generalization performance and lower computational complexity of ELM.
Although ELM outperforms SVM in both classification speed and accuracy, it is very unstable when processing highdimensional but small samples, which results from the random assignment of input weights. erefore, we proposed an improved algorithm based on ELM, which is called Optimized Extreme Learning Machine (OELM), where projection of feature signal is introduced. OELM improves some existing disadvantages of ELM: ELM needs more hidden neurons than BP; the generalization performance of the ELM depends on the proper selection of constant parameters, especially for a small number of training samples. According to reference [21], singular value decomposition (SVD), as a linear dimensionality reduction, aims at mapping the original data to a lower-dimensional space using a projection matrix. To reduce the complexity of our network, we assign the result of SVD to the input layer. Steps of OELM are presented as follows on the basis of ELM.
Firstly, the characteristic of input signal is represented by matrix X N×m , where N and m, respectively, represent the number of samples and attributes of the signal.
Secondly, do SVD to input matrix. SVD is widely used in image processing, signal classification, pattern recognition, and so on. In this experiment, SVD is represented as follows: where P and Q are the left and right singular matrices of input matrix X. e singular value matrix S is composed of singular values which are arranged in descending order. Select the d singular vector elements in S corresponding to the largest singular values, which is used to approximate 4 Complexity input matrix X. Finally, the optimal rank of X is obtained as follows: Next, in the low-dimensional space composed by Q d , high-dimensional data are represented as follows: where Q d is known as the projection vector. en, in order to overcome the defect of poor performance in processing high-dimensional small sample, we set the input layer weights for projection vector, instead of giving random value. at is, W � Q d . is improvement simplifies the complexity of the network. e output of the hidden layer can be obtained by the following formula after determining input layer weights: where g(x) is a single-hidden layer neural network transfer function. At last, the output layer weights can be calculated by means of a linear expression as follows: e training module is over after the above five basic steps. In fact, we need not learn input weights once again in OELM and only output weights need to be calculated by least square. e deterministic assignment of the input layer overcomes the defect of ELM, whose classification accuracy changes dynamically due to the random assignment of input weights [22]. Results also show that the classification performance of OELM is very stable and relatively explicit compared with the results of ELM.

Description of the Classification Procedure.
Based on the description given in Sections 3.1 and 3.2, features of brain signals can be extracted by CSP at every sampling point. We adopt the extracted features to train OELM classifier in the training dataset, and then the trained classifier is applied to classify the features extracted at the same sampling point in the testing dataset. e detailed procedure of OELM classification strategy is described in Algorithm 1.
With algorithms mentioned above, experiments can be conducted then. Figure 2 illustrates the detailed block diagram of the proposed ECoG processing scheme.
Firstly, we sample and collect signals from the brain. In this experiment, we increase the hidden layer node by one degree from 10 to 60. Sampling points are simply set to 1500, 2000, and 2500, respectively; then CSP is applied to extract features after acquiring ECoG signals from 64 electrodes; at last, 16 different activation functions are chosen to OELM for classification. All experiments are performed on several sets of parameters with 278 trials to train OELM and the remaining 100 to test the performance of the trained OELM.

Experiment Results and Discussion
We conduct all experiments under the exactly same development environment. Experiments are performed on the same computer with Intel Core i3 2 processor of 2.4 GHZ with 4 GB memory and implemented by MATLAB 7.12 (2011a, 64 bit). Section 4.1 lists several parameters which evaluate the performance of the system, while Section 4.2 shows the distribution of signal's amplitude obtained by CSP. Results of the mentioned methods are presented and discussed in Section 4.3. Bold value in tables indicates the best result among the listed.

Performance Evaluation.
To evaluate the performance of a BCI, there are five main indicators as follows: (1) Classification accuracy. It refers to the correct classification rate. It is a fundamental indicator to judge whether the BCI system meets the requirements or not. By comparing the true label of each trial with the label obtained from the testing stage, we get the number of equaled pairs and divide it by total trials to obtain classification accuracy.
Formula (20) performs a paired t-test of the hypothesis, where two matched samples in vectors X and Y come from distributions of ELM and OELM with equal means and return the result of testing in H. H � 0 indicates that the hypothesis ("equal means") cannot be rejected at the 5% significance level. H � 1 indicates that the hypothesis can be rejected at the 5% level.
However, high classification accuracy of BCI may sacrifice the training or testing time, due to the high complexity of algorithm. In turn, the rapid classification speed is generally obtained at the expense of lowering accuracy [23]. erefore, we aim to get a compromise of high classification accuracy and less time consumption to meet the system requirements.

Preprocessing Stage.
After extracting features with CSP, we can calculate the amplitude of all signals from 64 electrodes. To show it in an intuitive way, we randomly choose nine continuous trials (trial 6 to trial 14) from 278 training trials and then plot trials 9 and 10 as shown in Figure 3, Complexity 5 which illustrate the relative contribution of signal's amplitude corresponding to these two trials. As shown in Figure 3, trials 9 and 10 have rather different contour line distribution, while the region in red of Figure 3(a) basically shows blue of Figure 3(b), which represents low voltage potential. To highlight the difference further, we select a relatively middle channel (30 th ) and plot the amplitude of these two trials at every sampling point in Figure 4.
In Figure 4, the mean amplitude value of trial 10 is higher than 20, while the mean value of trial 9 is lower than 0. is strongly illustrates that CSP can extract features effectively. We can easily notice that in Figure 3, region of interest (ROI) is concentrated over the sensorimotor cortex area [24], where locates the 22 nd , the 29 th to the 32 nd , and the 37 th to the 40 th channel. We hand-select those more discriminative channels and pick out trials 9 and 10, whose level curves have the largest difference among the other one, and plot their (1) Let P � train data(:, 2: size(train data, 2))′, which denotes the sample of training data; T � train data(:, 1)′, which denotes the given target label of training data sample.    6 Complexity amplitude-sampling points curves corresponding to those nine channels, as shown in Figure 5. In Figure 5, the curves in red represent the voltage of trial 9 while the curves in blue denote that of trial 10 from raw training data. e straight lines parallel to the horizontal axis show the mean values corresponding to each trial. Channels are regarded as "good" if they are visibly discriminable on average between two classes [25]. In the whole 278 training trials, we list the exact mean voltage of those channels in Table 1 and the last column presents label for each trial.
As presented in Table 1, label of trial 9 is 1 while label of trial 10 is −1, which explains the difference of curves among those nine channels in Figure 5. We transfer the features extracted by CSP to SVM, ELM, and OELM, and the results are calculated as follows.

Results on Extreme Learning Machine.
We set sampling points to 1500, 2000, and 2500 roughly at first. Experimental results are obtained under 5 different activation functions, which are "Sine" (sin), "Sigmoidal" (sig), "Hard limit" (hardlim), "Triangular basis" (tribas), and "Radial basis" (radbas). We list the optimal results corresponding to specific hidden layer nodes in Table 2. Table 2 gives the performance of 5 activation functions based on ELM with sampling point increasing from 1500 to 2500. It is noteworthy here that the maximum of classification accuracy (80.77%) is obtained with "Sine" (sin) and "Sigmoid" (sig) function. Besides, we can also get a relatively shorter time, especially when the activation function takes "sig," the average training time and average testing time are both 0.0006 seconds. Compared with experimental results under other conditions, system performance is optimal.
It is known that training and testing time are both relatively good; however, classification accuracy is still not good enough. e reasons are as follows: on the one hand, as the feature extraction method, CSP does not take the frequency domain information into account, which will generate unrepresentative ECoG features, and affect classification accuracy later [25]; on the other hand, we apply ELM in feature classification. Due to its random assignment of input weights, it is difficult to get global optimum in the process of finding the optimal weight [26]. ese two points make sense as CSP combined with ELM cannot reach satisfactory results. en, OELM is proposed and put into practice as shown in the next section.  Table 3 without the "Radial basis" (radbas), "Triangular basis" (tribas), "Hard limit" (hardlim), "Cosine" (cos), "CosineH" (cosh), and "arcCo-sineH" (acosh), whose classification accuracies are lower than 70%. Table 3 suggests that in binary classification, a fixed parameter setting for each activation function is chosen which works well among all experiments. Compared with the optimal accuracy (80.77%) in Table 2, the best of Table 3 is 84.62%, which wins with 3.85%. It is noteworthy here that the average training and testing time obtained by OELM are very stable, which are slightly faster than that obtained by ELM. When activation function takes "sin" and "sig," hidden layer node numbers both set to 53, and sampling point is 1500, classification accuracy can reach to 84.62% and 83.33%, respectively, with average training time being 0.0049 seconds and 0.0052 seconds, while average testing time being lower than 0.0001 seconds and 0.0012 seconds, respectively.
It also need to be emphasized here that the activation functions "sin," g (x) � sin (x), and "sig," g (x) � S (x) � 1/ (1 + e −x ), are used in ELM and OELM classifiers for their better performance. Since the singular value decomposition (SVD) can reduce data dimension effectively, it is better to combine SVD with these two activation functions.
is, together with a closer look at results from Table 2, suggests that "sin" and "sig" are more valuable for our binary decision tasks than others.
We still expect a further improvement of classification accuracy, and proper selection of sampling points and hidden layer nodes may markedly functioning [27]. We are supposed to lessen the number of hidden layer nodes, which can effectively reduce the complexity of network. Figure 6 compares performance at different sampling points setting. For each case, we calculate the least number of hidden layer nodes and plot it in secondary axis. Figure 6 suggests that when sampling points set to 2150, the result is relatively optimal. e classification accuracy can improve to 92.31% when adopting OELM as the classifier with either "sin" or "sig" as the activation function, while the number of hidden layer nodes both being 33, which is relatively less compared with that of ELM, whose nodes are more than 38. Figure 7 depicts the classification accuracy at every hidden layer node number.

Complexity
For each activation function, different numbers of hidden layer nodes, ranging from 10 to 60, are applied to ELM and OELM. ereafter, classification accuracy corresponding to each set of parameters is calculated on testing data. Tables 4 and 5 compare the performance of ELM with that of OELM, with activation functions being "sig" and "sin." In Tables 4 and 5, "Std" and "RMSE" mean standard deviation and means root mean square error, respectively. According to formula (20), we set testing accuracy obtained by ELM as parameter X while that of OELM as Y in t-test function.
Both results show that H is equal to 1, which means two testing accuracy samples are rather different. p values are far less than 0.0001, which indicates that the mean values of the two samples are equal with the probability of less than 0.01%.
is confirms that OELM obtains significant improvement in classification accuracy in comparison with ELM.
According to results in Table 4, the proposed OELM yields a maximum accuracy of 92.31%, which increases 16.67% compared with the optimal result (75.64%) of ELM under the same parameters. Likewise, in Table 5, the proposed OELM outperforms ELM with 12.82%. On average, the increase of classification accuracy achieved by OELM is     Tables 4 and 5, generally speaking, ELM and OELM obtain similar performance in classification speed. However, the number of hidden nodes required by ELM is larger than that needed by OELM, meaning that the complexity of OELM is much lower than ELM [28]. Figure 8 shows the maximum value, mean value, standard deviation, and root mean square error of testing accuracy obtained by ELM and OELM. Figure 8 reveals that for whether "sin" or "sig" function, the maximum and mean values of classification accuracy obtained by OELM are rather higher than those achieved by ELM.    Table 6, which share the same dataset. Table 6 lists several experiment results, their feature extraction methods, and classifiers. e last column shows the classification accuracies corresponding to each system. Table 6 indicates that our method obtains an accuracy of 92.31%, which is 1.31% higher than Qingguo Wei's method, the best of the others. It can be concluded that our BCI system outperforms those listed with CSP to extract the features and OELM to classify them afterwards. e performance of the proposed algorithm is also compared with several other classification methods listed in [29] in Table 7, which all share the same dataset and adopt CSP as the feature extraction algorithm. Table 7 reveals that our method obtains an accuracy of 92.31%, which is 6.31% higher than Liu Yang's method, the best of the others. e results prove that OELM shows higher classification accuracy than SVM and LDA under the condition of the same feature extraction method (CSP). We also evaluate our method in terms of computation time, which includes training and testing time. e optimal classification accuracy, training time, and testing time corresponding to each experiment that we carried out are shown in Table 8.
As seen from Table 8, CSP combined with OELM achieves the highest accuracy, which is 11.54% higher than ordinary ELM. On the whole, OELM outperforms SVM in both accuracy and speed. Compared with accuracies obtained by SVM and ELM, the accuracy of OELM is more competitive. Defects in the application of SVM can be successfully overcome by OELM for its high accuracy and fast speed. e BCI system can generate better results with OELM than with other state-of-the-art methods when analyzing and processing ECoG signals [30]. As seen from Table 8, ELM is comparable with OELM in speed; however, OELM runs much fast than SVM by a factor up to thousands, whether in training or testing module. Furthermore, OELM can achieve the maximum testing rate 92.31% with 33 nodes, which is significantly higher than all the results so far listed in the ranking of the BCI competition III, using some    popular algorithm such as SVM [31]. It can thus be concluded from the results displayed in these figures and tables that OELM is much more suitable and competitive for binary-class signals of motor imagery.

Conclusions
In this paper, a newly intelligent and efficient learning algorithm called Optimized Extreme Learning Machine (OELM) is presented and applied in motor imagery signals classification with CSP to extract features. e proposed method outperforms conventional popular learning algorithms for the extremely fast learning speed and good generalization performance, which is demonstrated with the BCI competition III dataset I. Different sampling points and activation functions are employed in different experiments to analyze the property of OELM.
e results show that OELM needs less computational time and obtains better accuracy than SVM and ELM. In conclusion, OELM is a novel and efficient classifier for biometric applications. Although only binary-class classification strategy is discussed in our study, OELM can also be applied to solve multiclassification problem. We believe that this method has great potential for the design of real-time BCI systems.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.