Feature Selection in Classification of Eye Movements Using Electrooculography for Activity Recognition

Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.


Introduction
Many researches in activity recognition and computer vision adopt gesture recognition as forefront [1]. Gesture recognition is an ideal example of multidisciplinary research; gestures convey meaningful information to interact with environment through body motions involving physical movements of fingers, arms, head, face, or body. There are many tools for gesture recognition like statistical modelling, computer vision, pattern recognition, image processing, and so forth. Feature extraction and feature selection are successfully used for many gesture recognition systems. Importance of gesture recognition laid in human-computer interaction applications from medical rehabilitation to virtual reality. Mitra and Acharya provided a survey on gesture recognition, particularly on hand gesture and facial expressions [2].
The goal of activity recognition is to provide information that allows a system to best assist the user with his or her task. Activity recognition has become an important area for a broad range of applications such as patient monitoring, vigilance system, and human-computer interaction. Bulling is the first to describe and apply eye base activity recognition to the problem of recognition of everyday activities [3]. Eye movements have plentiful information for activity recognition.
Bulling also described and evaluated algorithms for detecting 3 eye movement characteristics of EOG signals, saccades, fixations, blinks [4]. One of the eye movement characters, blink, plays major role in activity recognition. Blink serves as the first line of ocular protection. The wiping action of the lid can remove dust, pollens [5]. According to Ousler et al. blink rate can indeed serve as an indicator of the cognitive or emotional state of the blinker [6]. Cummins states blinking might also contribute to social interaction by strengthening mutual synchronization in movements and gestures [7]. Leal & Vrij represented blinks in increased cognitive demand during lying [8]. Ponder and Kennedy reported about blinks during emotional excitement [9] and Hirokawa depicted more frequent blinks in highest nervousness [10]. Cognitive processes have a substantial impact on blink rates, mentally taxing activities like memorization or mathematical computation being associated with an increase in blink rate and inattentiveness (daydreaming) and stimulus tracking being associated with low blink rates. Blinks together with other eye movements would be possible sources of useful information in making inference about people's mental states.
One of the possibilities to detect eye movements is EOG, which is a technique for measuring the resting potential of the retina [11,12]. The following lists some of the wide range of applications of EOG.
(i) Wijesoma et al. used EOG for guiding and controlling a wheelchair for disabled people [13].
(iii) Deng et al. used EOG for operating a TV remote control and for a game [15].
Feature selection is important and necessary when a real world application has to deal with training data of high dimensionality. There are many features selection approaches available in the literature. Some of the hybrid approaches are listed as follows.
(i) García-Nieto et al. presented a Differential Evolution based approach for efficient automated gene subset selection using DLBCL Lymphoma and Colon tumor gene expression datasets. The selected subsets are evaluated by means of the SVM classifier [19].
(ii) Li employed a DE-SVM model that hybridized DE & SVM to improve the classification accuracy of road icing forecasting using feature selection [20].
(iii) Xu and Suzuki proposed a feature selection method based on sequential forward floating selection to improve performance of a classifier in the computerized detection of polyps in CT colonography (CTC). In this work feature selection is coupled with SVM classifier and maximized the area under the receiver operating characteristic curve [21].
(iv) Kuo et al. proposed kernel based feature selection method to improve the classification performance of SVM using hyperspectral image datasets [22].
(v) Güven and Kara employed artificial neural network analysis of EOG signals for the purpose of distinguishing between subnormal and normal eye [23]. This enables the physician to make a quick judgment about the existence of eye disease with more confidence. Only binary classification of EOG signal has been performed in this work.
From the literature review it can be observed that there are a number of EOG applications being developed. Still necessary feature selection algorithms need to be developed, evaluated, and used to produce substantial improvements in communications with disabled people by using eye movements and to make inference about person's cognitive state.
This paper presents a hybrid feature selection technique based on DE for activity recognition using eye movements by EOG signals which can identify a subset of most informative, eye movement characteristics amongst all eye movement characteristics. This method is used as an optimizer before the classifier to EOG signal features for recognizing activities like read, browse, write, video, and copy. The benefits of the feature selection approach include improving the efficiency of activity recognition since only a subset of eye movement is used and assisting SVM to attain satisfied accuracy.

Methodology
Here we explain the preprocessing, feature selection (with CBFS, mRMR, and DEFS), SVM classifier, and the model evaluation strategy used in this work.

Dataset Description: Recognition-of-Office-Activities.
The EOG data used in this study are collected from the Andreas Bulling's "recognition-of-office-activities" dataset (https://www.andreas-bulling.de/datasets/recognitionof-office-activities/). Eight participants took part during this study. For about 30 minutes the participants were involved in two continuous activity sequences. The total dataset is about eight hours. The data aggregation of this work is grounded on office based activities performed in random order paper reading, taking notes, watching video, and net browsing. Additionally, the dataset includes a period of rest (the NULL class). These activities are all generally performed during a usual working day.
These experiments were carried out in a well-lit workplace during normal working hours. Participants were seated ahead of 2 seventeen inch flat screens with a resolution of 1280 × 1024 pixels on which a video player, a browser, and text for copying in a word processor were on-screen. Sheets of paper and a pen were presented on the table close to the participants for reading and writing tasks. No constraints were forced with type of website and manner of interaction of browsing task.
EOG signals were picked up using an array of five 24 mm Ag/AgCl wet electrodes from Tyco Healthcare placed around the right eye. The horizontal signal and vertical signal were collected using two electrodes for each and the fifth electrode was placed on the forehead for the signal reference. EOG data is captured employing a commercial EOG device known as TMSI (Twente Medical Systems International) Mobi8 that integrates instrument amplifiers with 24 bit ADCs. Mobi8 tends to have better signal quality. Mobi8 was worn on a belt around each participant's waist and recorded four channels EOG at a sampling rate of 128 Hz. The behaviours of participants in specific phases (read, browse, write, video, copy, and rest) were observed by an annotated activity changes with wireless remote control and their nature in daily life during regular working hours. Context Recognition Network Toolbox (CRNT) was used for handling data recording and synchronization.

Baseline Drifts Removal and Filtering EOG Signals.
For preprocessing, this work adapts median filter and baseline drift removal 1D wavelet decomposition at level nine using Daubechies wavelets on the signal component [18]. Figure 1 shows EOG signals of horizontal, before and after baseline drifts removal and filtering.
Computational and Mathematical Methods in Medicine 3   Table 1 lists corresponding mean square error (MSE) and peak signal to noise ratio (PSNR) for two participants with thousands of samples.

Basic Eye Movement Type Detection.
Various activities using eye movements by EOG can be portrayed as a regular pattern by a specific sequence of saccades and short fixations of similar duration. The amplitude change in signals varies for various activities which can be used in identifying the regular office activities. The reading activity using EOG is patterned by small saccades and fixations. No large change in amplitude is included in reading. This is due to small eye movement between the words and fast eye movement between ends of previous line and beginning of next line. Writing was similar to reading, yet it required greater fixation duration and greater variance. It was best described using average fixation duration. Copying activity includes normal back and forth eye movements which involves saccades between screens. This was reflected in the selection of small and large horizontal saccade features, as well as variance in horizontal EOG fixations. In contrast, watching a video and browsing are highly unstructured. These activities depend on the video or website being viewed. These results propose that, for tasks that involve a known set of specific activity classes, recognition can be streamlined by only choosing eye movement features known to best describe these classes.
The steps in basic eye movement type detection for EOG based action recognition are (1) noise and baseline drift removal, (2) basic eye movement detection, and (3) feature extraction. The basic eye movements' interpretation and detections are carried out using wavelet coefficients.
We used the same wavelet coefficients to detect blinks in EOGv. Features related to the eye movements using EOG signal were calculated separately for two EOG (horizontal and vertical) signals for each participant. Statistical features such as mean, variance, and maximum value, minimum value based on saccades, fixations, and blinks are extracted from this work. Different characteristics of EOG signals result in different changes in the coefficients. Totally 210 statistical features are extracted from this work.
Eye movement characteristics such as saccades and blink patterns of EOG (horizontal and vertical) signals are shown in Figure 2.

Minimum Redundancy Maximum Relevance (mRMR).
The minimum redundancy maximum relevance (mRMR) [24,25] algorithm is a sequential forward selection algorithm. It uses mutual information to analyze relevance and redundancy. The mRMR scheme selects the features that correlate the strongest with a classification characteristic and combined with selection features that are mutually different  from each other having high correlation and it is denoted by the following equation: where ( ; ) is the measure of dependence between feature and objective . = ( 1: ; ) − ( 1: −1 ; ) is the difference in information with and without . S is the feature set and |S| is the number of features. [26] calculates the distance between the objective sample and the center of every class, and then compares the class of the nearest center with the class of the objective sample. The similarity ratio of all samples in a feature becomes a clearness value for the feature [26].

Clearness Based Feature Selection (CBFS). CBFS
Clearness based feature selection (CBFS) algorithm is a type of filter method. Clearness means the detachment between classes in a feature. If (clearness of feature 2 ) > (clearness of feature 1 ), then 2 is more useful for classification than 1 .
Step 1. The centrist for read and write is calculated by average operation. This is the median point of a class. Med( , ) represents the median point of class in the feature , and it is calculated by the following equation: where is a number of samples of class .
Step 2. For each , the sample predicted class is calculated. After calculating the distance between and Med( , ) for all classes, this work takes the nearest centrist Med( , ) and is a predicted class label for . The distance between and Med( , ) is calculated by the following equation: Step 3. Calculate × matrix 2 . This matrix contains a matching result of predicted class label and correct class label in Score . 2 ( , ) is calculated by Step 4. Calculate Score ( ). Initially we calculated Score ( ) by The range of Score ( ) is [0, 1]. If Score ( ) is close to 1, then it indicates that classes in feature are grouped well and elements in can be clearly classified.

Proposed Method.
The proposed method first identifies essential features by applying a threshold (th ) in correlated values among features. This stage reduces the feature space dimensionality. If correlation ( ) between the features satisfies the threshold (th < 0.8) those features are selected for the next stage. After removing tautological features approximately 184 features are selected for this work.
The reduced feature space is given to the DEFS algorithm for detecting the significant feature subset through machine learning algorithm such as maximum a posterior approach.
Here for the projection of features into feature subspace we go for kernelized Bayesian structure. This method is used as an optimizer before the classifier to EOG signal features for recognizing activities like read, browse, write, video, and copy. Figure 3 illustrates the steps involved in the Differential Evolution process [27]. The first step in the DE optimization method is to generate a population (NP × D) of NP members, each of D-dimensional real-valued parameters. In order to improve the DE subset selection efficiency, we are going to change the fitness function in terms of maximum a posteriori probability. The kernel distribution is appropriate for EOG features since it has a continuous distribution. The EOG features may be skewed and have multiple peaks; we can use a kernel which does not require a strong assumption. The kernel needs additional time and memory for computing than the normal distribution. For every feature we model with a kernel the Naive Bayes classifier or every category based on the training data for that class. The default kernel is normal, and also the classifier selects a width automatically for every class and feature. The stopping criterion was defined as reaching the maximum number of iterations. The chosen fitness function was the classification error rates achieved by the Naive Bayes classifier with kernel distribution.
In this work parameter is assigned dynamically, CR with the value of 0.8, the number of dimensions of the problem (#features) D is 15, the population used is 50, and the number of generations is 10: The parent vector is mixed with the mutated vector to produce a trial vector as in (9).
DE employs uniform crossover. Newly generated vector results in a lower objective function value (fitness). The randomly chosen initial population matrix of size (NP × DNF) containing NP randomly chosen vectors = (0, 1, . . . , NP −1) is created. DNF is desired number of features to be selected. We made search limited by the total number of features (NF = 15). Individual lower boundary of the search space is = 1 and upper boundary is H = NF = 15. Each new vector from initial population is indexed as 0 to NP−1. The steps in the DE process are as follows: the difference between two population numbers (P1, P2) is added to a third population member (P3). The result (R4) is subject to crossover with candidate for replacement (C5) to obtain a proposed (PR6). The proposed system is evaluated and replaces the candidate if it is found to be the best. In our scheme the probability of each feature is calculated and used as weighting to replace the duplicated features. All the time the features are not in a linear manner. So we only go for nonlinear kernel structure to enclose the features in feature space. The objective of this work is reducing the complexities of evolutionary algorithm and improving the fitness validation by machine learning probabilistic kernel classifier by considering the nonlinearity of inputs. Table 2 shows the number of samples taken for each activity, features identified from each activity, and the performance summary for each activity with and without DEFS features. SVM handles nonlinear data by using a kernel function [28]. The kernel maps the data into a feature space. The nonlinear function is learned by a linear learning machine in a high dimensional feature space. This is known as kernel trick which implies that the kernel function transforms the data into a higher dimensional feature space to form feasibly linear separation [29]: The kernel function used in this work is linear kernel, meaning dot product which is shown in (10): There is a class of functions ( , ) with a linear space and a function mapping to such that ( , ) = ⟨ ( ), ( )⟩. The dot product takes place in the space . Solve (11) and get the , which maximize using (12): The classifier is defined by (13). We used 75% of each activity for training the classifier. The classifier performance is improved by our proposed hybrid feature selection method. Original population P x,g Mutants population P ,g X 0,g X 1,g X 2,g V 0,g V 1,g V 2,g + + Trial vector U 0,g P x,g+1 X 0,g+1 . . .
The detailed performance of the proposed approach with DEFS features is listed in Table 3. The precision for video activity is higher than other activities. The recall for writing activity is slightly higher than other two feature selection methods. The detailed performance of the clearness based features with SVM and minimum redundancy maximum relevance features with SVM approach is listed and is also compared with DEFS features. The result shows that the proposed approach best performs with the other two methods. Figure 4 shows that precision for each activity is high for the proposed DEFS based feature selection compared to the other two feature selection methods, mRMR and CBFS. For write activity the proposed DEFS based feature selection and mRMR nearly provide the same results. For copy activity all the three feature selection methods provide almost the same precision values. For video activity, the proposed method outperforms well. Null and read activity precision shows that the CBFS features provide the best result when compared to mRMR features.

Results and Discussion
Feature extracted addresses the problem of finding a more informative set of features. The resultant feature subset shows that the most informative features in EOG signals are saccades and blinks. The statistical measures for EOG signal analysis proved to be very useful in searching the feature space by using hybrid feature selection based on differential evolution. The SVM with linear kernel is used for classification in EOG datasets before and after feature selection. The results of 10-fold cross-validation are listed in Table 3. In the table, we report performance by accuracy. The dataset is divided into training set with three-fourth of original and rest one-fourth of testing instances. We focus on Differential Evolution based feature selection algorithm in order to improve the classification. The detailed accuracy of class results is shown in Table 2 with an emphasis on the difference before and after feature selection. The results in Table 2 shows, class accuracy is increased with minimum number of features. The selected features improve the performance in terms of lower error rates and the evaluation of a feature subset becomes simpler than that of a full set. When the results are compared with CBFS and mRMR feature selection, the classifier (SVM) performance (ACC = 83.33%) is significantly improved with proposed features.

Conclusions
A hybrid feature selection method was employed in EOG signals based on the Differential Evolution and the proposed method is compared with CBFS and mRMR feature selection. The differential evolutionary algorithm is utilized to give powerful results in searching for subsets of features that best interact together through supervised learning approach. EOG dataset with high dimensionality and number of target classes ( = 6) were considered to test the performance of the proposed feature selection. What better fits our case is lower classification error rate, which is attained as 0.16. The classification performance is significantly improved when using the proposed feature selection algorithm that uses differential evolution based on a posteriori probability. This method followed by wavelet feature extraction presents powerful results in terms of accuracy (ACC = 83.33%) and an optimal subset of features with size ( ) 15.