Identity Recognition Using Biological Electroencephalogram Sensors

Brain wave signal is a bioelectric phenomenon reflecting activities in human brain. In this paper, we firstly introduce brain wave-based identity recognition techniques and the state-of-the-art work. We then analyze important features of brain wave and present challenges confronted by its applications. Further, we evaluate the security and practicality of using brain wave in identity recognition and anticounterfeiting authentication and describe use cases of several machine learning methods in brain wave signal processing. Afterwards, we survey the critical issues of characteristic extraction, classification, and selection involved in brain wave signal processing. Finally, we propose several brain wave-based identity recognition techniques for further studies and conclude this paper.


Introduction
Human biological features include various intrinsic physiological features (e.g., fingerprint) and behavioral features (e.g., signature).The emerging technology of mobile crowd sensing [1] and the recent rise of social bots [2] have added fuels to the research of identity recognition based on human biological features, which is a technique to authenticate human identity by using biosensors, fundamentals of biostatistics, and/or human biological features [3][4][5][6].Generally, any physiological or behavioral feature can be used in identity recognition systems as long as it meets the following conditions: (1) universality, that is, every human being has this feature, (2) uniqueness, that is, different people present this feature differently, (3) stability, that is, it will not change within a period of time, and (4) collectability, that is, it can be quantitatively measured.
Traditional identity authentication techniques such as access codes, passwords, or IC (Integrated Circuit) cards may be vulnerable to identity loss, forgery, theft, or compromise since they are separated from human biological features.These techniques are widely used in information system or web environment [7].However, they have limitations in handling the challenges of forging.For example, one may use forged biological features, such as fingerprint, hand shape, palm print, face, iris, and human ear, to crack such systems [8].Therefore, more research needs to be done and new approaches should be studied to address this challenge that throttles the development speed of identity recognition systems based on human biological features.
Brain wave is a unique biological feature that is hard to be forged.Recently, researchers attempt to apply brain wave techniques in identity recognition.Brain waves, as general bioelectrical phenomena, are generated by the ever-changing bioelectrical field in the human brain.Electroencephalogram (EEG) records brain waves along the scalp.Stimulations of different external signals will lead to different brain waves.Researchers may record the change of the bioelectric fields by inserting electrode(s) into the brain or using an electrode cap on the scalp to collect EEG data.
In an identity recognition system, as illustrated in Figure 1, there are multiple steps in processing the collected brain wave data, such as preprocessing, characteristic extraction, and validation.The classified characteristics will be matched with the data in a characteristic database.After calculating the posterior probability of each testing sample under an established model of human subjects, one can implement human identity recognition and authentication by using brain waves.
The rest of the paper is organized as follows.Firstly, we introduce the state-of-the-art brain wave-based identity recognition technologies by discussing the features and electrophysiological basics of human brain waves.Then, we describe the fundamentals and processes of collecting, extracting, selecting, and classifying brain wave features.Then we discuss how machine learning methods are applied in brain wave-based identity recognition systems.Finally, we conclude this paper with future work.

Electrophysiological Basics of EEG
Human brain is an important part of the central neural system, including cerebrum, cerebellum, and brain stem.Cerebrum is the most complex component with the largest brain volume and highest growth level.The surface of cerebral hemisphere is uneven and full of sulci and gyri.Several deep sulci divide cerebral hemisphere into four regions: the frontal, parietal, temporal, and occipital lobes.There are many small sulci developed across each lobe region, which extend the surface area of cerebral cortex.Different cortical regions control different nerve centers and undertake different tasks.Thus each region of cerebral cortex has its own function.Researchers have standardized the placement of electrodes for collecting and recording brain waves.The international   standard of 10-20 scalp electrode placement [9] is shown in Figure 2.
In different regions of cerebral cortex, the distribution of nerve cells is uneven and they have diverse structures.The potential collected by placing electrodes on scalp is not a reflection of potential variations of a single nerve cell but a synthetic effect of abundant neural activities.Therefore, the recorded brain wave signals of a series of potentials are regarded as an overlap of brain waves of different types generated in different brain regions.Generally, features of the potential series such as frequency, amplitude, and phase are used to describe brain waves.Under different states of brain activity, brain wave features will be different.As shown in Table 1 (five frequency bands of EEG signal), researchers categorize brain waves into multiple types based on their frequencies and locations, respectively, , , , , , and  waves.In previous work, researchers have tested different acquisition protocols for human recognition tasks such as relaxation with eye closed, EEG recordings based on visual stimuli, and performing mental tasks.
The blockage of  rhythm is related to the ready state of movement.For example, human body movements, imagination, or behavior consciousness has great effects on  rhythm.The amplitude of  rhythm may be controlled by trained subjects.The use of  rhythm for authentication control has received wide attention from researchers. wave of relatively high frequencies is another commonly used wave in research and it is ubiquitous in human brain particularly in central and frontal areas.It is regarded to be closely associated with human thinking activities.For instance,  wave can be observed in EEG signals when a person is nervous or excited. wave is generated when a lot of nerve cells carry out an intensive cognitive activity or movement.It has the highest frequency and minimum amplitude among all types of brain waves.Appearance of  wave demonstrates that the human brain is performing complex thinking activities or experiencing extreme excitements.

State of the Art
At present, research on individual identity recognition using EEG is still in its infancy.Existing studies can be roughly classified into two categories: EEG signal recognition during resting state and evoked EEG signal recognition.An overview of relevant research is illustrated in Figure 3.

Evaluation Criteria of EEG-Based Identity Recognition.
There are three major criteria for evaluating the performance of EEG-based identity recognition systems.
(1) Classification Accuracy.Classification accuracy is defined as the ratio of the number of accurately classified samples to the number of all samples.It is the most commonly used evaluation criterion to show the system feasibility.
(2) Kappa Value.Kappa value is a quantitative assessment to evaluate statistical consistence of EEG biological features.It takes into account accuracy of sample statistics and accuracy of random classification when samples are random.Moreover, Kappa can also be used to evaluate classification accuracy, robustness, and performance comparison as well.For multiple-category classifications, Kappa is better than classification accuracy.Kappa testing belongs to consistency testing method.It is utilized to test the difference between real consistency rate and random consistency rate.The calculation is described as formula (1).Here,  0 denotes the real consistency rate and   is the theoretical consistency. ( (3) Rate of Security Recognition.An EEG identity recognition system needs to recognize identity of each individual from brain wave, which entails high classification accuracy.High recognition accuracy is hard to achieve when brain waves have low signal to noise ratio.In this case, complexity of identity recognition algorithm will be improved by adding the number of recognizable tasks.Recognition rate of EEG data is an important index to measure the superiority of EEG identity recognition algorithm.Recognition is conducted using the data collected by one of the electrodes (P4 electrode).The security of recognition system is evaluated according to statistics of forty persons under coefficients of different orders.Bai et al. [11] proposed identifying a person by using the Visual Evoke Potential (VEP) of EEG signals.In this work, several techniques, such as Fisher Discriminant, Recursive Feature Elimination, and Genetic Algorithm, were introduced to reduce the utilized electrodes for less-intrusive user experience.They use a selfcollected database of twenty subjects and select the data of 32 electrodes in experiment.The experimental results show that the best identification rate is 97.25% by using a Support Vector Machine classifier.Furthermore, the classification accuracy achieves 85% in experiments.In [12], the authors use variance and density of power spectrum to create characteristic vectors.Data from 23 participants are used in security evaluation.Finally, it achieves the security recognition rate of 79%.After that, Hema et al. [13] proposed an improved scheme.A modality for biometric authentication technique is reported by using brain EEG signals recorded during the performance of three mental tasks to identify six individuals.Using a three-layer feed-forward neural network, they classify brain wave data from six subjects according to thinking activities into four states (Multiplication Task, Reading Task, Spell Task, and Relax Task).Experimental results show that the average security recognition rate reaches 94.4% to 97.5%, significantly improving the performance on security recognition rate.In [14], an individual identity recognition algorithm based on brain waves is devised using a Gaussian mixture model.In the experiment, the subjects need to complete three repeated body tasks with uniform rhythm, including left hand movement, right hand movement, and two-hand movement.Their results have verified that the Gaussian mixture model can remove noise signals such as those related to electromyography and eye movements when the subject is relaxed and thinking.The paper also shows that there are some mental tasks that are more appropriate for person authentication than others.

Analysis
In the research of evoked brain wave recognition techniques, an approach proposed in [15] recognizes identity using Visual Evoked Potential (VEP).VEP is a bioelectricity activity induced by visual stimulations to the central neural system.It is an electrical signal of cerebral cortex that responded to visual stimulation, which represents the change of bioelectrical behavior after receiving external information.In [16], the authors propose that different VEP signals in time domain will be triggered by showing the subject with self-face and non-self-face.It could be utilized in person recognition.The prototype system is simple and with low recognition rates, indicating a great room for improvement.In [17], the authors conduct experiments based on 3,560 sets of VEP signals from 102 persons.There was a minimum of 10 and a maximum of 50 eye-blink-free VEP signals from each subject.Techniques used for feature classification include the k-Nearest Neighbors (kNN), Elman Neural Network (ENN) classifiers, and 10-fold Cross Validation Classification (CVC).Notably, it reaches the highest recognition accuracy rate of 98.12%.The experiments had clearly indicated the significant potential of brain electrical activity as biometrics.In [18], the authors synthesize individual brain waves of multiple tasks and extract data features in brain waves for identity recognition.This method matches brain waves with those from nine persons in database.The highest recognition rate is 95.6%.
In [19], the authors use Fast Johnson-Lindenstrauss Transform for robust EEG mAR coefficients hashing.The promising results suggest that hashing may open new research directions and applications in the emerging EEGbased biometry area.The authors of [20] adopt VEP signals with frequency of 30-50 Hz and implement identity recognition by matching these signals against those of twenty persons in database.The feature vector utilizes coefficients of AR model and the peak value of power spectral density.Dimensionality reduction of features is performed using Fisher linear discriminate analysis.Finally, the kNN technique is employed to classify the data and the leave-one-out cross validation method is used for accuracy assessment, which leads to a correct classification rate of 100% [21].

EEG Data Collection
4.1.1.Hardware Sensor System for EEG Data Collection.Generally, EEG signal collection methods can be classified into three categories: embedding, semiembedding, and nonembedding [22].An embedding collection method inserts microelectrode(s) or a microelectrode array into the cerebral cortex and collects electrical activities in aggregation of nerve cells, called Local Field Potentials (LFPs).This method can achieve high precision but may cause damage to nerve cells.In the semiembedding methods, electrodes are placed at the surface of cerebral cortex.Compared with the embedding method it has no damage to nerve cells.But both the embedding and the semiembedding methods are intrusive to the subject's body.Consequently, both ways usually apply in research on BCI technologies with subjects using animals or human beings with serious brain damage.Nonembedding methods use electrodes placed on the scalp surface and no intrusive operation is necessary.It has no damage to nerve cells and has advantages like usage convenience and device affordability.These advantages make the nonembedding way popular in BCI research.However, the drawbacks of nonembedding methods are obvious as well.For example, electrical signals of nerve cells which are detected at the scalp surface are weak due to signal attenuations across cerebral tissues, endocranium, skull, soft tissues, and so forth.

Antinoise Interference.
The procedure of EEG signal collection is easily affected by ocular artifacts, including nonphysiological and physiological interference.The former is interference caused by the environment or equipment, such as power frequency interference, environmental interference, variation of contact resistance between electrode and scalp, and relative slippage.Power frequency interference can be eliminated by 50 Hz trap filter or low-pass filter.Environmental interference may be noise distracting subjects or electromagnetic radiation in space.Moreover, when EEG signals are collected by wet electrodes, conductive paste is required between the electrodes and the scalp to reduce contact resistance, which may change during the experiment.In this case, researchers introduce anti-contact-resistance and antiocular-artifact approaches to address this issue.Interference of nonphysiological ocular artifact, including relative slippage caused by head movement, can be controlled by improving experiment methods while the majority of nonphysiological interference can be eliminated by filter technologies [27].

Optimal Electrode Combination. EEG signals are col-
lected by multiple electrodes at different scalp spots of subjects.Each lead data reflects different neural activities [28].However, not every lead data is helpful in improving the performance of identity recognition system.For example, in a visually evoked EEG experiment, lead data far from the visual cortex cannot be used to recognize identity.In addition, there may exist redundant EEG information among EEG lead data sets.Consequently, researchers utilize electrode selection algorithms for optimal electrode combination to reduce the number of electrodes needed, the preparation time of experiment, the cost of experiment, the EEG data size, and the computational complexity of EEG signal processing [29].with EEG testing database.Finally, the EEG-based identity authentication is implemented by comparison.

Applications of Machine Learning in EEG-Based Identity Recognition. As illustrated in
In this section we focus on discussing applications of machine learning methods for EGG-based identity recognition, in the feature signal preprocessing, selection, extraction, and classification, respectively.

EEG Signal Preprocessing.
There are large amounts of high dimensional data in collected EEG signals, including noise, expression profile, and wave data.High dimensional data call for higher requirements of hardware storage and classifier selection.Dimension reduction technology can be employed to deal with dimension disaster and improve precision of classification algorithms.It also helps to improve visualization and data compression ratio.Generally, dimension reduction is required to reserve the effective expression of original data.Existing dimension reduction algorithms can be roughly classified into feature selection algorithms and feature extraction (subspace learning) algorithms.The former directly remove unrelated data from the original data, whereas the latter project high dimensional data into low dimensional spaces and fulfill dimension reduction transformation.
In existing EEG dimension reduction methods, data features of spatial samples in linear or nonlinear data set are usually projected in low dimensional spaces.The subset learning algorithms for dimension reduction can be divided into linear algorithms and nonlinear algorithms.The linear subset learning algorithms are based on traditional data optimization methods and the obtained data always have single variation.However, EEG dimension reduction under regularization framework should consider fusion of multiple features.For nonlinear dimension reduction, traditional algorithms mainly focus on study of flow learning.Regular nonlinear dimension reduction technologies usually concentrate on kernel method.In [30], the authors fulfill dimension reduction by using a sparse subset learning method.Orthogonal discriminate information is added into sparse neighborhood preserving projects.It makes the data after dimension reduction keep a similar partial reconfiguration relationship with sparse neighborhood preserving projects.Meanwhile, margin maximization and partial reconfigurable relationship can be utilized to process EEG samples of the same kind.

EEG Feature Extraction.
EEG is electrical signal of cerebral cortex responsible for stimulation and is biological activity variation of central nervous system after receiving external information.However, due to the delay from receiving stimulation to generating EEG signal, EEG amplitude and width are different for each person.So the time domain characteristic can be obtained.Since EEG has short duration time, the signal length is insufficient to evaluate frequency features.
EEG features extraction is among the key issues in identity recognition systems.Through measurements or computations, the objects to be recognized or classified usually yield many original features.In this situation, samples require a procedure of high dimensional spatial feature extraction.In other words, samples will be expressed in low dimensional spaces via projections.During this procedure, the most effective similar features will be extracted from original EEG features.Since features of different categories are usually different from each other, the noise elimination and dimensionality reduction technologies can be studied according to EEG features in time and frequency domains, as well as their application conditions.After eliminating noise, EEG signals record the amplitudes during a period, which cannot be used in identity recognition.In this scenario, it is necessary to classify and transform the time domain signals in order to obtain stable and unique features, autoregressive model coefficients, and power spectral density.These are common features of EEG identity recognition systems.
Another scenario is to mine the high-pass data of EEG signals.Previous feature selection algorithms mainly employ statistical approaches and boundary information.Reference [31] proposes selecting features by using sparse norm and demonstrates the rotation invariance of norm.Authors in [32] propose a generalized assistant function method to solve structural sparsity.This method greatly improves the efficiency of solving the structural sparsity problem, via taking advantage of parallelization.Authors in [33] exploit both global structure and partial structure of data in feature selection.It demonstrates that this way can improve recognition effects in supervised or unsupervised model size recognition.Reference [34] indicates that the recognition performance based on both partial boundary structure and global boundary structure is encouraging.However, the simulation parameters of the partial structure are required to be set manually.Results in [35] demonstrate that the 2 : 1 feature subset based on structural sparse norm has low redundancy.
In EEG feature selection, the high-efficient feature selection algorithms can be taken into consideration.An available method is to combine multiobjective regression and graph embedding using a uniform optimization model.This method is suitable for EEG feature selection, since constraints of structural sparse norm are added.The method is different from other methods and has two advantages: (1) it considers the features of global boundary information and partial structure of the EEG feature data when selecting EEG feature subset.Thus, the global structure and partial structure can be effectively maintained in EEG feature selection.Consequently, feature matching can be completed efficiently in EEG identity recognition.(2) The mutual interactions between data features are considered in the procedure of reducing dimension of EEG features.Thus, the reduction of processing efficiency caused by greedy methods is avoided.A batching method can be introduced for EEG features, in order to improve the processing speed and decrease the time complexity.

EEG Feature Selection.
Feature selection is to select a group of distinguishing characteristic subsets from the original feature collection according to some optimization measurement(s).Usually, EEG data has multifeatures.The purpose of feature selection is to find the most effective features from the feature set of original EEG.The features in different classifications may also be aggregated using clustering methods.For example, with the extensive study on EEG feature data, authors of [36] consider the correlation between feature and response and the redundancy between features.A selection criterion algorithm based on the correlation and redundancy is proposed, which maximizes the correlation between feature subset and response.

EEG Classification.
The design of classifier is a hot topic in EEG data processing and applications [37][38][39][40].Traditional methods for EEG classifier design face the challenge caused by high dimension of EEG data sets, thus the classification result may have large deviations.Small sample size problem of EEG could lead to the overfitting phenomenon, which limits the application of traditional classifier.To address this issue, [41]  has designed a no-argument sparse expression classifier by using a regularization method.This classifier has good robustness and does not need to set sparsity parameters manually.
In EEG collection for different individuals, researchers can design effective crossing verification methods according to different sparsity parameters and select the optimal parameter model to address the small sample problem.Authors of [42] have classified data features in database of epilepsy patients with brain disease.A classification method is proposed to distinguish normal and abnormal (epilepsy) EEG.Data mining of epilepsy EEG and complex signal processing are fulfilled by using theories such as chaos, the nearest neighbor, and systematic statistic time analysis.Results indicate that the proposed method can correctly classify normal and abnormal EEG data with sensitivity of 81.29% and specificity of 72.86%.As shown in Figure 5, the KNN method is employed to query in test point area of training data.Unknown test point will be marked red when  = 7.In this case, we can see "trained" data collection test points.Optimal positions of test points can be found quickly.Furthermore, the performance of EEG classification can also be greatly improved.4.3.5.Discussion.EEG signals belong to biometric modality, which may be affected by multiple factors.It is not enough for the alone metric such as accuracy rate to represent the performance.In order to address these biological problems of EEG, an optimization model can be established for EEG identity recognition.Studies on this problem have developed several models.The models and their optimization not only can provide theoretical evidence for data processing in EEG identity recognition but also can offer a good practical reference for detection departments that engage in identity recognition for living bodies.In the research of EEG classification method, the high-pass technology makes the collection of EEG samples become easier.However, in practical sample analysis and application, low correlation and flexibility still exist in sample data.The reasons lie in two aspects.On one hand, deviations of techniques, hardware, and software reduce reliability of disease data, thus greatly impairing the correlation of EEG data.On the other hand, biochemical experiments are usually expensive; thus only a small number of samples are used for mark.This issue raises a higher requirement for EEG data selection.Consequently, it becomes a key issue to select typical EEG samples for sample label design.A mathematical model for EEG sample selection is proposed.
Let  denote the characteristic function model of EEG,   ∈  × denote a sample dataset, and  and , respectively, denote the number of samples and the characteristic dimension; a mathematical model for the original problem can be established as follows: where   ∈ ,  ≥ 0,  = 1, 2, . . ., , and  = 1, . . ., .  = [ ,1 , . . .,  , ]  is a linear combination coefficient vector.The optimization model ( 2) has introduced a slack variable  ∈   .The features of  1 norm make vectors  and  sparse.Otherwise, the optimization process will fail.
As indicated in this model, ( 2) is a convex optimization problem.An alternative and iterative method can be employed to address this problem.The method, however, cannot be directly applied to feature dimension reduction of EEG due to its low convergence rate.In this regard, in order to improve the scaling ability of the original model, an assistant function should be found and further optimized to avoid excessive computation of gradient information.

Conclusions and Prospects
As understanding of the uniqueness of EEG signals increases, EEG is applied to more and more fields.It can not only realize anticounterfeiting and bioassay but also overcome the security issue of traditional biometric-based identity recognition.In this paper, we have highlighted the following perspectives.(1) Concerning EEG processing, we have reviewed a number of EEG preprocessing techniques and identified the limitation of existing methods.We have also discussed how to maximize the preservation of EEG characteristics when removing noises such as EMG and EOG.(2) Concerning feature selection and extraction, various common features of EEG are reported.We have also elaborated the advantages of supervised and unsupervised selection methods in dimension reduction of EEG data features.(3) We have placed an emphasis on the description of the design of various classifiers for EEG.Machine learning for EEG identity recognition systems is discussed as well.In summary, identity recognition based on EEG is still in its infancy and there is much work to be done in this area.Though various algorithms have been proposed, there is still significant room for improvement in both theory and practice, so as to meet the performance and cost requirements of person authentication using EEG under different states or from different devices.

Figure 3 :
Figure 3: Overview of identity recognition based on EEG.

Figure 5 :
Figure 5: kNN query starts at the test point and grows a spherical region until it encloses  training samples, and it labels the test point by a majority voting of these samples.In this case, where  = 7, the test point would be labeled by the category of the red points.

Table
: Five frequency bands of EEG signal.
The signals in different encephalic regions are mixed with interference signals, such as electromyography (EMG), electronystagmogram (ENG), and electrocardiogram (ECG).Thus the signal noise ratio of the collected EEG signals decreases, causing difficulty in extracting the useful EEG signals.
[25] EEG data analysis and modeling center in Thredbo University, Germany, has provided many original EEG benchmarks (http://www.fdm.uni-freiburg.de/Epilep-syData)[25].The benchmark suit contains many data about dynamic EEG analysis and prediction of epileptics. min