A Pervasive Approach to EEG-Based Depression Detection

,


Introduction
Depression is a common mood disorder, which might cause persistent feeling of sadness, loss of interest, and impairment of memory and concentration.Depressed patients normally experience cognitive impairment and suffer long and severe emotional depression.In severe cases, some patients will experience paranoia and illusion [1].According to the World Health Organization statistics, >300 million individuals suffer from depression worldwide; approximately 800,000 people die due to it every year [2].Thus, depression is predicted to become the second most common disease after heart disease by the year 2020 [3].Hence, the diagnosis of depression in the early curable stages is critical and might save the life of a patient [4].
Presently, the study on the human cerebral is currently under intensive focus in order to understand the mechanism underlying persistent negative emotion and depression.Therefore, the most commonly used diagnosis of depression is a scale-based interview conducted by a psychologist or 2 Complexity psychiatrist.The current international standard mostly used is "In Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition)" (DSM-IV) [5], and the clinical test, Mini-Mental State Examination (MMSE), is commonly applied [6].Other conventional psychometric questionnaires, such as Beck depression inventory (BDI) [7] and Hamilton Depression Rating Scale (HDRS) [8], are also used as screening tools rather than as the instrument for the diagnosis of depression.
The current methods of depression detection are humanintensive, and the results are dependent on the doctor's experience.Furthermore, depressed individuals are less likely to seek help due to fear of stigma and the nature of the disorder.As a result, a large number of depressed patients, not diagnosed accurately, do not receive optimal treatment and adequate recovery period.Therefore, finding convenient and effective methods for the detection of depression is an emerging topic for research.With the latest advances in the sensor and mobile technology, the exploration using physiological data for the diagnosis of mental disorder opens a new avenue for an objective and accurate tool for depression detection.Among all kinds of physiological data, electroencephalogram (EEG) reflects emotional human brain activity in real time [9].
The EEG signal is a recording of the spontaneous, rhythmic, electrical activity of brain neurons from the scalp surface.Since the earliest discovery from the rabbit and monkey brain and the first recording of the human EEG signal by German psychiatrist Hans Berger in 1926, studies on the analytical method of EEG and the interpretation of the association between the brain function and mental disorders have been continued for over a century [10].Neuroscience, psychology, and cognitive science research showed that a majority of the psychological activities and cognitive behavior could be indicated by EEG [11][12][13].The EEG signal is closely related to the brain activities and emotional states, and it could reflect the emotional transformation in real time.Cole and Ray [14] found that the EEG signal collected from the parietal lobe of brain is associated with the cognitive tasks and emotional states.Klimesch et al. found that the alpha waves with low frequency could reflect some of the features of attention, such as vigilance and expectations [15].Srinivasan et al. demonstrated that the frequency domain features of EEG could be used to predict the level of attention [16].Therefore, the EEG signal is critical for understanding the processing of human brain information and emotional state transformation.
The studies on EEG could be used to understand the mechanism underlying brain activity, human cognitive process, and diagnosis of brain disease, as well as the field of the Brain Computer Interface (BCI), which has attracted much attention in recent years [17].Compared to Computed Tomography (CT) and functional Magnetic Resonance Imaging (fMRI), EEG has a higher time resolution, a lower maintenance cost, and a simpler operation method.Thus, as an objective physiological method to obtain data, EEG was proposed as a nonintrusive approach to study cognitive behavior [18][19][20] and other illness symptoms, such as insomnia [21][22][23], epilepsy [24][25][26], and sleep disorder [27].EEG has also been used in the diagnosis of mental disorders, such as anxiety [28][29][30], psychosis [31][32][33][34], and depression [35][36][37][38].In addition, depression as a mental disorder with clinical manifestations such as significant depression and slow thinking is always accompanied by abnormal brain activity and obvious emotional alternation.Therefore, as a method tracking the brain functions, EEG can detect these abnormal activities.
The frequency of the EEG signal can be divided into 5 wave-bands: delta wave (<4 Hz), which normally appears in an adult's slow-wave sleep; theta wave (4-8 Hz), which is usually found when someone is sleepy; alpha wave (8-14 Hz), which is normally detected when someone is relaxed; beta wave (14-30 Hz), which commonly appears when someone is actively thinking; and gamma wave , which could appear during meditation.The EEG signals undergo changes in the amplitude as well as frequency, while different mental tasks are performed [39][40][41][42].
Presently, for research purposes, the most commonly used are 128-electrode and 256-electrode EEG systems [43,44], which are specifically designed for research purposes.The operation of the instruments was not only difficult to initiate but also it required technicians to apply conductive gel to each electrode on the participant's head before each use.The preparation process alone takes 30 minutes on average.In addition, these EEG systems are expensive.Overall, these systems are not practical for pervasive depression detection.
In the present study, the pervasive three-electrode EEG acquisition system, developed independently by the Ubiquitous Awareness and Intelligent Solutions Lab (UAIS) of Lanzhou University [45], was employed to construct a database containing both depressed patients and normal controls.Thus, the use of the latest data processing technique and machine learning to explore a pervasive EEGbased depression detection system has been the focus of investigation.In order to support this research: (1) A pervasive three-electrode EEG acquisition system has been introduced (Section 2.1).
(2) A psychophysiological experiment has been conducted, in which EEG of 213 participants has been recorded.These physiological data provided a comprehensive database for further analysis, construction, and evaluation of a pervasive EEG-based depression detection system (Sections 2.2 and 2.3).
(3) Several EEG preprocessing steps and methods were applied on the raw EEG data (Section 3.1).
(4) 270 features were identified and extracted from the recoded database.By employing a feature selection technique, an optimum feature matrix was constructed for the depression classification process (Section 3.2).

Pervasive Three-Electrode EEG Database Construction
2.1.Pervasive Three-Electrode EEG Acquisition System.The 10-20 system, proposed by Jasper in 1958, defined the name of the electrode and later became the international standard EEG placement system [46].With the development of sensor technology, the electrode became smaller than that in previous systems and the electrodes recorded a detailed EEG.In 1985, Chatrian et al. added extra electrodes in intermediate sites halfway between those of the existing 10-20 system, thereby expanding it to a 64-electrode system [47].
Due to the complexity of the full-brain 128-electrode and 256-electrode systems, the investigators restricted themselves from mobile and pervasive application.Thus, with the development of universal and pervasive electronic technology, the 8-electrode and 16-electrode systems with small volume were also developed gradually.
As shown in Figure 1, F represents the frontal lobe, T represents the temporal lobe, C represents the center, P represents the parietal lobe, and O represents the occipital lobe.EEG reacts to the biological activity of the brain tissue, thereby indicating the functional status of the brain [48].The EEG signal collected from the different locations of the scalp reflects a variety of information.For example, EEG from the frontal lobe reflects human memory, computational power, attention, and responsiveness; EEG from the parietal lobe is associated with somatic responses; EEG from the occipital lobe can be used as a reference for visual reactions; EEG from temporal lobe is related to auditory reactions.Therefore, for different research direction and purpose, the appropriate EEG collection location is essential.
Prefrontal cortex is the center of consciousness; thus, the better the control of the forehead cortex, the better the emotional control.Jasper studied the resting-state EEG of severe depression patients showing that when the body suffered from severe depression, the activity of the cerebral cortex was altered [49].Nauta emphasized that the prefrontal cortex played a major role in different aspects of emotional processes [50].Rolls put forward the importance of prefrontal cortex for emotional and motivational processes [51].Harmon-Jones suggested that the specific forms of anger, or anger elicited in particular contexts, are associated with left-sided prefrontal activation [52].In conclusion, the above studies have shown that the electrode sites located in the prefrontal cortex are associated with emotional process and psychiatric disorders.Therefore, Fp1, Fp2, and Fpz are the ideal choices of scalp position in the current experiment.The hair in the frontal lobe is absent and contact dry electrode should be sufficient without the need for applying conductive gel.The pervasive three-electrode EEG acquisition system (Figure 2), developed by UAIS from Lanzhou University [53], runs on rechargeable battery and transmits all the EEG data through Bluetooth 2.0 wirelessly.The system is extremely small in size and can be easily placed on the location.The sampling frequency is 250 Hz and according to the EGI engineers, all electrodes have an impedance of <50 kΩ.Since the frequency of EEG is 0.5-50 Hz, the passband of the EEG acquisition is 0.5-50 Hz.

Experiment Method.
Compared to the normal controls, depressed patients responded differently to outside stimulus [54,55].The feedback of the depressed patients to the positive and negative stimuli weakened.As the positive stimulus feedback weakened further, the overall performance was negative emotions and reflected as such in the emotional response of the different subsystems.In summary, no significant difference was observed in the positive stimulus between normal controls and depressed patients, and depressed patients would produce more negative emotions under negative stimulus as compared to normal controls.Beck's cognitive behavioral model of depression postulated that the depressed patients are likely to support a negative view of themselves, the world, and even the future.In order to maintain this negative self-view, they even resist the environmental feedback that is inconsistent with the view [56].Epstein et al. suggested that, in comparison to normal controls, depressed patients responded with less bilateral ventral striatal activation to positive stimuli, which leads to the decreased interest in performance of activities [57].Bylsma et al. proved that depressed patients exhibit less reactivity to all stimuli and events, irrespective of positive or negative nature [58].
Therefore, recording and analysis of the EEG signal in different stimuli may help in the identification of patients with depression.This study was designed to record the participants' EEG in four different cases: in resting state, under negative stimulus, under neutral stimulus, and under positive stimulus.The source of stimulus is soundtracks from the International Affective Digitized Sounds (IADS-2) [59], which is a standardized database of 167 naturally occurring sounds, widely used in the study of emotions.
The experiment was performed in a quiet room.Firstly, the experiment objective and procedures were described to the participants.Then, the pervasive three-electrode EEG acquisition system was placed on the participants' foreheads and checked for reception.After one minute of relaxation, the experiment begins again.At first stage, 90 s of restingstate EEG was recorded.Then, the participants were asked to   remain seated with eyes closed with as little body movements as possible, followed by another minute of rest.In the second stage, stimulation soundtracks will be played to participants.Each soundtrack was 6 s long with a 6 s break between each soundtrack.The process would continue until the experiment is completed.The process of EEG acquisition is shown in Figure 3.A total of 6 stimulation soundtracks (according to IADS-2) existed, including 2 neutral stimulation soundtracks, 2 negative stimulation soundtracks, and 2 positive stimulation soundtracks.Table 1 describes each audio stimulation.

Psychophysiological Database.
Of the total 250 participants, 213 (92 depressed patients and 121 normal controls) completed the experiment, successfully.The raw EEG data from all the electrodes were recorded.Depressed participants were selected by professional psychiatrists using MMSE [6], which is a 30-point questionnaire used by the psychiatrist during a face-to-face interview to assess the degree of cognitive dysfunction in patients with diffuse brain disorders.In addition, all participants are asked to fill the following scales for cross-referencing: (A) The Patient Health Questionnaire (PHQ-9) [60] is a 9-question-based multipurpose instrument for screening, diagnosing, monitoring, and measuring the severity of depression.We chose this questionnaire in order to find the relevance between the EEG characteristic and the severity of depression.(B) Life Event Scale (LES) [61] contains 48 questions including events of family, work, and social support.The influence of each event is evaluated for severity, duration, and frequency.We chose this questionnaire for cross-referencing purposes.(C) Pittsburgh Sleep Quality Index (PSQI) [62] contains 19 self-reported items, creating 7 components to diagnose sleep disorders.We chose this index to explore the direct link between sleep qualities with depression in EEG.(D) Generalized Anxiety Disorder Scale-7 (GAD-7) [63] contains only 7 self-report questions for screening and measuring the severity of generalized anxiety disorder.We chose this questionnaire for crossreferencing between depression and anxiety.

Data Processing
In this study, all preprocessing, and data analyses have been implemented using MATLAB software (version R2014a).ECG is a smooth signal among the physiological electrical signals, with a large amplitude.As the heart is located distally from the head, the ECG signal will be greatly attenuated when spread to the scalp.EMG is produced by muscle contraction, with an amplitude of 10 V to 15 mV.The frequency of EMG is concentrated primarily in the high band > 100 Hz.Powerline interference focuses on fixed operation frequency.In order to remove these interference signals, we followed the results of several investigators.Yang proposed a cascade of three adaptive filters based on the least mean squares (LMS) algorithm and verified that the proposed filter reduced the interference in EEG signals [64].Tong et al. validated the use of independent component analysis (ICA) for an efficient suppression of the interference of ECG from EEG [65].The National Institute of Mental Health announced that using an adaptive filter to estimate the contaminants can subtract them from the EEG data [66].
No overlap occurred between the frequency of EEG signal and power-line interferences, EMG and ECG; thus, Finite Impulse Response (FIR) filter based on the Blackman time window was used to remove these interference signals.The adequate linearity of the FIR filter is widely used in modern electronic communication.It can guarantee any amplitude frequency characteristics simultaneously, with strict linear phase-frequency characteristics.In addition, the unit sampling response is finite, which stabilized the filter.In order to reduce the energy leakage of the spectrum, the signal can be truncated by different interception functions.This truncation function is known as the window function.The time domain representation of the Blackman time window is where   () is the rectangular window function and  is the length of truncated data.
The resulting EEG signal is retained only between frequencies in the range of 0.5-50 Hz.However, the frequency of EOG overlaps within this range.Although all participants were asked to remain seated with eyes closed, their EOG was recorded inevitably while using the prefrontal-lobe EEG sites, such as Fp1, Fp2, and Fpz.A general model for EOG contamination can be described by where () and () are the samples of the recorded (including noise) and true EEG, respectively,  represents the source EOG, and  is an unknown transfer function.Kalman filter is an optimal recursive data processing algorithm, which has been widely utilized in several applications, such as industrial control systems, radar target tracking, communications and signal processing, aeroengine diagnosis, and intelligent robots.Kalman filter is based on the previous estimated value and the observed value of the current time to estimate the current value of the stated variable.Thus, the frequency of the EOG artifact would not exceed 15 Hz, and the approximate EOG signal and the amplitude of the brain in the low frequency band are small.As a result, the Kalman derivation formula combines the Discrete Wavelet Transformation (DWT) and an Adaptive Predictor Filter (APF) to estimate the pure EOG artifact.
The denoising model proposed in the present study involves the following steps: (1) signal decomposition, (2) ocular artifacts (OA) zones detection, (3) signal prediction, and (4) signal reconstruction.Herein, DWT was used to decompose the EEG signals and detect the OZ zones.The frequency range of the EEG signal was 0-64 Hz, while the OA occurred in 0-16 Hz.The multiscale DWT decomposition was used to extract the low frequency components and nonstationary time series, which were then divided into several approximate stationary time series.Thus, the conventional forecasting methods, such as Kalman filter, can predict the shape of the true wave of decomposition signals accurately.Subsequently, the Adaptive Auto Regressive (AAR) models and an Adaptive Predictor Filter (APF) were applied to improve the prediction.The APF uses an adaptive filter to estimate the future values of signals based on their past values.Finally, the EOG artifacts were removed from the raw EEG signal, and the data were ready for further processing.

Features Matrix Construction.
The features matrix consists of  rows and  columns, where  represents the number of EEG data and  represents the number of features extracted from each EEG.The present study constructed the training effective features matrix using three steps as follows: (1) Identify and extract all the efficient features for each set of EEG data, such that each row represents a feature vector.
(2) Each row of the features matrix is selected by feature selection; that is, the most suitable feature is selected from all the extracted features to form a final feature vector.
(3) Each row of the feature vectors is tagged by depression or nondepression.

Feature Extraction.
The EEG signal presents weak, nonlinear, and time-sensitive characteristic, which exhibits typically complex dynamics.The feature of EEG will change with the emotional state transformation.The analysis of EEG data displayed different linear features such as peak, variance, and skewness that were used in recent literature [67][68][69][70].Efforts have been made in determining nonlinear parameters such as Correlation Dimension for pathological signals, which are shown as useful indicators of pathologies [71].In order to obtain the feature matrix, we must first perform the feature extraction of the pretreated EEG.The EEG features are mainly divided into Time Domain Features and Frequency Domain Features.Owing to the nonlinearity and randomness of the EEG signal, this study extracts the nonlinear features such as the Correlation Dimension and Shannon Entropy in addition to the above EEG features.Finally, the following features were selected for extraction: (1) Time Domain Features.Time domain constitutes the most intuitive EEG features.The EEG signals are collected at a certain time and frequency.The artifacts are directly removed from the time domain EEG signal, and useful information was extracted as a time domain feature that can be used for continuous prolonged EEG detection.The time domain features extracted in this study include peak, variance, skewness, kurtosis, and Hjorth parameter.Hjorth parameters are indicators of statistical properties used in signal processing in the time domain introduced by Hjorth in 1970 [72]; the parameters include activity, mobility, and complexity.Among them, the activity parameters represent the signal power and the variance of time function.The mobility parameters represent the mean frequency or the proportion of standard deviation of the power spectrum.The complexity parameters represent the change in frequency.These parameters are usually used to analyze the EEG signals for feature extraction.
(2) Frequency Domain Features.Frequency domain is a tool for characterizing and classifying the EEG signals.Herein, the frequency domain features are relative centroid frequency, absolute centroid frequency relative power, and absolute power.
(3) Nonlinear Features.The EEG signals are nonstationary and random; they also include some of the characteristics of the nonlinear dynamics system.With increasing number of studies on the EEG signals, the nonlinearity has been under intensive focus worldwide.Therefore, processing and analyzing the EEG signal based on the nonlinear dynamics theory become a new research direction.The nonlinear features extracted in this study include 0-complexity, Kolmogorov Entropy, Shannon Entropy, Correlation Dimension, and Power-Spectral Entropy.
(A) The 0-complexity was proposed by Shen et al. [73] to resolve the issue of over-coarse graining preprocessing in Lempel-Ziv complexity (LZC) [74].The core of the algorithm is to decompose the sequence into regular and irregular components, and the 0complexity defines the proportion of irregularities in the sequence.The greater the proportion of its share, the closer the time domain signal to the random sequence, and, thus, the greater the complexity.The doctrine presumes that a signal can be divided into regular part and stochastic components.If  0 is a measurement of the signal and  1 is the measurement corresponding to the stochastic part, 0-complexity is defined as the ratio of  1 and  0 .Supposedly, the EEG signal to be analyzed is {(),  = 0, 1, . . ., −1} with a length of  samples; then the 0-complexity can be calculated with the power spectra as follows: The fast Fourier transform (FFT) of the signal is as follows: The mean amplitude of the power spectrum () is as follows: () less than  are replaced by 0 to obtain a new spectrum series (): The inverse FFT (IFFT) of () is as follows: The power of stochastic part  1 is extracted, and the 0-complexity was estimated: (B) Kolmogorov Entropy was used to measure the rate of loss of information per unit of time.Positive and finite entropy represents that the time series and the dynamic underlying phenomenon are chaotic.Zero entropy indicates a regular phenomenon in the space phase.Infinite entropy refers to a stochastic and nondeterministic phenomenon.Kolmogorov Entropy is defined as the average rate of loss of information as follows: (C) Shannon Entropy was introduced by Shannon in 1948 in an article entitled "A Mathematical Theory of Communication" [75].The size of the information of a message is directly related to its uncertainty.The amount of information is equal to the amount of uncertainty.Shannon Entropy is a measure of uncertainty of a random variable and a random signal.The larger the entropy, the greater the uncertainty and randomness.In the present study, the entropy used to process EEG can be viewed as a measure of the order in the signal, which measures the skewness and uncertainty [76].In the case of random variables with known probability distribution, the entropy is defined by where  is a random variable with probability distribution () and alphabet set  [77].
(D) Correlation Dimension indicates the dynamic features of the EEG signal.The greater the Correlation Dimension number, the complicated the EEG time series.The Correlation Dimension is a fractal dimension, often computed from the time series illustration.It is a simplified phase space diagram constructed from a single data vector.The fundamental Correlation Dimension algorithm was introduced by Grassberger and Procacia in 1983 [5] and can be expressed as below: where () is the correlation integral and  is the radial distance around each reference point.
(E) Power-Spectral Entropy is a sequence of power density with the frequency distribution obtained by Fourier transform.The calculated entropy of the power spectrum (referred to as Power-Spectral Entropy) can be implemented easily.The Power-Spectral Entropy is used to analyze the timing signals in EEG data.The entropy can be used as a physical indicator to estimate the quality and intensity of brain activity.The larger the entropy, the more active the brain.
All linear and nonlinear features (Table 2) were extracted from alpha wave, beta wave, delta wave, theta wave, gamma wave, and full-band EEG of each electrode (Fp1, Fp2, and Fpz).Therefore, a total of 270 features (15 basic features × 6 frequencies × 3 electrodes) were extracted.All the involved linear and nonlinear features are common information about EEG.

Feature Selection.
Feature Selection is used to select a relevant subset of all available features, which not only yields a small dimensionality of the classification problem but also reduces the noise (irrelevant features).We further deduced the types of features suitable for suppressing the EEG signal recognition by inspecting the features selected by the applied algorithm.
The feature evaluation function focuses on the relation between the features and the target class, which tends to involve redundant features, influencing the learning accuracy and results.In order to achieve these results, we applied the minimal-redundancy-maximal-relevance (MRMR) technique to perform the feature selection.The MRMR feature selection criterion was proposed by Peng et al. [78] in order to resolve the issue by evaluating both feature redundancy and relevance simultaneously; in particular, max-relevance, denoted as max (, ), refers to maximizing the relevance of a feature subset  to the class label .In [1], the relevance of a feature subset is defined as where Φ(  , ) denotes the relevance of a feature   to .Φ could be estimated using any correlation measures.Feature redundancy is defined based on the pairwise feature dependence.If two relevant features highly depend on each other, the class-discrimination power would not change dramatically if one of the features was removed.Minredundancy, min (), is used to select a feature subset of mutually exclusively features.The redundancy of a feature subset is defined as MRMR is defined as the simple operator maximizing  and minimizing  consecutively.In [1], the incremental search method was used to find the near-optimal features.The Complexity feature subset  −1 of −1 selected feature is utilized to select the -order feature that optimizes the following criterion: max 3.2.3.Effective Tagging.Each feature vector (each row of the feature matrix) has to be marked with a specific emotional tag.In this study, we divided the experimental population into two categories: depressed patients and normal controls.All eigenvectors are tagged as depressed and nondepressed.

Classification
SVM, KNN, and CT are the widely used classification algorithms in the majority of the EEG-related studies.In the present study, we evaluated the performance of these classifiers (SVM, KNN, and CT) plus the Artificial Neural Network (ANN) classifier in the depression detection process.All classifications and 10-fold cross-validations have been implemented using the MATLAB software (version R2014a).

Classification Techniques
4.1.1.SVM.SVM, proposed by Cortes and Vapnik [79] in 1995, is a supervised learning model and regression method.It exhibits several unique advantages in resolving the issue of small sample data, nonlinear data, and high-dimensional pattern recognition [80].SVM builds a hyperplane or an infinite-dimensional space for classification and regression.
The kernel function allows SVM to deal with the nonlinear classification problem by attempting to cluster a feature space based on the known labels, with maximum possible distance between the clusters' borders [79].In addition, SVM has been widely used in many fields such as text classification [81], image classification [82], biological sequence analysis, biological data mining [83], and handwriting character recognition [84].In recent years, SVM has also been applied in the field of depression discrimination [85][86][87].In the present study, Gaussian Kernel functions have been implemented and evaluated in SVM classification.[89], stress [90], and depression [85,91].
4.1.3.CT.CT, also known as decision tree, is a tree structurebased supervised classification model [92], defined by separating and partitioning a feature space, using multiple rules and defining a local model, into which the feature spaces can be categorized as binary or multiclass clusters.Each of the internal nodes represents a property, each edge represents a result, and each leaf represents a class label.Compared to the other classification algorithms, the decision tree is the fastest classification.CT has been used in classifying Alzheimer's disease [93] as well as depression [94].
4. 1.4.ANN.ANN is a classification method that mimics the structure and function of the biological neural network and consists of an information processing network with wide parallel interconnection of simple units.This network exhibits learning and memory ability, knowledge generalization, and input information feature extraction ability similar to that of the human brain [95].Neural networks have been used to resolve a variety of difficult tasks using common rulebased programming, such as computer vision [91], speech recognition [96], and metal disorders [97,98].ANN is the only unsupervised machine learning classifier used in the present study.

Classification
Result.10-fold cross-validation results of the most optimal performance feature combination sets of each classifier and their accuracy in the detection of depression are shown below: results of resting-state data, neutral audio stimulation data, positive audio stimulation data, and negative audio stimulation data are summarized Tables 3, 4, 5, and 6, respectively.
For resting-state EEG data, KNN achieved the best accuracy of 76.83% using feature combination of absolute power of gamma wave on Fp1, absolute power of theta wave on Fp2, absolute power of beta wave on Fp2, and absolute center frequency of beta wave on Fp2 (Table 3).
For EEG data of participants under neutral audio stimulation, KNN achieved the best accuracy of 74.39% using the feature combination of absolute power of theta on Fp1, center frequency of full-band EEG on Fp2, and peak of full-band EEG on Fp2 (Table 4).
For EEG data of participants under positive audio stimulation, KNN achieved the best accuracy of 79.27% using the feature combination of absolute power of theta wave on Fp1 and absolute power of beta wave on Fp1 (Table 5).
For EEG data of participants under negative audio stimulation, KNN achieved the best accuracy of 77.44% using feature combination of absolute power of theta wave on Fp1, correlation dimension of full-band EEG on Fp1, absolute center frequency of theta wave on Fp2, and absolute power of gamma wave on Fp2 (Table 6).
The results showed that, among all the four classifiers of SVM, KNN, CT, and ANN, KNN performed the best with an average classification accuracy of 76.98% (Figure 4).The absolute power of theta wave appeared in all the best performance feature combination, thereby indicating a potential connection between theta wave and depression.The absolute power of theta wave might be a valid characteristic for pervasive depression discrimination.

Conclusion and Future Work
Depression is a major health concern in millions of individuals.Thus, diagnosing depression in the early curable stages is critical for the treatment in order to save the life of a patient.However, current methods of depression detection are human-intensive, and their results are dependent on the experience of the doctor.Therefore, a pervasive and objective method of diagnosing or even screening would be useful.The present study explores a novel method of depression detection using pervasive prefrontal-lobe threeelectrode EEG system, which chooses Fp1, Fp2, and Fpz for electrode sites, according to the international 10-20 system.Several widely employed psychological scales were used to select the optimal experimental candidates, which encompassed 213 participants (92 depressed patients and 121 normal controls).Their EEG data of resting state, as well as under sound stimulation, were recorded.The soundtracks were selected from the IADS-2 database, comprising positive, neutral, and negative stimuli.
The FIR filter combining the Kalman derivation formula, DWT, and an APF were applied on the raw EEG data to remove the interference from environment, ECG, EMG, and EOG.Subsequently, 270 linear and nonlinear features were extracted from the preprocessed EEG.Then, the MRMR technique was applied to perform the feature selection.Four classification algorithms, KNN, SVM, CT, and ANN, have been evaluated and compared, using a 10fold cross-validation.The results exhibited KNN as the best performance classification method in all datasets, with the highest accuracy of 79.27%.The results also demonstrated the feature "absolute power of theta wave" in all the best Complexity performance features of the four datasets, thereby suggesting a robust connection between the power of theta wave and depression; this could be used as a valid characteristic feature in the detection of depression.
The current study postulated that a novel and pervasive system for screening depression is feasible.With a carefully designed model, the pervasive system could reach accuracy similar to the current scale-based screening method; for instance, the accuracy of BDI is reported to be 79-86% in different studies [99][100][101].
EEG and depression have been under intensive focus of research.In comparison to the study by Knott et al., who collected EEG recordings from 21 scalp sites and conducted univariate analyses for group comparisons and correctly classified 91.3% of the patients and controls [102], the current pervasive three-electrode EEG acquisition system can be measured easily and is rather suitable for the personal use of patients.Moreover, we used a less number of electrodes, thereby reducing the amount of data considerably.Healey and Picard attempted to extract feature patterns from physiological signals for emotional recognition, with an accuracy of 61.8-78.4% [103].Kim et al. developed a short-term monitoring emotion recognition system based on multiuser physiological signals to extract features and identify three emotions using SVM.The final recognition accuracy was 75% [104].Compared to these studies, our result has a higher accuracy with faster data processing efficiency.Henriques collected the resting-state EEG (with closed eyes) from 5 depressed patients and 13 normal individuals.The results showed that the activity in the left hemisphere in depressed patients is significantly lower than that in a normal person [105].Omel' chenko and Zaika collected EEG data from 53 depressed patients and 86 normal individuals and demonstrated that patients with depression have a higher delta and theta energy than normal, but a lower alpha and beta energy [106].Fingelkurts et al. researched the shock components of resting-state EEG from 12 depressed patients and 10 normal persons and found that the brain activates were affected by depression throughout the cerebral cortex [107].Compared to these studies, the current experiment presented more reliable data to ensure the reliability of the experimental results.Taken together, our data model based on feature extraction and feature selection reduced the amount of data to be processed with a faster data processing efficiency.In addition, the pervasive three-electrode EEG acquisition system was easier to measure, as well as ensuring the accuracy of data.

Table 2 :
Features used in the feature extraction process.
.1.2.KNN.KNN algorithm is a nonparametric supervised machine learning method for classification and regression.It was introduced by Dasarathy [88] in 1991 based on instant or lazy learnings.The classifier based on KNN does not require a training phase, and its computational complexity is proportional to the number of documents in the training set.Taken together, if the number of documents in the training set is , then the time complexity of the KNN classifier is ().KNN categorizes the feature spaces into binary or multiclass clusters by employing a training dataset to further classify the data points according to the closest data points to  in the training dataset.KNN has been used in medical informatics, such as the detection of epilepsy

Table 3 :
Classification results in resting state data.

Table 4 :
Classification results in neutral audio stimulation data.

Table 5 :
Classification results in positive audio stimulation data.

Table 6 :
Classification results in negative audio stimulation data.
Figure 4: Average accuracy of classification on selected features.