Towards Driver’s State Recognition on Real Driving Conditions

In this work a methodology for detecting drivers’ stress and fatigue and predicting driving performance is presented. The proposed methodology exploits a set of features obtained from three different sources: (i) physiological signals from the driver (ECG, EDA, and respiration), (ii) video recordings from the driver’s face, and (iii) environmental information. The extracted features are examined in terms of their contribution to the classification of the states under investigation. The most significant indicators are selected and used for classification using various classifiers. The approach has been validated on an annotated dataset collected during real-world driving. The results obtained from the combination of physiological signals, video features, and driving environment parameters indicate high classification accuracy (88% using three fatigue scales and 86% using two stress scales). A series of experiments on a simulation environment confirms the association of fatigue states with driving performance.


Introduction
Real-life car driving requires accurate and fast decisions by the driver, given only incomplete information in real time. A large number of fatalities occurring during car driving could be avoided if behaviors such as driver inattention, stress, fatigue, and drowsiness were detected and appropriate countermeasures were produced. The determination of the driver status in a vehicle is an active topic for the scientific community. However, the detection of stress and fatigue level in drivers is a complex task, which requires expertise in biosignal processing, computer vision, human factors, and so forth.
Stress could be defined as the awareness of not being able to cope with the demands of one's environment, when this realization is of concern to the person and associated with a negative emotional response, while fatigue as the temporary inability, or decrease in ability, or strong disinclination, to respond to a situation, because of previous overactivity, either mental, emotional, or physical [1]. The estimation of fatigue is well studied in the literature [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. The majority of relative works is based on in-lab experiments, mainly focusing on face monitoring and blink detection to calculate eye activation [2], while the vehicular experiments serve for indirect fatigue recognition through its impact on driving issues (speed maintenance, steering control). These methods, however, are suitable for the recognition of rather late stages of the fatigue (drowsiness) when the effects on driver's face are quite noticeable and performance change has already become critical. In the road environment, even earlier fatigue stages can affect driving performance. This is because even lower fatigue levels still cause declines in physiological vigilance/arousal, slow sensorimotor functions (i.e., slower perception and reaction times), and information processing impairments, which in turn diminish driver's ability to respond to unexpected and emergency situations [3]. Therefore, the impact of fatigue on the driver's performance should not be estimated using only driving measures, but additional parameters, associated with the driving performance, are needed (such as perceptual, motor, and cognitive skills) [4]. According to Crawford [5], physiological measures are the most appropriate indicators of driver fatigue. This has been confirmed by numerous studies, which followed similar approaches for driver fatigue estimation, making use of biosignals obtained from the driver [6][7][8][9].
Bittner et al. [6] presented an approach for the detection of fatigue based on biosignals acquired from the driver electroencephalogram (EEG), electrocardiogram (ECG), electrooculogram (EOG), and video monitoring. They examined different features that might be correlated with fatigue, such as the spectrum of the EEG, the percentage of eye closure (PERCLOS), and the fractal properties of heart rate variability (HRV). They concluded that the first two are more correlated with instant fatigue levels of the driver, while the last is most suitable for the detection of the permanent state of the driver. Li [7] addressed the estimation of driver's mental fatigue using HRV spectrum analysis using a simulator for data collection. The features obtained from HRV indicated high correlation with the mental fatigue of the driver. Yang et al. [8] used heterogeneous information sources to detect driver's fatigue. The information sources included fitness, sleep deprivation, environmental information (traffic, road condition, etc.), physiological signals (ECG, EEG), and video monitoring parameters (head movement, blink rate, and facial expressions). In order to combine all the above-mentioned information they used the Dempster-Shafer theory and rules for determining whether the driver is in fatigue state or not. Qiang et al. [9] proposed a probabilistic framework for modelling and realtime inferencing of human fatigue by integrating data from various sources and certain relevant contextual information. They used a Dynamic Bayesian Network which encapsulates the time-dependent development of fatigue symptoms. The estimation is based on visual cues and behavioural variables. As research in the field progresses, a variety of physiological signals has been used for fatigue detection. The most informative measures in terms of fatigue recognition are those extracted from the EEG signal, which have been used for the quantification of task-specific performance changes [10][11][12][13][14][15][16][17][18]. However, the idea of near future vehicles, capable of acquiring drivers' EEG, is quite optimistic. Indicators coming from measurements taken in a less obtrusive manner should be exploited in a real life system.
Physiological measurements are also good indicators of the driver's stress. Several works in the literature focus on driver stress recognition based on biosignal processing. ECG, electromyogram (EMG), respiration, skin conductivity, blood pressure, and body temperature are the most common signals collected from the driver in order to estimate the workload and the levels of stress he/she experiences. Healey [10] presented a real-time method for data collection and analysis in real driving conditions to detect the driver stress status. According to them, there is a strong correlation between driver status and selected physiological signals (EMG, ECG, skin conductivity, and respiration effort). In another study, Healey and Picard [11] specified an experimental protocol for data collection. Four stress level categories were created according to the results of the subjects self-report questionnaires. A linear discriminant function was used to rank each feature individually based on the recognition performance and a sequential forward floating selection (SFFS) algorithm was used to find an optimal set of features to recognize driver stress. Healey and Picard [12] proposed a slightly different protocol, while the results showed that for most drivers, the skin conductivity and the heart rate are most closely correlated to driver stress level. Zhai and Barreto [13] developed a system for stress detection using blood volume pressure, skin temperature variation, electrodermal activity, and pupil diameter. Rani et al. [14] presented a real-time method for driver's stress detection based on the heart rate variability using Fourier and Wavelet analysis. Liao et al. [15] presented a probabilistic model for driver's stress detection based on probabilistic inference using features extracted from multiple sensors.
The well-established literature in stress and fatigue detection problems has revealed a number of features, highly correlated to the one or the other state. However, according to our knowledge, all studies focus only on one specific driver affective state (either fatigue or stress), although in practice they both influence the physiology of the driver and hence his physiological responses. Putting such systems in practice could make the estimation of drivers state less effective compared to experimental settings, as in real driving simultaneous presence of both fatigue and stress could occur making discrimination of different possible states more difficult.
Having this in mind, we developed a driver status recognition methodology for simultaneous stress and fatigue detection. Our methodology employs features coming from (i) a set of driver's biosignal recordings (ECG, electrodermal activity, respiration), (ii) video recordings from driver's face, and (iii) environmental conditions (weather, visibility, and traffic). In our work we select the features with higher contribution to the classification of the states under investigation. Furthermore, we evaluate the contribution of different groups of features (biosignals, video, and environmental features), in order to investigate which group is more associated to a specific driver's state (fatigue and stress). Using the selected features, we examine the performance of four different classifiers (namely, the SVMs, the Decision Trees, the Naive Bayes and General Bayesian classifier) on the driver state recognition accuracy. The proposed methodology allows for simultaneous estimation of stress and fatigue levels using the minimum set of physiological signals in the less obtrusive manner. Applying our methodology, changes in driver's state are estimated at an early stage before they critically affect driving performance.
Having developed a sound methodology for driver's state estimation we also study whether the estimated changes of driver's state affect driving performance. To perform this study, we have developed a driving simulation environment, which allows us to monitor a set of driving performance measures (steering, braking, lane keeping, and reaction time) and examine their association with the subject's physiological state. A series of laboratory experiments are conducted around the driver simulator. As drivers are not easily stressed when using a simulator, our study focuses only on the association of the estimated fatigue and the deterioration of driving performance.
In the following sections we first describe the proposed methodology (Section 2). The dataset obtained in real driving conditions is then presented (Section 3). In Section 4 the obtained results are presented. In Section 5 we shortly International Journal of Vehicular Technology 3 present our study of fatigue impact on driving performance. A discussion on the methodology and the results follows (Section 6). A concluding section summarizes our work (Section 7).

Methodology
The methodology consists of three main steps (depicted in Figure 1): (i) preprocessing and feature extraction which is decomposed in three streams: (I-a) signal acquisition, preprocessing, and feature extraction, (I-b) video acquisition processing and feature extraction, and (Ic) environment information extraction, These steps are described in details in what follows.

Step I(a): Signal Acquisition/Preprocessing and Feature
Extraction. The physiological signals which are reported in the literature as the most significant indicators of subjects' fatigue and stress, are, blood pressure, EEG, EOG, ECG, heart rate variability, skin conductivity and respiration [13,16,17]. However, in order to set up a real-time system for driver stress and fatigue monitoring in real driving conditions, the sensors for the physiological signal acquisition should be minimally obtrusive. Taking this into consideration, the recorded physiological signals in our work are limited to the following signals: (i) Electrocardiogram (ECG) through a g.ECG sensor which is placed on the subject's chest, (ii) Electrodermal Activity (EDA) through two Ag/Ag·Cl electrodermal activity sensors attached on the subject's middle, and index fingers of the right hand, and (iii) the respiration rate using a g.RESP Piezoelectric Respiration Sensor which is placed around the subject's thorax. The Biopac MP-100 system is used for signal acquisition. The ECG signal is acquired at sampling frequency 300 Hz while the EDA and the respiration signal are acquired at 50 Hz. The resolution is set to 12-bit for all signals.

ECG Signal.
(1) The Biopac system has an option of acquiring only the R-waves of ECG signals, which are more robust to noise. The output signal is a positive peak only when an R-wave is detected. This function is useful for heart rate calculations when a well-defined peak is desired as it tends to remove any components of the waveform that might be mistaken for peaks. This option is used in the real driving conditions, since the noise from the subject's movement introduces high noise in the ECG signal. In order to obtain useful indicators of the subject's states under investigation (fatigue and stress) we first perform some necessary preprocessing steps on the raw signals. The features are extracted in time windows of 5 minutes, that is, a reasonable compromise between the need of sufficient sample size in order to have reliable statistic properties and the need of small window to capture the changes in the psycho-physiology of the driver [18]. In order to extract the RRV signal from the ECG, an accurate estimation of R peaks is needed. Initially, a lowpass ButterWorth Filter is applied to the ECG signal to remove the baseline wonder. Then the R peaks are detected, using the procedure described in [19]. Furthermore, since the errors in the RR interval estimation and in RRV extraction can have serious impact in the spectrum estimation and thus in the features calculated from the spectrum, we also visually correct the initial R estimation of the algorithm through a specifically built application. After ECG preprocessing and R peak detection, the R-R intervals are estimated as the time differences between successive R peaks. Those R-R intervals constitute the RR variability signal (RRV). The next step is the interpolation of the RRV series in 4 Hz samples and downsampling to 1 Hz. This is an important step if ordinary spectrum estimation methods are to be applied (FFT, Autoregressive methods). After interpolation the low frequency (0.01 Hz) trend of the signal is removed using a ButterWorth filter. The FFT transform H( f ), of the signal (calculated at 1024 samples) is extracted and the spectrum of the signal is obtained as The following features are calculated from the spectrum: (i) the ratio of the very low frequency (VLF) (0.01-0.05 Hz) energy to the total signal energy, (ii) the ratio of the low frequency (LF) (0.05-0.2 Hz) energy to the total signal energy minus the VLF energy, (iii) the ratio of the high frequency (HF) (0.2-0.4 Hz) energy to the total signal energy minus the VLF energy, (iv) the ratio of the LF to the HF components.
We also calculate the Spectrum Entropy (SE) of the signal, The SE can be considered as a measure of the deterministic behavior of the RRV. The Detrended Fluctuation Analysis (DFA) [20][21][22], Approximate Entropy [23,24], and Lyapunov exponent analysis [25] are applied on our 5 min intervals of the RRV recordings.

EDA Signal.
The EDA signal is downsampled to 1 Hz. A smoothing filter is applied since in many cases noise is evident in the signal; then the low frequency 0.01 Hz of the signal is removed which is considered as the skin conductance level (SCL). The first absolute difference (FAD) of the remaining signal is calculated, giving a measure of the skin conductance response (SCR): The respiration signals have high signal to noise ratio and only in cases with subject's sudden movements, noise exists. The signal is downsampled to 10 Hz and the wonder is removed. The power spectrum of the signal, using FFT transform, is extracted. A smoothing of the power spectrum follows, and the maximum energy frequency between 0.1 Hz and 1.5 Hz is selected as the dominant respiration frequency (DRF). Furthermore, we extract another feature which is the ratio of the heart rate to the respiration rate. As respiration is a main modulator of the cardiac function, the hypothesis is that for normal/relaxed conditions the ratio of heart to respiration rate is constant and changes are observed only in abnormal conditions, such as stress and fatigue. Given the mean RR intervals and the dominant respiration frequency the ratio of the heart rate to the respiration rate is calculated as

Step I(b): Video Acquisition/Processing and Feature
Extraction. The video of the face of the driver is processed following the approach described in [26,27]. The first step is the detection of the face and the second is the detection of eyes. The information of interest is (i) the movement of the head, which could be an indicator for both stress and fatigue and (ii) the mean level of eye opening as an indicator of fatigue. We also calculate an estimation of PERCLOS, considering eye closure when the confidence of eye presence is less than zero. As a measure of head movement, the standard deviation of the face position in the video frame is used, and as a measure of eye opening we use the confidence of eye detection (provided in [26]). If the eyes are wide open this confidence is high, while for near close eyes it is quite low. As video is not available for all sessions (e.g., due to low quality recordings) the sessions without video recordings, can be treated as missing values. The K-NN algorithm is used for replacing the missing data in the combined data for video and physiological features for all sessions. K is set to 3 and the weighted Euclidean distance is employed.

2.3.
Step I(c): Environment Information Extraction. In our methodology we introduce driving environmental information. For this purpose, a forward looking camera for road monitoring is employed. From road monitoring video, useful information about driving environment conditions during each session is manually extracted. This information concerns weather, road visibility, and traffic conditions. Bad weather and low visibility are reported as important stress factors [28]. Another important stress factor is traffic density [29,30]. Using the video recordings of the road scenery, we manually extracted a metric of the traffic load of the road during the 5 min interval. All environmental variables are categorized in two states (good/bad weather, low/good visibility and low/high traffic density).

2.4.
Step II: Feature Selection. The majority of the features extracted in Step I are the most common features used in similar studies. However, a classifier using all those features would lack robustness. For this reason we employ feature selection. Such an approach is a prerequisite in cases where the ratio of data to features is low. Furthermore, introducing redundant features or features highly correlated can deteriorate the classification performance. Therefore, to build a robust classifier for stress and fatigue detection, we have to evaluate the contribution of each feature as an indicator of these states. For a two-class classification problem, we define as DAUC of a feature, the difference in the area under curve (AUC) of a classifier based on the specific feature and a random classifier. The DAUC is used as a metric of discrimination power of a feature. The DAUC of the optimal classifier is 0.5, thus features with DAUC near 0.5 are considered to be optimal. The area of the optimal classifier is 1 and the area of a random classifier 0.5, thus the difference of an optimal from a random classifier is also 0.5. In order to select the optimal feature set for more than one classification problems, the average DAUC of each feature is calculated. Then features are sorted according to their average DAUC, obtaining a feature ranking. Correlation analysis can be further implied to investigate relationships between features and exclude duplicate information. The number of features finally selected is experimentally derived.

2.5.
Step III: Classification. The third step of our methodology is classification. The performance of four different classifiers is examined. In this section we briefly describe the classifiers used for fatigue and stress classification.

Support Vector Machines (SVM).
Each instance in the training set contains one "target value" (class labels) and several "attributes". The goal of the SVM is to produce a model which predicts the target value of data instances in the testing set in which only the attributes are given. Let a training set of instance-label pairs be (x i , y i ), where x i ∈ R is the training vector, belonging to one of the classes generating the data, N is the number of the extracted features in the training set, and y i indicates the class of x i . The support vector machine requires the solution of the following optimization problem: subject to International Journal of Vehicular Technology   5 where b is the bias term, w is a vector perpendicular to the hyperplane separating the classes, ξ is the factor of classification error, and c > 0 is the penalty parameter of the error term. The training vectors x i are mapped into a higher dimensional space F by the function φ : R n → F. SVM finds a separating hyperplane with the maximal geometric margin and minimal empirical risk R emp in the higher-dimensional space. R emp is defined as where f is the decision function defined as where is the kernel function, a i are weighting factors, and b is the bias term. In our case the kernel is a radial basis function (RBF) which is defined as where γ = 1/2σ 2 is the standard deviation. The RBF kernel, which is used in our experiments, nonlinearly maps samples into a higher dimensional space, thus, it can handle the case when the relation between class labels and attributes is nonlinear. In our case γ = 1 and c = 10. In the case of more than two classes classification, the one-against-all strategy is followed.

Decision Trees.
To construct the decision tree we use the C4.5 inductive algorithm [31].
Prior to the information gain definition, we specify a measure called entropy, defined as the degree of complexity of the input samples. In the case of having C classes in a set S, the entropy of S, H(S), is defined as where p i is the ratio of class i in set S. Considering the previous equation, the information gain expresses the reduction of entropy. The information gain for an attribute X, Gain(S, X) is obtained as where Values(X) represents the range of feature X and S u is a subset of S having u as a result of feature X. In our problem, the extracted features are continuous valued. Therefore, they can be incorporated into the decision tree by partitioning them into a set of discrete intervals. For each continuous feature x, a new Boolean feature is created: The selection of the threshold t is conducted through a process of generation of a set of candidate thresholds which produce a high information gain. Those candidate thresholds are evaluated and the one that produces the maximum information gain is finally chosen. The algorithm of [31] has the advantage of solving the overfitting problem by using a postpruning method.

Naive Bayes
Classifier. The Naive Bayes classifier is based on the Bayes Theorem and the assumption of independence among variables. Despite the fact that the independence assumption is considered as poor in general, this classifier works well even in complex situations. Let again a set of instance-label pairs (x i , y i ) where x i ∈ R and y i ∈ Y the class producing x i . The probability model for a classifier is abstractly a conditional model: Applying the Bayes' Theorem: The denominator of the fraction is effectively constant. Thus, in practice we are only interested in the numerator of that fraction, which is equivalent to the joint probability model: Using the conditional independence assumptions we can write the joint probability as Then, under the aforementioned independence assumptions, the conditional distribution can be expressed as where Z is a scaling factor. This is a more manageable form, requiring (C − 1) + NRC parameters where R is the number of parameters for the p(x i ) model and C is the number of classes.

General Bayesian
Classifier. This classifier is based on the same philosophy as the Naive Bayes, without the hypothesis of feature independence. For example, in cases of continuous features following a Gaussian Distribution, in the Naive bayes case the covariance matrix is diagonal while in the General Bayes classifier the covariance is a full positive definite matrix.

Dataset
The dataset collection was performed driving conditions, which helps to recognize and understand the true physiology of the driving task and measure the subject's reactions to common driving conditions, such as bad weather and traffic congestion. The subject under investigation is a 28-year-old, healthy male, with two years of driving experience. Next, the experimental settings and protocols for the data collection is described.
The equipment that was used in order to acquire the needed information included (i) a Biopac MP-100 for signal acquisition of the driver (ECG, EDA and Respiration). This equipment was installed on the back seat of the vehicle and the sensors were attached to the driver as depicted in Figure 2. (ii) A camera monitoring the road is used only for annotation reasons, (iii) a camera monitoring driver's face. Before the beginning of the annotated sessions, the subject conducted a number of long-lasting sessions in order to familiarize with the equipment. The duration of the data collection in real conditions was approximately 18 months and a sufficient number of driving events under different conditions was encountered. The total number of tours (37 experiments), average duration of each tour and encountered conditions in all tours are shown in Table 1. Sessions are covering the whole day duration (07.00-24.00) so as to capture different fatigue levels ( Figure 3).
The driver annotation was performed at the end of each session, by self-annotating his state. A scale of three fatigue levels (normal, low fatigue, high fatigue) and a scale of two stress levels (normal, stress) are used, following a human factors expert's suggestion. In our experiments subject was tutored to annotate high fatigue as a state close to drowsiness symptoms and low fatigue as a lower level, between alertness and drowsiness.

Results
The first step of our methodology is the preprocessing and feature extraction described in Sections 2.1 and 2.2. The features extracted are summarized in Table 2. Low visibility (fog/night) 2   described method is straightforward since two classes exist. For fatigue classification, which is a three-class classification problem, the problem is decomposed in four two-class subproblems (normal versus low fatigue, normal versus high fatigue and low fatigue versus high fatigue). The DAUC of each feature for all abovementioned classification problems is given in Figure 4(a). To build more robust classifiers, we also investigate features discriminating fatigue and stress states. In Figure 4 Finally from environmental conditions, the weather conditions (S1) seem to be the most important. In Table 3, we present the correlation among physiological and video features. Correlation analysis shows that the indicators F1-F2, F1-F6, F1-F9, and F7-F8 are rather correlated. Finally, all these features were kept, as removal of any of them did not increase the performance of the selected classifiers; instead it tended to decrease the accuracy.
The third step of our methodology is classification. The classifiers tested are described in Section 2.5. As indicated in Table 1 our dataset is not balanced. To address this problem, the following procedure is used.
(i) 50 balanced datasets from the original one were extracted. Let K be the number of samples for the  (ii) For each of the 50 datasets we perform stratified 10-cross validation using the classifiers described in Section 2.5 and we obtain the confusion matrix.
(iii) The mean of each entry of the confusion matrix is calculated.
The measures used to evaluate the performance of the different classifiers are the following. (ii) Sensitivity per class: the fraction of correctly classified instances of a class to the total number of instances belonging to that class.
(iii) Specificity per class: the fraction of the correctly classified instances for a class to the total number of instances classified as the specific class.
(iv) Overall accuracy: the fraction of the total number of correctly classified instances to the total number of instances.
In Tables 4 and 5, we present the results for fatigue and stress classification using three sets of features: (i) only physiological features, (ii) physiological and video features, and (iii) physiological, video, and environmental features. In these tables the sensitivity and specificity per class, as well as the total accuracy for all classifiers and feature sets are given. For the two-class stress problem the information provided is sufficient to evaluate the performance of the classification. However for the three-class fatigue problem a better insight is given through the confusion matrix of the classification. From Tables 4 and 5, we observe that SVM had the best performance in all feature sets for classification of both states, whereas Naive Bayes classifier had the worst (up to 12% lower accuracy compared to SVM in some cases). In Table 4 we observe that the highest accuracy for fatigue classification was obtained using the full feature set (88% with SVM). When limited feature sets are employed the difference is rather small (85% with physiological features and 87% with physiological and video features, both obtained using SVM). In Table 6 detailed classification results (containing also the confusion matrix) are given using the full set of features. It can be noticed that the main source of misclassification is in the low fatigue class. From Table 5, we observe that for stress classification the incorporation of additional features, in contrast to fatigue detection, significantly increased the obtained accuracy. The 78% accuracy obtained by physiological features climbs to 86% using physiological, video, and environment features.
In our analysis we also study the contribution of the features to the classification results. As already described the features used in our experiments come from physiological signals, video monitoring of driver's face, and environmental information. As features are extracted from signals obtained from different sensors, features can be grouped into five groups each of them related to a specific sensor of the experimental setting. Such an analysis can give a significant insight for the importance of each sensor when building a system for driver state monitoring. features, respectively. We then evaluate the contribution of each group of features in the classification performance, with the following procedure: (i) we perform the classification with the whole feature set, (ii) we remove each group of features and we measure the decrement in accuracy. Removing a group of features, a reduction in accuracy is expected, taking into account that the initial feature set is considered as the optimal one. In Figure 5 the percentage of accuracy reduction is given. In fatigue classification, a large decrease in accuracy is observed removing RRV features, whereas in stress classification no group has such a significant impact in classification accuracy.

Study on the Impact of Fatigue on Driving Performance
Our proposed methodology showed good performance even in the detection of early stages of fatigue (low fatigue state). In order to investigate whether these early fatigue stages are worth recognizing, we performed a study to examine the impact of driver's fatigue levels on driving performance. The goal of our study was to verify that the detected fatigue levels are associated with the deterioration of driving performance. A simulation environment was developed to measure driving performance in terms of subject's sensorimotor functions (i.e., perception and reaction times). The simulation driving world was based on the Microsoft XNA framework as it is shown in Figure 6. The vehicle is controlled by the Logitech's Momo racing wheel. The subject was asked to focus on the driving task, that is, keep the vehicle within the road lane and avoid crashes with pedestrians which appeared unexpectedly on the road, by pressing the brake pedal and stopping the vehicle. From this primary task, measurements of steering control and reaction times are monitored. In addition to the primary driving task, a secondary task request is used following the well-established PDT (Peripheral Detection Task) technique [32]. During the experiments apart from pedestrians, other objects (animals) randomly appeared outside the roadway. Once the objects perceived, the subject responded by pressing one of the control buttons of the steering wheel, and the respective reaction time is measured. The physiological signals monitored in the laboratory experiments are similar to the ones measured during realworld experiments (ECG, EDA, and respiration). The same off-the-shelf equipment (Biopac MP-100) is used in this type of experiments. In the laboratory testing, a single camera is used to take video recordings from the driver's face. The same annotation method based on self-reporting, described in Section 3, is followed. Furthermore, subjects were asked to report the time they got awake and the hours of sleep.
The total number of sessions gathered is 24 and each session duration is 12 minutes. From those sessions, 12 subjects were in normal state, 7 in low fatigue, and 6 in high fatigue. Each session is split in two 5 intervals (the first and the last minute are not taken into account).
The 12-minute duration of the experiments, does not suffice to increase the fatigue level of a subject. Therefore, the experiments were performed at different hours of day (or night), ensuring that subjects experienced different fatigue levels, based on their previous work effort and hours of sleep. Furthermore, the environment, was quite calming in order to reduce any potential work-related stress. Some useful measures for driving performance are extracted, based on the task involved in the experimental protocol. The first category of measures involves the reaction time of the driver both on primary and secondary tasks. The reaction time is a good measure of subject's alertness. In order to evaluate the reaction time, the time passed from the moment that the object appeared on the screen until the subject presses the brake pedal (for the primary task) or the button (for the secondary task) is measured. The association of the fatigue levels with driving performance, is evaluated using the following measures: mean and standard deviation of reaction time on primary task, mean and standard deviation of reaction time in secondary task, and standard deviation of the vehicle position from the center of the lane. In Table 7, the mean ± standard deviation of the driving performance measures, for normal, low, and high fatigue states are given, as those are self-reported by the subject. The P value using the hypothesis that driving performance is not better in the normal state is also given. When the subject is in low fatigue state, a significant decrease in driving performance is observed, expressed in average reaction times for both primary and secondary tasks. In the high fatigue state all performance measures are significantly worse, as expected. Our analysis verifies that changes in driver's state that are detected by our methodology do correspond to driving performance changes.  F2 F3  F4  F5  F6  F7  F8  F9  F10  F11  F12  F13  F14  F15  V1  V2  V3  V4  F1 0

Discussion
In this work we presented a methodology for simultaneous fatigue and stress detection in realistic driving conditions.    Performing real-time monitoring of drivers physiological activity is still quite difficult, since this requires special sensor equipment attached to the driver, which in a realcar application would raise a number of safety-related issues concerning the obtrusive driver monitoring procedure. Some research projects [33] addressed the implementation of the unobtrusive driver monitoring paradigm, by collecting biosignals from sensors embedded on the steering wheel or adjusted on the driver's seat. Although many approaches on affective state recognition (either stress or fatigue) have presented promising results in the field of biomedical and/or other special applications, still they are not considered suitable for an automotive application. In our work, from the large group of biosignals used in similar studies, we have chosen to exploit only a limited set of them (ECG, EDA, respiration) and achieved comparable results by incorporating additional information from driver's face video as well as the driving environment. However, direct comparison with other methods is not feasible mainly for two reasons. First because other methods focus merely on the estimation of a single psycho-physiological state and secondly because most relevant studies were performed on a simulation environment. In our approach we followed a quite different experimental protocol allowing us to address (i) the simultaneous estimation of driver's stress and fatigue levels and (ii) the driver monitoring on real-life conditions. Furthermore, we demonstrated, using a simulation environment, that detection of even earlier stages of fatigue, is of high importance, since a significant deterioration in performance is observed.
Concerning the performance of the employed classifiers, SVM is the one presenting the best results in all classification problems, followed by Decision trees and Bayesian classifier. Naive Bayes had the worst accuracy. The reason for this, as depicted in Table 3, is that the assumption of feature independence does not hold, thus making Naive Bayes classifier weak for all classification problems examined.
Classification using physiological features shows very good performance (the highest accuracy 85% is obtained using the SVM classifier). The incorporation of additional features merely improves the initial results. In contrast, removing RRV-related features, a 15.5% decrease in accuracy is observed ( Figure 5). Using SVM classifier and physiological and video features a 99% accuracy in high fatigue classification is achieved. Considering that this state is more related to driving performance and accident provocation than the others, we consider that the success in the accurate detection is crucial. We also notice that the main source of misclassification is between the low fatigue class and the other two classes (normal and high fatigue). This is expected since the discrimination of fatigue in discrete levels is quite abstract, given that fatigue is commonly considered as a continuous variable. The discrimination of fatigue in classes might cause errors due to annotation errors from the subject who could misjudge his state. This problem is enhanced considering the long duration of the experiments and thus the probability of variation of the fatigue criteria as those are defined by the subjects.
Stress classification was expected to be more difficult, since no features, proved to sufficiently discriminate stress levels. Using only physiological signals, a 78% accuracy is obtained with SVM classifier. The incorporation of additional information, increases significantly the accuracy of all classifiers (the highest accuracy of 86% is obtained using the SVM classifier). Furthermore, we observe in Figure 5 that no group of variables has a very good discrimination power, thus concluding that a reliable system for stress detection must be based on the fusion of several information sources.
As a great number of features can be extracted from physiological signals, an important step in our methodology is feature selection. In Figure 4, we observe that the mean RR (F1) and the std of RR (F2) are very good discriminators for fatigue levels, while more complex RRV features (DFA, approximate entropy and Lyapunov exponents) lack discrimination power. However, those features are more used in medical applications, extracted from long recordings and are related to problematic heart function [34,35]. For within individual variations, simple RRV characteristics have proved to be rather informative [11]. Respiration rate which is highly correlated with heart rate, as well as EDA features are also good indicators of fatigue. From video features, std of eye activation (V2) and PERCLOS (V4) are the best fatigue indicators, especially in discriminating high fatigue. The relation of PERCLOS with late stages of fatigue is well established in the literature. Regarding stress, mean RR (F1), LF/HF ratio (F6) and the ratio of heart rate to respiration rate (F15) are the best indicators among the physiological features, but still their discrimination power is not so high. A possible explanation for this, may be the low impact of stress on the physiological signals compared to that of circadian rhythm. From video features, the standard deviation of head movement (V3) was the best stress indicator. Still, since the head movement is a behavioural parameter, the correlation of this feature with stress is expected to vary significantly between individuals. Environmental conditions were expected to be rather correlated with stress levels. From the examined driving environment variables, only weather conditions did have a contribution to stress classification.
It should be noted that in this work a single subject is monitored during ordinary work days, without any restrictions related to sleep hours or external stimuli. The driver experiences a number of different conditions, both from the physiological aspect as well as from the environmental point of view. We therefore consider that this study truly depicts the actual physiological status of the particular subject during driving. This work indicated also that the estimation of low fatigue, as a predecessor of higher fatigue levels (e.g., drowsiness) is plausible. A future work could focus on the identification of even earlier stages, which can hardly be selfrecognized. In such a case the presence of external human factor experts and/or accurate performance measures are a prerequisite for annotation purposes.

Conclusions
We presented a methodology for simultaneous detection of driver's fatigue and stress levels. Our methodology employs three types of information: (i) physiological features, (ii) video features from driver's face monitoring, and (iii) driving environment information. Our methodology proved to provide very good results (i.e., accuracy 88% for fatigue and 86% for stress). Fatigue can be estimated with high accuracy (85%) using only physiological features. This is not the case for stress, where the incorporation of additional information increases the accuracy by 8%. Especially in high fatigue detection, which is more related to driving impairment, the obtained sensitivity is 99% and the specificity 92%, which indicate very good identification. A study on the impact of fatigue on driving performance confirms that the detection of driver state achieved with our methodology can contribute to early detection of driving impairment.