A Driver Face Monitoring System for Fatigue and Distraction Detection

Driver face monitoring system is a real-time system that can detect driver fatigue and distraction using machine vision approaches. In this paper


Introduction
Improvement of public safety and the reduction of accidents are of the important goals of the Intelligent Transportation Systems (ITS). One of the most important factors in accidents, especially on rural roads, is the driver fatigue and monotony. Fatigue reduces driver perceptions and decision making capability to control the vehicle. Researches show that usually the driver is fatigued aer 1 hour of driving. In the aernoon early hours, aer eating lunch and at midnight, driver fatigue and drowsiness is much more than other times. In addition, drinking alcohol, drug addiction, and using hypnotic medicines can lead to loss of consciousness [1,2].
In different countries, different statistics were reported about accidents that happened due to driver fatigue and distraction. Generally, the main reason of about 20% of the crashes and 30% of fatal crashes is the driver drowsiness and lack of concentration. In single-vehicle crashes (accidents in which only one vehicle is damaged) or crashes involving heavy vehicles, up to 50% of accidents are related to driver hypovigilance [1,[3][4][5]. According to the current studies, it is expected that the amount of crashes will be reduced by 10%-20% using driver face monitoring systems [6].
e driver face monitoring system is a real-time system that investigates the driver physical and mental condition based on the processing of driver face images. e driver state can be estimated from the eye closure, eyelid distance, blinking, gaze direction, yawning, and head rotation. is system will alarm in the hypovigilance states including fatigue and distraction. e major parts of the driver face monitoring system are (1) imaging, (2) hardware platform, and (3) the intelligent soware.
In the driver face monitoring systems, two main challenges can be considered: (1) "how to measure the fatigue?" and (2) "how to measure the concentration?". ese problems are the main challenges of a driver face monitoring system. e �rst challenge is how to de�ne fatigue exactly and how to measure it. Despite the progress of science in physiology and psychology, there is still no precise de�nition for fatigue. Certainly, due to the lack of precise de�nition of fatigue, there is not any measurable criterion or tool [3]. However, a precise de�nition for fatigue is not de�ned yet, but there is a relationship between fatigue and some symptoms including body temperature, electrical resistance of skin, eye movement, breathing rate, heart rate, and brain activity [2,3,7,8]. One of the �rst and most important symptoms of fatigue appears in the eye. ere is a very close relationship between Psychomotor Vigilance Task (PVT) and the percentage of eyelid closure over time (PERCLOS). PVT shows the response speed of a person to a visual stimulation. erefore, almost in all driver face monitoring systems, eye closure detection is the �rst symptom used to measure fatigue. e second challenge is measuring the driver attention to the road. e driver attention can be partly estimated from the driver head and gaze direction. e main problem is that if the head is forward and looking toward the road, the driver does not necessarily pay attention to the road. In other words, looking toward the road is not paying attention to it [3].
In this paper, a new driver face monitoring system is proposed which extracts the hypovigilance symptoms from driver face and eye adaptively. en, the symptoms are analyzed by a fuzzy expert system to determine the driver state. e remainder of paper is organized as follow. In Section 2, some previous researches are reviewed. e proposed system is described with details in Section 3. In Section 4, the experimental results and discussions are presented. Section 5 is related to the conclusions.

Previous Works
e driver face monitoring systems can be divided into two general categories. In one category, driver fatigue and distraction is detected only by processing of eye region. ere are many researches based on this approach. e main reason of this large amount of researches is that the main symptoms of fatigue and distraction appear in the driver eyes. Moreover, the processing of the eye region instead of the processing of the face region has less computational complexity. In the other category, the symptoms of fatigue and distraction are detected not only from eyes, but also from other regions of the face and head. In this approach, in addition to processing of eye region, other symptoms including yawning and head nodding are also extracted. Driver face monitoring system includes some main parts: (1) face detection, (2) eye detection, (3) face tracking, (4) symptom extraction, and (5) driver state estimation. ese main parts are reviewed in different systems in the current section.
In the most of driver face monitoring systems, the face detection is the �rst part of the image processing operations. Face detection methods can be divided into two general categories [9]: (1) feature-based and (2) learning-based methods.
In the feature-based methods, the assumption is that the face in the image can be detected based on applying heuristic rules on features. ese methods are usually used for detecting one face in the image. Color-based face recognition is one of the fast and common methods. In these methods, the face is detected based on the color of skin and the shape of face. Color-based face detection may be applied on different color-space including RGB [10,11], YCbCr [12], or HIS [13]. In noisy images or in the images with low illuminations, these algorithms have low accuracy.
Learning-based face detection uses statistical learning methods and training samples to learn the discriminative features. ese methods bene�t from statistical models and machine learning algorithms. Generally, learning-based methods have less error rates for face detection, but these methods usually have more computational complexity. Viola and Jones [14] presented an algorithm for object detection, which is very fast and robust. is algorithm was used in [15][16][17] for face detection.
Almost in all driver face monitoring systems, because of the importance of symptoms related to eye, the eye region is always processed for extracting the symptoms. erefore, before the processing of eye region, eye detection is required. Eye detection methods can be divided into three general categories: (1) methods based on the imaging in the infrared spectrum, (2) feature-based methods, and (3) other methods.
One of the fast and relatively accurate methods for eye detection is the method based on the imaging in the infrared (IR) spectrum. In this method, physiological and optical properties of the eye in the IR spectrum are used. e eye pupil re�ects IR beams, and it seems as a bright spot when the angle of IR source and imaging device are suitable. According to this interesting property, pupil and eye are detected. e systems proposed in [4,[18][19][20] used such method for eye detection.
Feature-based eye detection approach includes various methods. Image binarization [5,21,22] and projection [23,24] are two feature-based eye detection methods which assume that the eye is darker than the face skin. Usually, more complicated processing is needed to detect the proper location of eyes, because these methods are simple and have high error rate.
ere are few methods for eye detection based on other approaches which were used in driver face monitoring systems. In [10], a geometrical face model with some featurebased methods was used to detect eyes. In addition, some systems such as [15] used hybrid methods for eye detection. In [15], elliptical gray-level template matching and IR imaging system were used for eye detection in day and night, respectively.
Usually, the entire image is searched for detecting the face/eye. Searching the entire image increases the computational complexity of the system. erefore, usually aer early detection of the face/eyes, in the next frames, face/eye tracking is performed. In the most of driver face monitoring systems, �alman �lter [4,19,25] or extended versions of �alman �lter such as Unscented �alman Filter (U�F) [23] were used. However, in some researches, search window [18] and particle �lter (PF) [26] were used for tracking.
In the driver face monitoring systems, useful symptoms for fatigue and distraction detection can be divided into three general categories: (i) symptoms related to the eye region; (ii) symptoms related to the mouth region; (iii) symptoms related to the head.
Eye is the most important area of the face where the symptoms of fatigue and distraction appear in it. erefore, many of the driver face monitoring systems detect driver fatigue and distraction only based on the symptoms extracted from the eyes. e symptoms related to eye region include PERCLOS [3,4,10,15], eyelid distance [25,27], eye blink speed [4,10], eye blink rate [4,19], and gaze direction [4].
Yawning is one of the hypovigilance symptoms related to the mouth region. is symptom was extracted by detecting the open mouth in [11,16]. ese systems detect the mouth based on the color features of the lips in the image.
Some fatigue and distraction symptoms are related to head. ese symptoms include head nodding [5,19] and head orientation [4,10,19]. Head nodding can be used for fatigue detection, and head orientation can be used for both fatigue and distraction detection. Driver nodding and lack of driver attention to the road can be detected by estimating the angle of head direction.
Aer symptom extraction, the driver state has to be determined. e determination of the driver state is considered as a classi�cation problem. e simplest method for detecting the driver fatigue or distraction is based on applying a threshold on extracted symptom [22].
Another method for determining the driver state is knowledge-based approaches. In a knowledge-based approach, decision making about the driver fatigue and distraction is based on the knowledge of an expert which the knowledge usually appears in the form of if-then rules. In [19,25], fuzzy expert systems were used as knowledge-based approach for estimating the driver state.
More complicated approaches such as Bayesian network [4] and nave dynamic Bayesian network [26] were used for driver state determination. ese approaches are usually more accurate than threshold-based and knowledge-based approaches; however, they are more complicated.

The Proposed System
e proposed system is a driver face monitoring system that can detect driver hypovigilance (both fatigue and distraction) by processing of eye and face regions. Flowchart of our system is shown in Figure 1. Aer image acquisition, face detection is the �rst stage of processing. en, symptoms of hypovigilance are extracted from face image. However, an explicit eye detection stage is not used to determine the eye in the face, but some of important symptoms related to eye region (top-half segment of the face) are extracted. Additionally, a template matching method is used for detecting the head rotation. Finally, we used a fuzzy expert system to estimate driver hypo-vigilance. Performing the face detection algorithm for all frames is computationally complex. erefore, aer face detection in the �rst frame, face tracking algorithms are used to track driver face in the next frames unless the face is lost. erefore, we use an auxiliary variable denoted by sw for determination of face tracking status in Figure 1. If sw is 0, the face is lost, and face detection algorithm must be performed to localize the driver face. In contrast, if sw is 1, it shows that face is tracked successfully by face tracking method. For system initialization, sw is 0. It means that the system must perform face detection algorithm for �rst frame.
We used Haar-like features and adaptive boosting method proposed by Viola and Jones [14] for face detection. Face detection algorithm was trained by about 3000 faces and about 300000 nonfaces. For face tracking, full search method is used to �nd the driver face image in the new frame. e search region is around the center of face image in the last frame which the size of search region is changed according to the size of face image (1.5 times bigger than the size of face image). en, correlation coefficient between the face image and the subwindows of search region is used as the matching criteria.
3.1. e Symptom Extraction. In the proposed system, two types of symptoms are extracted: (1) the symptoms related to eye region and (2) the symptom related to face region. e symptoms related to eye region are PERCLOS, eyelid distance changes with respect to the normal eyelid distance (ELDC), and eye closure rate (CLOSNO). e symptom related to face region is head rotation (ROT).

e Symptoms Related to Eye
Region. e proposed system uses horizontal projection in top-half segment of face image to extract symptoms of driver hypovigilance. Our proposed method uses a spatiotemporal approach without explicit eye detection for feature extraction which is not very sensitive to illumination, skin color, and wearing glasses, because it is an adaptive method. is method is based on changing the horizontal projection of top-half segment of face image during time. Horizontal projection in image is computed by (1) Length of HP is equal to the height of . In our proposed system, only horizontal projection of top-half segment of face image is used, so the length of horizontal projection will be equal to half height of driver face image. Before extracting the symptoms related to eye region, system needs to be trained. Because of different eyelid behavior in different individuals, estimating driver vigilance level based on absolute values is not suitable for robustness of driver face monitoring systems. erefore, for developing a robust and adaptive system, normal values of the vigilance symptoms must be estimated by training phase. In our proposed method, "training" has a little different de�nition in comparison with general machine learning systems. In the proposed method, training means extracting normal value of vigilance symptoms of driver. erefore, training phase is a short period of time that we assume that driver is fully aware and looking forward. In training phase, the normal values of PERCLOS, CLOSNO, and ELDC are calculated. Normal values of PERCLOS and CLOSNO are denoted by PERCLOS N and CLOSNO N , respectively. Because the eye is not detected explicitly, the eyelid distance and normal eyelid distance are estimated implicitly. e eyelid distance is estimated by the horizontal projection of top-half segments of face; therefore, the average horizontal projection of top-half segments of face is computed during training phase to estimate the normal eyelid distance.
Training duration is about 1-2 minutes. In the �rst 100 frames of training sequence, we suppose that driver eyes are usually open. So, horizontal projection of open-eyes can be estimated by computing average of horizontal projections of �rst 100 frames. Horizontal projection of open eyes was named HP , and it can be computed by (2). In (2), HP is the horizontal projection of frame and is 100. Consider Eye closure can be detected by computing the correlation of horizontal projection of current frame (HP ) and HP . e correlation of HP and HP is denoted by CHP . If CHP is larger than th CHP , eye is open in frame , otherwise, the eye is closed. Consider Aer computing the HP as horizontal projection of open eyes, a copy of HP is named as HP . HP will be updated during acquisition of new frames using fuzzy running average method [28], while HP is not updated. In fuzzy running average method, updating HP is dependent to the matching degree (correlation coefficient) of HP and HP . Fuzzy running average is shown in (4). In (4), represents the weighting factor and is calculated based on CHP as shown in (5). Consider In (5), min is a constant (0.8 in our system) and represents the minimum value of . According to (5), varies in range [0.8 ]. A higher updates HP slower. erefore, HP is updated during driving based on the changes of HP .
Eye closure state is saved in a circular list ( eye_closure ). If eye is open, the current element of eye_closure will be 1, else, the current element of eye_closure will be 0. When eye_closure is full, the oldest data is replaced by new data. Length of eye_closure ( ) must be equal to the number of training frames (about 1500-3000). eye_closure is helpful for computing PERCLOS and CLOSNO, but ELDC is computed using correlation of current horizontal projection (HP ) and HP . HP shows the eyelid distance of driver in normal state implicitly.
PERCLOS shows the percentage of eye closure during last frames computed by CLOSNO shows eye blink rate (frequency) in a given duration. If eye_closure is the �rst derivation of eye_closure , CLOSNO can be computed based on eye_closure . According to (7), eye_closure indicates the start and stop frames of eye closure events by +1 and −1, respectively, and other elements of eye_closure are zero. erefore, CLOSNO is computed by In (9), Sigm is the sigmoid function, and and are the parameters of sigmoid function. and show the slope and displacement of sigmoid function respectively. General form of sigmoid function is shown in In the proposed system, = 5 and = 0.5. Because the range of sigmoid function is [0 ], ELDC is always in range [0 ]. If ELDC is near to zero, distance of eyelids is normal, but if ELDC approaches to one, distance of eyelids approaches to zero (eye is closed).

e Symptom Related to Face Region.
Head rotation is a symptom of distraction which is extracted from face region in the proposed system. e head rotation is estimated based on the changes of face image with respect to the frontal face template. In order to compute the frontal face template, we assume that the driver face is in frontal mode during the �rst 1�� frames. e average face image during these frames is computed as frontal face template. en, the absolute difference of face image in the current frame and the frontal face template is named Face . erefore, the head rotation (ROT) is estimated by ROT changes in range [0, 1]. When Face is near to zero, the ROT is near to zero too, and when Face is near to one, ROT is near to 1. Greater ROT value indicates more head rotation. e proposed method for head rotation estimation cannot determine the angle of rotation.

Fatigue and Distraction Detection.
In the proposed system, driver fatigue and distraction detection is estimated using a fuzzy expert system ( Figure 2). A fuzzy expert system is an expert system that uses fuzzy logic instead of Boolean logic. In other words, a fuzzy expert system is a collection of membership functions, inference engine, and rules that are used to reason about inputs and generate proper outputs. At �rst, a fuzzy expert system fuzzi�es crisp inputs by prede�ned membership functions to generate fuzzy inputs. en, fuzzy inputs are processed by an inference engine. In inference engine, the truth value for each rule of rule-base is computed using a fuzzy implication method (usually Mamdani or Larsen methods) and applied to the conclusion part of each rule. ese results are assigned to each output variable for each rule as a fuzzy subset. en, all of the fuzzy subsets assigned to each output variable are combined together to form a single fuzzy subset for each output variable. Finally, the fuzzy subset of each output variable is defuzzi�ed to generate the crisp output.
e proposed fuzzy expert system processes four inputs and generates two outputs. e inputs are (1) PERCLOS, (2) ELDC, (3) CLOSNO, and (4) ROT, and outputs are (1) fatigue estimation and (2) distraction estimation. In order to build a fuzzy expert system, Mamdani fuzzy inference method (also called min-max method) is applied on a set of fuzzy rules. e fuzzy rules are shown in Tables 1  and 2. ese rules are extracted by an expert. However, these rules are not very complicated, and they are clear to understand.
e fuzzy membership functions of the inputs are depicted in Figures 3-6. According to Figures 3 and 4, the membership function of PERCLOS and CLOSNO is de�ned based on the PERCLOS N and CLOSNO N , respectively. Additionally, ELDC and ROT are two symptoms that were normalized during the computation, and they always vary in range [0, 1] (Figures 5 and 6). erefore, the de�ned membership functions for the inputs are fully adaptive and normalized. e membership functions for the outputs are singleton and are depicted in Figures 7 and 8  of fuzzy subsets for each membership function is 3. A larger number of fuzzy subset leads to de�ne more rules in rulebase, and this issue makes the system more complicated. In contrast, a smaller number of fuzzy subset leads to decrease the accuracy of driver state estimation. e defuzzi�cation method in the proposed method is Center Of Gravity (COG). is method is the most familiar and useful method for defuzzi�cation.

Experimental Results
e proposed system was tested on 27 sequences which lasted about 76 minutes. e sequences were captured in both laboratory conditions (indoor) and real conditions (in vehicle) from 5 different individuals using a digital camera.
ere is no tool for measuring the fatigue and distraction; therefore, objective evaluation is not possible for evaluating the proposed system directly. In this section, the proposed methods for extracting the symptoms are evaluated at �rst, and then an example sequence is investigated to evaluate the system subjectively.

Experiments on Symptom
Extraction. e accuracy of computing PERCLOS and CLOSNO is directly dependent to the accuracy of eye closure detection algorithm. erefore, we evaluate the eye closure detection algorithm in this section. Evaluation of eye closure detection is based on two criteria: false positive rate (FPR) and false negative rate (FNR). False positive error occurs when eye is open but the system detected it as closed eye. False negative error occurs when eye is closed but the system detected it as open eye. Table 3 shows FPR and FNR of the proposed algorithm for eye closure detection in different states. According to Table 3, the FNR of eye closure detection for drowsy state without glasses is greater than normal state without glasses. In drowsy state, the eyelid distance is reduced and blinking speed is slow. en, horizontal projection of consecutive frames in drowsy state changes slowly. erefore, many of eye closure events are not detected, and FNR in drowsy state is greater than normal state. But the FPR of eye closure detection in drowsy state is very low with respect to normal state.
According to Table 3, both FPR and FNR of eye closure detection for normal state with glasses are greater than normal state without glasses. In normal state with glasses, the re�ection of glasses may appear in the image as a bright spot near the eye. erefore, detection of changes of horizontal projection of top-half segment of face is difficult, and eye closure detection will have more error rate.
For investigating the accuracy of ELDC, we tested our method on 9-minute-long sequence. Figure 9 shows four sample frames of this sequence in which the driver is being drowsy aer 7 minutes. Figure 10 shows the measured ELDC for this sequence. According to Figure 10, the ELDC can indicate the driver drowsiness correctly.
Accuracy of the proposed method for head rotation detection is investigated by applying a threshold on ROT. If ROT is more than 0.3, the head rotation is detected. According to this experiment, FPR and FNR of 9.2% and 12.1% were achieved for head rotation detection, respectively. In Figure  11, some sample frames of a 2-minute-long sequence are shown in which driver rotated his head to different directions. In this �gure, (a) image shows the driver face without any rotation, and other images show head rotation of driver in different directions. e result of head rotation detection by the proposed method for the given video sequence is depicted in Figure 12.

Experiments on Driver State Estimation.
Evaluation of driver state estimation is a difficult task because there is not any criterion for measurement of fatigue and distraction. erefore, objective evaluation is not possible for driver state estimation.
In this section, the extracted symptoms from a sample sequence are plotted, and fatigue and distraction levels in the sequence are estimated by the proposed system. At this experiment, ten-minute-long sequence is used. e �rst minute of the sequence is used for training. According to the training phase, PERCLOS N is 0.02 and CLOSNO N is 13 times per minute. e curvature of PERCLOS, CLOSNO, ELDC, and ROT related to this sequence are plotted in Figures 13,14,15,and 16. e estimated levels of fatigue and distraction are shown in Figures 17 and 18. According to Figure 17, the driver has been semidistracted at about the 3rd minute. e estimated level of distraction seems true, because the CLOSNO was decreased with respect to the CLOSNO N during this time. In addition, the driver has been drowsy aer 7 minutes. e drowsiness state was estimated based on two symptoms: ( ) increasing the PERCLOS during the time from 7th to 8th minute and (2) increasing the ELDC aer 8 minutes. ese symptoms are depicted in Figures 13 and 15. 4.3. e Processing Speed. e proposed method was implemented in MATLAB R2008a and was tested on a personal computer with Intel Core2 Dou 2.66 GHz and 2 GB RAM memory. e processing speed of the proposed method is more than 5 frames per second. Over 85% of computational complexity of the system is related to face tracking.

Comparison with Other Methods.
In this section, we compare our system with other previous systems. Unfortunately, we cannot compare accuracy of different driver state estimation algorithms, because there is not any scienti�c and precise criterion to measure fatigue and distraction. erefore, we only compare the accuracy of different system for symptom extraction.
For eye closure detection, the proposed algorithm is compared with other algorithms presented in [10,19,21]. e results of comparison are depicted in Table 4. is table shows that the performance of our proposed method is very good in comparison to other methods, while the experimental setup of our system is more realistic, and we used longer video sequences for our experiments.
For head rotation detection, the proposed method is compared with the algorithm presented in [19]. Unfortunately, the accuracy of other methods for head rotation detection was not reported. For example, accuracy of the methods presented in [4,10] was not reported. In these papers, only the ability of system to measure head rotation in different direction and in a speci�c interval was reported.   Table 5 shows the comparison result of the proposed method and the method presented in [19]. e comparison result shows that our method achieves higher precision rate.

Conclusions
In this paper, a new adaptive method for symptom extraction and driver state estimation was proposed for driver hypovigilance detection. Two types of symptoms were considered: symptoms related to eye region (including PERC-LOS, ELSDC, and CLOSNO) and symptom related to face region (ROT). e proposed method extracts the symptoms related to eye region using horizontal projection of tophalf segment without explicit eye detection; the symptom related to face region is extracted based on face template matching. en, the normal value of the extracted symptoms     If PERCLOS is normal and ELDC is normal, then fatigue is normal If PERCLOS is normal and ELDC is high, then fatigue is semisleepy If PERCLOS is normal and ELDC is dangerous, then fatigue is sleepy If PERCLOS is high and ELDC is normal, then fatigue is semisleepy If PERCLOS is high and ELDC is high, then fatigue is semisleepy If PERCLOS is high and ELDC is dangerous, then fatigue is sleepy If PERCLOS is dangerous and ELDC is normal, then fatigue is sleepy If PERCLOS is dangerous and ELDC is high, then fatigue is sleepy If PERCLOS is dangerous and ELDC is dangerous, then fatigue is sleepy T 2: Rules of the proposed fuzzy expert system for estimation of driver distraction.
If CLOSNO is normal and ROT is normal, then distraction is normal If CLOSNO is low and PERCLOS is normal and ELDC is normal, then distraction is semidistracted If CLOSNO is dangerous and PERCLOS is normal and ELDC is normal, then distraction is distracted If ROT is high and PERCLOS is normal and ELDC is normal, then distraction is semidistracted If ROT is dangerous and PERCLOS is normal and ELDC is normal, then distraction is distracted is calculated during a short training phase. According to the normal value of the extracted features, an adaptive fuzzy expert system estimates the level of fatigue and distraction. e short training phase makes the system robust and adaptive. In other words, the proposed system may be used efficiently for different individuals with different face and eyelid behaviors. Experiments show that the accuracy of the proposed method for extracting the symptoms of driver fatigue and distraction is very good. Additionally, the system can estimate the driver fatigue and distraction very well by subjective evaluation.
e proposed method was also tested on video sequences captured in visible spectrum, but the color information was not used in any part of the system. In other words, the proposed system operates in gray-level visible spectrum. erefore, the system may operate in IR spectrum with a few changes. e main disadvantage of our system is the face tracking method which is inaccurate and very computationally complex. Adaptive �lters such as Kalman �lter may reduce the complexity and increase the processing speed and accuracy of the system.