A Method for Recognizing Fatigue Driving Based on Dempster-Shafer Theory and Fuzzy Neural Network

This study proposes a method based on Dempster-Shafer theory (DST) and fuzzy neural network (FNN) to improve the reliability of recognizing fatigue driving.Thismethodmeasures driving states usingmultifeature fusion. First, FNN is introduced to obtain the basic probability assignment (BPA) of each piece of evidence given the lack of a general solution to the definition of BPA function. Second, a modified algorithm that revises conflict evidence is proposed to reduce unreasonable fusion results when unreliable information exists. Finally, the recognition result is given according to the combination of revised evidence based on Dempster’s rule. Experiment results demonstrate that the recognition method proposed in this paper can obtain reasonable results with the combination of information given by multiple features. The proposed method can also effectively and accurately describe driving states.


Introduction
Fatigue is a common physiological phenomenon that reduces a driver's attention and ability to control the vehicle.Fatigue driving is one of the major causes of road accidents and poses a significant threat to the safety of drivers and passengers.Common methods for detecting fatigue driving include measurements of physiological features, facial features, and features of driving behavior [1].
Physiological signals related to fatigue consist of electroencephalogram (EEG), electrocardiogram (ECG), electromyography (EMG), and heart rate variability (HRV) [2][3][4][5].The results of fatigue recognition by physiological signals have high accuracy, but this approach has limitations.One particular limitation is the extraction of signals that are intrusive for drivers thereby making them uncomfortable.Noncontact measurements, such as extracting facial features and features of driving behavior, are practical and will not affect normal driving.Existing studies have shown that fatigued people exhibit certain visual changes in facial features, such as eye closure, movements in gaze direction, yawning, and head movements [6][7][8][9][10][11]. Fatigue may be also reflected in features of driving behavior, such as lane departure and steering wheel movements [12][13][14].However, using facial features to evaluate driving states is not always reliable because extraction of facial features may be affected by variation of illumination and driver's posture.Thus, image processing algorithms cannot ensure the accuracy of recognition results.The reliability of methods based on driving behavior is dependent on road and climate conditions, vehicle types, and driving habits.Thus, accurate evaluation of driving states is difficult to achieve with a single feature under a complex environment.Models based on information fusion have been developed to increase the accuracy of fatigue recognition [15][16][17].Lee and Chung [15] proposed a dynamic fatigue monitoring system based on Bayesian network.Deng et al. [17] proposed a fatigue monitoring method based on Dempster-Shafer theory (DST).However, the prior probability needed in Bayesian network is acquired according to experts' subjective experience.The definition of basic probability assignment (BPA) function in DST has no standard solution because of the nonlinear characteristics of fatigue features.
DST [18,19] is an improvement of Bayesian inference and an effective method for handling imprecise and uncertain information; DST has been widely used in information fusion.However, Dempster's rule of combination often obtains unreasonable fusion results when proofs are in conflict [20].Generally speaking, there are two main reasons to cause highly conflicting evidence [21].One is the questionable information reliability caused by environmental disturbance or instrument errors.To solve this problem, a series of improvement methods have been proposed.These methods are mostly focused on the modifications of combination rules and revisions of original evidence.The modifications of combination rules include Yager's rule [22], Qian et al. 's rule [23], Lefevre et al. 's rule [24], and Dezert-Smarandache theory (DSmT) [25].The revisions of original evidence consist of weighted average strategy [26,27] and discount strategy [28][29][30][31].Jiang et al. [32] used -numbers [33] to evaluate the fuzziness and reliability of the uncertainty in sensor data fusion.The other reason is that the given environment is in an open world, which means the frame of discernment is incomplete due to lack of knowledge [34].The information fusion environment of traditional DST is in a closed world, which means the frame of discernment consists of all the elements.Deng [21] proposed the generalized evidence theory (GET) to deal with uncertain information fusion in the open world.Jiang et al. [35] measured the weight of evidence based on Deng entropy [36] to handle conflict.
This study proposed a method based on DST and fuzzy neural network (FNN) [37] to recognize fatigue driving.Facial features were adopted as the fatigue parameters of the proposed method, which were extracted by cameras mounted in the vehicle.Given the self-learning and selfadaption abilities of FNN [38], this method was used to obtain the BPA of evidence.Under the driving environment, conflict information was mainly caused by environmental interferences and measurement errors.To address this issue, a modified algorithm is introduced into the method by taking the credibility of each piece of evidence into consideration.This algorithm revises original evidence based on discount strategy to improve the reliability of recognition result.
This study is organized as follows.Measurements of fatigue features are summarized in Section 2. Section 3 proposes a recognition method characterized by multifeature fusion.Section 4 discusses the experiments and results.Conclusions are given in Section 5.  Frames in the video that represent several expressions are shown in Figure 1.Kimura et al. [39] analyzed the feasibility of using facial expressions to judge the degree of fatigue and the consistency of results between the facial features analysis and ratings of facial expression videos.In this paper, we quantify the driver's states into three levels and give corresponding scores for each level: Awake (0 point), Fatigue (1 point), Severe fatigue (2 points).The evaluation criteria of driver's states are shown in Table 1.

Measurements of Fatigue Features
Under the vehicle interior environment, extracting features accurately and rapidly becomes difficult because of limitations in image processing algorithms and environmental interferences.To address this issue, this study considers visual features that can be effectively extracted and measured, like eye movement and mouth movement.

Facial Features Extraction.
In order to obtain the eye movement and the moth movement features, we need to locate eyes and mouth areas first.In this paper, CLM    (Constrained Local Model) [40] algorithm is adopted to localize facial landmarks.The facial feature points detected from several frames are shown in Figure 2.
Eyes and mouth areas can be extracted according to the coordinates of feature points, as are shown in Figure 3.By analyzing the relation between feature points, we can acquire some states evaluation indexes.
Detecting fatigue through eye features, we mainly focus on the analysis of eyelid and iris movement.Eyelid movement, reflected by eyes' opening and closing, is one of the most relevant and also the most visually significant fatigue features.We use changes in distance between upper and lower eyelids over a period of time to represent eyelid movement.Figure 4 shows the eyelid movement in a period of time.
As is shown in Figure 4,  is the width of time window and  is the average distance between eyelids.According to PERCLOS (Percentage of Eyelid Closure) [41], we take approximate 0.7 L as the threshold to determine whether the eye is opened, which is denoted with the broken line Iris movement is mainly reflected in the change of driver's gaze direction.To get iris movement features, we use Canny algorithm [42] and Hough transform [43] to locate iris region and get the center of iris, as is shown in Figure 5.
In Figure 5,   and   are left and right corner of eye at time .  is the center of iris at time , and  +1 is the center of iris at time  + 1.The width of time window is , and the pupil rest time is   (1 <  < ).Total number of frames within the time window is   .Features extracted according to iris movement are AAI (Average Asymmetry of Iris) and PRPT (Percentage of Pupil Rest Time).AAI and PRPT are calculated as follows: where | ⋅ | represents the distance between two points and  is the threshold of iris movement.
Yawning is also one of the most visually significant fatigue features.The width and height of mouth can be calculated according to the feature point coordinates in the mouth region, as is shown in Figure 6.And we use the aspect ratio to measure the mouth opening, defined as  =   /  .
Figure 7 shows the mouth movement reflected by changes in mouth opening. is the width of time window.When  > 0.7, it means that mouth is opened.Yawning can be considered as a process that mouth keeps opening more than three seconds.The number of yawning is denoted as   , and each mouth open duration is denoted as   ( = 1, 2, . . .,   ).
The fatigue indexes extracted from Figure 7 are YF (Yawning Frequency) and AOT (Average Opening Time).They can be calculated as follows:

Framework of Multifeature Fusion Recognition
The recognition method proposed in this paper combines the advantages of DST and FNN.A single FNN is divided into several subnetworks.Each subnetwork with nonlinear mapping capability is trained to process information from different features to conduct preliminary evaluation of driving states.The BPA function is defined by normalizing network output.The problem of highly conflicting evidence fusion can be solved efficiently by revising conflicting evidence using discount strategy.The degree of conflict is measured by the correlation coefficient of evidence to calculate the credibility of evidence, which is used as discount factor.Evidence fusion is based on DST. Figure 8 shows the framework of multifeature fusion fatigue recognition.
The value of () represents the degree of evidential support of exact set .If () > 0, subset  is called focal element.

Determination of BPA.
In multifeature fusion recognition, each feature is considered as a piece of evidence.Pieces of evidence are combined according to Dempster's rule and new evidence is obtained as basis for recognition.In practical applications, the definition of BPA function is based on the characteristics of data.
where   stands for driving state and (  ) corresponds to the th node of output.

Structure of FNN.
In order to get driving states from eyelid movement, iris movement, and mouth movement features, we design three FNNs with two inputs and three outputs to present signs of fatigue.As is shown in Table 1, driving states are classified into three levels: Awake, Fatigue, and Severe fatigue.So the FNN outputs corresponded to these three levels.For eyelid movement measurement, the inputs of FNN are PERCLOS and MCD.For iris movement measurement, the inputs of FNN are AAI and PRPT.For mouth movement measurement, the inputs of FNN are YF and AOT.Structure of FNN is shown in Figure 9.When a state is identified, the corresponding output node is set to 1; otherwise, it is set to 0. Normalizing the FNN output data over a period of time and BPA of each piece of evidence can be obtained by formula (6).FNN is composed of five layers, namely, input, fuzzy, fuzzy rules, normalized, and output layer.
The first layer is input layer.Each node in this layer represents an input variable   .
The second layer is fuzzy layer.Each node represents the value of a linguistic variable.The fuzzy layer calculates the subjection function of each input variable that belongs to a fuzzy set that corresponds to the value of a certain linguistic

Small Medium Large
Output, variable.For all the features defined in Section 2, their values increase along with the accumulation of fatigue.Therefore, we classify the input into three different linguistic variable values, namely, small, medium, and large.The subjection functions of different linguistic variables are assigned as follows: The subjection function of linguistic variable medium is described by a Gaussian function, where   and   represent the central value and width of the function that belongs to th fuzzy set of th input variable.The subjection functions of linguistic variables small and large are described by Sigmoid function, where   contributes to the right shift of the subjection function along the horizontal axis and   adjusts the shape of function. is the number of input variables and   is the fuzzy partitions of   .The number of nodes in this layer is  2 ,  2 = ∑  =1   .The outputs of these functions are normalized to range from 0 to 1.The curve of subjection function based on formula ( 7) is shown in Figure 10.The third layer is fuzzy rules layer.Each node represents one fuzzy rule.By calculating the subjection degree, the fitness of each rule can be defined as where  1 ∈ {1, 2, . . .,  1 },  2 ∈ {1, 2, . . .,  2 }, . . .,   ∈ {1, 2, . . .,   },  = 1, 2, . . ., ,  = ∏  =1   , and  is the number of nodes in this layer, which is equal to the number of fuzzy rules.
According to the facial expression of fatigue, take eyelid features as an example, the fuzzy rules can be described as follows: If PERCLOS is small, MCD is small, then output is Awake. . . .
The fuzzy rules of eyelid movement, iris movement, and mouth movement are shown in Tables 2, 3 and 4.
The fourth layer is normalized layer.The normalized calculation is defined as follows: Table 4: Fuzzy rules of mouth movement.
The fifth layer is output layer, which is also called defuzzification layer.Each node in this layer represents an output variable   .The defuzzification is defined as follows: where   stands for the weight of FNN, which can be adjusted through the learning algorithm, and  is the number of output variables.

Learning Algorithm.
The error of FNN is defined as follows: where   represents the actual output and   represents the expected output.
Error back propagation algorithm (BP algorithm) is used for network parameter adjustment to ensure that the actual output is close to the expected output.Based on BP algorithm, the weight of FNN   can be adjusted as follows: And the subjection function parameters   and   can be adjusted as follows: where  stands for the learning rate,  > 0.

Revision of Evidence.
Given the limitation of DST, unreasonable results are often obtained when combining highly conflicting evidence.Errors in the feature parameters extracted by camera are inevitable while driving because of environmental interferences.Therefore, the Dempster's rule cannot be used directly when conflict exists.
Evidence with low reliability should not be negated completely because the cause of the conflict remains unknown.Thus, in the modified algorithm of recognition method, the original pieces of evidence are revised through rational distribution of unreliable evidence before performing information fusion based on DST.
The discount strategy proposed by Shafer [19] is applied in this method.The credibility of evidence, which is regarded as the discount factor, is used to revise the BPAs of original evidence.Parts of the unreliable credibility are distributed to set Θ according to the discount rule.The influence on the result of fusion of conflicting evidence can be reduced by increasing the uncertainty of evidence.
Discount rule is defined as follows: The key of discount strategy is to measure the credibility of evidence effectively.Conflicting factor  in DST is used to measure the degree of conflict.The larger the value of , the higher the degree of conflict.However, when pieces of evidence are highly conflicting,  → 1.This result means that conflicting factor  cannot accurately represent the degree of conflict.
The correlation coefficient of evidence is proposed in this paper to measure the credibility of evidence.
Let  1 and  2 be two pieces of evidence on frame Θ; BPAs are  1 and  2 ; and focal elements are   and   .The correlation coefficient between  1 and  2 is defined as where sim ( 1 ,  2 ) ∈ [0,1].The larger the value of sim ( 1 ,  2 ), the higher the degree of correlation between pieces of evidence.
For  pieces of evidence provided by multiple features, the correlation coefficient can be expressed as the following correlation matrix: Pieces of evidence with high correlation coefficient can support each other.Therefore, the degree of one piece of evidence supported by others can be defined based on the correlation matrix as follows: where support degree sup (  ) represents the reliability of evidence.The evidence with the highest degree of support is used as standard evidence.The weight of each piece of evidence compared with the standard evidence can be calculated and considered as the credibility of evidence: The credibility of evidence   is taken as discount factor.The original pieces of evidence can then be revised according to the discount rule in formula (14).Thus, the reasonability of the results increases after combining pieces of evidence using Dempster's rule.

Recognition of Driving States.
Driving state is identified based on the BPA of combined evidence.Assuming that  1 ,  2 ⊂ Θ,  1 and  2 are defined as follows: If  1 satisfies the following recognition rule: where  1 and  2 are two thresholds, then  1 is regarded as the current driving state.In the method proposed in this study,  1 is set to 0.2 and  2 is set to 0.6.

Experiments
4.1.Data Collection.We carry out the experiment in real driving and capture the facial expression video.The experiment lasts two hours.The video resolution is 640 × 480 and the frame rate is 30 fps.The video is divided into several video segments with equal length.In this paper, we set the length of each segment and also the time interval of driving state prediction by FNN to be one minute.In order to get the standard of state assessment, we ask three people to rate these video segments based on the features in Table 1 and the driving states are confirmed if scores by three raters are the same.If raters have different opinions, the video needs to be reassessed.We extract 30 frames at equal intervals in every 10 seconds as test samples and size of all the frames is adjusted to 437 × 437.Fatigue features are extracted based on the methods in Section 2. The state evaluations by different features are obtained using FNN.In one video segment, BPAs are determined by normalizing the output of FNNs and the recognition result is obtained by the multifeature fusion framework based on Section 3. The process of videos and frames extraction is shown in Figure 11.Thus, we can compare the driver state assessment performance among the recognition results of multifeature fusion, single feature, and the assessment standard based on scores The comparison of two different methods is shown in Table 7.The correct rate by multifeature fusion recognition is increased comparing to single feature recognition.The accuracy of fatigue driving recognition is improved using both eye and mouth features based on the multifeature fusion framework.

Conclusions
This study proposed a method for recognizing fatigue driving recognition.This method is based on FNN and DST to address the complexity of fatigue information.The BPAs of multiple pieces of evidences based on different visual features are obtained by FNNs.DST is applied for fusion of evidence.A modified algorithm with discount strategy is also used for the revision of conflicting evidence to enhance the rationality of fusion results.This algorithm adopts the correlation coefficient of evidence to measure the degree of conflict.The credibility of evidence, which is measured by the correlation coefficient, is represented as discount factor for evidence revision.The results of simulations indicate that this recognition method can overcome the interference of unreliable information that originated from environmental interferences and measurement errors.Therefore, the proposed method can increase the accuracy and robustness of fatigue driving recognition.

Figure 1 :
Figure 1: Frames of facial expression video.

Figure 6 :
Figure 6: Measurement of width and height of the mouth.

Table 1 :
The evaluation criteria of driver's mental states.
3.1.DST.Let Θ = { 1 ,  2 , . . .,   } be a finite set of mutually exclusive and exhaustive proposition known as the frame of discernment.Power set 2 Θ is the set that includes all subsets of Θ. BPA is a function defined as  : 2 Θ → [0, 1], which satisfies the following condition:

Table 2 :
Fuzzy rules of eyelid movement.

Table 3 :
Fuzzy rules of iris movement.

Table 7 :
Comparison of different recognition results.