A Novel Fatigue Driving State Recognition and Warning Method Based on EEG and EOG Signals

Traffic accidents are easily caused by tired driving. If the fatigue state of the driver can be identified in time and a corresponding early warning can be provided, then the occurrence of traffic accidents could be avoided to a large extent. At present, the recognition of fatigue driving states is mostly based on recognition accuracy. Fatigue state is currently recognized by combining different features, such as facial expressions, electroencephalogram (EEG) signals, yawning, and the percentage of eyelid closure over the pupil over time (PERCLoS). The combination of these features increases the recognition time and lacks real-time performance. In addition, some features will increase error in the recognition result, such as yawning frequently with the onset of a cold or frequent blinking with dry eyes. On the premise of ensuring the recognition accuracy and improving the realistic feasibility and real-time recognition performance of fatigue driving states, a fast support vector machine (FSVM) algorithm based on EEGs and electrooculograms (EOGs) is proposed to recognize fatigue driving states. First, the collected EEG and EOG modal data are preprocessed. Second, multiple features are extracted from the preprocessed EEGs and EOGs. Finally, FSVM is used to classify and recognize the data features to obtain the recognition result of the fatigue state. Based on the recognition results, this paper designs a fatigue driving early warning system based on Internet of Things (IoT) technology. When the driver shows symptoms of fatigue, the system not only sends a warning signal to the driver but also informs other nearby vehicles using this system through IoT technology and manages the operation background.


Introduction
Fatigue is a very complex physical and psychological state that can be divided into mental fatigue and physical fatigue. In most cases, mental fatigue and physical fatigue are intertwined and appear at the same time. Mental fatigue is often caused by long-term cognitive activity in the brain. Under brain fatigue, people's cognitive function is limited, and their alertness is reduced. Drivers are prone to both mental and physical fatigue during long-term driving, but mental fatigue is the main problem. Fatigue driving is one of the major hidden dangers of road traffic safety. Research on fatigue driving recognition and early warning technology can reduce the frequency of traffic accidents [1]. Fatigue driving status recognition is a prerequisite for early warning, so fatigue driving status recognition is very important. At present, research on fatigue driving identification methods mainly focuses on three aspects: (1) identification based on driver behavior characteristics: the driver's fatigue state is judged by the recognition of the driver's behavior, such as the movement of the eyelids, the closed state of the eyes [2], and facial expressions [3]. e identification method is simple and easy to implement, but the scoring standard is easily affected by conditions such as personal behavior, light, and image acquisition angle. e collection of various modal data will inevitably be noisy, causing the recognition result to fail to correctly identify the driver's fatigue state. (2) Detection based on vehicle parameters: through the detection of vehicle parameters such as vehicle speed, vehicle position, and steering wheel rotation angle during driving, the driver's operating indicators are obtained, and then the degree of fatigue is judged. Since vehicle parameters are closely related to the actual driving quality of the driver, this method is closer to the actual driving situation. However, vehicle parameters need to be measured during actual operation, which increases the cost of the vehicle. (3) Recognition based on the physiological parameters of the driver: the driver's fatigue state can be judged by identifying the driver's physiological characteristics, such as with electrocardiograms [4], electroencephalograms [5,6], electrooculograms [7], and electromyography [8,9].
Since the EEG signal directly reflects the driver's brain activity and the price of EEG signal acquisition devices are declining, they are therefore convenient to use. erefore, identifying driving fatigue states based on the EEG signals is considered to be one of the most objective and accurate analysis methods. Reference [5] proposed a new real-time fatigue driving detection method based on EEG signals. e study combines two characteristics of power spectral density (PSD) and sample entropy (SampEn) to judge mental fatigue. e results show that the method is effective for fatigue detection because the prediction results of fatigue are consistent with the phenomena recorded in the simulated driving process. is is considered an objective measure of behavior. Reference [10] proposed a recurrent networkbased convolutional neural network (RN-CNN) method to detect fatigue driving. e data used in the experiment are the EEG signals collected during driving simulation. is method can achieve an average recognition accuracy of 92.95%. Reference [11] proposed the detection of the fatigue driving state based on the feature data of sample entropy, approximate entropy, and complexity, which can well identify four different mental fatigue states. Reference [12] uses five different entropies, the relative energy of the alpha wave, and (θ + α)/β as indicators for judging fatigue. e experimental results show that this fusion method can accurately judge the fatigue degree of the driver. Reference [13] uses the fast Fourier transform to extract four rhythm α, θ, β, δ features. By analyzing the trend and mutual relationship of these four features, it is found that using (θ + α)/β as the feature to assess the mental state is the most effective. Reference [14] studied sample entropy, fuzzy entropy, approximate entropy, and spectral entropy as the inputs of a decision tree. Experiments have shown that this method has an accuracy of 94% for the identification of fatigue driving, and it can identify fatigue driving more accurately. Typical EEG signal characteristic analysis methods are mainly divided into the time domain [15], frequency domain [16], and time-frequency domain analysis methods [17]. e EEG signal in the frequency domain has obvious characteristics and strong distinguishability. It is of great significance to the analysis of EEG signals. EEG signals in different frequency bands can effectively reflect people's mental state and excitement [18]. Reference [19] uses a convolutional neural network (CNN) to realize emotion recognition based on the time-frequency diagram of EEG signals obtained by a wavelet transform. However, the timefrequency diagram cannot effectively reflect the correlation of EEG signals between different electrodes. e above studies have shown that fatigue driving recognition based on EEGs is the most objective and accurate fatigue recognition method and is known as the "gold standard" of fatigue detection.
Fatigue driving state recognition based on EOGs mainly separates the horizontal and vertical EOG signals from the electrode signals of the forehead and extracts a series of features, such as gaze, blinking, and saccade, for driver fatigue state recognition. Reference [20] found that as driver fatigue increases, it will be accompanied by long-term blinking, which reflects the relationship between slow eye movement and driver fatigue. By extracting the eye movement features in the EOG signal, machine learning algorithms are used to identify the driver's fatigue state. Reference [21] detected fatigue driving by extracting the fatigue characteristics of blinking, slow eye movement, amplitude, and periodicity in the EOG signal, and the experimental results showed that the detection effect was effective. In summary, driver fatigue detection based on EOG signal characteristics is also feasible.
At present, most studies mainly focus on the fusion of multiple features and the application of integrated classifiers. e purpose of these studies is to maximize the accuracy of fatigue recognition. However, most studies ignore the realtime performance of fatigue driving recognition and early warning. In a real-life environment, the timely identification and early warning of fatigue driving are more meaningful. With the rapid development of modern industry, collection methods of EEG and EOG signals are becoming more advanced. e volume of collection equipment is becoming increasingly miniaturized and portable, their collection accuracy is increasing, and their production cost is decreasing. With the development of Internet of ings (IoT) technology in recent years, it is no longer difficult to collect driver EEG signals without interference. Based on the above background, this paper proposes a fast identification method of fatigue driving based on EEG and EOG. is method can collect the driver's EEG and EOG signals in a real environment and complete rapid identification and timely warning. e contents of this study can be divided as follows: (1) More objective EEG and EOG signal data are used as the identification data of the fatigue driving state. For EEG data, its PSD and differential entropy are extracted as feature data. For EOG, EOG features extracted based on independent component analysis (features_table_ica), EOG features extracted based on subtraction rules (features_table_minus), and EOG features extracted using both subtraction rules and principal component analysis (features_ta-ble_icav_minh) are used as feature data. e multifeature data of the two modalities can represent more comprehensive sample information.
(2) A fast SVM algorithm based on sample geometric features is proposed. For the case of nonlinear separability, the support vector in the high-dimensional space should also be on the edge of the positive and negative classes. Measured by distance, the support vector is composed of those sample points with larger distances of the same kind and smaller distances of different kinds. e key is to find such sample points. FSVM can greatly reduce the number of training samples and reduce the number of support vectors, resulting in a reduction in the training time of the model, and at the same time, the impact on sample classification accuracy is minimal. (3) Based on the results of fatigue driving status recognition and IoT technology, this paper designs an early warning system. e system can realize data collection, identification, and early warning. When a driver is detected to be fatigued, the system not only sends a warning signal to the driver but also informs other nearby vehicles using this system through the Internet of ings technology and manages the operation background.

Differential Entropy Feature Extraction.
e differential entropy feature is expanded on the basis of Shannon entropy. In 2013, differential entropy was used for the first time to characterize EEG characteristics. Compared with the traditional PSD, it shows superior performance [22]. e original definition of calculating differential entropy is as follows: (1) When a random variable follows the Gaussian distribution N(u, σ 2 ), the differential entropy can be simply calculated by the following formula:

PSD Feature Extraction.
PSD is used to characterize the change in signal power with changes in frequency. In practical applications, the average value of the signal value in a certain frequency band is generally regarded as the PSD of the frequency band, and the calculation formula is where X(x n ) is the discrete Fourier change value of segment n and X * (x n ) is the conjugate function of X(x n ). Table 1 gives the relevant introduction of each commonly used classification model. At present, the classification model of fatigue driving state recognition can be divided into machine learning [23,24] and deep learning [25,26]. e mathematical model of the machine learning algorithm is simple, and the algorithm time complexity is relatively low, but the recognition accuracy is not as good as that of the deep learning algorithm. e model of the deep learning algorithm is complex, there are many parameters that need to be adjusted, and the time complexity of the algorithm is high, but the recognition rate of the algorithm is high. In summary, the two types of classification models have their own characteristics and applicable scenarios. Since the scenarios used in this article do not belong to the category of large samples and fatigue driving recognition and early warning have high real-time requirements, the machine learning algorithm can fully meet the requirements. erefore, the classic SVM in machine learning is used as the basic algorithm to classify datasets.

Labeling of Fatigue Signal Labels.
e key to fatigue identification based on EOG is calculating the PERCLOS value. e PERCLOS value indicates the degree of closure of the eyelids per unit time.
e calculation formula is as follows: PERCLOS is marked as P, and the threshold of P is set to determine whether it is fatigued. When P < 0.35, it indicates that the driver is awake. When P > 0.35, it indicates that the driver is tired. According to different P values, two different states can be obtained. is type of research can be described as a binary classification task. e awake state is recorded as 0, and the fatigue state is recorded as 1. Table 2 shows the specific labeling method.

ZigBee Wireless Technology.
e ZigBee standard is a wireless ad hoc network standard suitable for wireless sensor networks proposed by the ZigBee Alliance in 2004. ZigBee chips usually integrate basebands, microcontrollers, and memory, and ZigBee can work in the frequency bands of 868 MHz, 915 MHz, and 2.4. e data transmission rate of the ZigBee network ranges from 20 to 900 kpbs. Each ZigBee network contains a coordinator, and the task of the coordinator allows the router to expand the communication range of the network. Since ZigBee nodes can wake up from sleep in 30 ms, which makes ZigBee's response delay far lower than other types of wireless technologies, ZigBee is very suitable for small data volume burst data transmission.
e system designed in this paper will not only alert drivers of fatigue but also transmit alerts to other vehicles. is requires the establishment of a suitable wireless network connection between the vehicles. Since the driver is not always in a state of fatigue, the exchange of data between vehicles will be intermittent, and there will be no continuous data exchange. Moreover, the amount of data carried by each fatigue alert is very small. Based on the above analysis, this article selects a ZigBee network suitable for sudden small data transmission. As a typical protocol for wireless sensor networks, ZigBee is suitable for the transmission of such data. erefore, this article selects ZigBee as the wireless network standard between vehicles.

Fatigue Driving Status Recognition and Early
Warning System In this study, a rapid fatigue state recognition and early warning system was designed. e architecture of the system is shown in Figure 1. e hardware system of the whole system mainly includes a Bluetooth headset and a vehicle-mounted terminal. e Bluetooth headset is mainly responsible for collecting EEG and EOG signals. rough the Bluetooth communication protocol, the data are transmitted to the vehicle terminal for storage, processing, and analysis. e fatigue state recognition module in the vehicle terminal is responsible for classifying the received EEG and EOG signals to determine whether the driver is tired. When the result of the identification is fatigue, a warning message will be issued to the driver, operation manager, and surrounding vehicles. A set of equipment can be installed on each vehicle, and the system includes a sending end and a receiving end.

Recognition Model.
is study uses a fast SVM model to classify feature data. For nonlinearly separable data, the support vector in the high-dimensional space should also be on the edge of the positive and negative classes. Measured by distance, the support vector is composed of those sample ere are three types of processing units in the network: input unit, output unit, and hidden unit. e input unit receives signals and data from the outside world. e output unit realizes the output of system processing results. A hidden unit is a unit that lies between an input and output unit and cannot be viewed from outside the system. ANN is a kind of nonprogrammed, adaptive, and brainstyle information processing mode, whose essence is to obtain a parallel and distributed information processing function through network transformation and dynamic behavior.
Advantages: ① it is a simple application; ② it has more accurate classification results; and ③ it has the ability to quickly search for optimization. Disadvantages: ① it easily enters the local optimum.
SVM e algorithm finds a dividing hyperplane that can correctly separate the two types of data on both sides to achieve the effect of data classification and prediction. is hyperplane is determined by the support vectors.
Advantages: ① the "curse of dimensionality" can be avoided; ② it has a known effective algorithm that can be used to find the global minimum of the objective function; ③ the generalization ability of the algorithm is good. Disadvantages: ① it is difficult to implement large-scale training samples; ② it has difficulty solving the multicategory problem; ③ it is sensitive to parameter and kernel function selection.

Random Forest (RF)
e forest is composed of many trees, so the result of RF depends on the decision result of multiple trees. is is an integrated learning idea. For example, there is a new animal in the forest, and the forest holds a forest meeting to determine what kind of animal it is. Every tree must express its opinions. e result with the most votes will be the final result.
Advantages: ① it can handle very high-dimensional (many features) data, and there is no need to perform feature selection; ② the training speed is fast, and it is easy to make a parallel method; ③ the implementation is relatively simple. Disadvantages: ① it is prone to overfitting; ② for data with attributes with different values, the attribute weights produced by RF on such data are unreliable.
AdaBoost e algorithm trains several individual learners with a certain combination strategy so that a strong learner can finally be formed to achieve the goal of more people and more power.
Advantage: ① under the framework of AdaBoost, various classification models can be used to build weak learners, which is very flexible; ② given its high precision, it can be applied to most classifiers without the need to adjust parameters. Disadvantages: ① unbalanced data leads to a decrease in classification accuracy; ② training is timeconsuming.

CNN
A method consisting of the following layered form: input layer: data entry Convolutional layer: for feature extraction Pooling layer: used to extract features again Hidden layer: the layer in the middle Fully connected layer: after vectorizing the extracted feature matrix, classify its features.
Advantages: it has a high classification accuracy rate. Disadvantages: ① parameters need to be adjusted; ② it needs large amount of data; ③ it requires a large amount of calculation. Journal of Healthcare Engineering points with a larger distance from the same class and a smaller distance from the heterogeneous group. e key is to find such sample points. First, the distance between any two points is described. e following mapping formula is used: e distance between any two points is defined in space R m : Assume sample point z k ∈ G + ; for any k ∈ I + , it corresponds to point ϕ(z k ) in the high-dimensional space.
A pair of distance values e parameter r represents the proportion of possible support vectors in the training sample set. ere are critical values c1 and c2 such that P d + k > c 1 � r and P d − k < c 2 � r. at is, we find those points ϕ(z k ) with a larger average distance from the positive point and a smaller average distance from the negative point. e support vector is the point z k that satisfies the condition ϕ(z k )|d + k > c 1 , d − k < c 2 , k ∈ I + }. As shown in Figure 2, the points in set T + 1 � ϕ(z k )|d + k > c 1 , k ∈ I + correspond to the points in the shape of ① in the figure. e point in T In the same way, for z k ∈ G − and any k ∈ I − , set T − can be found. en, the set formed by the support vector is e FSVM algorithm follows a certain principle of the distance between samples to extract support vectors, which are used as training samples for the SVM, and then the SVM is used for training. e implementation steps of the algorithm are 5 in total, and details of each step are shown below: Step 1. Set the scale parameter r(0 < r < 1).
Step 2. In the high-dimensional space, calculate the distance sents the block matrix formed by the distance between any two points in the positive and negative sets. D 12 (D 21 ) represents the block matrix formed by the distance between each point of the positive and negative clusters.
Step 3. Calculate the average matrix Step 4. Extract the support vector set according to the given ratio r. Sort the components in V 11 and V 22 in descending order. Sort the components in V 12 and V 21 in ascending order. Extract the top l + · r and l − · r samples after sorting to form a new training set T B D (|T B D | < l · r).
Step 5. e SVM algorithm is trained on T BD to obtain the final model.

Introduction to Simulation Data.
e SEED-VIG [27] dataset was used for experimental simulation, which is mainly composed of EEG and EOG signals. To collect the dataset in the real scene, the data collector used the Neuroscan system to collect the relevant signals of the driver in the simulated driving environment. e sampling frequency is 1 kHz, and 21 channels of data are collected in total. e electrode position during EOG acquisition is shown in Figure 3(a), and there are 4 channel electrodes in total. e electrodes set for the EEG signal are 6 channels in the temporal brain area and 11 channels in the back brain area. e specific electrode positions are shown in Figure 3(b). e placement of these positions meets the international 10-20 electrode distribution requirements. e collected initial EEG signal contains problems such as noise, which requires preprocessing, feature extraction, and smoothing operations on signals from different brain regions. e different brain areas mainly include the temporal lobe brain area, T area; the occipital brain area, P area; the prefrontal EEG, F area; and the brain area signal leads, bandwidth, and characteristic dimensions, which are shown in Table 3. First, the forehead EEG is separated from the forehead electrode signals, and the EEG signals of each brain area are divided into 5-band EEG signals. e features of EEG and EOG data are extracted, as shown in Table 4.

Experimental Setup.
During the experiment, the main contrastive algorithms used are BP [28], RF [29], SVM [30], and CNN [31]. Both SVM and FSVM use a radial basis kernel function, and the kernel function parameter is set to 0.001. e parameters of the other comparison algorithms are the same as those in the reference. e evaluation index of the model used is the recognition accuracy rate, and its calculation formula is as follows: e computer configuration information used in the experiment is 32 G of memory, an i7-11700F CPU, a Win10 operating system, and the MATLAB 2020a programming tool.

Fatigue Recognition Accuracy Rate Experiment.
e dataset is randomly divided into a training set and a test set at a ratio of 7 : 3. First, the EEG feature and EOG feature are classified separately using the classifier and then the two features are combined for classification. e experimental data are the mean value after running the algorithm 10 times. e experimental results are shown in Tables 5-7. e data in Table 5 show that, for most classification algorithms, the recognition rate based on DE features is slightly better than that based on PSD features. is shows that the evaluation effect based on the DE feature is better than that based on the PSD feature. Regarding the recognition accuracy index, regardless of whether it is based on PSD or DE features, the recognition rate of the CNN deep learning algorithm is significantly ahead of that of the machine learning algorithm. is shows that, in terms of recognition accuracy, the performance of deep learning algorithms is significantly better than that of machine  Journal of Healthcare Engineering learning algorithms. Among the machine learning algorithms, the SVM algorithm has the best recognition rate. is is also the reason why SVM is chosen as the basic algorithm. e recognition accuracy of FSVM is comparable to that of classic SVM. e data in Table 6 show that, for different classification algorithms, the recognition rate based on the features_ta-ble_icav_minh feature is the best.
is shows that the classification information carried by the EOG features extracted by both subtraction rules and principal component analysis is more abundant. Among the different classification algorithms, CNN has the highest recognition rate, which shows that the recognition effect of deep learning algorithms is indeed very good. e recognition rates of SVM and FSVM are similar, and the recognition rate of FSVM is slightly higher.
To explore the influence of multimodal data on the recognition accuracy, EEG and EOG signals were fused for experimental analysis. EEG uses differential entropy feature data, using the average of the three brain regions P, T, and F as the final experimental data. EOG uses the features_ta-ble_icav_minh feature with the best recognition effect as the input feature data. e recognition results of the fusion features of each algorithm are shown in Table 7.
e experimental data shown in Table 7 show that, in addition to the BP algorithm, the recognition accuracy of other algorithms based on fusion features is better than the recognition accuracy of a single feature. is shows that the fusion feature can effectively improve the fatigue recognition (     accuracy. e changes in the recognition accuracy of each algorithm based on different features are shown in Figure 4. It can be clearly seen from the figure that the recognition rate of CNN and FSVM based on fusion features has the largest increase, followed by SVM, and RF has the smallest increase. e recognition accuracy of the BP algorithm has declined to a certain extent. In summary, the use of fusion features has advantages in the recognition rate.

Fatigue Recognition Model Training Time Consumption Experiment.
e recognition model is trained based on the fusion features, and the time taken to train the model 10 times is averaged. e training time consumption details of each model are shown in Table 8. e data in Table 8 show that the time spent on machine learning algorithms is lower than the time spent on deep learning algorithms. For scenarios that require a quick response time, machine learning algorithms are more suitable. Among the many machine learning algorithms, the training time required for the FSVM model mentioned in this article is greatly reduced. Compared with the classic SVM model, the training time is reduced by 33.95%. Compared with the CNN, the training time of the FSVM model is only a quarter of it. In summary, the model proposed in this paper can not only ensure a better recognition rate but also reduce the training time for the model. erefore, it can fully meet the task of real-time fatigue identification and has good practical value.

Conclusion
e rapid identification and early warning of the fatigue driving state are the key to reducing traffic accidents. Quick and accurate fatigue identification is a prerequisite for effective early warning. is study is based on two-modal data of EEGs and EOGs to identify the fatigue driving state and extracts multiple features of the two-modal data for experimental analysis. Experimental data show that the fatigue state recognition accuracy of multimodal data fusion is higher. In the selection of classification models, deep learning algorithms have a leading advantage, and the recognition accuracy is higher than that of machine learning algorithms. However, considering the real-time requirements of fatigue state recognition tasks, this study proposes an FSVM algorithm that can quickly provide model training.
e FSVM algorithm greatly improves the training speed of the model without reducing the recognition accuracy and achieves the expected effect. On the other hand, based on fast and accurate recognition results, this article designed a set of early warning systems based on IoT technology to extend the early warning information from a single vehicle to the Internet of Vehicles. When the driver is in a fatigue state, the     Journal of Healthcare Engineering system can not only send a warning signal to the driver but also notify other nearby vehicles using this system and manage the operation background through IoT technology. Regarding the identification of fatigue status, the next step of this research will be to improve the accuracy of identification, and more modal data can be introduced for comprehensive decision-making. In an early warning system, when the vehicle speed is too high and the distance is too large, the signal between the vehicle and other vehicles is likely to be weak, and it is impossible to guarantee the successful warning of other vehicles. LoRa has the characteristics of long communication distance, low power consumption, and low cost, which may be able to solve the above problems. is is also the content of this study, which needs further research in the future.

Data Availability
e labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.