Besides reduction of energy consumption, which implies alternate actuation and light construction, the main research domain in automobile development in the near future is dominated by driver assistance and natural driver-car communication. The ability of a car to understand natural speech and provide a human-like driver assistance system can be expected to be a factor decisive for market success on par with automatic driving systems. Emotional factors and affective states are thereby crucial for enhanced safety and comfort. This paper gives an extensive literature overview on work related to influence of emotions on driving safety and comfort, automatic recognition, control of emotions, and improvement of in-car interfaces by affect sensitive technology. Various use-case scenarios are outlined as possible applications for emotion-oriented technology in the vehicle. The possible acceptance of such future technology by drivers is assessed in a Wizard-Of-Oz user study, and feasibility of automatically recognising various driver states is demonstrated by an example system for monitoring driver attentiveness. Thereby an accuracy of 91.3% is reported for classifying in real-time whether the driver is attentive or distracted.
More than 100 years of history of the automobile are marked by milestones as the combustion engine and mechanical components followed by electrical and electronic device integration, increasing usage of control technique and software. Apart from reduction of fuel consumption, and thus alternative actuation and light weight construction, the main research interest in automobile development in the near future is dominated by driver assistance and natural, intuitive driver-car communication. This statement is supported by various EU-funded research projects such as PREVENT (
Emotional factors are decisive for enhanced safety and comfort while driving a car, as we will show in Section
Constantly increasing provision of speech technology as well as gaze detection and eye/head movement monitoring mark the beginning of more natural ways of human-machine interactions which are based on intuitive communication modalities. Recognition of emotion from vocal and facial expression, physiological measurement, and contextual knowledge will be the next key-factor driving improved naturalness in many fields of Human-Computer Interaction [
This paper will give an introduction to in-car affective computing with an extensive literature overview on studies and existing work in Section
This section gives an extensive literature overview on the topic of affective computing and the role of emotions and other driver states in the car. We investigate the influence of affective states on the driving performance in Section
The role that emotions and other mental states (such as fatigue) play while driving a car becomes evident when considering essential driver abilities and attributes that are affected by emotion: perception and organisation of memory [
Aggressiveness and anger are emotional states that extremely influence driving behaviour and increase the risk of causing an accident [
On the other hand, a too low level of activation (e.g., resulting from emotional states like sadness or fatigue) also leads to reduced attention as well as prolonged reaction time and therefore lowers driving performance. As stated by Yerkes and Dodson [
Another example for dangerous driver states is sleepiness, which affects all abilities that are important for driving, in a negative way. The fact that even when people recognise they are tired, they often force themselves not to take a rest but to go on driving, makes sleepiness a severe problem in today's car traffic [
As automobile driving itself can often be a source of stress, it seems obvious that stress is an affective state which is very likely to occur in a car. Driving closely behind other vehicles, changing traffic lanes during rush hour, receiving a phone call, getting to ones destination on time, and paying attention to traffic rules are only some of the tasks which partly have to be fulfilled simultaneously by the driver and therefore cause mental overload. A frequently experienced event is rush hour traffic congestion which is interpreted as stressful by almost every automobile driver and causes many people to use public transport and to dismiss the private car in urban areas. Similar to anger and aggressiveness, stress usually implies a high level of arousal which in turn leads to a lack of focus and attention and therefore lowers driving performance [
Confusion or irritation is a further state which can lead to a loss of self-control and control over the vehicle, increasing the probability of committing a traffic violation or even being involved in an accident [
Nervousness is an affective state that implies a level of arousal which is above the degree of activation that is best suited for the driving task. Reasonable decision-making as well as strategic planning and concentration are affected when being nervous. Reasons for nervousness are variable and can be related directly to the driving task (e.g., for novice drivers) or to other personal or physical circumstances. In [
Also negative emotions with a rather low level of arousal, like sadness or frustration can have perturbing effects on driving ability [
Apart from safety aspects, when thinking of the car as a “virtual companion”, the automatic recognition of sadness as an emotional state maybe one day enable the system to cheer up the driver and thus deliver also enhanced driving pleasure besides increased safety.
So far, we have pointed out the enormous effect that affective states and emotions have on driving abilities and listed the most dangerous affective states which prevent safe driving. However, the need for automatic in-car emotion recognition and driver state detection only becomes evident when examining adaptation or even “countersteering” strategies that can easily be implemented provided that the drivers emotion is determined accurately. The aim of affect recognition is to provide a kind of “state variable” which serves as input for subsequent processing in emotion-sensitive accessories, aiming to improve not only driving comfort but also safety [
To reduce the stress level of the driver, dialogue strategies can be adapted to the current workload [
A possible approach towards making, for example, an angry driver aware of the dangerous driving style, resulting from her or his increased level of arousal, would be to encourage better driving via voice response [
In [
To countersteer detected confusion, an emotion-sensitive system could provide help or more detailed explanations concerning the functionality which the driver is about to use. Complicated information or entertainment systems benefit from automatic guidance through menus. that could be triggered by the detection of irritation. As far as confusion or nervousness due to traffic situations is concerned, it was shown that particularly elderly people profit by the recognition of irritation and subsequent driving support [
Apart from trying to influence the driver's emotion in a positive way, adapting user interfaces to the user's affective state can also reduce the risk of accidents and potentially leads to higher driving pleasure. Experiments indicate that matching the in-car voice with the driver's state not only encourages users to communicate with the system, but also improves driving performance [
As an important step towards enhanced and reliable speech recognition, adaptation of speech recognition engines to the driver's current emotion is a technique that has prevailed in increasing the robustness of speech recognition systems [
In this context the design of emotion dependent speech dialogues for natural interaction with in-car systems will be an upcoming challenge. Besides speech technology improvements, new concepts regarding the interaction design with other input and output modalities are also relevant. Flat menu hierarchy, “one click” solution, user interfaces with seamless multimodality and usage of handwriting recognition (for almost blind text input on a touch display without having to look at buttons) are some examples.
As in many pattern recognition disciplines, the best emotion recognition results are reported for multimodal recognisers [
For the recognition of driver states like anger, irritation, or nervousness however, the audio channel was proven to be valuable [
Recognisers exploiting visual information have been applied for the detection of emotions like anger, sadness, happiness, disgust, fear, irritation, and surprise [
More extensive approaches (at least from the sensory point of view) to measure emotion also include physiology exploiting data from electromyograms, electrocardiograms, respiration, and electrodermal activity [
The use of driving style as a modality for emotion recognition is quite self-evident and less costly although not investigated very intensely so far. The fact that driving style and affective state are highly correlated was outlined in Section
Motion of the driver in her or his seat is another method to measure nervousness or activity [
Besides improvements in driving safety by monitoring the driver's emotional state, the upper class car of tomorrow will also be “socially competent”, that is, more human-like with respect to verbal and nonverbal communication and interaction skills and, possibly somewhat limited, understanding of nonverbal meaning and contextual information. The car can be expected to be able to interact with driver and passengers in a way quite natural to us as humans. It could be able to serve as a virtual companion or secretary and assist the driver in difficult situations. The car will likely be more like a real human codriver than like the touchscreen based interfaces found in today's cars [
A major problem arising along with the growing number and complexity of in-car entertainment and communication systems is the increased distraction of the driver caused by these systems. When changing your route while driving, for example, the display of your route guidance system will capture your visual and cognitive attention for some time. The same is true for changing the radio station or your music selection in the on-board entertainment system. If these distractions remain few, the driving safety is not affected notably. However, if more tasks are added, especially reading e-mails, retrieving background information about points of interest or communication with other people over the phone, driving safety will certainly suffer, if these systems do not change their way of interfacing with the user [
In this section we list reasons and show various sources that indicate a demand for in-car Human-Machine Interfaces to become more human-like in the near future. As mentioned in the previous paragraph, there is primarily the issue of improving driving safety. Section
This is an important factor in competitive markets. Users' demands for more technology in the car are obvious. The literature indicates that people experience more driving pleasure in newer, safer, and more comfortable cars, as proven, for example, in [
In the following section the demand and feasibility of enhancing users' driving pleasure and enabling the user to be more productive, while still focussing on driving, is discussed in more detail. Section
Users of in-car Human-Machine Interaction nowadays most often get frustrated, if a system does not understand their intentions and interface design is complex and nonintuitive. Numerous publications exist that try to improve the interfaces by optimising the amount of time users spend on data input and by re-structuring menus in order to access items more quickly, for example, [
More human-like systems should use multiple modalities (especially adding the visual channel) and thus be able to detect communication problems as quickly as possible, as pointed out by [
A “socially competent” car in the role of a virtual companion can engage the driver into conversation and thus will give the driver the feeling of not being alone [
Another way to improve the driving experience is personalisation [
The car may also offer a good personalised music selection, for example. Music is known to improve mood by directly affecting physical brain processes [
Modern upper class vehicles fine tune the engine sound perceived by the driver using a considerable number of hidden speakers throughout the passenger cabin. In some situations the driver might be in a bad mood and bothered by the disturbing sound of her or his engine. An emotionally sensitive car could sense the driver's mood and adjust the engine (the motor of the car) sound (especially as perceived inside the car) based on good guesses or learnt preferences.
Since time is precious, many drivers want to be able to use the time while driving to communicate with other people, access information like news and weather forecast or check reservations and bookings, for example. Today's in-car information systems in combination with mobile phones practically allow drivers to do all these tasks, however, most of these tasks cannot be done safely while driving. Interfaces are still designed in a traditional way like most other Human-Machine Interfaces, using a screen to display information combined with haptic input via buttons, knobs, and touch devices. Some systems use speech input in certain areas such as dialling of phone numbers. Yet, the driver has to spend some cognitive and visual effort on communicating with the system. He or she must learn to interact with the system—it is not the system that learns how to interact with the user. The latter, however, should be the case in a user-friendly, human-like system [
Most users, especially elderly people or people with less practice in interacting with computers, will experience problems properly using an in-car driver interface and thus will require more time to access and use various features [
This is exactly where it becomes obvious that a “socially competent” virtual companion would be indeed very helpful for single drivers, especially those unfamiliar with computer interfaces. As many cars are occupied only by one driver, as pointed out in Section
Besides being a helpful aid for single drivers, a “socially competent” virtual companion can also be of great help for drivers with passengers requiring special assistance. Children in the back seats can significantly distract the driver if they require too much of her or his attention. Such a virtual companion could detect such a situation and take some load off the driver by engaging the children in conversation or begin telling them stories or showing them cartoons, for example, via a rear seat entertainment system.
At this point it becomes most obvious that for a “socially competent” car it is also necessary to estimate the interest level of the conversation partner. If the entertainment system detects that the children are not interested in, for example, the film currently shown, it probably is time to change to something different in order to keep the children's attention. Also, the driver should not be bored by noninteresting information.
In Section
In order to design human-machine communication in future upper class cars more naturally and intuitive, the incorporation of innovative applications of pattern recognition and machine learning into in-car dialogue interfaces becomes more and more important. As discussed in the previous sections, emotion recognition is an essential precondition to create a social competent car that can talk to the driver and provide a “virtual companion”. In this section we discuss specific use-cases for emotion related technology in the car for both fields, namely the safety-related tasks of driver state monitoring and control of driver emotions, and the tasks related to enhancement of driving pleasure and productivity, such as multimodal and affect sensitive interfaces. We start our use-case overview by giving a brief summary of the state-of-the-art in in-car driver assistance and entertainment systems.
While affect aware technology is missing in today's automobiles due to the lack of user adaptable and autonomous, reliable technology, speech recognition has started to mature in the automobile market. The most obvious example is navigation systems where the destination selection can be performed via speech input. This speech recognition is based on templates which are stored during a training phase when the user adds a new destination and pronounces its name several times. More advanced systems, are based on subword modelling (phonemes) and include a universal acoustic model. They are thus able to recognise speech input without the need of recording several templates. Some minor voice adaptation might need to be performed in the same way as in modern dictation systems. These systems allow for a voice based command-like interface, where the user can change routes by command (“fast route”, “short route”), change the view, or have traffic information read out aloud. Entertainment systems can be controlled in a similar fashion by commands such as “change station”, “next song”, or even by pronouncing a song title or artist. Yet, these systems are restricted to a set of predefined commands and do not allow for flexible interaction. The user has to know the capabilities of the system, he has to know “what” he can say. Future systems, as proposed in the use-cases in the following sections, must be able to accept all input, filter out information they understand, associate it with available car functions, and “tell” the user what his options are.
For the safety-related tasks we present three different categories of use-cases, which are countersteering strategies, adaptation strategies, and communicating the driver's emotional state (e.g., anger/rage, fatigue, and high workload, stress, or uncertainty) to other vehicles.
This category contains use-cases which aim to “countersteer” negative affective states in order to guide the driver into a happy or neutral state which is known to be best suited for safe driving [
Adapting the personality of an automated in-car assistant to the mood of the driver can also be important. A badly synthesised voice or an overly friendly, notoriously the same voice is likely to annoy the driver which soon will lead to distraction. Therefore, as an important adaptation strategy, matching in-car voice with the driver's emotion is beneficial, as has been found in, for example, [
The third category consists of use-cases that describe how a driver's state can be communicated to others. Locating potentially dangerous drivers can aid the driver assistance systems in other vehicles to warn their drivers more timely. Methods of car-to-car communication for preventing road rage are developed by some automobile manufacturers, for example. Further applications include monitoring passengers—especially children—and other road users while driving, to reduce the driver's cognitive workload, logging the driver's emotion to derive statistics for research purposes, and automatically triggering emergency calls in case of accidents, severe pain or dangerous situations.
Similar to the safety-related applications of in-car emotion recognition, the use-cases related to driving pleasure can also be grouped into three different categories: enabling of a mood adequate human-machine dialogue, adaptation of surroundings, and increasing productivity.
Personalised and “socially competent” small-talk belongs to the first category and is a key feature of a “virtual companion”. Thereby emotion serves as contextual knowledge that indicates how the dialogue system has to interpret the output of the automatic speech recogniser (e.g., the use of irony may depend on the user's emotional state, also there seems to be a reduced vocabulary in highly emotional speech, such as short angry commands or comments). Such dialogues do not only depend on the current words uttered by the user, but depend also on contextual information like time of day or weather. Similar use-cases are adaptive topic suggestion and switching, dialogue grounding, and reactions to nonlinguistic vocalisations like moaning or sneezing. Further, multimedia content analysis methods enable the car to deliver information from the internet which suits the current interest and affective state of the driver (e.g., love poems if the driver is in love, or only allowing happy news if the driver is in a happy state). Observing the driver's workload also enables the car to adapt the level of entertainment to the current traffic situation. Incoming and outgoing calls can be managed by a “phone guide” who takes into account the affective state of both the driver and the conversational partner. The latter can be determined from speech while the system converses with the caller (i.e., asking for the caller's identification and purpose/importance of his call) before putting him through to the driver.
Depending on the driver's mood, the in-car ambience can be adjusted. This can be done by automatic selection of mood adequate music, for example. Moreover, engine sound, ambient light, and air conditioning can be adapted according to the driver's affect.
Finally, potential use-cases for a virtual codriver can be derived from the goal to increase the driver's productivity. Thereby calendar functions, handling of e-mails, internet access, and automatic translation are relevant as aspects that are likely to be welcomed by car buyers. However, the role affective computing takes in such technological advances is not fully researched, yet. Also increasing productivity on the other hand means higher workload for the driver, and thus reduced focus on the road leading to reduced safety. The aspect of increasing productivity thus should only be addressed if it can be ensured that these tasks do not in any major way keep the driver from his primary task of controlling the vehicle. This would be the case if the virtual codriver had a fully natural speech interface and the capability to robustly understand the driver's intentions from minimal input.
It is important to assess acceptance and success of any new technology as soon as possible to determine whether efforts in developing the technology are well spent. Since it is a well known issue that too much technology might irritate or confuse users or make them feel observed, we address these issues in a user study designed for in-car affective computing. The basic idea is to set up a car with a simulated virtual codriver in a Wizard-of-Oz experiment. Users are asked to perform several tasks in the simulation while being assisted by the virtual codriver. The users' experience with the system is determined via multiple questionnaires which are filled out after the experiment. The next section describes the setup and procedure of the Wizard-of-Oz (WoZ) experiment. Section
In order to create a realistic driving scenario in a safe and controllable environment, a driving simulator was used. It consists of half the body of a real BMW 5 series vehicle in front of a large screen (see Figure Input of navigation destination. Switching between three alternative options of navigation routes. Number dialling on the car phone. Viewing and editing calender entries.
Driving simulator and simulation software. (a) Setup of the driving simulator. (b) Lane-Change-Task (the signs indicate that a lane change is to be performed).
In order to simulate as much use-case customised support by the system as possible, the supervisor was able to fully remotely control the driver information system. The virtual in-car assistant's voice is simulated by the Wizard-of-Oz operator. Therefore, the operator's voice is recorded by a microphone in the control room, and after applying on-line effects, simultaneously played back via the car's centre speaker. Instructions are given to the test persons on what task to perform next (it was made clear to the subjects that these instructions did not belong to the virtual driver assistance system). The instructions have been prerecorded to ensure the same test conditions for all subjects. The following instructions were used for the tasks described in the following paragraphs. Drive straight with moderate velocity. Drive straight with high velocity. Enter Scan the calender for today's appointments. Call your office to inform them of your late arrival.
The experiment was a first-contact situation for the test subjects, and they did not receive instructions on the capabilities of the driver-information system. Subjects were asked to imagine that is was an ordinary Monday morning and they were starting their usual drive to work.
After taking a seat in the car, the driver is greeted by the car with a short dialogue. Thereby the user is asked whether he or she is driving the usual way to work, or he or she requires navigational assistance, and whether he or she would like to listen to music.
The driver is now asked to start driving a simulated test track with low speed to get used to the primary driving task in the simulator environment. This test track includes the Lane Change Task (see above). Next, the participant is asked to drive the track at a higher velocity, which induces a higher load due to the primary driving task. During this situation the following use-cases are simulated.
The operator instructs the subject to enter a navigation destination in parallel. Now, the system (in our WoZ case simulated by the operator) will detect decreased attentiveness in the primary task, ask the driver to pay more attention to his primary driving task via speech output, inactivate the display elements, and offer the user the option of speech-based input of the destination.
Next, a road congestion is simulated. The user is now instructed to inform his office of his delay via his cell phone. The dialling does not work, however, due to a simulated bad network. The reason is not immediately apparent to the user, who only realises that his call is not being connected. The system detects the induced confusion and offers to connect the call once the network is available again. The wizard was instructed to act once he recognised confusion because the user was hesitating or expressing his confusion verbally.
The subject is now instructed to scan his calendar for appointments. An appointment in a distant city is scheduled for the next day. A comment that a hotel room must be reserved is attached to the appointment. If the subject does not ask the system for available hotels by himself, after a defined timeout the system will ask the user if hotel information is to be obtained from the internet now. The system will guide the user through a hotel reservation process.
After finishing the hotel reservation, an incoming phone call is simulated. However, there is no way apparent to the user to answer the call. Again, the system will detect that the user is not answering the call and will ask for the reason while at the same time offering help, that is, to either ignore the call or to accept it.
Now the system initiates a dialogue, where it comments on the driver's busy day, and the bad weather, and asks the driver whether a different radio station would be preferred. Finally, an updated traffic report is received with the information that the congestion has not yet cleared. This report is automatically interpreted by the system, and the user is given the option to select three alternative routes from the system display, which will bring him directly to the location of his appointment, instead of his office.
All the use-cases described so far are fixed, and thus common for all subjects. In addition to these planned scenarios the operator was trained to react individually to the subjects responses and comments, adapt his output voice to the user's state (thereby changing his tone of voice to match the user's tone of voice in the current situation), and especially react to nonlinguistic behavior such as laughing, sighing, or hesitation, where it seems appropriate.
After finishing the experiment, every test subject was asked to fill out a questionnaire, which consists of four parts: The System-Usability-Scale (SUS) [
Thirteen subjects (twelve male and one female) took part in the experiment. The average age is
The analysis of the System Usability Scale (SUS) was performed with the method proposed in [
Table
Results for the System Usability Scale (SUS) and Subjectively Experienced Effort (SEA) scale for four selected tasks. SUS: maximum score is 100 (best usability), SEA: maximum score 120 (highest workload: worst), except for results marked with
Scenario | SUS | SEA |
[0–100] | [0–120] | |
Intelligent virtual agent support in stress situations | ||
Assisting confused drivers | ||
Smalltalk with virtual agent | ||
Adaption of agent speech to the driver's emotional state |
The SEA scale describes the subjectively experienced workload for each particular scenario. A high score (maximum value 120) indicates a high perceived workload. Thus, lower values indicate better performance with respect to reducing the driver's workload and keeping his or her focus on the road. For the first two scenarios, “stress”, and “confusion”, however, a modified scale was used, where a high value (again maximum of 120) indicates the subjectively perceived
Concluding, every scenario is evaluated positively on average. Both the SUS and the SEA scale show good results regarding the use of the system in spite of the prototypical system setup. The subjectively perceived workload decreased noticeably, if the car gave support to the test person (“stress”, and “confusion” scenarios on the SEA scale). This is a good basis for further development of such driver state-aware functionalities.
With Attrak-Diff a product is evaluated with respect to the following four dimensions. Pragmatic Quality (PQ): usability of the product. Hedonic Quality-Stimulation (HQ-S): support of needs in terms of novel interesting and stimulating functions. Hedonic Quality-Identity (HQ-I): identification with the product. Attractiveness (ATT): global value of the product based on the quality perception.
The so-called portfolio representation (the result in the 2-D space spanned by PQ and HQ) determines in which character-zone the product can be classified. For this study, the Attrak-Diff was evaluated for the entire system with all its new ideas and concepts. Individual features were not evaluated separately. The resulting portfolio presentation is shown in Figure
Results Attrakdiff-Portfolio Representation.
The system is rated as “rather desired”. However, the classification is neither clearly “pragmatic” nor “hedonic”, because the confidence interval overlaps into other character-zones. So there is room for improvement in terms of usability (PQ and HQ). The small confidence rectangle indicates a high agreement among the test subjects.
A questionnaire composed of eleven questions using a five point scale was used (“Strongly Agree” (value = 1) to “Strongly Disagree” (value = 5)) for the custom evaluation of the system as a whole. For each question, the mean value of the ratings and the standard deviation (
Results of system specific questionnaire composed of eleven questions. Mean (
Question | ||
---|---|---|
A talking car is reasonable | 1.4 | 0.4 |
I feel observed by a talking car | 3.8 | 1.1 |
I feel disturbed by a talking car | 3.5 | 0.3 |
I would rely on suggestions given by the car | 2.7 | 0.8 |
I feel the car helps to handle difficult driving situations | 1.7 | 0.7 |
A car should react to my emotion | 3.2 | 1.1 |
Starting self-initiated dialogs is desired | 2.8 | 1.3 |
Automatic evaluation of my stress level and emotional state is desired | 2.8 | 1.6 |
It is helpful if the car can support me in confusing situations | 1.3 | 0.2 |
I like the ability to request information from the internet via natural speech input | 1.5 | 0.8 |
It should be possible to mute the car's voice | 1.3 | 0.4 |
Nearly every test one thinks that a talking car is reasonable (mean 1.4,
Unclear are the results of the questions, whether a car should react to the driver's emotion (mean 3.2,
The last four questions show all positive results. The test subjects agreed on the fact that it would help, if the car was able to support the driver in confusing situations (mean 1.3,
Overall, the test persons stated that a talking car—as simulated via the Wizard-of-Oz—makes sense, and they do not feel observed or disturbed. This is an indicator for a good acceptance of such a product.
However, the driver wants to be the master of the situation and makes her or his own decisions, because not all test persons would rely on the car's suggestions, and high standard deviations were observed for the driver state monitoring questions. Virtually all subjects wish to have a functionality, which allows the user to mute the car's voice (mean 1.3,
The unclear results regarding the recognition of emotions and stress-states may also relate to this as well as the fact that these functions could not be implemented consistently enough in this experiment (Wizard-of-Oz). Such a consistent evaluation would require ways of reliably and reproducibly inducing emotional and stress-states, which is a very difficult task that can never be performed perfectly. Thus a very large number of subjects is required for these evaluations.
Driver inattention is one of the major factors in traffic accidents. The US National Highway Traffic Safety Administration estimates that in 25% of all crashes some form of inattention is involved [
In this section we show how reliably driver distraction can be detected using adequate machine learning techniques. The motivation for detecting whether a driver is distracted or not could be adaptive driver assistant systems, for example, lane keeping assistance systems. These systems track the lane markings in front of the vehicle and compute the time until the vehicle will cross the marking. If the driver does not show an intended lane change by using the indicator to signal the change, the systems will use directed steering torques on the steering wheel to guide the car to the middle of the lane.
One problem with lane keeping assistance systems is that they can be annoying in some circumstances [
Our system for online driver distraction detection is based on modeling long-range contextual information in driving and head tracking data. It applies Long Short-Term Memory (LSTM) recurrent neural networks [
In order to train and evaluate our system we used data that was recorded during an experiment in which drivers had to fulfil certain “distracting” tasks while driving. The resulting database consists of 32 participants (13 female and 19 male). The car (an Audi A6) was equipped with the “Audi Multimedia System” and an interface to measure CAN-Bus data. Additionally, a head tracking system was installed, which was able to measure head position and head rotation. Head-tracking systems are not common in vehicles today, but the promising research in such cameras for driver state detection will lead to a higher installation rate in serial cars in the near future. So we decided to use head-tracking information in our approach as well.
Eight typical tasks (performed haptically) on the Multimedia Interface were chosen as distraction conditions: adjusting the radio sound settings, skipping to a specific song, searching for a name in the phone book, searching for a nearby gas station, dialling a specific phone number, entering a city in the navigation device, switching the TV mode, adjusting the volume of navigation announcements.
The procedure for the experiment was as follows: after a training to become familiar with the car, each participant drove down the same road eight times while performing secondary tasks on the in-vehicle information system. On another two runs the drivers had to drive down the road with full attention on the roadway. In order to account for sequential effects, the order in which the conditions were presented was randomised for each participant. Overall, 53 runs while driving attentively and 314 runs while the drivers were distracted could be measured. The “attentive” runs lasted 3 134.6 seconds altogether, while 9 145.8 seconds of “distracted” driving were logged (see Table
Experimental conditions for driving data collection.
Experimental conditions | |
---|---|
Num. participants | 32 (13 f, 19 m) |
Age of participants | 29 to 59 |
Driving experience | |
Car | Audi A6 quattro 2.6 TDI |
Road | Ayinger Str. |
Num. “attentive” runs | 53 |
Num. “distracted” runs | 314 |
An analysis of the influence on lane keeping of the different in-vehicle information system tasks [
Six signals were chosen for a first analysis: steering wheel angle, throttle position, speed, heading angle, lateral deviation, head rotation.
Steering wheel angle, throttle position, and speed are direct indicators of the driver behavior. Many studies prove the fact that visually distracted drivers steer their car in a different way than do attentive drivers. The same applies for throttle use and speed (an overview can be found in [
The database collected as described above was split into a training, a validation, and a test set. For training we randomly chose 21 drivers. The validation set consists of three randomly chosen drivers, while the system was evaluated on the remaining eight drivers. Thus, our evaluations are completely driver independent, that is, the results indicate the performance of the system for a driver which is not known to the system (the system was not optimised for a specific driver's style). The training set consists of 35 baseline runs (i.e., runs during which the driver was attentive) and 146 runs during which the driver was distracted. The test set contains 13 baseline and 51 “distracted” runs.
We evaluated the performance for different numbers of memory blocks (70 to 150) in the hidden layer of the LSTM neural network. The number of memory blocks is correlated to the complexity of the network, that is, the number of parameters which are used to describe the relation between inputs and outputs (see, e.g., [
Table
# Memory blocks | Accuracy (%) | Recall (%) | Precision (%) | |
---|---|---|---|---|
70 | 87.2 | 78.8 | 85.6 | 82.1 |
90 | 89.0 | 82.6 | 87.0 | 84.8 |
100 | 91.1 | 88.7 | 88.0 | 88.4 |
110 | ||||
130 | 89.0 | 83.3 | 86.4 | 84.9 |
150 | 86.2 | 77.2 | 84.2 | 80.6 |
Classification of driver distraction (attentive vs. distracted) for
# Memory blocks | Correctly | Accuracy | Accuracy |
class. runs (%) | baseline runs (%) | runs with task (%) | |
70 | 88.4 | 62.6 | 98.0 |
90 | 90.9 | 72.3 | 97.8 |
100 | 92.9 | 84.5 | 96.0 |
110 | |||
130 | 90.7 | 74.1 | 96.9 |
150 | 87.8 | 61.8 | 97.6 |
By analysing the obtainable classification performance when using only single signals, we can get an impression of the relevance of the individual data streams. The best “single stream” performance can be obtained when using exclusively head rotation, followed by exclusive usage of steering wheel angle, heading angle, throttle position, speed, and lateral deviation, respectively.
Trying to get an impression of the accuracy of distraction detection when driver distraction is not caused by the Multimedia Interface, we tested the system on data that was recorded while the driver had to fulfil tasks like eating a chocolate bar or reading a letter. We found that the obtained F1-measure is only slightly worse for this scenario (83.2%).
Tables
Summarising all aspects discussed in the past sections, it becomes clear that emotions will be a key issue not only in general oncoming human-computer interaction, but also in the in-car communication.
As we have discussed, emotions affect many cognitive processes, highly relevant to driving, such as categorisation, goal generation, evaluation and decision-making, focus and attention, motivation and performance, intention, communication and learning. There is a need for controlling the driver's emotional state: the high relevance of an emotionally high valence was documented by a substantial body of literature—“happy drivers are the better drivers”. This control of the emotional state will thus ensure a safer and more pleasant driving experience. At the same time too high arousal may lead to aggressive driving behaviour. For optimal driving performance, a compromise between too high and too low arousal must therefore be found.
Apart from externally induced states of intoxication (alcohol, drugs, medication) or pain, we had found anger, aggressiveness, fatigue, stress, confusion, nervousness, sadness, and boredom as main negative emotions and mental driver states of interest, and happiness as positive factor.
As basic strategies to control emotion, countersteering emotions was found next to adapting car functionalities to driver emotion. The in-car driver interface can thereby influence users' emotional states in several ways. To provide only few examples, angry drivers could be calmed down and could be made aware of their state, fatigued drivers could be stopped from falling asleep by engagement in a discussion with control of potential boredom for topic-switching, and confused drivers could be offered assistance regarding the current traffic situation.
The growing complexity of in-car electronics demands for new interfaces that do not disturb the drivers' focus on the road or annoy the driver because they are so difficult to use. Natural, human-like interfaces that quickly and tolerantly comprehend drivers' intentions are the key. In Section
As an example for the feasibility of driver state recognition, we presented an automated system for detection of driver distraction, which can be implemented in a car with the technology available today. Using Long Short-Term Memory recurrent neural nets, it is possible to continuously predict the driver's state based on driving and head tracking data. The strategy is able to detect inattention with an accuracy of up to 91.3% not dependt of the driver, and can be seen as a basis for adaptive lane-keeping assistance.
The presented paper shows the need, the acceptance, the feasibility and doability of intelligent and affective in-car interfaces. Yet, substantially more work is required to develop products which can be manufactured in series and which are robust enough for the end-user market. In this respect, more usability studies with a broader range of users in even more realistic driving situations (e.g., “out in the wild”) are required. Further, implementations of actual prototype systems—instead of the presented Wizard-of-Oz approach—must be built and evaluated by drivers under realistic conditions. Therefore, before implementing such prototypes, more evaluations of, for example, the vocal and visual modalities are required with respect to robustness in the in-car environment and user acceptance.
Naturally, people talk, they talk different from today's command and control-oriented and in the near future oncoming rudimentary natural language-based in-car interaction, and engineers will have to listen [