Classification of Physiology Indicators for the Automatic Detection of Potentially Hazardous Physiological States

In EU-funded project HUMABIO, physiological signals are used as biometrics for security purposes. Data are collected via electrode sensors that are attached to the body of the subject and are obtrusive to some degree. In order to maximize the obtained information and the benefits from the use of obtrusive, physiological sensors, the collected data are processed to also detect abnormal physiology states that may endanger the subjects and those around them during critical operations. Three abnormal states are studied: drug and alcohol consumption and sleep deprivation. For the classification of the physiology, four state-of-theart techniques were compared, support vector machines, fuzzy expert systems, neural networks, and Gaussian mixture models. The results reveal that there is significant potential on the automatic detection of potentially hazardous physiology states without the need for a human supervisor and that such a system could be included at installations such as nuclear factories to enhance safety by reducing the possibility of human operator related accidents.


Introduction
Sleep deprivation and drug and alcohol consumption prior and during work are three of the major causes of human operator-related accidents.Employment status was highly correlated with rates of illicit drug use in 2002.In the USA, an estimated 17.4% of unemployed adults were illicit drug users compared with 8.2% of those employed full time and 10.5% of those employed part time.Of the 16.6 million, illicit drug users aged 18 or older in 2002, and 12.4 million (74.6%) were employed either full or part time.Among the 51.1 million, adult binge drinkers in 2002, 40.8 million (80%) were employed either full or part time.Similarly, 12 million (79%) of the 15.2 million adult heavy drinkers were employed [1].Employees who abuse alcohol and other drugs are prone to abnormal behaviour at work, and studies have shown that substance-abusing employees function at about 67% of their capacity [1].Furthermore, research indicates that between 10 and 20 percent of the nation's workers who die on the job test positive for alcohol or other drugs.Up to 40% of industrial fatalities and 47% of industrial injuries can be linked to alcohol use and alcoholism [2].In fact, industries with the highest rates of drug use are the same as those at a high risk for occupational injuries, such as construction, mining, manufacturing, and wholesale [3].
Employees who use drugs are 3.6 times more likely to be involved in a workplace accident and five times more likely to file a workers' compensation claim [4], while an estimated 500 million workdays are lost annually due to alcoholism [4,5].
Urinalysis is the most common test type and used by federally mandated drug testing programs yet likely the least effective.The main disadvantages of urine-based drug test kits are the ease at which they can be "cheated" via simple adulteration or substitution, inability to detect current/onthe-job drug abuse, and with respect to SAMHSA-5, or NIDA-5, the inability to test for drugs used in current society [6].
Saliva (oral) drug test kits are noninvasive, and the results are available in minutes; however, they tend to still take longer and cost more than urine screening tests.This method is the best to determine recent use and is a potential indicator of impairment.A disadvantage of saliva-based drug testing is that it is not approved by SAMHSA for use with DOT/Federal Mandated Drug Testing.In addition, while oral fluid is not considered a biohazard unless there is visible blood, it should be treated with care [6].
Spray (sweat) drug test kits are noninvasive.The main disadvantage of spray-or sweat-based drug testing is the fact that they are open to contamination.Also, interpersonal large variations of sweat production rates render the results inconclusive in some cases [6].
A major shortcoming of current drug testing in relation to safety and accident prevention is that all of these techniques do not measure current intoxication estimation and reveal information about drug use that can have no impact on safety, productivity, or performance.Someone may test positive after taking a drug days, weeks or months before [7].
In addition to drugs and alcohol consumption, the loss or disruption of sleep may result to operator-related accidents and reduced productivity [8].The loss of even one night's sleep can lead to extreme short-term sleepiness.Sleeping less than four hours per night impairs performance, and the effects of sleep loss are cumulative and may lead to chronic sleepiness [9].
In HUMABIO [12], novel sensors developed in Integrated Project SENSATION [13] for the collection of physiology signals were used to develop a biometric security system that was based on these signals.These wireless sensors in their final form are to be integrated in clothing (e.g., hats) to reduce obtrusiveness.However, since they need to be attached to the body, some discomfort for the user is unavoidable.To increase the usage of the sensors beyond their security purpose, the measured physiology signals were also exploited to validate the subject's nominal capacity to work.In particular, the extracted features from the sensors were fused using modern multimodal classifications techniques, namely, support vector machines (SVMs), fuzzy expert systems (FESs), neural networks (NNs), and Gaussian mixture models (GMMs).It is shown that the efficient combination of the extracted signals enables a high classification accuracy, reduces the impact of noisy data, and improves the security and antispoofing properties of the system.Therefore, the novelty of the paper is twofold: firstly, the use of novel physiology signals with limited obtrusion and secondly, the development of a multimodal framework for their combination and efficient integration to estimate the physiological state of the subjects.
This aims to detect possible deficiencies (deriving from drug or alcohol consumption and sleep deprivation) through measurement of features that describe the person's internal physiology, and it should be applied to cases that involve critical operations that require the operator's full attention and readiness such as driving or air traffic controlling.The use of physiology markers and signals to estimate the state of an individual was researched extensively in the past; however, the main effort was towards the detection of sleep deprivation-related hypovigilance and the exploitation of the electroencephalogram (EEG) signal [14][15][16][17].In this paper, we examine the detection of two additional dangerous conditions and the fusion of many physiological signals other than the EEG.

Objectives Specification and Data Measurement Protocol
The aim is to exploit physiological measurements such as EEG, electrocardiogram (ECG), event-related potentials (ERP), and secondary measurements such as body sway to assess whether subjects are at nominal physiology state and should be trusted to perform their tasks adequately or not.The first step to carry out this study was to define a strict data collection protocol that provides measurements from "normal" and "abnormal" physiology conditions from several subjects.
The final protocol was a single-centre, randomized, and open clinical trial following a four-way cross-over design.It consisted of one-day screening visit and four four-day experimental sessions, each evaluating subjects in one of the following conditions according to a cross-over design: (i) normal condition, (ii) Midazolam administration: subjects are administered a single oral tablet of 7.5 mg of Midazolam (Dormicum) on Day 2 at 11:00 a.m.The tablet is administered with 240 mL of tap water, (iii) alcohol intake: subjects receive a single oral dose 0.5 g alcohol/kg on Day 2 at 10:30 a.m.The appropriate dose of alcohol (vodka 40%, adjusted for each subject's body weight) is mixed with orange juice to a volume of 300 mL, (iv) Partial sleep deprivation: subjects are allowed to sleep only 4 hours (from 3:00 a.m. to 7:00 a.m.) the night between Day 1 and Day 2.
Subjects are admitted at the centre on Day 1 at 12:00 under the conditions set by the protocol of each experimental session.On Day 1 and Day 2, they perform repeated recordings for authentication, validation, and monitoring at different time points (only validation protocol will be described in the present paper).Subjects are remained at the study centre until Day 3 noon, after review of the safety assessments.They are allowed to leave the centre if medical conditions reveal no clinically relevant abnormality.A period of at least seven days separates two consecutive admissions at the centre (Day 1) (Figure 1).
EEG and EOG recordings were performed at 256 Hz using 32 electrodes placed on the scalp of the subject, following the 10-20 system.28 of them were used for EEG and four for EOG (vertical eye movement and horizontal eye movement).One additional chest lead was used to record ECG and another one fixed on the scruff was used to record artefacts.These recordings were performed simultaneously to visual and auditory stimulations described as follows.
Subject is seated in front of a computer screen and keeps the eyes open during the first minute, eyes closed during the following minute and then eyes open for the remaining minute.In the eyes open period, subject is asked to stare at a spot on the screen.The spot can be maintained still or it can be moving in any other direction.Screen can be flashing at other moments.At the end of the first minute, the spot disappears, and written sentences appear for a few seconds on the screen as for example "I close my eyes".The subject is asked to read them and then to close his eyes.A sound occurs 50 seconds after, and at the end of this second minute, the subject is asked to open the eyes.At the end of the third minute, the spot disappears and written sentences appear for a few seconds on the screen as for example "I close my eyes".The subject will be asked to read them and then to close his eyes.The ERP test (which consists of brief sound emissions) is applied for two minutes.At the end of the test, the subject is asked to stand up.Then, the subject is asked to perform the body sway test (Figure 2).
As specified before, the fifth time recording can be 12:30 or 13:45 depending on the monitoring test.Before and after each monitoring test, cortical brain state (CBS) test was performed for calibration purposes.This test consists of 2min eyes open (EO), two-minute EC and again two-minute EO.Simultaneous EEG, EOG, and ECG recordings during this test and analysis of these signals were also performed.More specifically, synchrony method was used to analyze these signals.
In Table 1, the features that were extracted from the raw data measurements are reported.These features serve as inputs to the classification algorithms that are described in the following section.

Classification Algorithms
A fundamental reason for decision making based on multiple classifiers is the inherent limitation of the information in a single modality.The scope of a fusion algorithm is to combine efficiently the information of each modality aiming at a more accurate and reliable classification decision.Furthermore, the use of multimodal classifiers partly addresses the problem of noisy sensor data and reduces the risk from corrupted data values.Additionally, the use of multiple modalities reduces the risk of spoofing or cheating, since the subject should spoof all the employed modalities.
The detection of an abnormal physiological state is a two-class problem.The physiology measurement set can be classified as "normal" or "abnormal" when compared to a previously stored measurement set that corresponds to the subject's nominal state.
Four state-of-the-art fusion techniques were utilized, namely, support vector machines (SVMs), fuzzy expert systems (FESs), Gaussian mixture models (GMMs) and artificial neural networks (NNs).Each of these techniques follows a different philosophy for the fusion of the unimodal inputs in order to produce an overall estimation of the person's physiology state.
(i) A typical SVM implementation was developed [18].A radial basis kernel function was used to map the input data to a higher dimensional space in which they were linearly separable [19].The parameters of the SVM that were optimized during training were the parameter γ of the kernel, the penalty parameters for each of the two classes (normal and abnormal state) and the definition of the support vectors.The optimization of the above mentioned parameters was done via complete enumeration.Three different SVM models were trained, each one detecting a specific abnormal condition (drug consumption, alcohol consumption, and sleep deprivation).
(ii) A TSK FES [20] was developed as described in [21].The FES's premise space consisted of three inputs.Each premise input was segmented by three trapezoid membership functions leading to the creation of 27 threedimensional fuzzy rules.The premise inputs that were selected via extensive experimentation from the available ones (shown in 18, and 19 as premise yielded the best classification rates, and finally for sleep deprivation, the premise space of the FES was set by inputs, 1, 2, and 3.The FES estimation of the person's physiology condition is a synthesis (weighted average) of the 27 fuzzy rule outputs.Each fuzzy rule output is a linear function of all the parameters shown in Table 1 so as to include all of the available information about the person's physiology.The parameters of the linear output functions as well as the parameters that define the position of the fuzzy rules in the premise space (or in other words the segmentation of the premise inputs via the shape and positioning of the membership functions) are optimized by a real coded [21] genetic algorithm [22].
(iii) Bayesian classification and decision making is based on probability theory and the principle of choosing the most probable or the lowest risk (expected cost) option.The Gaussian distribution is usually quite good approximation for a class model shape in a suitably selected feature space.In a Gaussian distribution lies an assumption that the class model is truly a model of one basic class.However, if the actual model is multimodal, this model cannot capture coherently the underlying distribution.Gaussian mixture model (GMM) is a mixture of several Gaussian distributions and can, therefore, represent different subclasses within a class [23].The probability density function is defined as a weighted sum of Gaussians where α c is the weight of component c, 0 ≤ α c ≤ 1 and The parameter List θ = {α 1 , μ 1 , Σ 1 , . . ., α c , μ c , Σ c } defines a particular Gaussian mixture probability density function.
Estimation of the Gaussian mixture parameters for one class can be considered as unsupervised learning of the case where the samples are generated by individual components of the mixture distribution and without the knowledge of which sample was generated by which component.Three GMMs were developed, each corresponding to an abnormal state.The three GMMs comprised four mixture components.The weights of the components were estimated after extensive experimentation (a) Midazolam intake: α 1 = 0.2, α 2 = 0.36, α 3 = 0.17 (iv) The fourth algorithm was a three-layer feed-forward neural network (NN) [24].The layers consist of N input neurons, Y hidden neurons and one output neuron, where N is equal to the number of the measured physiology features (from Table 1 N = 19), and Y varies according to the abnormal state, and it is set through experimentation.The neurons are fully interconnected, and a bias is applied on each neuron.The transfer function is selected to be sigmoid so as to address nonlinearities of the input data set.For the training of the weights, the well-known back propagation method was used.The optimum number of training iterations and training parameters were set heuristically and vary depending on the diagnosis case (sleep deprivation and alcohol or drug intake) (a) Midazolam intake: where Y is the number of hidden neurons, t f is the neuron transfer function factor, and b is the bias factor of each neuron.Convergence was achieved after 500 to 1000 iterations depending on the abnormal state (the neural network for Midazolam intake was trained faster).For Midazolam and alcohol intake, only the time immediately after the consumption was tested (11:45).The reason is that the effects of drug and alcohol diminish with time quite rapidly.For sleep deprivation, tests were made on the latest measurement (19:00), since the effects of this condition become more intense as time goes by.
The baseline values corresponding to the nominal state were computed as the average of the measurements at the control days (Day 1) at the exact time of the measurements.
In other words, for the drug and alcohol states, the average of the measurements on Day 1 at 11:45 was regarded as the nominal value.
The relatively small number of subjects (20) in the available data set necessitated the use of cross-validation for the effective evaluation of the proposed methods.Thus, 15 subjects were used for training of the machine-learning algorithms, and the remaining 5 were used for evaluation of the trained algorithm.By moving each subject from the training set to the testing set, and vice versa, 20 training and evaluation sets were created.The reported results are the average of the classification rates over the 20 experiments.
Due to the small size of the training datasets which consist of only 15 subjects and four samples per subject per state (one normal and three abnormal for each abnormal state), which makes 60 training samples per state, the training only lasts a few seconds using an Intel Pentium 4. The validation of the trained algorithms with the five subjects that were not included in the training dataset takes place instantly.
It must be noted that the most computationally complex part of the classification detection algorithms is the training phase.However, since the training is performed offline and only once, it does not affect the performance of the system at the operational phase in which the classification of physiological state for an individual is performed in real time.

Training and Evaluation
Results.In the following tables, the comparative results of the different algorithms are presented.False rejection ratio (FRR) denotes the ratio of wrong estimation of abnormal state by the system, and false acceptance ratio (FAR) denotes the ratio of wrong estimation of nominal state, while the average of these measures is reported by the half total error rate (HTER) in analogy with the typical accuracy measures used for biometric security systems.
By examining the results in Tables 2, 3, and 4, the first observation is that drug intake is detected more accurately than alcohol, while sleep deprivation is the condition that is detected with the least accuracy.This outcome could be expected, since people exhibit different sensitivity to alcohol, thus there is stronger overlapping of the normal/abnormal classes than in the drug consumption case, where the induced substance can cause a state that is more objectively characterized as "abnormal".The detection accuracy from drug intake can be considered very satisfactory, since tests result to only a 5% error.The effects of Midazolam and alcohol intake are mainly detectable shortly after the consumption.Although the results in this paper present only the performance of system at time "11:45" (just after consumption) for Midazolam and alcohol, the detection of these conditions at later times is not as accurate.
Sleep deprivation, on the other hand, is considerably more difficult to detect (as shown in Figure 3(a)).The error reaches 40% (30% when using incomplete patterns with virtual feature values), which cannot be considered satisfactory.However, this could be due to the fact that the subjects were allowed to sleep for four hours during the night prior to the test (instead of the normal eight hours), so they did not reach the critical levels of sleepiness that could be detected more accurately by the analysis of their physiology indicators.
Another observation is that all algorithms, even though they follow different theoretical approaches to the classification problem, reached similar diagnosis rates (Figure 3(b)) for the three abnormal conditions.This makes us confident that the reported results are close to the highest accuracy levels that can be reached when utilizing the given features.

Conclusions
In this paper, we presented a study on the potential of the automatic classification of physiology measurements and derived features for the detection of specific hazardous physiological states.Such procedure is very important when applied to workplaces that involve critical operations that may affect the health and safety of personnel but also larger populations (e.g., professional drivers and operators in nuclear factories).The classification of physiology features was pursued with the development of state-of-the-art pattern recognition and classification algorithms, namely, SVM, FES, GMM, and ANN.SVM yielded the best estimation accuracy; however, the accuracy ranges significantly depending on the abnormal condition.Drug consumption which is the condition with the greatest accident risk is detected with 95% accuracy.The Midazolam treatment has a great effect on the physiology of almost all individuals.This "global" effect is demonstrated by the fact that the validation results using subjects outside the training datasets are not far worse than the training results.This means that after the offline training of the system with a number of subjects whose physiological measurements are acquired, the trained system can be applied to new unknown subjects and a similar detection accuracy is expected.This is not true, however, with alcohol, where the training results are significantly better than the validation results.Thus, we conclude that the effects of alcohol exhibit strong interpersonal variability.This was expected and leads us to think that the system should be trained specifically for every subject in order to extract a personalized intoxicated "signature".This complexes system deployment more than the drug detection case; however, even under nonpersonalized training, alcohol consumption was detected with an accuracy of almost 85% which could be satisfactory depending on the safety requirements of the deployment facility.
Sleep deprivation which is a major cause of hypovigilance is detected with approximately 70% accuracy.The main reasons for the low accuracy in detecting hypovigilance due to sleep deprivation are the large interpersonal variations, the "soft" sleep deprivation protocol which allowed four hours of sleep, and the nature of sleep deprivation-related hypovigilance: hypovigilance is manifested not linearly and in a steady manner, but sleepiness and alertness alternate from moment to moment.In addition, the process of attaching the electrodes, and the protocol itself which is based on tests rather than measurements in usual conditions makes the subjects tense and alert.On the other hand, hypovigilance can be detected much more accurately using other sensors (cameras) and features (eyelid activity parameters) [21].The system presented in this paper does not aim to operate antagonistically to such systems that also present other advantages such as unobtrusiveness, but rather to operate synergistically to make the detection more robust.
This study shows that sensors already applied for the collection of physiology signals for several purposes (e.g., security [12] and health or stress monitoring [13]) can also be utilized for the validation of a person's nominal physiological state as an accident preventive measure, adding value and another purpose to physiology measuring sensorial setups that are usually obtrusive for the subjects.
Next steps include further testing in order to explore how performance is deteriorated when several features that their acquisition is difficult and requires special equipment are discarded.If the impact on the detection rate is minimal, it can lead to increased unobtrusiveness and simplify the protocol for the collection of physiology and other data.The prime candidates for exclusion are ERP and body sway, since these modalities contribute mostly to the complexity of the validation of initial nominal state system by requiring specific time consuming protocols and special equipment for the data acquisition.

Figure 3 :
Figure 3: HTER grouped by algorithm for each abnormal condition (a) and HTER grouped by abnormal condition for each algorithm (b).

Table 1 )
varied depending on the abnormal state.For drug detection inputs, 4, 5, and 6 (see Table1) were the premise inputs, for alcohol detection inputs, 17,

Table 1 :
Measurements and extracted features for the diagnosis of abnormal states.

Table 2 :
Training and evaluation results for the detection of drug intake (Midazolam) using all available features.

Table 3 :
Training and evaluation results for the detection of alcohol using all available features.

Table 4 :
Training and evaluation results for the detection of sleep deprivation (SD) using all available features.