Detection of the state of mind has increasingly grown into a much favored study in recent years. After the advent of smart wearables in the market, each individual now expects to be delivered with state-of-the-art reports about his body. The most dominant wearables in the market often focus on general metrics such as the number of steps, distance walked, heart rate, oximetry, sleep quality, and sleep stage. But, for accurately identifying the well-being of an individual, another important metric needs to be analyzed, which is the state of mind. The state of mind is a metric of an individual that boils down to the activity of all other related metrics. But, the detection of the state of mind has formed a huge challenge for the researchers as a single biosignal cannot propose a particular decision threshold for detection. Therefore, in this work, multiple biosignals from different parts of the body are used to determine the state of mind of an individual. The biosignals, blood volume pulse (BVP), and accelerometer are intercepted from a wrist-worn wearable, and electrocardiography (ECG), electromyography (EMG), and respiration are intercepted from a chest-worn pod. For the classification of the biosignals to the multiple state-of-mind categories, a multichannel convolutional neural network architecture was developed. The overall model performed pretty well and pursued some encouraging results by demonstrating an average recall and precision of 97.238% and 97.652% across all the classes, respectively.
Biosignals or physiological signals are those signals that can provide the details about the physiological states and their associated dynamics in the body of a human being [
In the past, the acquisition process of the biosignals was a very cumbersome process that primarily included a clinical environment with a huge number of sensors and moreover, the process was quite expensive altogether. But, after the advent of wearable technologies or smart wearables, which has grown into much popularity, it is now quite easier to fetch the data and analyze it [
In the proposed study, different physiological signals of the subjects are coupled together to detect each state of the mind more accurately and precisely. The complete study was performed by engineering state-of-the-art features and followed by applying a multichannel convolutional neural network for the prediction of the states of the mind. The major novelty of the work can be put forward in multiple ways. First, the data that have been used in the study have been fetched from multiple subjects over a long period of time [
The rest of the paper is structured as follows: The second section provides the details about the related work that has been performed in a particular segment of stress detection, wearable technology, machine learning, and deep learning. The third section presents a deep understanding of the data preprocessing and feature engineering. The fourth section discusses the development of the deep learning model and discusses the training procedure. The fifth section provides the results that were achieved in the work followed by the sixth section, which discusses the complete inflow of data to the prediction and also explains the societal impact of the work. Lastly, the paper is concluded in the seventh section.
In the past, many researchers have highlighted the importance of biosignals for detecting different positive or negative emotions and different mental states depending on the situations. In [
But, in a perceptive case, it can be widely assumed that the decision thresholds for identifying a particular state of mind may not be the same across all the times for a particular individual. Therefore, a normalization factor was devised in [
Moreover, the type of activities that are pursued by the people also has a different perspective towards maintaining the decision threshold. While pursuing some strenuous activities such as driving, the amount of mental stress threshold for a particular person increases drastically from the normal state to the mobile state. Therefore, to answer this particular subjective scenario, a stress detection model was developed in [
Now, as discussed previously, a wide range of stress detection or stress classification has been performed for the driving activity but the relevance of the sensors used for deriving a particular outcome is one of the concerns. Therefore, [
The development of a generic model for the state of mind detection of different individuals seemed to be quite important as each individual has a wide range of different grant roots or thresholds for a specific condition. Therefore, for solving this particular scenario, a study was performed by [
For the stress detection of individual using electroencephalography (EEG), [
Furthermore, in the study regarding emotional stress detection using EEG signals, [
As primarily, the studies performed for the detection of stress predominantly used the wearable devices and noninvasive sensors for the extraction of signals, therefore [
The above-highlighted work further motivated us to explore the proposed study by developing a multichannel deep learning architecture with regard to stress detection by leveraging multiple biosignals and also to perform a check upon the interindividuality of the subjects during the learning process.
For the implementation of the multichannel convolutional neural network, multiple prerequisite steps are to be followed. As the data are in the raw format, generalizing the data based on the international system of units remains one of the most primary concerns. Moreover, as the data have been derived from the biosensors, it contains a multitude of abstracted information, which in turn can be difficult for the deep learning algorithms to identify [
The data set used in the work was fetched from the UCI machine learning repository that was posted by [
The data generated from both the wearable devices were in raw format. Therefore, the primary task that had to be performed for getting ahead in the process was to perform the conversion and generalization of the data into the SI units.
The ECG data provided by [
The EMG data were extracted from the subject at the sampling rate of 700 Hz from the chest using the RespiBAN device. The raw data of the EMG were converted to its SI unit that is microvolts (
The respiration data were extracted from the subject’s chest using the RespiBAN device at 700 Hz of sampling frequency during the experimental procedure. The raw data was converted to a form of displacement percentage using the piezoelectric sensors. The formula for the conversion is as follows:
The triaxial accelerometer data were captured from the wrist using Empatica E4, which sampled the data to 32 Hz, and the data provided were in the units of 1/64 g. Therefore, the following formula ensures the conversion of the raw data from the triaxial accelerometer to its SI units that is m/s2:
The BVP data are also known as the photoplethysmograph (PPG) data that were extracted from the subjects’ wrist using the Empatica E4 at a sampling rate of 64 Hz. The PPG basically narrows down the change in the volume of blood that is being caused by the pressure pulse by illuminating the skin with a light-emitting diode and detecting the amount of light transmitted and reflected back using a photodiode.
The temperature data from the wrist were pursued using the Empatica E4 device that performed the data generation at a sampling frequency of 4 Hz. The data generated from the subject were in the unit of degree Celsius.
Post conversion and generalization of the raw data to their SI units, the next step that was undertaken was data preprocessing. The data that have been fetched in the study comes from different regions of the subject’s body, and multiple devices have been used for the extraction of the data. Moreover, we can observe that there is a lot of variance in terms of the sampling rate of different signals. Therefore, for generalizing the frequencies of all the signals, we tend to convert all the low-sampled signals to 700 Hz initially. Therefore, the triaxial accelerometer data, blood volume pulse, and temperature data have been upsampled to 700 Hz. Now, as the signals have been upsampled to 700 Hz, therefore the data for 15 subjects captured for 100 minutes turned out to be huge in size. So, the signals were further downsampled to 10 Hz by aggregating every 70 samples together using statistical techniques. Also, on the other hand, the labels were downsampled to 10 Hz by taking the mode of the labels for every 70 samples. Finally, after performing all the aggregations and changes in the sampling frequency, the total number of samples of the whole data set for 15 subjects turned out to be 573,480. The distribution of the state of the mind categories has been depicted in Table
State of the mind category distribution.
State of the mind class | Number of samples |
---|---|
Baseline | 274,790 |
Amusement | 117150 |
Stressed | 65450 |
Meditation | 37090 |
Recovery | 79000 |
The features that have been engineered from the raw biosignals data are primarily varied in three different forms. The first form is the one-to-one variance or continuous feature variable. In this type of feature, each and every sample of the data set gets an individual value and is continuous in nature. The second form is the subject-wise variance where all the samples of a particular subject are provided with the same value for a particular feature. The third form of feature is based on minute-based variance, where all the samples of a particular minute are provided with the same value. Therefore, using such methods usually provides the features with an optimum variance, which can lead to a better model in terms of generalizability and better classification performance.
The features derived from the ECG, EMG, respiration, and BVP are peak-based features, and the features derived from the accelerometer are purely statistical in nature. The peak-based features for the 1-dimensional biosignals are determined by calculating the local maxima of the cycle of the signal by leveraging the information of the threshold and the definite distance that is needed to be maintained between consecutive peaks.
The features for electrocardiography are basically in the form of minute-based variance where each minute of particular feature gets a different value. Moreover, the features defined in the purpose of ECG are peak-based features as it is a primary notion in terms of biosignals that the peaks of the signal carry a summative value to an entire cycle.
Figure
ECG signal for 30 seconds of subject 2.
In Table
Electrocardiography (ECG) features.
Feature name | Description |
---|---|
ECG_Peaks | This gives out the number of local maxima in a minute |
ECG_Average_Amplitude | This feature gives out the average amplitude of the local maximas in a minute |
ECG_Differ_Mean | This feature pursues the average difference between consecutive local maxima in a minute |
ECG_Resting | This feature shows out the resting motion of a subject, which means the number of local maxima within 10 samples that is 1 second |
The electromyography signals are well known to measure and record the electrical coefficient of skeletal muscles that tend to define the activation level and figures out the medical abnormalities in a subject. The features calculated for the EMG signal are minute-wise varied to offer an optimum variance across each feature.
Figure
EMG signal for 30 seconds of subject 2.
The features demonstrated in Table
Electromyography (EMG) features.
Feature name | Description |
---|---|
EMG_Peaks | This gives out the number of local maxima in a minute |
EMG_Average_Amplitude | This feature gives out the average amplitude of the local maximas in a minute |
EMG_Differ_Mean | This feature pursues the average difference between consecutive local maxima in a minute |
The respiration data have been extracted from the chest, which shows the tone and rhythm of the breath and also places the ratio between multiple breath cycles. Also, the respiration data have always been helpful in terms of determining the state of mind and in determining the level of arousal or rate of bio-intensity of a particular subject. The features derived from the respiration data are minute-based such as ECG and EMG. Figure
Respiration signal for 100 seconds of subject 2.
Moreover, Table
Respiration features.
Feature name | Description |
---|---|
RESP_Peaks | Number of breath cycles in a minute |
RESP_Average_Amplitude | This feature gives out the average amplitude of the local maximas in a minute |
RESP_Differ_Mean | This feature pursues the average difference between consecutive local maxima in a minute |
The BVP signal is specifically derived from the photoplethysmogram that illuminates the skin to determine the changes in the light absorption. From the peaks of BVP, we can determine the heart rate of an individual as every time the heart pumps blood, there is a slight change in the volumetric quantity of blood in arteries, which can be detected using a BVP Signal. In Figure
Blood volume pulse for 30 seconds of subject 2.
The features for BVP signal are also varied on the terms of a minute where each minute gets a different value. Table
Blood volume pulse features.
Feature name | Description |
---|---|
BVP_Peaks | This gives out the number of local maxima in a minute |
BVP_Average_Amplitude | This feature gives out the average amplitude of the local maximas in a minute |
BVP_Differ_Mean | This feature pursues the average difference between consecutive local maxima in a minute |
The accelerometer signals are quite reliable in terms of analyzing the level of stress in an individual by seeking out the patterns in the movement [
Accelerometer signal features.
Feature | Equation | Description |
---|---|---|
Mean |
|
The mean of the signal for each subject |
Standard deviation |
|
The standard deviation of the signal is calculated for each value |
Correlation |
|
The correlation coefficient between the two accelerometer signals |
Kurtosis |
|
Kurtosis shows the peakedness of a signal |
Crest factor |
|
It shows the signal impulsiveness with the maximum accelerometer value |
In recent years, it has been observed how supervised learning techniques have evolved to create some most innovative architectures for solving a particular problem. More evidently, the rise in popularity can be observed for the deep learning algorithms too, which has undergone a major paradigm shift in terms of structure, optimizer functions, and the architecture [
In this work, biosignals from chest and wrist wearables have been used for the detection of the state of the mind while undergoing a stress interview. The major significance of this work stands with identifying the stress segment of an individual. For the identification and the predictions of the state of mind, a multichannel convolutional neural network has been used for guaranteeing the optimum generalizability and for identifying complex patterns in the biosignals.
The model architecture for the multichannel convolutional neural network has been depicted in Figure
Multichannel CNN architecture.
The most distinctive aspect of convolutional neural networks is the convolution layer, which is used for traversing along the matrix of the data to create a penultimate feature matrix of spatially oriented features using an adaptive kernel or a filter. The adaptive filters for the convolution layers in the multiple channels are to be adjusted on the basis of the input shape of the data matrix. Therefore, the following equation has been used to choose the optimum shape for the filter.
The feature maps from the first convolution layer are further passed to the second layer of convolution without using any subsampling layer in between. By considering the huge spatial volume of the data that is being trained on the CNN architecture, it can be duly argued that using subsampling layer, such as pooling in between consecutive CNN layers, can make the solution less computationally expensive. But, the usage of subsampling layers for the data whose numerical significance is more important than the spatial arrangement possesses information loss [
The generated feature matrix by the 2nd convolution layer is then subjected to a fattening layer. The flattening layer first converts the feature matrix from a 2-dimensional matrix to a 1-dimensional array because the subsequent stages of the network contain dense layers. And, for passing a set of data to the dense layer, it is required that the data must be in 1-dimensional format.
After the data are subjected to a flattening layer, they is then subjected to a dropout layer. The dropout layer that has been used in the architecture is basically used for performing regularization and it also assists the model in preventing overfitting. The dropout layer allows the model to fetch for more complex and robust feature relationships by dropping a set of neurons from the visible and the hidden layers to perform more randomized feature learning.
The 6th layer in the architecture is a dense layer, which is the fully connected layer with 64 units. The dense layer allows the model to perform a linear operation on the feature matrix that has been generated by the convolution layer. Moreover, as the convolution layers work locally for the spatial set of defined filters that traverses along with the data matrix, the dense layer acts as a global layer where all the nodes of the layer participate and are connected to all the other nodes in the following layers. Therefore, the usage of dense layers in this work allows the model to establish a global relationship between the features and also accounts for the abstraction of more complex patterns in the data.
The 9th layer in the network is the concatenation layer that allows us to combine the feature matrices from all the channels. The reason behind the concatenation of the feature matrices lies in accordance with our problem statement, which is to detect the state of the mind based upon multiple signals. Therefore, for obtaining the decision threshold based upon all the biosignals, the concatenation of the feature matrices from all the channels is required.
The subsequent layer after the concatenation layer is fully connected layer with 32 units. This fully connected layer is used for fetching out the composite relationships between the concatenated feature matrices from the multiple channels. This layer majorly plots the complex features, complex relationships, and the patterns among the combined feature matrices that support the generation of the decision threshold. The last layer or the output layer that is depicted in Table
Multichannel CNN architecture.
Layer | Layer type | Filters | Size | No. of parameters | Output dimension | Activation |
---|---|---|---|---|---|---|
1 | Input | — | — | — | ECG: (4, 1) |
— |
|
||||||
2 | Conv1D (1st layer) | 128 | ECG: (2, 1) |
ECG: 384 |
ECG: (3, 128) |
ReLU |
|
||||||
3 | Conv1D (2nd layer) | 64 | ECG: (2, 1) |
ECG: 16448 |
ECG: (2, 64) |
ReLU |
|
||||||
4 | Flatten | — | — | ECG: 128 |
— | |
|
||||||
5 | Dropout | — | — | — | — | |
|
||||||
6 | Dense (1st layer) | 64 | — | ECG: 8256 |
ECG: 64 |
ReLU |
|
||||||
7 | Dropout | — | — | — | — | — |
|
||||||
8 | Dense (2nd layer) | 32 | ECG: 2080 |
ECG: 32 |
ReLU | |
|
||||||
9 | Concatenate | — | 0 | 160 | ||
|
||||||
10 | Dense (3rd layer) | 32 | 160 | 5152 | 32 | ReLU |
|
||||||
11 | Dense (output) | 5 | 32 | 165 | 5 | SoftMax |
The model training in the work used two varied procedures namely Type I and Type II. The type I procedure predominantly was utilized for tuning the hyperparameters and choosing the most viable optimizers for increasing the model performance. Moreover, the type I model was also used to check an initial performance of the model for randomized sequence. For creating the model based on type I procedure, the complete data set was split as 70% of the data were allotted to the training set, 20% were allotted to the validation set and lastly, 10% were allotted to the testing set. The samples that were placed on different sets of data were chosen randomly to remove any correlation in terms of subjects. Table
Training, validation, and testing divisions for all the channels and number of features for Type I.
Channel | Training samples | Testing samples | Validation set | No. of features |
---|---|---|---|---|
ECG channel | 616,413 | 176,118 | 88,059 | 4 |
EMG channel | 616,413 | 176,118 | 88,059 | 3 |
Respiration channel | 616,413 | 176,118 | 88,059 | 3 |
BVP channel | 616,413 | 176,118 | 88,059 | 3 |
Accelerometer channel | 616,413 | 176,118 | 88,059 | 15 |
On the other hand, another procedure for training the model was also undertaken by using a cross-validated approach using the data of individual subjects as the testing set. This particular approach was named as Type II procedure. More particularly, for creating the type II model, a 15-fold cross-validation was performed on the data of 15 subjects, where the data of a particular subject were always kept aside for creating the test set. The remaining data of 14 subjects were further allocated to the training and the validation set based on a randomized split with a ratio of 80 : 20. This particular model was developed only for the sake of understanding the capability of the model to generalize across different subjects. Table
Number of samples for each fold of training.
The subject in the test set | Training set | Validation set | Testing set |
---|---|---|---|
Subject 1 | 656,872 | 164,218 | 59,500 |
Subject 2 | 654,184 | 163,546 | 62,860 |
Subject 3 | 654,264 | 163,566 | 62,760 |
Subject 4 | 655,896 | 163,974 | 60,720 |
Subject 5 | 649,320 | 162,330 | 68,940 |
Subject 6 | 663,744 | 165,936 | 50,910 |
Subject 7 | 661,960 | 165,490 | 53,140 |
Subject 8 | 664,144 | 166,036 | 50,410 |
Subject 9 | 661,728 | 165,432 | 53,430 |
Subject 10 | 663,824 | 165,956 | 50,810 |
Subject 11 | 650,536 | 162,634 | 67,420 |
Subject 12 | 658,360 | 164,590 | 57,640 |
Subject 13 | 654,512 | 163,628 | 62,450 |
Subject 14 | 653,984 | 163,496 | 63,110 |
Subject 15 | 659,280 | 164,820 | 56,490 |
The development of a model architecture is one of the prime components of the system that is being developed in the work. But, more advertently, the component that works for the state-of-the-art model architectures is the control over the training process and to optimize the model’s performance and outcomes. Therefore, the components such as the model hyperparameters, loss functions, and the optimizer functions are discussed in the following sections.
The control of the training process is generally held by the hyperparameters that are used for the tuning of the model. As of the current scenario, the optimization of the models by minimizing the testing error is considered to be one of the toughest challenges. But in an intermittent way, the tuning of the elements that reside outside of the model actually influences the complete performance of the model and can be considered as the most challenging part in solving the problem. The primary reason behind the difficulty lies with the fact that the chosen hyperparameters must be model-specific and not training set-specific because hyperparameters that are tuned on the basis of the training set often develop poor model generalizability. Therefore, choosing the right set of hyperparameters is important to maintain the overall tradeoff between model generalizability and optimum objective score.
So, for the choice of right set hyperparameters, Bayesian Sequential Model-Based Optimization (SMBO) is used. Bayesian SMBO is a type of hyperparameter optimization that minimizes a particular objective function by developing a surrogate model (probability function) based on the previous evaluation results of the objective function. The basic objective function of the Bayesian SMBO is given by
The surrogate model is considered to be less expensive to be optimized than the main objective function [
The set of hyperparameters that were obtained by running Bayesian SMBO on the model are
The loss function is a very integral part of the deep learning and the machine learning models. The loss functions are basically used to measure the variability between the predicted output (
The loss function used in the work is the categorical cross-entropy loss, which is also known as the SoftMax loss. In the categorical cross-entropy loss function, each prediction is compared to the actual class value and a score is calculated. The score is further used to penalize the probability of the prediction based on the difference from the actual value. The penalty that is offered to the predicted value is purely logarithmic in nature where a small score is allotted to tiny differences and the huge score is allotted to larger differences [
The optimizer functions are the ones that play an integral part in the optimization of the internal parameters of a model. The internal parameters of the type of model that is being dealt with in the work are the weights and biases. Now, in the previous segment, we have discussed the loss function of the model that needs to be minimized over the training iterations. But the loss function is more of a mathematical way of determining what is the error rate between the predictions and the actual labels. Therefore, optimizer functions are used to incorporate the loss function with the models’ internal parameters such as weight and biases for updating the same based on the response generated from the loss functions.
In this work, multiple optimizer algorithms were used and a comparative analysis was performed with regard to which optimizer function relates to the best minimization of the categorical cross-entropy loss and ties best with the hypothesis of the problem. The comparative analysis can be seen in Table
Comparative analysis of the model performance based on the optimizer algorithms for subject 1 in the testing set.
Metric | Adam | RMSprop | SGD |
---|---|---|---|
Accuracy | 97.62 | 90.45 | 92.51 |
Recall “baseline” | 0.9861 | 0.8945 | 0.9063 |
Precision “baseline” | 0.9703 | 0.9106 | 0.9542 |
F1 score “baseline” | 0.9716 | 0.9033 | 0.9311 |
Recall “amusement” | 0.9891 | 0.9322 | 0.9256 |
Precision “amusement” | 0.9956 | 0.9158 | 0.9428 |
F1 score “amusement” | 0.991 | 0.9288 | 0.9299 |
Recall “stress” | 0.9832 | 0.9647 | 0.9568 |
Precision “stress” | 0.9784 | 0.94 | 0.9487 |
F1 score “stress” | 0.9693 | 0.9561 | 0.9509 |
Recall “meditation” | 0.9583 | 0.9428 | 0.9467 |
Precision “meditation” | 0.9752 | 0.9022 | 0.9788 |
F1 score “meditation” | 0.9680 | 0.9312 | 0.9635 |
Recall “recovery” | 0.9456 | 0.9365 | 0.9387 |
Precision “recovery” | 0.9711 | 0.9174 | 0.9579 |
F1 score “recovery” | 0.9620 | 0.9258 | 0.9466 |
The multichannel convolutional neural network model developed in the work aimed to provide very sound and effective results on the basis of the classification of the different state of minds for a particular subject. Also, the model developed in the work provided with the results by prompting an average recall and precision of 97.238% and 97.652%, respectively, for all the classes. The model also showcased a constant tendency of precision and recall in the random data folds of training and testing.
Moreover, with prior accordance to the hypothesis that was developed in the initial phase stated the rules that the precision and recall of all the class must be above the same threshold providing a fixed classification rate in all classes. As in the previous work [
In Figure
Confusion matrix of the multichannel CNN model.
The metrics used for evaluating the potential of the model are precision, recall, and the F1 score of all the classes. In the current scope of this work, the recall of each class provides us the information, with regard to the number of data samples that the model has correctly predicted to be of a particular class. The precision on the other hand of a particular class determines the confidence of prediction to belong to a particular class. And lastly, the F1 score suggests the weighted average of both precision and recall and therefore takes a leap over all the wrongly predicted samples of a particular class.
Figure
Model training process using Adam optimizer for 100 epochs.
Table The overall performance of the Adam Optimized model is better than the other two. The model optimization is very much time-efficient. The model optimization is computationally efficient. As the type of data, we are dealing within the work, there is no prospect for an upper bound or lower bound of a particular type of biosignal. Therefore, for reproducibility of the model in the future, it may happen that the gradients might change for a particular type of subject. So, having an algorithm to optimize the model which is not varied by the rescaling of the gradient will turn out to be useful [
Table
Comparative analysis of the model performance for multichannel CNN and single-channel CNN for subject 1 in the testing set.
Metric | Multi channel CNN | Single channel CNN |
---|---|---|
Accuracy | 97.62 | 87.53 |
Recall “baseline” | 0.9861 | 0.9524 |
Precision “baseline” | 0.9703 | 0.9347 |
F1 score “baseline” | 0.9716 | 0.9435 |
Recall “amusement” | 0.9891 | 0.9311 |
Precision “amusement” | 0.9956 | 0.9006 |
F1 score “amusement” | 0.991 | 0.9132 |
Recall “stress” | 0.9832 | 0.8991 |
Precision “stress” | 0.9784 | 0.9157 |
F1 score “stress” | 0.9693 | 0.9036 |
Recall “meditation” | 0.9583 | 0.7658 |
Precision “meditation” | 0.9752 | 0.8631 |
F1 score “meditation” | 0.9680 | 0.8122 |
Recall “recovery” | 0.9456 | 0.9136 |
Precision “recovery” | 0.9711 | 0.9217 |
F1 score “recovery” | 0.9620 | 0.9178 |
Table
Comparative analysis of the model performance for multichannel CNN for Type-II model.
Metrics | Subject 1 | Subject 2 | Subject 3 | Subject 4 | Subject 5 | Subject 6 | Subject 7 | Subject 8 | Subject 9 | Subject 10 | Subject 11 | Subject 12 | Subject 13 | Subject 14 | Subject 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | 75.34 | 78.66 | 74.56 | 0.7644 | 75.84 | 76.54 | 77.8 | 79.89 | 78.26 | 76.20 | 76.79 | 77.72 | 77.66 | 78.08 | 76.14 |
Recall “baseline” | 0.724 | 0.79 | 0.741 | 0.748 | 0.739 | 0.873 | 0.793 | 0.874 | 0.814 | 0.813 | 0.791 | 0.831 | 0.884 | 0.83 | 0.719 |
Precision “baseline” | 0.752 | 0.795 | 0.771 | 0.715 | 0.769 | 0.718 | 0.798 | 0.715 | 0.79 | 0.727 | 0.742 | 0.809 | 0.8 | 0.775 | 0.713 |
F1 score “baseline” | 0.738 | 0.792 | 0.756 | 0.731 | 0.754 | 0.788 | 0.795 | 0.787 | 0.802 | 0.768 | 0.766 | 0.82 | 0.84 | 0.802 | 0.716 |
Recall “amusement” | 0.723 | 0.75 | 0.718 | 0.676 | 0.669 | 0.694 | 0.678 | 0.705 | 0.709 | 0.694 | 0.676 | 0.686 | 0.714 | 0.682 | 0.728 |
Precision “amusement” | 0.727 | 0.813 | 0.818 | 0.841 | 0.81 | 0.835 | 0.714 | 0.769 | 0.752 | 0.742 | 0.777 | 0.816 | 0.844 | 0.79 | 0.736 |
F1 score “amusement” | 0.725 | 0.78 | 0.765 | 0.75 | 0.733 | 0.758 | 0.696 | 0.736 | 0.73 | 0.717 | 0.723 | 0.745 | 0.774 | 0.732 | 0.732 |
Recall “stress” | 0.719 | 0.734 | 0.786 | 0.74 | 0.804 | 0.718 | 0.744 | 0.744 | 0.806 | 0.734 | 0.767 | 0.749 | 0.746 | 0.747 | 0.801 |
Precision “stress” | 0.785 | 0.796 | 0.769 | 0.788 | 0.843 | 0.781 | 0.836 | 0.778 | 0.813 | 0.785 | 0.815 | 0.844 | 0.805 | 0.76 | 0.838 |
F1 score “stress” | 0.751 | 0.764 | 0.777 | 0.763 | 0.823 | 0.748 | 0.787 | 0.761 | 0.809 | 0.759 | 0.79 | 0.794 | 0.774 | 0.753 | 0.819 |
Recall “meditation” | 0.734 | 0.798 | 0.775 | 0.873 | 0.779 | 0.756 | 0.835 | 0.849 | 0.759 | 0.788 | 0.757 | 0.819 | 0.766 | 0.791 | 0.773 |
Precision “meditation” | 0.846 | 0.808 | 0.822 | 0.811 | 0.843 | 0.822 | 0.851 | 0.776 | 0.778 | 0.841 | 0.785 | 0.834 | 0.862 | 0.871 | 0.887 |
F1 score “meditation” | 0.786 | 0.803 | 0.798 | 0.841 | 0.81 | 0.788 | 0.843 | 0.811 | 0.768 | 0.814 | 0.771 | 0.826 | 0.811 | 0.829 | 0.826 |
Recall “recovery” | 0.867 | 0.861 | 0.723 | 0.785 | 0.801 | 0.786 | 0.84 | 0.819 | 0.825 | 0.781 | 0.849 | 0.801 | 0.773 | 0.854 | 0.786 |
Precision “recovery” | 0.814 | 0.841 | 0.803 | 0.795 | 0.837 | 0.782 | 0.808 | 0.836 | 0.81 | 0.798 | 0.795 | 0.814 | 0.789 | 0.785 | 0.824 |
F1 score “recovery” | 0.84 | 0.851 | 0.761 | 0.79 | 0.819 | 0.784 | 0.824 | 0.827 | 0.817 | 0.789 | 0.821 | 0.807 | 0.781 | 0.818 | 0.805 |
In the present world, as the life of people have changed in a varied way where they are much suited to the new customized lifestyle and the disorientation of the biological clock, it has been very necessary and of paramount importance that the state of mind and health must be maintained properly. But, people these days have turned out to be more reluctant to spend their time with the therapists or the doctors for pursuing a proper check on their health. Therefore, with the emergence of smart healthcare, the process could be very much maintained and measured using the wearable devices that have grown into much affluence in society. We know that the smart wearables that have presently arrived in the sector support multiple biosignals of the user such as movement, heart rate variability, pulse pressure, vascular respiration, perfusion index, etc. Therefore, these biosignals, if properly monitored for a particular subject, will be able to identify the health conditions as well as will be able to detect the primary anomalies in the health.
The data that have been used in the work have been properly curated from the wearable device worn by the subject during the experimental process for detecting the certain state of mind that can be very much useful to understand the mental conditions of the subject. In the data amalgamation process, five key classes were noted namely, recovery, baseline, stress, amusement, and meditation. And for the classification purpose, multiple biosignals were utilized such as accelerometer, electrocardiography, electromyography, blood volume pulse, and body temperature. The signals were further analyzed to perform optimum feature engineering where the summative information of complete signals is extracted using the maxima and the minima of the signal at a particular instance of time.
For the classification purpose, a multichannel convolutional neural network architecture was developed in the work. The primary concern for the development of a multichannel architecture is that as we have different biosignals from different parts of the body, we tried to avoid the initial intermixing of the features of different biosignals. But later on, at the penultimate region, the feature matrixes conceived by different channels are concatenated for pursuing an integrated decision threshold for the detection of the state of the mind from all the biosignals. But at a certain point, a question can be raised that “Why deep learning has been used for solving the particular problem?” The answer to the question lies in the fact that as the biosignals are of an abstract nature and there are multiple complex interactions and patterns in the data, manually engineering the right features would be very difficult. Therefore, in this work, deep learning is performed as the method has the ability to produce extremely complex feature representations and also allows model reproducibility, which will allow us to perform incremental learning if a certain new set of data arrives.
In the proposed study, a multichannel convolutional neural network architecture was developed for the detection of state of the mind by leveraging biosignals from the wearable devices. The different types of biosignals used in the work are electrocardiography, electromyography, respiration, blood volume pulse, and accelerometer. The model developed performed pretty well by prompting an average recall and precision of 97.238% and 97.652%, respectively, across all the classes. In the work, a comparative analysis was performed for choosing the right optimizer by keeping in mind the performance of the optimizer with respect to the cost of computation, time efficiency, and model reproducibility. Finally, it was found that the model optimized with Adam optimizer performed the best with respect to the other optimizer functions.
To conclude, the outcome of the study is very motivating. However, in the area of classification of the state of the mind and the analysis of the biosignals, there is still a huge scope for further research. Therefore, it is very much recommended to investigate multiple ways of solving the particular type of problem and to understand the complete capability of multichannel deep learning architectures, which will further impact the society in a novel and a positive way.
The data used to support the experiments and the findings of the study have been duly included in Section
The authors of the paper declare that there are no conflicts of interest regarding the publication of the paper.
This research was funded by the Basic Science Research Program through the National Research Foundation of Korea (NRF), supported by the Ministry of Science, ICT & Future Planning (NRF-2017R1D1A3B04032905).