In the current intranet environment, information is becoming more readily accessed and replicated across a wide range of interconnected systems. Anyone using the intranet computer may access content that he does not have permission to access. For an insider attacker, it is relatively easy to steal a colleague’s password or use an unattended computer to launch an attack. A common one-time user authentication method may not work in this situation. In this paper, we propose a user authentication method based on mouse biobehavioral characteristics and deep learning, which can accurately and efficiently perform continuous identity authentication on current computer users, thus to address insider threats. We used an open-source dataset with ten users to carry out experiments, and the experimental results demonstrated the effectiveness of the approach. This approach can complete a user authentication task approximately every 7 seconds, with a false acceptance rate of 2.94% and a false rejection rate of 2.28%.
Insider threats have always been one of the most severe challenges for intranets with security requirements [
Internal personnel have access to use internal proprietary systems, and they know internal security policies and protection techniques and review regulations from safety facilities [
The organizations in physical isolation high-security networks environment, such as confidential research institutes and military enterprises, are less likely to suffer external attacks. Thus, the internal attacks are the main reason for the leakage of sensitive information [
There are two main reasons for this attack: Due to the negligence of internal employees, they left the workstation without logging out of the terminal equipment. As a result, malicious people use their terminals to unauthorized access and copy sensitive information; Others forge a privileged user’s identity, such as password leakage or USB-key loss, and thus access privileged user’s terminal for espionage activities.
Traditional authentication methods, such as passwords, USB keys, and fingerprints, determine the user identity when logging in. Thus, they cannot effectively discover end users with identity theft [
This paper proposed a continuous identity authentication method based on mouse dynamic behavior and deep learning to solve the insider threat attack detection problem. We verified the effectiveness of our proposed method on an open-source mouse dynamic dataset which contains the mouse dynamic data from ten users. The experimental results showed that our proposed method could identify the user’s identities in seconds and has a lower false accept rate (FAR) and false reject rate (FRR). Specifically, the contributions of the work are as follows: We propose a novel continuous identity authentication method using mouse dynamic behavior and deep learning. It achieves better accuracy and lower verification time than existing methods. Instead of manually extracting features from raw operations to characterize a user’s unique mouse behavior characteristics, such as movement speed curves, we map the mouse dynamic behavior into pictures. Hence, the whole details of the mouse behavior can be preserved. We construct a 7-layer CNN network to train the mouse behavior pictures datasets. The network converges with a small amount of data (about 18000 pictures). Moreover, the network can be used to train other mouse behavior datasets and implement identity authentication easily.
The remainder of this paper is organized as follows: Section
As early as 2006, the American Institute of Computer Security (CSI) issued a report that the insider threat caused by the malicious abuse of authority has exceeded the traditional Trojan attacks and has become the main threat to organizations [
The current research on insider threats is becoming more systematic and specific. Since 2001, the United States Secret Service (USSS) and Carnegie Mellon University have jointly established the CERT Insider Threats Center. The center collected more than 700 insider threat cases of fraud, theft, and destruction and thus solved the problem of insufficient data in insider threat research [
The most common attack on insider threat is identity fraud. Since the user’s identity cannot be continuously verified during the process of using the terminal, the malicious user has the opportunity to masquerade as a legitimate user. At present, research on the use of human-computer interaction data for insider threat attack detection has achieved good results in practical applications. The human-computer interaction data detection mainly studies the behavior pattern of the user using the computer input device, in which the mouse and the keyboard are mainly used as the data source.
In [
Research using keyboard data includes static text analysis and dynamic text analysis. For example, in [
Among different input devices, the mouse is a suitable choice. In some existing methods based on mouse dynamics, papers [
The paper [
There are also some other approaches to detect insider threats. Reference [
From previous research on insider threat detection based on mouse behavior, researchers usually extract some mouse features based on the basic mouse movements (which we call raw data, including clicks, moves, and drags), such as direction, velocity, Angle of Curvature, Curvature Distance, and Pause-and-Click [
This kind of method has achieved outstanding results, but all have a common shortcoming. Researchers are all extracting features based on their own experience and understanding. This method has certain limitations; during the process of extracting features, some researchers use only mouse click actions; some researchers use the moving distance combined with clicks to generate features, and others consider other basic mouse actions. But there is still much raw data that can reflect the individual’s unique being ignored and therefore affects the accuracy of recognition.
We propose a method that can completely retain all basic mouse operations and use deep learning for user authentication. First of all, we map all actions generated by the mouse to images through a particular method. Then we train the image datasets through the CNN network to create classification models. In the process of authentication, the user’s mouse operation is mapped according to the same method, and then classified by the trained model, so as to achieve the purpose of user identification. Our approach makes full use of the advantages of deep learning. First, in the process of mapping mouse behaviors into images, we retain all basic mouse operations. Neither need to manual extract features to train these image datasets when using the CNN network, nor need the feature extraction algorithm in traditional machine learning. The convolutional neural network can automatically complete feature extraction and abstraction in the training. Secondly, there are many successful and efficient solutions to the problem of how to use CNN to classify images. In order to take this advantage, we map the mouse actions to the behavior trajectory on the picture, which based on mouse behavior. This turns the problem of the user authentication based on mouse behavior into a classic image classification problem.
Many of the previous studies collect the raw mouse data by themselves and use for analysis and experimentation. Most of the datasets have not been disclosed. Even if some researchers open source their own datasets, they are currently unavailable for download. To prove the effectiveness of our method, we choose to use the open-source mouse dynamic dataset, which published on GitHub [
The dataset stores the mouse dynamic data from ten users. The dataset consists of two parts, one part is stored in the “training_files” folder and contains data session files from ten users who normally operate the mouse, and store them separately in their respective named folders. There are 5-7 long session data files in each user folder; the other part stores them in the “test_files” folder, which also contains ten folders named after ten users. Each user folder stores many short session data files. Some of these session files are actually not generated by current users. The dataset also gives a “public_labels.csv” file that marks whether the session data in the folder is legal or not.
Each session is stored in the following format:
The specific meanings are as follows [
record timestamp: elapsed time (in a sec) since the start of the session as recorded by the network monitoring device;
client timestamp: elapsed time (in a sec) since the start of the session as recorded by the RDP client;
button: the current condition of the mouse buttons;
state: additional information about the current state of the mouse;
x: the x coordinate of the cursor on the screen;
y: the y coordinate of the cursor on the screen;
The data in “training_files” is quite complete and large enough compared to the data in “test_files”. We divided “training_files” into two parts, one to train our model and the other to verify whether our model is effective or not. In addition, we use “test_files” to further verify that our model is still accurate enough for insider threat detection.
All the basic operations that the mouse can produce are move, Click, Drag, Scroll, and Stay. In previous studies, the researchers usually extract features from these five basic actions based on their experience. Such as moving distance, moving speed, click frequency, etc., which can reflect the behavior of individuals using the mouse to a certain extent. However, there are also two obvious shortcomings. First, the use of a single feature (or a combination of multiple feature vectors) does not fully or accurately reflect the individual’s unique behavioral characteristics; the second is that more time is required in the process of acquiring features, because the feature vectors (some researchers call them effective operations) are extracted from the basic operations. Undoubtedly, relatively more basic actions are needed to generate enough effective operations.
In order to completely preserve the features generated by people using the mouse, we map all the basic actions generated by users to the image. Because the data generated by the mouse is a one-dimensional dataset, we map these one-dimensional datasets to two-dimensional tensors according to the following mapping method. The specific mapping method is as follows:
Take m mouse operations according to the data sequence of the session;
Construct a coordinate system D based on xMax and yMax;
Extract all the “Move” operations in this m, record all the coordinates of “Move” (x, y)
The position of the mouse is represented by a two-dimensional coordinate system (x, y), and the distance between two mouse positions is expressed as a movement feature of the mouse. The movement features can reflect the personality characteristics of users such as moving speed, moving angle, moving range, and average moving distance.
Extract all the “Pressed” operations in m, the pressed operation (x, y)
The click action is actually made up of two actions: Pressed and Released. So a “click” action is actually done by a Pressed and a Released. Two consecutive times on the same coordinate (x, y) becomes a “double click” action. In addition, the mapping method also needs to distinguish the left and right buttons of the mouse, that is, “click” and “double-click” of the left mouse button and “click” and “double-click” of the right mouse button. It can reflect the features of the number of clicks, frequencies, and other features.
Extract all the “Drag” operations and (x, y)
The mouse did not release immediately after pressing “Pressed” but “drag” a distance and then “Released,” which is a drag operation. Drag operations are also common in actual mouse use and are often not noticed as features of the user.
In the same way, all “Scrolls” are extracted. Scroll’s “Up” operations are recorded in [scroll_up_x, scroll_up_y] and “Down” operations are recorded in [scroll_down_x, scroll_down_y]. The “Scroll Up” is represented by a cyan upward triangle“△” and the “Scroll Down” is represented by “▽”, as shown in Figure
Mouse “Scroll” operation is also often ignored by researchers, but it can also reflect people’s habitual operating characteristics. The “Scroll” operation is divided into two Up and Down scroll actions.
In this m, if there are two consecutive actions on the same coordinate and stay for a while, we believe that this period can also reflect the personal operating habits and call it a “Stay” operation. (x, y, s)
Usually, researchers think of mouse movements and clicks as mouse operations. They do not pay attention to the interval between two mouse operations. In fact, this interval is also an “operation” of the mouse, and we call it the “Stay” operation. When there is no other operation on the mouse in one coordinate or the interval between two operations of the mouse, it can be defined as the “operation,” which is represented by a semitransparent square on the two-dimensional image. The size of the square is used to indicate the length of stay.
Save the mapped D as a picture in JPG format, so we get a track diagram of the mouse’s behavior in units of m basic operations.
Repeat Steps 2-8 until the number of remaining operations in a session is less than m, and get n pictures of a user in a session, as shown in Figure
A picture of a user’s mouse behavior (when m=100).
According to the mapping method, all sessions of all users in “training_files” store them in the folders named by each user, and we get a dataset that can be trained by the CNN network. Each username is the label of each sub-dataset. We generated image sets for training CNN models in units of m=25, 50, 100, 500, and 1000, respectively, as shown in Figure
Image dataset generated from “training_files” in a unit of m. As the value of m increases, the dataset of user pictures becomes smaller.
Although the open-source dataset was used to generate 10 sets of tagged datasets, as shown in table x, the amount of generated images is insufficient. Therefore, we use three methods for data augmentation. One is to flip the image, including horizontal flip and vertical flip; The other is to rotate the image by 90 degrees, 180 degrees, and 270 degrees, respectively; The third is to randomly rotate the image by 25 degrees. Each picture will be judged according to the probability of 50% on whether to perform the above operation. As long as the dataset reaches our preset target, the dataset stops to augment. In this paper, our default augmentation goal is 18,000, which is to augment each user’s dataset to 18,000 images.
We refer to the networks of Alexnet [
The architecture of our CNN.
The first four layers are convolution layers, and the remaining three are fully connected layers. A max-pooling layer follows each convolutional layer. The output of the first two full-connected layers is processed by Dropout. The output of the last layer is sent to Softmax and obtained the probability distribution of the classified labels. Apply the ReLU nonlinearity to the output of all convolutional and all fully connected layer except the last one.
1 2 3 4 5
Where learning_rate is stepsize, which is the learning rate, Beta1 and beta2 are the exponential decay rates of first-order moment estimation and second-order moment estimation, respectively. Momentum in other optimization algorithms is directly incorporated into first-order moment estimation in Adam. The range of moment estimation is
In this section, we will conduct three experiments. We completed the entire model training and experiments in the following experimental environment: Python 3.5.2, Tensorflow r1.4, CUDA Version 8.0.61, cudnn-8.0-windows10-x64-v6.0, windows10, and NVIDIA GTX 1060 6GB GPU. Experiment A is to verify the effectiveness of our method. Through the experiment, we can confirm that the use of the CNN network is effective for identity authentication based on mouse features and achieves good FAR and FRR values. Experiment B is to illustrate that our method requires very little time for authentication and can perform continuous user identity authentication after the user logs in to the terminal. Experiment C is to experiment with the “test-files” data provided by the dataset. Our experiment is designed to be a scene that needs to be faced with a real insider threat attack and takes measures to reduce FAR as much as possible. Experiments can show that our method can be applied in practical situations.
We believe that, in the insider threat detection scenario, the problem to be solved by the identity authentication should be judging whether the person currently using the terminal is consistent with the currently logged-in user. Therefore, we designate one user as a legal user and nine other users as illegal users (we total have ten users’ mouse dynamic data). This is a typical binary classification problem. To verify the effect of our method, we design the experiment as follows.
Use the image dataset that was generated in Section
According to the T0 and T0′, randomly extract the same amount from the other nine users’ subsets, to construct an illegal user training set T1 and an illegal test set T1′. And ensure that there is no intersection between T1 and T1′.
Take T0 and T1 as input, and use the CNN network constructed in Section
Appoint one of the ten users as the legal user, and the remaining nine are considered as illegal users. Repeat the above experiment steps and calculate the average FAR and FRR.
In this experiment, we made T0 + T0′ = 18000 and T1 + T1′ = 18000. Hence, the size of training set is T0 + T1 = 30600(85%), and the size of test set is T0′ + T1′ = 5400(15%). The above experiment was conducted for different mouse operation datasets (m=25, 50, 100, 500, or 1000), and the experimental results were shown in Table
The results of experiment A.
User | m=25 | m=50 | m=100 | m=500 | m=1000 | |||||
---|---|---|---|---|---|---|---|---|---|---|
FAR(%) | FRR(%) | FAR(%) | FRR(%) | FAR(%) | FRR(%) | FAR(%) | FRR(%) | FAR(%) | FRR(%) | |
| 16.556 | 10.333 | 7.37 | 6.37 | 4.889 | 2.444 | 4.519 | 0.185 | 1.778 | 0.259 |
| ||||||||||
| 9.519 | 10.815 | 6.222 | 3.889 | 2.704 | 2.556 | 2.444 | 0.037 | 0.519 | 0.037 |
| ||||||||||
| 18.37 | 12.259 | 8.148 | 9.63 | 4.222 | 2.963 | 1.259 | 0.444 | 0.852 | 0.296 |
| ||||||||||
| 9.407 | 7.037 | 11.593 | 7.63 | 4.963 | 1 | 0.519 | 0.148 | 0 | 0.259 |
| ||||||||||
| 10.667 | 9.185 | 7.556 | 4.148 | 5.111 | 1.852 | 1.037 | 0.074 | 0.444 | 0 |
| ||||||||||
| 6.074 | 8.185 | 9.63 | 1.556 | 2.444 | 1.112 | 0.852 | 0.37 | 0.519 | 0.037 |
| ||||||||||
| 7.704 | 8.704 | 4.407 | 4 | 1 | 2.519 | 0.778 | 0.519 | 0.222 | 0.037 |
| ||||||||||
| 7.259 | 9.222 | 3 | 5.852 | 0.63 | 3.481 | 1.704 | 0.37 | 0.259 | 0 |
| ||||||||||
| 12.481 | 10.333 | 6.963 | 5.593 | 1.704 | 2.667 | 1.037 | 0.148 | 0.481 | 0.185 |
| ||||||||||
| 9.26 | 9.704 | 3.667 | 4.889 | 1.704 | 2.185 | 3.778 | 0.037 | 0.704 | 0.074 |
| ||||||||||
| | | | | | | | | | |
Trend chart of average values of FAR and FRR.
In our opinion, the authentication time is composed of the time needed to collect the mouse features and the time required for classification. Compared with the time of collecting mouse features, the time of classifying mouse features using the trained model is almost negligible. Hence, our primary concern is the time required to obtain enough features. It can be seen from the data set analysis that the number of operations per second generated by the user when using the mouse normally has individual differences. The detailed data is shown in Figure
Number of mouse actions per second generated by users.
As can be seen from Figure
Average time and the minimum time required to acquire mouse actions.
m | Avg Time(s) | Min Time(s) |
---|---|---|
| 1.768 | 0.315 |
| ||
| 3.536 | 0.63 |
| ||
| 7.072 | 1.26 |
| ||
| 35.358 | 6.296 |
| ||
| 70.716 | 12.592 |
In an insider threat attack scenario, the insider is familiar with the internal system; it is possible to copy sensitive information in a very short time. Therefore, we believe that the authentication time should be within 10 seconds, and the time intervals between two authentications should also be controlled within 10 seconds. According to the above experimental results, we select the third model (m = 100) for the next experiment. The model was able to complete authentication in about 7 seconds on average and reached 2.94% FAR and 2.28% FRR. We think this can basically meet the needs of such insider threat attack detection.
We will use the test set (“test_files”) provided by the dataset for the experiment and then determine whether a session is legal data based on the labels (“public_labels.csv”) provided by the dataset. The user data in “test_files” is mapped according to the mapping method in Section
The test data set generated according to the mapping method (when m=100), in which each user folder contains legal sessions and illegal sessions.
test_files | legal | illegal | total_picutres |
---|---|---|---|
| 36 | 37 | 1659 |
| |||
| 23 | 43 | 1585 |
| |||
| 56 | 49 | 1306 |
| |||
| 45 | 70 | 1238 |
| |||
| 68 | 38 | 1173 |
| |||
| 30 | 20 | 1089 |
| |||
| 37 | 22 | 605 |
| |||
| 38 | 33 | 765 |
| |||
| 43 | 20 | 684 |
| |||
| 35 | 73 | 1076 |
| |||
| 411 | 405 | 11180 |
It would be fair to say that an insider threat detection system with low false reject rates may be tolerated, but an insider threat detection system with low false accept rates cannot be allowed. That is because if there is a false reject event, the results of the false reject event report can be assisted by various measures, such as on-site inspection, video surveillance, IDS, firewall, and SOC, and the false reject event can be verified. But if a malicious behavior of espionage is not be detected, that means the attacker has successfully achieved the goal and will incur an incalculable loss to the organization. That is to say, the system designing is to minimize FAR but with FRR-tolerant in the actual application scenario.
Therefore, we design this experiment according to the purpose of reducing the FAR as much as possible. A session represents the beginning of a mouse session, in which mouse actions are generated in chronological order. In the actual authentication scenario, authentication is performed every time sufficient actions are generated (we generate pictures according to the setting of m=100). We do not consider the current user to be a legitimate user until three consecutive authentications are legal. In other words, each authentication result is compared with the previous two authentication results, and a warning is issued as long as one of the authentication results is illegal. In this way, the actual authentication requires three pictures (m=100
The results of experiment C.
User | FAR(%) | FRR(%) |
---|---|---|
| 0 | 3.223 |
| ||
| 0 | 2.365 |
| ||
| 2.5 | 7.537 |
| ||
| 0 | 3.704 |
| ||
| 0 | 12.776 |
| ||
| 0 | 12.296 |
| ||
| 0 | 23.958 |
| ||
| 0 | 22.52 |
| ||
| 0 | 11.556 |
| ||
| 0 | 15.73 |
| ||
| 0.25 | 11.5665 |
In this section, we show a comparison of our experimental results against the results of previous works, which are shown in Table
Compare with previous works.
Source | FAR | FRR | Data required | Authentication time |
---|---|---|---|---|
[ | 2.46% | 2.46% | 2000 mouse actions | 1033 seconds |
(click/move/drag/stay) | ||||
| ||||
[ | 1.30% | 1.30% | 20 mouse clicks | 37.73 minutes(click) or 3.03 minutes(click/move) |
(click or click/move) | ||||
| ||||
[ | 8.74% | 7.69% | 32 mouse operations | 11.80 seconds |
(click/move) | ||||
4.69% | 4.46% | 160 mouse operations | 59.49 seconds | |
(click/move) | ||||
3.33% | 2.12% | 320 mouse operations | 118.14 seconds | |
(click/move) | ||||
| ||||
ours | 10.73% | 9.58% | 25 mouse actions | 1.768 seconds |
(click/move/drag/stay/scroll) | ||||
6.86% | 5.36% | 50 mouse actions | 3.536 seconds | |
(click/move/drag/stay/scroll) | ||||
2.94% | 2.28% | 100 mouse actions | 7.072 seconds | |
(click/move/drag/stay/scroll) | ||||
1.79% | 0.23% | 500 mouse actions | 35.358 seconds | |
(click/move/drag/stay/scroll) | ||||
0.58% | 0.12% | 1000 mouse actions | 70.716 seconds | |
(click/move/drag/stay/scroll) |
Generally speaking, as the number of mouse actions increases, both FAR and FRR show a downward trend, but the verification time increases accordingly.
Many previous studies have shown that mouse biobehavioral features can authenticate users. In this paper, we focus on the challenges of using mouse behavioral features for insider threat detection and propose a method that combines deep learning with mouse biobehavioral features for insider threat detection. This method can complete a user authentication task in a very short time while maintaining high accuracy. In the previous studies, one or several basic actions were selected from mouse five basic actions, and these actions were used to extract features to describe the unique behavioral characteristics of the user and then classified by using methods such as SVM, to realize user authentication. We use all five basic mouse actions to prevent the user’s unique behavior characteristics from being omitted. Then, we map the user’s mouse actions into pictures and automatically extract and model the user behavior pictures through the CNN network of deep learning. We use an open-source mouse behavior dataset that contains mouse action data from 10 users. The experiments have demonstrated the effectiveness of the proposed approach, with a false acceptance rate of 2.94%, a false rejection rate of 2.28%, and the authentication time of 7.072 seconds (when m = 100). These results show that this approach can be applied to detect insider threat attacks in specific scenarios.
The mouse dynamic data used to support the findings of this study were supplied by Balabit Mouse Dynamics Challenge dataset and available at
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported in part by Defense Industrial Technology Development Program JCKY2018212C020 and JCKY2016212C005 and in part by the National Natural Science Foundation (NSFC) under Grant CNS 61572115.