Network intrusion detection is one of the most important parts for cyber security to protect computer systems against malicious attacks. With the emergence of numerous sophisticated and new attacks, however, network intrusion detection techniques are facing several significant challenges. The overall objective of this study is to learn useful feature representations automatically and efficiently from large amounts of unlabeled raw network traffic data by using deep learning approaches. We propose a novel network intrusion model by stacking dilated convolutional autoencoders and evaluate our method on two new intrusion detection datasets. Several experiments were carried out to check the effectiveness of our approach. The comparative experimental results demonstrate that the proposed model can achieve considerably high performance which meets the demand of high accuracy and adaptability of network intrusion detection systems (NIDSs). It is quite potential and promising to apply our model in the large-scale and real-world network environments.
Network intrusion detection techniques are not trivial for cyber security to defend against malicious and suspicious activities [
Unfortunately, network intrusion detection techniques are still facing several enormous challenges and problems to detect anomalies effectively [
Meanwhile, as computer hardware, such as GPUs, owns increasingly computing capabilities, deep learning techniques achieve incredibly impressive results in several research areas. Convolutional neural networks (CNNs) specially have obtained remarkable performance in the field of computer vision, such as object recognition and image classification. The most powerful part of deep learning techniques is learning feature hierarchies from large amounts of unlabeled data. Therefore, deep learning techniques are quite promising to be applied in the network intrusion detection field.
Recently, various deep learning approaches have been applied to the network intrusion detection area, such as restricted Boltzmann machines (RBMs), deep belief networks (DBNs), stacked autoencoders (SAEs), and supervised learning with convolutional neural networks (CNNs). The existing work about the application of deep learning approaches for network intrusion detection is twofold. For one thing, deep learning techniques are utilized to learn or extract valuable features automatically from raw data, which is called feature extraction. These learned features are then fed into classifiers to further complete classification tasks. For another, specific features are firstly extracted according to domain expert knowledge. Deep learning algorithms mainly play roles of classifiers which take hand-crafted features as input data.
However, there are several problems or limitations with these studies. To begin with, obtaining large amounts of labeled network data and hand-crafted features is pretty costly, let alone other existing problems of customized features, as mentioned previously. In practice, though, getting lots of unlabeled raw network traffic data with little labeled data is relatively easy. Also, the training process of some deep learning methods, such as DBNs and SAEs, consists of unsupervised pre-training [
This research aims to construct a novel network intrusion detection model which combines the strengths of unsupervised feature learning and CNNs to extract or learn critical features automatically from large volumes of raw network packets. In this paper, we propose a network intrusion detection model by stacking dilated convolutional autoencoders which actually combines the concepts of self-taught learning [
The remainder of this paper is organized as follows. Section
In this section, we review a little recent research that is relevant to our work. Deep learning methods used in unsupervised feature learning tasks for network intrusion detection mainly include restricted Boltzmann machines (RBMs), autoencoders, deep belief networks (DBNs), stacked autoencoders, and various variants of these methods.
In most existing studies, the unsupervised deep learning methods for intrusion detection play roles of unsupervised feature extractors to learn abstract features from hand-crafted features. The abstract features are then taken as input data of a classifier, such as the softmax classifier. For example, Fiore et al. (2013) [
However, very few attempts have been made to use deep learning techniques to learn useful features or good representations from raw network traffics. Wang (2015) [
In this paper, we propose a deep learning approach, called dilated convolutional autoencoders (DCAEs), for the network intrusion detection model, which combines the advantages of stacked autoencoders and CNNs. In essence, the model can automatically learn essential features from large-scale and more various unlabeled raw network traffic data consisting of real-world traffics from botnets, web-based malwares, exploits, APTs (Advanced Persistent Threats), scans, and normal traffics.
In this section, we first introduce our model for network intrusion detection from an overall perspective. Subsequently, the deep learning method used in the model is described in detail. Finally, we briefly present construction of our datasets.
An overview of the training process of the DCAEs-based model is illustrated in Figure
Overview of the training process based on the DCAEs method.
Neural network structure of the fine-tuning process.
The architecture of dilated convolutional autoencoders (DCAEs) is pretty similar to classical autoencoders [
Structure of a dilated convolutional autoencoder.
The DCAEs can be used to construct a deep neural network by stacking multiple DCAEs, which is similar with SAEs [
In sum, the advantages of dilated convolutional autoencoders are as follows. First, the application of dilated convolutions enlarges the layers’ receptive fields to learn more global features. Compared with max-pooling, dilated convolutions can protect the input data from information loss. Second, the pretraining process of the DCAEs does not need labeled data, which is more useful in practical applications. Finally, the DCAEs have lesser parameters than fully connected neural networks, such as SAEs. Therefore, the DCAEs are more effective and time-saving than other unsupervised deep learning methods.
In this paper, we performed three kinds of classification tasks on two types of datasets. Table
Sample distribution of the CTU-UNB dataset and the Contagio-CTU-UNB dataset.
CTU-UNB dataset | Contagio-CTU-UNB dataset | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Traffic type | Training set | Validation set | Test set | Total | Traffic type | Training set | Validation set | Test set | Total | |
Normal | 41480 | 8123 | 8174 | 57867 | Normal | 10812 | 2053 | 2061 | 14926 | |
|
||||||||||
Bot | Neris | 8039 | 1567 | 1565 | 11171 | Botnet | 10681 | 2107 | 2047 | 14835 |
Rbot | 6073 | 1228 | 1221 | 8522 | Web-based malware | 10327 | 1928 | 1929 | 14184 | |
Virut | 18914 | 3680 | 3767 | 26361 | ||||||
Menti | 217 | 40 | 43 | 300 | Exploit | 7325 | 1396 | 1418 | 10139 | |
Sogou | 34 | 5 | 5 | 44 | APT | 1439 | 242 | 276 | 1957 | |
Murlo | 2013 | 364 | 358 | 2735 | ||||||
NSIS.ay | 4395 | 903 | 867 | 6165 | Scan | 6466 | 1274 | 1269 | 9009 | |
Total | 39685 | 7877 | 7826 | 55298 | ||||||
|
||||||||||
Total | 81165 | 16000 | 16000 | 113165 | Total | 47050 | 9000 | 9000 | 65050 |
Data preprocessing steps.
In this section, we first briefly introduce classification metrics for performance analysis. Experimental setup and environments are then described. Finally, we present and analyze some important experimental results.
Six evaluation metrics were utilized for performance analysis of our experiments. The six metrics are accuracy (AC), precision (
The experimental environments are shown in Table
Experimental environments.
Name | Configuration |
---|---|
OS | Ubuntu 16.04.1 LTS 64-bit |
CPU | Intel Core i5-4200M 2.50 GHz |
RAM | 8 G |
GPU | GeForce GT 740M |
Cuda | 7.5 |
We performed three types of classification tasks on the Contagio-CTU-UNB dataset and the CTU-UNB dataset to evaluate the performance of the proposed model. The classification tasks include 6-class classification using the Contagio-CTU-UNB dataset and 2-class and 8-class classification using the CTU-UNB dataset. Specifically, the 6-class classification involves normal data and five kinds of malware traffic data (i.e., botnet, web-based malware, exploit, APT, and scan). The 2-class classification contains normal data and botnet data from the CTU-UNB dataset. The 8-class classification consists of normal data and seven types of botnet data shown in Table
First, we evaluated the proposed model on three types of the classification tasks. In the 6-class classification task, we also compared our method with other deep learning approaches which have the similar structure and the training process. Furthermore, we evaluated the generalization ability of the proposed model through utilizing the well-trained model of the 2-class classification to detect unknown attacks which are not involved in the training set. Meantime, some important parameters of our model were analyzed.
Table
Accuracy of three kinds of classification tasks.
Metric | 6-class | 2-class | 8-class | ||
---|---|---|---|---|---|
SAE | DBN | DCAE | DCAE | DCAE | |
Accuracy (%) | 96.96 | 96.94 |
|
|
|
Precision, recall, and
Traffic type | SAE | DBN | DCAE | ||||||
---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
Normal | 96.09 | 96.60 | 96.35 | 96.12 | 96.12 | 96.12 | 98.59 | 98.64 | 98.62 |
Botnet | 96.38 | 97.41 | 96.90 | 96.38 | 97.61 | 96.99 | 98.69 | 99.66 | 99.17 |
Web-based Malware | 97.87 | 97.56 | 97.72 | 97.72 | 97.56 | 97.64 | 99.12 | 98.96 | 99.04 |
Exploit | 96.71 | 95.28 | 95.99 | 96.38 | 95.84 | 96.11 | 98.94 | 98.73 | 98.84 |
APT | 95.56 | 93.48 | 94.51 | 97.33 | 92.39 | 94.80 | 99.24 | 94.93 | 97.04 |
Scan | 98.50 | 98.50 | 98.50 | 98.58 | 98.50 | 98.54 | 99.84 | 99.61 | 99.72 |
|
|||||||||
Total | 96.96 | 96.96 | 96.95 | 96.95 | 96.94 | 96.94 |
|
|
|
Precision, recall, and
Traffic type |
|
|
|
---|---|---|---|
Normal | 99.77 | 99.57 | 99.67 |
Neris | 91.63 | 97.19 | 94.33 |
Rbot | 97.66 | 95.74 | 96.69 |
Virut | 98.76 | 97.48 | 98.12 |
Menti | 97.50 | 90.70 | 93.98 |
Sogou | 80.00 | 80.00 | 80.00 |
Murlo | 96.10 | 96.37 | 96.23 |
NSIS.ay | 99.07 | 98.62 | 98.84 |
|
|||
Average | 98.44 | 98.40 | 98.41 |
Confusion matrix of the 6-class classification task.
Normal | Botnet | Web-based malware | Exploit | APT | Scan | |
---|---|---|---|---|---|---|
Normal |
|
10 | 10 | 8 | 0 | 0 |
Botnet | 3 |
|
1 | 1 | 0 | 2 |
Web-based malware | 16 | 4 |
|
0 | 0 | 0 |
Exploit | 8 | 9 | 0 |
|
1 | 0 |
APT | 2 | 3 | 5 | 4 |
|
0 |
Scan | 0 | 1 | 1 | 2 | 1 |
|
Confusion matrix of the 8-class classification task.
Normal | Neris | Rbot | Virut | Menti | Sogou | Murlo | NSIS.ay | |
---|---|---|---|---|---|---|---|---|
Normal |
|
16 | 3 | 10 | 0 | 0 | 6 | 0 |
Neris | 11 |
|
7 | 23 | 0 | 0 | 1 | 2 |
Rbot | 1 | 37 |
|
8 | 1 | 1 | 2 | 2 |
Virut | 5 | 74 | 13 |
|
0 | 0 | 2 | 1 |
Menti | 0 | 1 | 0 | 0 |
|
0 | 2 | 1 |
Sogou | 0 | 0 | 0 | 0 | 0 |
|
1 | 0 |
Murlo | 0 | 3 | 5 | 3 | 0 | 0 |
|
2 |
NSIS.ay | 2 | 8 | 0 | 2 | 0 | 0 | 0 |
|
Table
ROC curves for various classification tasks.
Accuracy comparison of various parameter settings.
The AUC value of binary classification was equal to 1.00, which suggested that our method performed extremely well in the binary classification. Meanwhile, The AUC value of 6-class and 8-class classification was 0.99. It is almost certain that our method produces high true positives and low false alarms.
Additionally, after finishing the 2-class classification task which detected botnet data, the well-trained model was saved in order to evaluate the generalization ability of the proposed model. A new test set containing attack types of the Contagio-CTU-UNB dataset was then constructed to evaluate the generalization of features learned from the CTU-UNB dataset. As shown in Table
Sample distribution of new test set for evaluating generalization ability.
Traffic type | Attack | Normal | ||||
---|---|---|---|---|---|---|
Botnet | Web-based malware | Exploit | APT | Scan | ||
Data size | 2934 | 1929 | 1418 | 276 | 1269 | 8174 |
7826 |
Precision, recall, and
Traffic type |
|
|
|
---|---|---|---|
Normal | 82.25 | 99.56 | 90.08 |
Attack | 99.41 | 77.56 | 87.14 |
|
|||
Average | 90.65 | 88.80 | 88.64 |
Evaluation results on different numbers of convolutional layers and two types of activation functions.
Number of layers | Activation function | Accuracy (%) | Run time (minutes) | Parameter setting |
---|---|---|---|---|
|
Sigmoid | 98.83 | 102.83 | filter_dilation = |
filter_shape = |
||||
ReLU |
|
57.09 | feature_maps = |
|
full_units = 640. | ||||
|
||||
|
Sigmoid | 98.78 | 217.25 | filter_dilation = |
filter_shape = |
||||
ReLU | 98.61 | 60.64 | feature_maps = |
|
full_units = 960. | ||||
|
||||
|
Sigmoid | 98.80 | 178.26 | filter_dilation = |
filter_shape = |
||||
ReLU | 98.51 | 48.65 | feature_maps = |
|
full_units = 690. |
In the rest of this section, the parameter comparison of experiments used controlling variable method on the 6-class classification task. The controlling variable method means the variable or parameter studied has different value while other parameters are set to optimal values. Figure
Table
In addition, we also presented evaluation results on adding a max-pooling layer and various activation functions of the fully connected layer, as shown in Table
Evaluation results on adding a max-pooling layer and various activation functions of the fully connected layer.
Layer | Activation function | Accuracy (%) | Run time (minutes) |
---|---|---|---|
Max-pooling | Sigmoid | 98.42 | 59.91 |
ReLU | 97.97 | 90.41 | |
|
|||
Fully connected | Tanh | 98.97 | 57.09 |
Sigmoid | 98.82 | 101.83 | |
ReLU | 98.62 |
|
We also found that increasing or reducing the length of input vector had little change for the accuracy rate, as shown in Table
Evaluation results on the different lengths of input vectors.
Length (bytes) | Accuracy (%) | Run time (minutes) | Parameter setting |
---|---|---|---|
200 |
98.40 | 19.46 | filter_dilation = |
400 |
98.43 | 37.90 | filter_dilation = |
500 |
97.93 | 13.87 | filter_dilation = |
600 |
97.99 | 19.08 | filter_dilation = |
800 |
97.91 | 16.38 | filter_dilation = |
1000 |
|
|
filter_dilation = |
1500 |
98.28 | 52.93 | filter_dilation = |
As stated previously, the purpose of this study was to learn significant features automatically and efficiently from unlabeled raw network traffic data using deep learning techniques. In general, this study shows that the proposed model can achieve high performance by learning feature representations from large volumes of unlabeled training samples. The training samples based on the session are constructed from parts of header and payload information of network packets. We found that the proposed deep learning method obtained quite good results on various classification tasks. These results provide insights into the feature representations learned from raw traffics. It is certain that these feature representations are effective to identify various malicious network traffics and generate low false alarms. The experimental results also show that more layers of convolutional autoencoders fail to significantly enhance performance as expected. It is possibly because the the number of hidden units of the first convolutional layer is enough to learn useful feature representations. In addition, diverse activation functions have a great effect on training time. The results suggest that the ReLU function is a good choice for the proposed model to reduce run time. Moreover, an additional max-pooling operation is not necessary for our proposed model compared to traditional convolutional autoencoders.
The limitation of our proposed model is that the training process takes a comparatively long time. However, it can be solved by cross-GPU parallelization technique [
In this paper, we proposed a novel network intrusion detection model based on dilated convolutional autoencoders. The proposed deep learning method can automatically learn significant feature representations from large volumes of unlabeled raw traffic data. The Contagio-CTU-UNB dataset and the CTU-UNB dataset are created from various malware traffic data. Three kinds of classification tasks are performed to evaluate the performance of the proposed model. We also compared our deep learning method with other similar approaches. The effects of various important hyperparameters are further analyzed. The experimental results show the superiority of our model by effectively detecting complex attacks from lots of unlabeled data. The remarkable performance we achieved meet requirements of large-scale and real-world network environments by combining high-performance computing techniques.
The authors declare that they have no conflicts of interest.
This work is supported by the National Natural Science Foundation of China under Grants nos. 61379145 and 61105050.