^{1}

^{2}

^{1}

^{2}

Anomaly detection is a problem with roots dating back over 30 years. The NSL-KDD dataset has become the convention for testing and comparing new or improved models in this domain. In the field of network intrusion detection, the UNSW-NB15 dataset has recently gained significant attention over the NSL-KDD because it contains more modern attacks. In the present paper, we outline two cutting-edge architectures that push the boundaries of model accuracy for these datasets, both framed in the context of anomaly detection and intrusion classification. We summarize training methodologies, hyperparameters, regularization, and other aspects of model architecture. Moreover, we also utilize the standard deviation of weight values to design a new regularization technique. Then, we embed it on both models and report the models’ performance. Finally, we detail potential improvements aimed at increasing models’ accuracy.

The provision of an effective and robust network intrusion detection system (NIDS) remains one of the key challenges of network security. Irrespective of technological advances in the field of NIDS, many potential solutions operate by utilizing signature-based and less-capable methods instead of an anomaly detection technique. Certain factors are linked to the hesitancy in switching, including the high cost associated with the high rate of false alarms, obstacles in the attainment of valid training data, and training data longevity. However, the reliability of conventional techniques has been proven to be limited, which subsequently leads to inaccurate and inefficient detection. In this regard, this challenge is linked to the creation of a widely accepted anomaly detection technique that is capable of reducing the limitations induced by current changes occurring within modern networks. Efficient, rapid, and effective techniques are required to deal with these issues. As such, it is important to improve effectiveness and accuracy in an in-depth manner. The analysis of NIDS must be contextually aware, and it should be more detailed in order to move toward high-level observation rather than abstract representation. Changes to behavioral attributes are required in order for this to be easily comprehensible for a network’s specific element, for example, protocols, versions of the operating system, individual users, and the diverse nature of data and different types of protocols available in modern advanced networks.

This introduces eminent levels of difficulty and complications, thus representing the most crucial challenge to tracing the deviation between abnormal and normal behaviors. Due to such difficulties, it remains difficult to establish an accurate standard, which increases the domain for probable exploitation or zero-day attacks.

Recent works have highlighted the application of machine learning (ML) and other existing methods such as support vector machines (SVMs), decision trees, and Naïve Bayes for the detection of network intrusions [

In a broad sense, ML applications have brought efficiency and accuracy in the identification of anomalies in network traffic. However, some deficiencies remain in these methods, such as data preprocessing requiring expert knowledge (e.g., finding important and relevant features from data) and the interaction of expert personnel being required to carry out the task. As such, this not only requires human expertise but also involves an error-prone task [

Due to these limitations, deep learning (DL) algorithms have the highest priority in modern research. DL is the advanced field of ML, which can address these limitations and can resolve problems related to shallow learning. Initially, researchers demonstrated that the layer-wise learning features of DL algorithms have either better performance or performance equivalent to that of shallow learning [

One of the main aspects of building a deep learning model is regularization. Regularization is an essential component of supervised learning; the most widely used regularization techniques are L1 and L2. The application of the penalty term is the key difference between these regularization techniques. L1 penalizes the loss function by adding the absolute value of the magnitude of coefficient and thus it is suitable for feature selection or reduction, while L2 penalizes the loss function by adding the squared magnitude of the coefficient so that it gives less weights to unimportant features [

The main drawback of these regularizers is the dependence on the model parameters so the relationship of weight matrix entries is ignored and only a signal value of weights is controlled.

To address this drawback, we design and implement a new regularization technique as a substituted option to L1 and L2 regularizers. The new regularizer considers the dispersion of the weight values, which is known as the standard deviation, unlike L1 and L2 regularizers, in which only individual values of weights are controlled without considering the relationship among weight matrix entries.

The merit of the proposed methods lies in the adoption of new architecture for abnormal behavior detection systems.

In this paper, we present two efficient models. The first model is based on feedforward neural network (FNN) and the second model is based on a deep variational autoencoder (VAE). To reduce the error on the given training set and avoid overfitting, we introduce a new regularization technique based on taking the standard deviation of the weight matrix to get the regularization term. The motivation behind this is to create an adaptive form of weight decay. After that, we embed it in both models to study the performance of both models. We also trained our models in both semisupervised and supervised framing. Then, we conducted an in-depth analysis of the detection efficiency using different evaluation metrics. Finally, we compare our results with other well-known existing ML techniques.

Our major contributions to the existing literature are provided as follows:

We present the design and implementation of two models based on VAE and FNN using a new regularization algorithm. Furthermore, we present the performance of both models on different benchmark datasets.

We analyze and compare the performances of the proposed models using different evaluation measures such as accuracy, true positive rate (TPR), and F-measure with other ML methods. The experimental results show the effectiveness of the proposed models for anomaly detection.

The rest of this paper is organized as follows. In Section

FNNs are composed of various functions in a graph-like data structure that describes the connectivity among functions. The composition of functions can be denoted in the following manner.

Suppose that we have three different functions _{1}_{2}, and _{3}, and we let _{3} (_{2} (_{1} (_{1} being the first layer, _{2} being the second layer, and so on. The number of functions in this composition is the depth of the neural network model. The final function, or the most outer function, is known as the output layer in neural network terminology.

During the training phase of the neural network model, we estimate a function

The training data consists of approximate examples with target output variables

Variational autoencoder (VAE) [

VAEs represent a very promising method, as these methods integrate variational interpretation with the employment of neural networks as function approximates in a way that searches for the approximate posterior distribution, which can be performed with stochastic gradient descent (SGD) [

As a result, VAEs have the ability to create new data once the model has been trained by sampling from this distribution. This is achieved by creating a hyperparametric description of the data that can be selected to have lower feature dimensions compared to the data. Therefore, the interpretation of this description can consider a squeezed characterization of the dataset. In the domain of anomaly detection, VAE represents a pleasing fit due to its inherent probabilistic nature.

Recent research on NIDS has extensively focused on the implementation of shallow learning and ML techniques such as SVM [

A model proposed by [

An emerging branch of ML which has received significant attention is DL. Recently, several studies have extensively employed DL in the field of network intrusion detection, which subsequently brought promising prospects to this realm. In unsupervised framings, DL methods and approaches used in the field of network anomaly detection for feature learning include restricted Boltzmann machines (RBMs), deep neural networks (DNNs), deep belief networks (DBNs), and autoencoders. Erfani et al. [

A DL method based on a DBN of RBMs having four hidden layers to reduce the feature sizes was proposed by Alrwashdeh and Purdy [

A novel method based on the combination of hybrid feature selection and two-stage metaclassifier for intrusion detection was proposed in [

The NSL-KDD essentially shares an identical structure with the old version “KDD Cup’99 dataset” and has five categories that include

Attack classes based on different attack types.

Attack class | Attack types |
---|---|

Dos | Back, land, Neptune, pod, smurf, teardrop, mailbomb, pro-cesstable, udpstorm, Apache2, worm |

Probe | Psweep, nmap, portsweep, Satan, mscan, saint |

U2R | Buffer-overflow, loadmodule, perl, rootkit, sqlattack, xterm, |

ps | |

R2L | Fpt-write, guess-passwd, imap, multihop, phf, spy, warez-master, xlock, xsnoop, snmpguess, snmpgetattack, httptun- nel, sendmail, named |

UNSW-NB15 is a recent and complex dataset collected by the Cyber Security Research Group (CSRG) at the Australian Centre for Cyber Security (ACCS) [

Initially, the amount of data was large (approximately 100 GB) and it was collected through TCP dump and Ixia PerfectStorm tools, which consist of normal and various modern attack types. The data is gathered around two periods of simulation of 15 and 16 hours, respectively. The total number of instances of this dataset is approximately 2.5 million, consisting of 42 attributes excerpted using Argus, Bro-IDS, and other advanced algorithms.

There are five feature categories in this dataset:

As is the case with the majority of ML problems, there was a significant amount of data preprocessing needed to successfully learn data representation for both UNSW-NB15 and NSL-KDD datasets. Both datasets are quite large for the problem and are split into both a training set and a test set. For columns that were string values, a label encoder was applied to transform the data into the unique integer representation.

For the sake of exploration, we pursue both the semisupervised and supervised framings of these datasets. The two methods utilized are FNN and deep VAE.

The FNNs are applied to the supervised context and modelled as multiclass classification with each type of attack being a different class. The hidden and input layers contain a swish activation function (_{1} = 0.9, and _{2} = 0.999. A large batch size is selected for training efficiency and to smooth out gradient updates, though recent publications have shown exceptional convergence (at least when it comes to certain problem domains) with online and local training scenarios [

On the other hand, when training in the semisupervised context, we use an autoencoder. The theory behind autoencoders is fairly straightforward given previous knowledge with DL algorithms. Autoencoders can also be used on a variety of other very interesting problems. Among these are denoising image data, dimensionality reduction, and even compression [

Regularization is a key component of supervised learning, so we embed our new regularization technique into both models as a substituted option to L1 and L2 regularizers to observe if any improvement occurs on learning data representation. The new regularizer considers the dispersion of the weight values, which is known as the standard deviation. The new regularization technique uses the standard deviation of the weights to the loss function instead of absolute values and squared magnitude values. Thus, it restrains the learning model from taking widespread values from the weight space. The mathematical formulation of the new regularizer is given in equations (^{th} row of the weight matrix.

The parameter

To evaluate our models, the performance of all classifiers is evaluated in terms of accuracy, false positive rate (FPR), true positive rate (TPR), precision, and F-Score, which are calculated based on the mathematical representation given in equations (

To carry out simulations, a machine having core-i7 processor with 16 GB of RAM and 64 bit linux operating system is used. The implementation is done in python 2.7 with

Training and testing time for each model.

Model | Dataset | Training time (sec) | Testing time (sec) |
---|---|---|---|

Proposed FFN | NSL-KDD | 700.5 | 436.01 |

Proposed VAE | NSL-KDD | 1203.7 | 545.3 |

Proposed FFN | UNSW-NB15 | 423.8 | 264.2 |

Proposed VAE | UNSW-NB15 | 689.8 | 360.1 |

Due to various optimization and cutting-edge methods, both models achieved results equally, near to, or better than those of previous state-of-the-art methods for this problem. Each model is trained on both NSL-KDD and UNSW-NB15 datasets by using train-test split method and explicitly tested on the test set provided by the dataset

High values of F-Score represent that the precision value of both models is also accurate and efficient to detect and find out anomaly from the network traffic. For our first model, the classifier converges to +95% on the validation data after approximately 50 epochs and starts to diverge after approximately 75. Even given the extremely accurate model, there remains room for further improvement.

The autoencoder converged to the same accuracy on the validation set (approximately 70 epochs) and diverged shortly thereafter. The autoencoders with embedded regularizers oscillated slightly more chaotically during training, likely due to both the difference in problem and feature set when compared to the classifier (along with different hyperparameters, regularizers, and activation functions).

Likewise, after embedding the new regularizer to both aforementioned models, we observed up to 1.7% improvement in average validation accuracy.

The feedforward model was trained for 100 epochs, and the average testing accuracy achieved through feedforward model is 96.7% and 94.7% for NSL-KDD and UNSW-NB15 datasets, respectively. The progressions for training and validation accuracy for FFN models on both datasets are shown in Figures

Training and validation accuracy on NSL-KDD dataset using FFN model.

Training and validation accuracy on UNSW-NB15 dataset using FFN model.

Comparison of the proposed models’ results with existing approaches used on the UNSW-NB15 dataset.

Model | No. of features | Classifier | Accuracy | FPR | TPR | Precision | |
---|---|---|---|---|---|---|---|

Proposed FFN | 42 | DNN | 94.7 | 1.04 | 94.24 | 86.76 | 89.75 |

Proposed VAE | 42 | Deep VAE | 93.3 | 0.93 | 95.21 | 87.9 | 90.2 |

Two-stage ensemble [ | — | Two-stage meta | 91.72 | 8.90 | 91.30 | 91.60 | — |

GALR-DT [ | 20 | DT | 81.42 | 6.39 | — | — | — |

NAWIR [ | 42 | AODE | 83.47 | 6.57 | 98.5 | — | — |

[ | 5 | RF | 81.61 | 4.40 | 81.6 | — | 79.5 |

Standard MLP [ | 42 | Softmax | 81.30 | 21.15 | — | — | — |

TSDL [ | 10 | Softmax | 89.13 | 0.74 | 63.27 | — | — |

NB | 82.07 | 18.56 | |||||

DT | 85.56 | 15.78 | |||||

ANN | 81.34 | 21.13 | |||||

LR | 83.15 | 18.48 | |||||

[ | 42 | EM | 78.47 | 23.79 | — | — | — |

Comparison of the proposed models’ results with existing methods on the NSL-KDD dataset.

Model | Accuracy | FPR | TPR | Precision | F-score |
---|---|---|---|---|---|

Proposed FFN | 96.7 | 0.64 | 95.86 | 88.2 | 90.96 |

Proposed VAE | 97.01 | 0.83 | 95.42 | 87.9 | 91.3 |

Two-stage ensemble [ | 85.79 | 11.7 | 86.8 | 88.0 | — |

DBN [ | 80.58 | 19.42 | 80.58 | — | 84.08 |

S-NDAE | 85.42 | 14.58 | 85.42 | — | 87.37 |

SVM [ | 86.22 | 89.30 | |||

Ensemble | 90.45 | 93.91 | |||

Multilayer | 91.98 | 94.36 | |||

DBN + SVM | — | — | 92.17 | — | 94.65 |

STL-IDS [ | 80.48 | — | 76.56 | — | 79.07 |

Similarly, for VAE models with embedded regularizer, we achieved average testing accuracy of 97.01% and 93.3% for NSL-KDD and UNSW-NB15 datasets, respectively. The progressions for training and validation accuracy for VAE models on both datasets are shown in Figure

Training and validation accuracy on NSL-KDD dataset using VAE model.

Training and validation accuracy on UNSW-NB15 dataset using VAE model.

From Figures

If we notice FPR and TPR from Tables

We introduce the design and the implementation of two models employing a new regularization technique that meets or exceeds previous bests on the NSL-KDD and UNSW-NP15 datasets for both classification and anomaly detection domains. The new models are tested on several datasets available in network security domain (i.e., NSL-KDD and UNSW-NB15). The simulation results represent that the performance of new models is better than those of other methods. However, there are many different ways in which one could alter model optimization to further increase test accuracy. Firstly, more could be done with data prepossessing and feature selection. For example, one could implement PCA to extract principle components, use a low variance filter, or find a different method to select important features. Additionally, with domain knowledge, one could engineer more features that may increase model effectiveness. Regarding the models, a key area could include hyperparameter tuning either manually or via an algorithm. Furthermore, experimenting with other regularizers, optimizers, and activation functions could also be worth investigation. Overall, our proposed models performed well and have resulted in satisfactory performance measures compared to existing state-of-the-art methods. For comparison purposes, Tables

Datasets used to support the findings of this study are included within the article.

This article does not contain any studies with human participants performed by any of the authors.

The authors declare that there are no conflicts of interest.

This work was supported by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, under Grant no. (D-432-611-1441). The authors, therefore, gratefully acknowledge DSR’s technical and financial support.