A Universal High-Performance Correlation Analysis Detection Model and Algorithm for Network Intrusion Detection System

In big data era, the single detection techniques have already notmet the demandof complex network attacks and advanced persistent threats, but there is no uniform standard to make different correlation analysis detection be performed efficiently and accurately. In this paper, we put forward a universal correlation analysis detection model and algorithm by introducing state transition diagram. Based on analyzing and comparing the current correlation detection modes, we formalize the correlation patterns and propose a framework according to data packet timing and behavior qualities and then design a new universal algorithm to implement the method. Finally, experiment, which sets up a lightweight intrusion detection system using KDD1999 dataset, shows that the correlation detection model and algorithm can improve the performance and guarantee high detection rates.


Introduction
(A) Background.Intrusion detection is a kind of technology which recognizes the intrusion by collecting and analyzing the protected system information [1].The crucial functions are monitoring Internet and computer system, discovering and distinguishing the intrusion behaviors or attempts, and generating intrusion alarm in real time [2].Intrusion detection can be thought as a binary technology that distinguishes whether the system state is "normal" or "attack" [3].The requirements of the intrusion detection system are the detection rate, that is, detection accuracy, followed by real time.Only in high detection speed, it can deal with massive data transmitted in Internet in time, get rid of missing information for low speed [4], cause false negatives and false positives, and minimize the losses brought by the intrusion.However, with the diversification in kind and increasing in number of network attack means, there is a key issue of low detection rate for intrusion detection system [5].In addition, the traditional intrusion detection system detects slowly and consumes large amounts of resources.With the quick development of network speed, it can not process the massive data transmitted in real time, resulting in a large increase of false positives rate and false negatives rate [6].Also these problems are becoming more and more serious.Detection rate and detection speed have become important indicators of intrusion detection system real-time requirements [7].How to build a high detection rate and detection speed intrusion system has become the focus of current research.Figure 1 gives an overview of a universal network intrusion detection framework.A key point in this figure is the use of deep analysis modules to process the associated events.
The deep analysis module plays an important role in intrusion detection system.We can see that the data of deep analysis modules come from two parts, one part is the result of detection on the upper layer and the other part is the raw data.Various data packets or events are processed by correlation algorithm, such as a correlation detection of event frequency and correlation detection for multiple parallel events and so on.The performance of correlation detection influences directly the detection rate and detection speed of intrusion detection system.However, the factors which influence the correlation detection result are various, so it is difficult to extract a unified correlation detection algorithm.Therefore, it is very necessary to find a universal correlation detection algorithm to increase detection rate and speed.
(B) Related Works.In order to detect anomalies in network, correlate parameters from different layers should be combined [8].Some papers focus on building a new hierarchical framework for intrusion detection as well as data processing based on the feature classification and selection [9][10][11].
(C) Contribution.In this paper, we propose a novel method to increase the detection rate of intrusion detection system and improve the detection speed.This method is a correlation analysis detection model based on data packet timing and behavior quality, aiming to solve the problem of versatility, consistency, and the integrity of packet detection.This method enables us to overcome the disadvantage of traditional intrusion detection system.
The rest of this paper is organized as follows.In Section 2, we analyze and compare the current common data packet correlation detection modes briefly.Section 3 presents the generating process of the algorithm in detail.In Section 4, we present the detection process for intrusion detection system and make some experiments.In the end, we conclude the paper in Section 5.

System Overview
In intrusion detection system, for single session, there will be false positives when describing threatening events only by single feature, in order to reduce the behavior features of attraction events accurately.However, some papers have pointed out that there is relevance among different attack events [26].If every session is analyzed separately, we can not identify the attack behavior exactly.While when we consider the related sessions correctly, we can identify an attack event completely.Nowadays, the majority of correlation detecting  methods of intrusion detection system are as follows: correlation detection of event frequency, correlation detection for multiple parallel events, correlation detection for multiple serial events, correlation detection based on source IP of the event, correlation detection based on destination IP of the event, correlation detection based on resource of events and destination IP, and correlation of session [27,28].There are more and more weaknesses in traditional correlation detection, such as low detection rate and poor accuracy.Thus, we put forward a unified correlation detection algorithm and build a data pack correlation detection model based on the data packet timing and behavior quality, aiming to solve the problem of the versatility, consistency, and integrity of intrusion detection.Figure 2 gives an overview of a data packet correlation detection mode.
According to the behavior features, there are two kinds of intrusion events: one is unconditional trigger and the other is conditional trigger.
(i) Unconditional trigger: correlated detecting based on the order of event occurring, including correlation detection of event frequency, correlation detection for multiple parallel events, and correlation detection for multiple serial events.
(ii) Conditional trigger: correlated detection based on behavior feature, including single behavior feature and complex behavior feature.

Correlation Analysis Detection Model
In this section, we present the correlation analysis detection model as follows.

Concept Definition
Definition 1 (distributed packet flow).The time series is given as  = ⟨ (i)  is the timestamp of the original event, that is, the time node of the event on the time series.
(ii)  is the behavioral characteristics of the original event.
Definition 3. The LAMBDA syntax is used to define the different relationships between primitive events:  0 ,  1 , . . .,   .
(i) Existence:  indicates whether the event   exists or not.
(ii) Parallel:  1 |  2 indicates that the events  1 and  2 are parallel relations.
(iii) Serial:  1 ;  2 indicates that the events  1 and  2 are serial relations.
Definition 4. The initial state of the system in the state diagram is  0 , the intermediate state is   , and  is the termination status.Each node in the following graph represents the current state of the event, and if there is only one transition condition, an arrow arc exists between the two nodes to indicate the transition of the event state.The mark on the arc represents the transition condition.

Unconditional Trigger Type (i) A Correlation Detection of Event
Frequency.This method detects that the original event that contains the threat behavior feature directly, and then it performs response processing.
The state transition relationship is The state diagram is shown in Figure 3.
(ii) Correlation Detection for Multiple Parallel Events.Some threat behaviors can be detected when multiple events occur at the same time.The state transition relationship is The state diagram is shown in Figure 4.
(iii) Correlation Detection for Multiple Serial Events.Some threat behaviors can be detected when multiple events occur in sequence.
The state transition diagram is shown in Figure 5.

Conditional Trigger Type
(i) According to the Single-Event Feature of the Event Correlation Detection.Some threat behaviors can be detected when multiple events simultaneously satisfy a certain behavioral characteristic.
The state transition relationship is The state transition diagram is shown in Figure 6.(ii) According to the Feature of Composite Behavior of the Correlation Detection.Some threat behaviors can be detected when multiple events simultaneously satisfy the composite behavioral characteristics.
The state transition relationship is The state transition diagram is shown in Figure 7.

Detection Algorithm Generation.
According to the data packet correlation detection mode and state diagram analysis, this paper proposes the data packet correlation detection model, which can be used to detect anomaly or original data packet in intrusion detection system to improve system detection rate and reduce detection time.

Correlation Detection Formula
Definition 1.  0 indicates the initial state of the detection system,   indicates that the system state is detected at any time,   represents the behavioral characteristics of the event, and   indicates the termination status of the detection system.
Definition 2.   (  |   ) indicates that the input behavior attributes   resulting in changes in system status, and the detection system termination status is   .
According to the above formula definition and state diagram, packet correlation detection formula can be as follows: (6)

Formula Proof
(1)  0 |   , that is, indicates that the detection system starts from  0 and ends at final state   after a single behavior of indicates that the detection system is parallel to multiple events  0 ,  1 , . . .,   and ends at final state   after a single behavior of   .

Detection Algorithm Proposed.
Based on the detection algorithm and the formula proposed above, this paper proposes the data packet correlation detection model.Figure 8 gives a formal representation of the data packet correlation detection model by state diagram.A key point in this figure is the use of deep analysis modules to process the associated events.
When the anomaly detection results or original data packets flow into deep analysis module in intrusion detection system, the data packet correlation detection model based on timing and behavior features can detect the events pointedly and thoroughly with the existing detection modes.Compared with traditional and single detection model, this algorithm increases detection speed and precision.Figure 9 is the detection framework of this algorithm.
In the first layer of this model, it detects timing characteristic (e.g., sniffing attacks in sequence) and behavior characteristic (e.g., as attacking continuously certain IP address).Deep analysis is in the second layer, which detects correlation detects combined with timing characteristic and behavior characteristic and aims at the detection of various persistent concealed attack behaviors.There is no need to detect data flows according to behavior feature in order and it can detect attack behaviors roundly, simplifying traditional detection modes.

Lightweight Intrusion Detection System Based on Correlation Analysis Detection Model
The flow diagram of deep analysis in lightweight intrusion detection system based on correlation analysis detection model is shown in Figure 10.The crucial part of the diagram is correlation detection.Firstly, it verifies ports and finds flows correlation table.Secondly, it uses correlation analysis detection model to detect and makes DPI and DFI identification.Finally, timing and behavior features are written into correlation table and these results are returned.
In this part of our paper, we will analyze and compare the traditional intrusion detection system and intrusion detection system based on correlation analysis detection model by detecting the data set that consists of 41 features in KDD1999.Then we compare the results of detection rate and detection time.
All verification work of this paper is based on KDD1999 data set.Before the experiment, we preprocess KDD1999 data set to meet experiment requirements.The environment of experiment is Window 7 operation system, and the hardware parameters are Quad-Core Intel Core i7 processor 3.2 GHz, 4096 MB RAM.  1.

Experimental Program.
After the analysis and process of KDD 1999 data set in the section above, the number of instances in training set is much larger.We know that, in order to make experiment operation more convenient, we should select data to reduce the number of instances.We sample the DOS, PROBE, R21, U2R, and NORMAL in training set randomly and respectively and ensure the consistency of these samples and the original sample.Then we combine these 5 new samples and form a new training data set.These 5 samples are the combination of NORMAL and DOS, the combination of NORMAL and PROBE, the combination of NORMAL and R21, the combination of NORMAL and U2R, and the combination of NORMAL, DOS, PROBE, R21, and U2R.The instance of 5 new training sets is 98630.These 5 training sets are flowed into the traditional intrusion detection system and intrusion detection system based on data packet correlation detection model.We compare the performances based on the same data resource by comparing detection rate, detection time, and so on.

Experimental Results and Analysis.
The experiment result is shown in Table 2; it is obvious to get that the detection rises sharply in intrusion detection system based on the data packet correlation detection model and the detection time decreases, promoting the efficiency of detection system.Therefore, for intrusion detection system based on the correlation analysis detection model in this paper, the detection rate of known and unknown attacks is high, improving the performance of intrusion detection system.

Conclusions
In this paper, we build a high-performance correlation analysis detection model, which aims to resolve the low detection rate and slow detection speed.
For the intrusion detection system, we put forward a kind of universal network intrusion detection framework.Meanwhile, we analyze and compare the current common correlation intrusion detection modes.Finally, we propose a data packet correlation detection model and algorithm based on the data packet timing and behavior characteristics.In the experiments, this kind of correlation detection model has improvement in performance than former.In this paper the present popular intrusion detection system has good practical value.

Figure 3 :
Figure 3: The state diagram of a correlation detection of event frequency.

Figure 4 :Figure 5 :
Figure 4: The state diagram of correlation detection for multiple parallel events.

Figure 6 :Figure 7 :
Figure 6: The state transition diagram of the single-event feature of the event correlation detection.

Table 1 .
Set. KDD1999 is a standard data set used for intrusion detection test.The KDD1999 dataset consists of a total of 5 million records, and it also provides a 10% training data set and a test data set.There are 494021 instances in training data set and 41 features in each instance, while there are 311029 instances in test data set.We divide the training data set into 5 parts: DOS attack, PROBE attack, R21 attack, U2R attack, and NORMAL.And NORMAL means normal data, excluding attack.There are 6 different kinds of attacks in DOS, 4 kinds of attacks in PROBE, 8 kinds of attacks in R21, and 4 kinds of attacks in U2R.The number of NORMAL instances is 97278 in the whole training data According to Table 1, there are 22 kinds of various attack types.As for KDD1999 test data set, we divide every type of attack into 2 parts according to the same principle: known attack and unknown attack.The known attack means the attack types that have appeared in the training data set, while the unknown attack means the attack types that have not appeared in training data set.There are 39 kinds of attack types totally in test data set: 10 types in DOS, 4 are unknown attacks; 6 types in PROBE, 2 are unknown attacks; 15 types in R21, 7 are unknown attacks; 8 types in U2R, 4 are unknown attacks.These attack types are listed in Table 1.There are 4166 instances in PROBE attack, and 2377 instances are known attack instances, accounting for 57.1% PROBE instances; 1789

Table 1 :
KDD1999 sample category distribution and partly attack type statistics.

Table 2 :
The experiment result.