A Model to Partly but Reliably Distinguish DDOS Flood Traffic from Aggregated One

Reliable distinguishing DDOS flood traffic from aggregated traffic is desperately desired by reliable prevention of DDOS attacks. By reliable distinguishing, we mean that flood traffic can be distinguished from aggregated one for a predetermined probability. The basis to reliably distinguish flood traffic from aggregated one is reliable detection of signs of DDOS flood attacks. As is known, reliably distinguishing DDOS flood traffic from aggregated traffic becomes a tough task mainly due to the effects of flash-crowd traffic. For this reason, this paper studies reliable detection in the underlying DiffServ network to use static-priority schedulers. In this network environment, we present a method for reliable detection of signs of DDOS flood attacks for a given class with a given priority. There are two assumptions introduced in this study. One is that flash-crowd traffic does not have all priorities but some. The other is that attack traffic has all priorities in all classes, otherwise an attacker cannot completely achieve its DDOS goal. Further, we suppose that the protected site is equipped with a sensor that has a signature library of the legitimate traffic with the priorities flash-crowd traffic does not have. Based on those, we are able to reliably distinguish attack traffic from aggregated traffic with the priorities that flash-crowd traffic does not have according to a given detection probability.


Introduction
Attackers may take the advantages of the principles 1 of distributed systems i.e., the internet , such as openness, resources sharing, assessability, and so on, to launch distributed denial of service DDOS attacks.The threats of DDOS attacks to the individuals are severe.For instance, any denial of service of a bank server implies a loss of money, disgruntling or losing customers.
According to the classification of the CERT Coordination Center CERT/CC , DDOS attacks are divided into three categories 2 : 1 flood i.e., bandwidth attacks, 2 protocol attacks, and 3 logical attacks.This paper considers flood attacks.DDOS flood attacks consume resources e.g., bandwidth by sending flood packets in order to shut down the target or significantly degrade its performance.The flood packets may be generated by hundreds or thousands of machines distributed all over the world.
A network-based intrusion detection system IDS monitors the traffic on its network as a data source 3 .In this regard, there are two main approaches.One is misuse detection and the other anomaly detection.Solutions given by misuse detection are primarily based on a library of known signatures to match against network traffic.Hence, unknown signatures from new variants of an attack mean 100% miss positives.As a matter of fact, the form in which an attack takes place is usually determined by a large number of details many of which are unknown.This is particularly true for DDOS attacks 4 .Hence, anomaly detectors play a role in DDOS detection 2, 3, 5-12 .Anomaly detectors cannot replace signature-based systems 2, 3 .From a practical view, therefore, the combination of a signature-based system and anomaly detector is worth noting 2 .
A traffic series is a packet flow.A packet consists of a number of fields, such as protocol, source IP, destination IP, ports, flag setting in the case of TCP or UDP , message type in the case of ICPM , timestamp, and length packet size .Each may serve as a feature of a packet for statistical detection purpose, see for example, 8, 13-15 .In addition, there are other available features of traffic, such as flow rate 16 , the number of connections 17 , and so on 6, 11, 12 .This paper takes traffic series in packet size traffic series for short as a monitored objective.
Usually, detections are expected to be adaptable to a wide range of network environments e.g., 7, 8, 11-17 .Nevertheless, it is obviously worth studying detections that are environment dependent.This paper studies detecting signs of DDOS flood attacks in the underlying network to use static-priority schedulers.
As known, two tough issues in detecting DDOS flood attacks are 1 reliable detection as can be seen from 2, 3, 5, 7, 9, 10 , and 2 distinguishing attack traffic from aggregated traffic 7, 9, 16 .The solution to the first issue is crucial to practical applications because false positives can lead to inappropriate responses that cause denial of service to legitimate traffic.In addition, it is the basis to find the solution to the second.
It is noted that flash-crowd traffic and DDOS flood traffic may have similar statistics from a network view.DDOS flood is malicious but flash crowds legitimate.Flash crowds happen when a huge number of users try to access the same server simultaneously for some specific events e.g., the NASA Pathfinder mission 16 .Because an attacker aims at attacking the target such that it denies services of all legitimate traffic, we assume DDOS flood traffic has all priorities in all classes.On the other hand, according to the nature of differentiated services, we assume that flash-crowd traffic does not have all priorities.Further, we suppose that the protected site is equipped with a sensor that has a signature library of the legitimate traffic with the priorities flood crowds do not have.In these cases, DDOS flood attack traffic can be distinguished, according to a given detection probability, from aggregated traffic with the priorities flash crowds do not have.
The rest of paper is organized as follows.Section 2 introduces the randomized traffic regulator for feature extraction of arrival traffic.Section 3 considers the principle.A case study is demonstrated in Section 4; discussions are given in Section 5 and conclusions in Section 6.

Traffic Regulator and Its Randomization
There are two major areas of traffic modeling.One is based on random processes, see for example, 6, 8, 18-30 .The other is deterministically modeling, for example, traffic regulator 18, 30-33 .We take traffic regulator to characterize traffic in this research.Definition 2.1 see 31, 33 .Let y t be the instantaneous rate of arrival traffic at time t.Then, the amount of traffic generated in the interval t 1 , t 2 is upper bounded by where σ and ρ are constants and t 2 > t 1 .This property is written as y ∼ σ, ρ that is called traffic regulator.Practically, traffic is considered in the discrete case on an interval-by-interval basis.Thus, we generalize Definition 2.1 as follows.
Definition 2.2.Let y t be the instantaneous rate of arrival traffic at t.Then, the amount of traffic generated in the nth interval n − 1 I, nI n 1, 2, . . ., N is upper bounded by where σ I, n , ρ I, n represents the traffic regulator in the nth interval, and I is a positively real number.
For the simplicity, denote F I, n σ I, n ρ I, n I.
Definition 2.3.Let y i p,j,k t be the instantaneous rate of all flows of class i with priority p going through server k from input link j at t.Then, the amount of y i p,j,k t generated in the nth interval n − 1 I, nI n 1, 2, . . ., N is upper bounded by F i p,j,k I, n .That is, nI y n−1 I y i p,j,k t ≤ F i p,j,k I, n .Definition 2.3 provides a feature of arrival traffic y i p,j,k t on an interval-by-interval basis.Theoretically, I can be any positively real number.In practice, however, I is selected as a finite positive integer.
Usually, F i p,j,k I, n / F i p,j,k I, q for n / q.Therefore, {F i p,j,k I, n } n 1, 2, . . . is a random process.Computing the sample mean of F i p,j,k I, n in terms of I yields A z, where z follows the standard Gaussian distribution.Thus, x 2 (t) x r (t)

Detection Probability and Miss Probability
Normally, a server serves for a number of connections clients concurrently.Figure 1 illustrates a server that serves for r connections of normal traffic and s connections of attack traffic.Aggregated traffic y t consists of normal traffic x t and attack one a t .In the case of I ≥ 10, one has Prob where 1 − α is called confidence coefficient.Let C i p,j,k α be the confidence interval with 1 − α confidence coefficient.Then,

3.2
The above expression exhibits that B is a template of F i p,j,k n .Thus, we have 1 − α % confidence to say that F i p,j,k n normally takes the value of B as its approximation with the variation less than or equal to On the other hand, For facilitating the discussion, two terms are explained as follows.Correctly recognizing an abnormal sign means detection and failing to recognize it miss.We explain the detection probability and miss probability by the following theorem.
In the case of P det 1 and the computation precision being 4, one has The diagram of our detection is indicated in Figure 2.

About False Alarm
False alarm means mistakenly recognizing a normal as abnormal.In this mechanism, detection criterion is F i p,j,k n > V α with P det 1 − α/2 and P miss α/2.Therefore, if 6 Mathematical Problems in Engineering F i p,j,k n > V α happens in the case that F i p,j,k n comes from normal traffic and an alert is fired, then this alert will be a false alarm, which has the probability α/2.Therefore, P false P miss . 3.9 In the case of P det 1, one has P false P miss 0.

Partly Distinguishing Attack Traffic
For the simplicity, suppose that traffic has two priorities p 1 and p 2 .We further suppose that flash-crowd traffic has the priority p 1 but does not have p 2 .Non-flash-crowd normal traffic has both p 1 and p 2 and DDOS flood traffic has both p 1 and p 2 .Then, F i p 2 ,j,k n > V α implies a detection that the traffic y i p 2 ,j,k t contains attack traffic of class i at the server k from the link j in the nth interval.The detection probability is 1 − α/2 .Denote y i p 2 ,j,k t x y i p 2 ,j,k t a y i p 2 ,j,k t , where x y i p 2 ,j,k t and a y i p 2 ,j,k t are normal traffic and attack traffic with p 2 , respectively.Note that x y i p 2 ,j,k t does not have the components of flash-crowd traffic.
Usually, a signature-based sensor is designed such that it has a library that contains signatures of attack traffic.In the present mechanism, however, we use a signature-based sensor that has a library to contain signatures of legitimate traffic with the priorities that flashcrowd traffic does not have.In this way, traffic whose signatures cannot be matched by this signature-based sensor may be taken as flood traffic or suspicious.Thus, if F i p 2 ,j,k n > V α occurs, the flows that are in y i p 2 ,j,k t and cannot be matched by the signature-based sensor are flood traffic of class i with p 2 at the server k from the link j in the nth interval.The reason to use a signature library of legitimate traffic instead of attack one is that attackers make efforts to create new variants of signatures but legitimate users usually do not.Figure 3 indicates the process of distinguishing attack traffic a y i p 2 ,j,k t from y i p 2 ,j,k t .

A Case Study
We consider fractional Gaussian noise FGN , which is an approximation model of traffic time series 18, 19, 21, 22, 35, 36 .The autocorrelation function of discrete FGN is given by where σ 2 Γ 2 − H cos πH /πH 2H − 1 is the strength of FGN 37 , l is an integer, Γ • is the Gamma function, and H ∈ 0.5, 1 the Hurst parameter.
In Figures 4, 5, 6, and 7, subscripts and superscripts of y and F are omitted.Consider TCP traffic series y t 40 ≤ y ≤ 1500 Bytes , indicating the number of bytes in a packet at t.By simulating FGN, we have a series with H 0.6 as shown in Figure 4.According to Definition 2.2, we obtain F I, n Bytes as shown in Figure 5 n, I 1, 2, . . ., 16 .Figure 6 indicates ξ n Bytes .The histogram of ξ is given in Figure 7.

Collecting traffic data
Computing x y i p2, j, k (t)

DiffServ Architecture: A Flexible Foundation
The above explanations only take the simple case of two priorities.In fact, there may be several priorities in a DiffServ domain, where applications are differentiated by their classes, and a certain portion of bandwidth is reserved for each class traffic 38 .Usually, all the flows in a class are assigned the same priority on each router.However, it is also available that the flows in a class may be assigned different priorities, and flows from different classes may have the same priority as can be seen from 32, Paragraph 5, Section 1, page 327 .This paper considers a class to be assigned different priorities.On the other side, the DiffServ architecture distinguishes two types of routers edge routers and core routers 32, Paragraph 2, Section 3, page 327 .Thus, a detector can be installed with either edge routers or core ones.Consequently, the DiffServ architecture provides a flexible foundation to design effective IDS to distinguish flood traffic from aggregated one.This paper is simply a beginning on this track.

Applicability
Mathematical properties of traditionally aggregated traffic time series have been studied deeply in a way, see for example, 18-22, 35 .However, math properties of aggregated traffic time series on a class-by-class basis for different priorities in the DiffServ domain are rarely seen.That is a main reason we use traffic regulator proposed by 33 because it is a tool particularly applicable in a flow-unaware environment.In addition to that, the traffic regulator is simple.Let T m and T c be the time for recording data and data processing, respectively.Suppose that we record a packet per 10 microsecond.Then, T m 10 −5 Q second , where Q is the length of the series involved in computations.In the above case study, Q 16 × 16 256.Thus, T m 2.56 ms.One the other hand, T c for a series of 256 length  on an average Pentium IV PC is neglectable in comparison with T m .This exhibits that the detection time is short enough to meet real-time use in practice.
It is worth noting that F i p,j,k n is a traffic pattern.In the present method, signs of DDOS flood attacks are identified by F i p,j,k n > V , meaning traffic pattern under attacking must be significantly different from that of normal traffic.As a matter of fact, if an attacker were able to attack a target such that it would be overwhelmed by creating the floods that well mimic or be near to normal traffic, the target would be overwhelmed at its normal state even if there were no flood packets.This is obviously impossible even if the attacker knows normal traffic pattern exactly before attacking.

Future Work
The previous presentation is quite academic in the following senses.The detection mechanism previously exhibited was discussed based on postulated traffic models without analyzing real-traffic data.For this reason, we shall work on the traffic models in this paper with real-traffic data for anomaly detections.In addition, we will derive a general mechanism to reliably identify and distinguish attack traffic from aggregated traffic for the flows of class i with all priorities.In addition to that, we shall explore statistical learning methods discussed in other fields, see for example, 39-49 .

Conclusions
This paper suggests a reliable method to detect signs of DDOS flood attacks in the DiffServ environment with static-priority schedulers.The present method can, with the combination of a signature-based sensor, partly but reliably distinguish attack traffic from aggregated traffic at a given server for a given link in a given time interval according to a predetermined detection probability.Given that static-priority schedulers are widely supported in current routers, it is our belief that this approach may be practical and effective in engineering.

Figure 5 :
Figure 5: Illustrations of traffic regulators in different intervals.