Bound Maxima as a Traffic Feature under DDOS Flood Attacks

1 Jiangsu Electronic Information Products Quality Supervision & Inspection Research Institute, China National Center for Quality Supervision and Test for the Internet of Things Products & Systems, No. 100, Jin-Shui Road, Wuxi 214073, China 2 School of Information Science & Technology, East China Normal University, No. 500, Dong-Chuan Road, Shanghai 200241, China 3 Department of Computer and Information Science, University of Macau Av. Padre Tomas Pereira, Taipa, Macau SAR, P.R., China 4 College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China


Introduction
People nowadays are heavily dependent on the Internet that serves as an infrastructure in the modern society.However, distributed denial-of-service DDOS flood attackers remain great threats to it.By consuming resources of an attacked site, the victim may be overwhelmed such that it denies services it should offer or its service performances are significantly degraded.Therefore, intrusion detection system ISD for detecting DDOS flood attacks has been greatly desired.
There are two categories regarding IDSs.One is misuse detection and the other anomaly detection.Attacking alerts given by misuse detection is primarily based on a library of known signatures to match against network traffic, see, for example, 1-5 .Thus, attacking with unknown signatures from new variants of an attack can escape from being detected by signature-based IDSs with the probability one, see, for example, 6 , making such a category of IDSs at the protected site irrelevant.However, based on anomaly detection, abnormal variations of traffic are identified as potential intrusion so that this category of IDSs are particularly paid attention to for identifying new attacking, see, for example, 7-13 .For the simplicity, in what follows, the term IDS is in the sense of anomaly detection.Noted that the detection accuracy is a key issue of an anomaly detector, see, for example, 14, 15 .To be effective, IDSs require appropriate features for accurately detecting an attack and distinguishing it from the normal activity as can be seen from 10, Section IV .Hence, developing new traffic features for anomaly detection is essential.
The reference papers regarding traffic features for IDS use are wealthy.For example, 86 features for clustering normal activities are discussed in 9 .Note that a selected feature is methodology-dependent.In this regard, 16 uses packet head data.The paper 17 utilizes autocorrelation function of long-range dependent LRD traffic time series in packet size and 18 employs the Hurst parameter.Scherrer et al. adopt scaling properties of LRD traffic 19 .
The traffic models used in 17-23 are in the sense of fractal.In general, fractal models might be somewhat complicated in practical application in engineering in comparison with the traffic feature proposed in this paper.Recall that there are two categories in traffic modeling 24, Section XIV .One is statistical modeling e.g., LRD processes .The other bounded modeling, which has particular applications to modeling traffic at connection level, see, for example, 25-30 .Bounded models, in conjunction with a class of service disciplines, are feasible and relatively efficient in applications, such as connection admission control CAC in guaranteed quality-of-service QoS .In addition, such models are simple in mathematics and relatively easy to be used in practice in comparison with fractal models.This paper aims at providing a new traffic feature for anomaly detection based on bounded modeling of traffic.The main contributions in this paper are as follows.
i We present the histogram of the maxima of bounded traffic rate on an interval-byinterval basis as a traffic feature for exhibiting abnormal variation of traffic under DDOS flood attacks.
ii The experimental results exhibit that the maxima of rate bound of attack-contained traffic is statistically greater than that of attack-free traffic drastically.in Section 3. Experimental results are demonstrated in Section 4, which is followed by discussions and conclusions.

Experimental Data
While DDOS attacks continue to be a problem, there is currently not much quantitative data available for researchers to study the behaviors of DDOS flood attacks.The data in the 1998-1999 DARPA http://www.ll.mit.edu/IST/ideval are valuable but rare for public use though there are points worth further discussion 31 .Those data were obtained under the conditions of realistic background traffic and mean examples of realistic attacks 32, 33 .The used data sets in 1999 contain more than 200 instances and 58 attacks types, see, for details 34 .Two data sets are explained below.

Set One: Attack-Free Traffic (1999 Training Data-Week 1)
The first set of data containing 5 traces.We name them by OM-W1-i-1999AF i 1, 2, 3, 4, 5 , meaning Outside-MIT-week1-i-1999-attack-free.Table 1 indicates the actual times at which the first packet and last one were extracted for each trace.

Traffic Bounds
In this subsection, we brief the deterministic bounds for accumulated traffic and traffic rate with the help of demonstrations using traffic traces OM-W1-1-1999AF and OM-W1-1-1999CF.Let x t i be the series, indicating the number of bytes in the ith packet i 0, 1, . . . of arrival traffic at time t i .Then, x i is a discrete series, indicating the number of bytes in the ith packet of arrival traffic.Figure 1 shows a plot of x i for the first 1024 points of OM-W1-1-1999AF.
According to 27, 43 , an upper bound of arrival traffic x i is given below.
Definition 2.1.Let x i be the arrival traffic function.Then, is called traffic upper bound of x i over the duration of length I.

Histogram of Maxima of Traffic Rate Bound: A Feature for Identifying Abnormal Variation of Traffic under DDOS Attacks
In this section, we first introduce the time series of traffic rate bound.Then, we establish the maxima of traffic rate bound.Finally, we achieve the histogram of the maxima of traffic rate bound.The demonstrations with the experimental data are used for facilitating the discussions.

Traffic Bound Series
Theoretically, I can be any positively real number.In practice, however, I is selected as a finite positive integer.Fix the value of I and observe traffic bounds in the interval n−1 I, nI , n 1, 2, . . ., N.Then, we express traffic bounds as a function in terms of the interval index n.
Considering the index n, we express traffic upper bound by F I, n , which is a series.Note that x i is a stochastic series and so is F I, n .That is, F I, m / F I, n for m / n.We term F I, n traffic upper bound series.Similarly, we use GAMA I, n to represent traffic rate bound series.Figure 4 shows the traffic upper bound series.Figure 5 plots the rate bound series.
Since GAMA I, n is random, identification in a single interval is not enough.We use Figure 6 to explain this point of view.From Figure 6, we see that the rate bound of attackcontained traffic is greater than that of attack-free traffic in some intervals, for example, in the second and third intervals.However, it is less than the rate bound of attack-free traffic in some intervals, for example, in the first and fourth intervals.Therefore, we will study the issue how the bound series of traffic rate statistically varies under DDOS flood attacks.For this reason, we study the maxima of traffic rate bound.To investigate this phenomenon quantitatively, we need a measure to describe the similarity or dissimilarity between the pattern of Hist MGAMA F n and that of Hist MGAMA C n , which will be explained in the next subsection.

Correlation Coefficient Used as a Similarity Measure for Pattern Matching
There are many measures to characterize the similarity or the dissimilarity of two patterns in the field of pattern matching, see, for example, 44, 45 .Among them, the correlation where corr implies the correlation operation.
It is known that 0 ≤ Corr FC ≤ 1.The larger the value of Corr FC the more similar between the pattern of Hist MGAMA F n and that of Hist MGAMA C n .Mathematically, the case of Corr FC 1 implies that the pattern of Hist MGAMA F n is exactly the same as that of Hist MGAMA C n .On the contrary, Corr FC 0 means that the pattern of Hist MGAMA F n is totally different from that of MGAMA C n .From the point of view of engineering, however, the extreme case of either Corr FC 1 or Corr FC 0 does not make much sense due to errors and uncertainties in measurement and digital computation.In practical terms, one uses a threshold for Corr FC to evaluate the similarity between two.The concrete value of the threshold depends on the requirement designed by researchers that but it is quite common to take 0.7 as the smallest value of the threshold for the pattern patching purpose.Suppose that we consider 0.8 as the threshold value.Then, we say that the pattern of Hist MGAMA F n is similar to that of Hist MGAMA C n if Corr FC ≥ 0.8 and dissimilar otherwise.By computing, we obtain Corr FC 0.01751 for OM-W1-1-1999AF and OM-W2-1-1999CF, implying the pattern of Hist MGAMA F n considerably differs from that of Hist MGAMA C n as indicated in Figure 8 c .We will further demonstrate this interesting phenomenon in the next section.

Experimental Results
The value of Corr FC for OM-W1-1-1999AF and OM-W2-1-1999CF has been mentioned above.In this section, we illustrate experimental results describing Corr FC for OM-W1-2-1999AF and OM-W2-2-1999CF.The plots to illustrate Corr FC for OM-W1-3-1999AF and OM-W2-3-1999CF, OM-W1-4-1999AF and OM-W2-4-1999CF, OM-W1-5-1999AF and OM-W2-5-1999CF and are listed in the appendices.Note that the values of Corr FC for other three pairs of test traces, see Figures 16 c ,  20 c , and 24 c , also exhibit that the pattern of Hist MGAMA F n is noticeably different from that of Hist MGAMA C n .We summarize the values of Corr FC of all five pairs of traces in Table 3, which shows that Corr FC < 0.2 for all pairs of test traces.

Discussions and Conclusions
The maxima of rate bound of attack-contained traffic is not always higher than that of attackfree traffic, see Figure 7. Statistically, however, it is higher than that of attack-free traffic  3 indicate that the pattern of Hist MGAMA F n is obviously different from that of Hist MGAMA C n .Thus, the results in this paper suggest that the histogram of the maxima of traffic rate bound may yet be a traffic feature to distinctly identify abnormal variation of traffic under DDOS flood attacks.
In comparison with fractal model of traffic as discussed in 18, 19, 43 , the present feature has an apparent advantage.Recall that statistical models like LRD processes, see, for example, 18, 19 , are usually for traffic in the aggregate case, but there is lack of evidence to use them to characterize statistical patterns of real traffic at connection.As a matter of fact, finding statistical patterns of traffic at connection may be a tough task.To overcome difficulties in describing traffic at connection level, bounded modeling is introduced 25-29 .Thus, if we let x j,k t be all flows going through server k from input link j and let F j,k I be the maximum traffic constraint function of x j,k t , the present analysis method of traffic is technically sound and usable for x j,k t but fractal models may not.Since the bounded models of traffic are mainly used at connection level in some applications, such as real-time admission control, it is clear that the present traffic feature for identifying abnormal variation of traffic under DDOS flood attacks can be extracted at early stage of attacks.

Figure 15 :
Figure 15: Series of the maxima of traffic rate bound.a Maxima of GAMA I, n for OM-W1-3-1999AF.b Maxima of GAMA I, n for OM-W2-3-1999AC.
Hist MGAMA F n and Hist MGAMA C n as the histograms of MGAMA F n and MGAMA C n , respectively.Then, they represent empirical distributions of MGAMA F n and MGAMA C n .Figures 8 a and 8 b indicate the Hist MGAMA F n and Hist MGAMA C n for OM-W1-1-1999AF and OM-W1-1-1999CF, respectively.From Figure 8 c , we see that the pattern of Hist MGAMA F n considerably differs from that of Hist MGAMA C n .

Figure 19 :
Figure 19: Series of the maxima of traffic rate bound.a Maxima of GAMA I, n for OM-W1-4-1999AF.b Maxima of GAMA I, n for OM-W2-4-1999AC.

Figures 9 a
and 9 b are the plots of the first 1024 points of OM-W1-2-1999AF and OM-W2-2-1999CF, respectively.Figures 10 a and 10 b indicate the series of traffic rate bound for OM-W1-2-1999AF and OM-W2-2-1999CF for n 0, 1, . . ., 16 with I 64, respectively.Figures 11 a and 11 b demonstrate the maxima of rate bound for both traffic traces for n 0, 1, . . ., 128.Figures 12 a and 12 b show the histograms of the maxima of traffic rate bound for both traces.
Figure 12 c  gives the comparison between two.By computation, we have Corr FC 0.163261, meaning that the pattern of Hist MGAMA F n considerably differs from that of Hist MGAMA C n for OM-W1-2-1999AF and OM-W2-2-1999AC.

Table 2 :
Data set for attack-contained traffic.