The application-layer distributed denial of service (AL-DDoS) attack makes a great threat against cyberspace security. The attack detection is an important part of the security protection, which provides effective support for defense system through the rapid and accurate identification of attacks. According to the attacker’s different URL of the Web service, the AL-DDoS attack is divided into three categories, including a random URL attack and a fixed and a traverse one. In order to realize identification of attacks, a mapping matrix of the joint entropy vector is constructed. By defining and computing the value of EUPI and jEIPU, a visual coordinate discrimination diagram of entropy vector is proposed, which also realizes data dimension reduction from N to two. In terms of boundary discrimination and the region where the entropy vectors fall in, the class of AL-DDoS attack can be distinguished. Through the study of training data set and classification, the results show that the novel algorithm can effectively distinguish the web server DDoS attack from normal burst traffic.
China Postdoctoral Science Foundation2016M590234Liaoning Provincial Department of EducationLG201611Shenyang Ligong University4771004kfs32Project of Applied Basic Research of Shenyang18-013-0-322017 Distinguished Professor Project1. Introduction
Distributed denial of service (DDoS) attack [1–3] is one of the most serious threats to today’s networks and has aroused great concern in various countries around the world [4, 5]. DDoS attack refers to consumption of the victim server resource and keeping the targets from providing services for legitimate users. DDoS attack is categorized into two classes: network-layer DDoS (NL-DDoS) attack and application-layer DDoS (AL-DDoS) [6, 7]. The early DDoS attack was a network-layer attack. In NL-DDoS, attackers send a large number of bogus packets towards the victim host with vulnerability exploitation that exists only on the network and transport layer. For example, IP spoofing uses fake connections to quickly consume the server’s bandwidth and hide its location; SYN Flood attackers keep sending unused connections requests with only SYN flags to the server, which would exhaust the bandwidth and resources of the server on a massive number of TCP half-connections. In NL-DDoS attack, the victim server or IDS can easily distinguish legitimate packets from DDoS packets [8]. In contrast, in AL-DDoS, perpetrators attack the victim server through a flood of legitimate requests. AL-DDoS attack does not saturate the bandwidth of the victim server through inbound traffic but through outbound traffic. Because AL-DDoSs behave very much like flash crowd, a legitimate behavior where a very large number of users simultaneously access a website, it is not easy to distinguish them. Consequently, due to universality and variety of web service in application layer, AL-DDoS may be stealthier and more dangerous than the traditional NL-DDoS attack.
Considering that impacts of the AL-DDoS attacks are becoming great, researchers at home and abroad have done a lot of related work in this field. Jung et al. [9] deeply analyzed the difference between the AL-DDoS and the flash crowd. When flash crowd occurs, a large number of address clusters recur, while a large number of new address clusters will appear with AL-DDoS attacks. The distribution of access addresses of flash crowd is uneven, while the distribution of access addresses is more uniform when DDoS attacks. Li and others [10] use a mixed measure to detect the shifting of the flow distribution, thus distinguishing between DDoS attacks and flash crowd. Yu and others [11] use the Sibson distance to measure the similarity between the flows and realize the distinction between the DDoS attack flow and the flash crowd flow. Oikonomou and Mirkovic [12] built a normal behavior model to distinguish between attackers and normal visitors. Lee et al. [13] proposed a detection algorithm of AL-DDoS attack based on information entropy. According to the information entropy of URL access rate, the algorithm can detect DDoS attack, but can not distinguish between DDoS attacks and flash crowd. Rathika et al. [14] put forward a method to detect attacks based on the average number of requests per unit of time for each session (ANRS). When the DDoS attack flow is small and occurred in the low-rate, the value of ANRS neither rises significantly nor descends. Xie and Yu [15] simulated the users’ access and page request behaviors with Markov chain model. According to the jump probability of each session, the behaviors model takes the degree of deviation as detection indicator.
In this paper, based on the characteristics of user access behavior in application layer, the attacks are classified in terms of access mode of URL. The matrix from IP to URL maps a joint entropy vector, which realizes data dimension reduction. Through defining and computing the EUPI and jEIPU, the coordinate discrimination diagram of entropy vector is constructed. Also by the region where the entropy vector falls in, the type of AL-DDoS attack can be discriminated. The simulation experiment shows that the algorithm can effectively distinguish between DDoS attacks from normal traffic.
2. Behavior of AL-DDoS Attack
On the web, user behaviors of accessing to a URL include three steps: visiting, lingering, and abandoning. Because different users are interested in the different content and pages, the legitimate access behaviors from users’ IP address to URL are random. In contrast, AL-DDoS attacks are usually launched by a specific tool or bot-nets, which make those collaboration-based behaviors more regular and nonrandomized.
According to the attacker’s selection of attacked URL, the AL-DDoS attack is divided into three categories. The first category is a fixed URL attack, in which the attacker initiates and determines one or a few of URLs. In order to achieve a better attack effect, the attacker often chooses to download a large picture or file request, which is the most common and easy to implement. For example, the SOAP replay attackers [16] send the soap request message repeatedly to a fixed URL, which exhaust the source of the victim server through outbound traffic.
The second category is a random URL attack, which scans the attack list of site and randomly selects URLs in every attack. The random URL attacks masquerade as normal web access behaviors in a low-density manner. It outwardly likes that a very large number of users simultaneously access few popular websites. Therefore, the random URL attack is stealthier than the fixed URL attack.
The third category is a traversal URL attack, which is similar to the web crawler. This mode of attack is in the form of web crawler, grabbing URL and selecting the URL request. The attacker starts from the home page URL and selects a URL as the next request. Then the process is repeated and cycled until no new URL is obtained.
3. A Novel Detection Algorithm against AL-DDoS3.1. Mapping from IP to URL
For the websites, users’ traffic that converges to a URL is a stream of successive URL requests. The number of requests belonging to the kth URL from the ith source IP address in a constant time window (interval) is xi(k). The matrix [xi(k)]N×K from source IP to URL is the map of web access. Let i be equal to N and k be equal to K. The map from IP to URL is shown in Table 1.
Thereinto, xi(k) is element of the matrix in k row and i column:(1)xi∈X=x1,x2,…,xN,where xi=(xi(1),xi(2),…,xi(K))∈RK and xi is K dimension vector.
3.2. Definition of EUPI and EIPU
The entropy of URL request per IP address (EUPI) is defined as(2)EUPIi∣xik:xi∈RK=-∑k=1KPiUxilogPiUxi,where PiU(xi) is the probability of kth URL request per source IP address. PiU(xi) indicates the percentage of different URL requests that are accessed by one IP address:(3)PiUxi=xik∑k=1Kxik.The entropy of IP address per URL (EIPU) is defined as(4)EIPUk∣xik:xi∈RK=-∑i=1NPiIxilogPiIxi,where PkI(xi) is the occurrence probability of ith source IP address which accesses the kth URL:(5)PkIxi=xik∑i=1Nxik.
Entropy, not only EUPI but also EIPU, represents the probability of occurrence of discrete random events. In other words, information entropy is low in an orderly system. On the contrary, the more disordered and random the system is, the higher the information entropy would become. So it is able to be a measure of the ordering degree of the system. Typically, legitimate users that access to the site have certain randomness. Users will access web pages based on their interests. According to statistics, 80% of users visit 20% of the hot web pages [17].
For a fixed URL attack whose IP address is represented as i, PiU(xi) would be significantly increased in the victim URLs, namely, to converge those access requests to the one or a few of attacked URLs. Therefore, the value of EUPI would reduce. For both a random URL and a traversal URL attack, EUPI of them would increase. In particular, because of the equal probability characteristic of traversal URL behavior, EUPI of a traversal URL attack would be close to maximum entropy of its value.
The space complexity of the algorithm in the article is O(n2), which increases with matrix dimension. Because the detection model of the article only needs to calculate the conditional entropy of the matrix, the time complexity of the algorithm is O(nlogn).
However, a situation, on which a lot of sudden hits and needs (e.g., hot events, festival online shopping, and centralized e-ticketing) will lead to a sharp increase of URL traffic in a certain time, must be considered. The situation is named flash crowd, whose burst traffic and high volume are the common characteristics of AL-DDoS attack. If simply relying on EUPI, there would have a higher false alarm rate. So in order to optimize the detection method and reduce false alarm rate, the entropy of IP address per URL (EIPU) is considered and applied to distinguish between AL-DDoS attacks and flash crowd. When a URL is accessed from many IP addresses, there is approximated uniformly distributed traffic of each IP source address under the event of flash crowd. Thus, with characteristics of EUPI and EIPU, a matrix transform of Z(·) is constructed, which is defined as the following formula:(6)ZxikN×N=z1,z2,…,zNT.
Thereinto, the transform is needed to satisfy the condition of N=K. zi stands for a joint entropy vector:(7)zi=zi1+jzi2=EUPIi∣xik+jEIPUk=i∣xik.
So we can obtain formulas of zi(1)=EUPI(i∣xi(k)) and zi(2)=EIPU(k=i∣xi(k)).(8)x11x12⋯x1k⋯x1N⋮⋯⋯⋯xi1xi2⋯xik⋯xiN⋮⋯⋯⋯⋯⋯xN1xN2⋯xNk⋯xNN⟶z11z12⋯⋯zi1zi1⋯⋯zN1zN2.
In the condition of N≠K, extended processing of matrix is used to satisfy equal conditions.
(a) N>K. When N is greater than K, the matrix needs to be extended to N order square matrix. The data from the K+1 to N column comes from the normal access traffic of training data set. The extended URL from the K+1 to N is named as virtual URL.(9)xijN×K⟶x11x12⋯x1Kx1K+1⋯x1N⋮⋯⋯⋮⋯⋯xi1xi2⋯xiKx1K+1⋯xiN⋮⋯⋯⋮⋯⋯xN1xN2⋯xNKx1K+1⋯xNN.(b) N<K. When N is less than K, the matrix needs to be extended to K order square matrix. The data from the N+1 to K row comes from the normal access traffic of training data set. The extended IP address from the N+1 to K is named as virtual IP.(10)xijN×K⟶x11x12⋯x1k⋯x1K⋮⋯⋯⋯xi1xi2⋯xik⋯xiK⋮⋯⋯⋯⋯⋯xN1xN2⋯xNk⋯xNKxN+11xN+12⋯xNk⋯xNK⋮⋯⋯⋯xK1xK2⋯xKk⋯xKK.Because the extended rows or columns is from normal access traffic of training data set, it does not affect the judgment of attack behaviors or abnormal traffic. For the algorithm of joint entropy vector, the extended processing of the matrix only adds the number of normal entropy vectors and does not change the distribution or the number of attack vector.
So the mapping from max(N,K)-dimensional space to two-dimensional space is expressed as(11)X⟶Z:zi∈Z=z1,z2,…,zmaxN,K.Thereinto, zi=(zi(1),zi(2)) is two-dimensional entropy vector.
3.3. Boundary Discriminant
Let T be training data set. T is defined as(12)T=z1,y1,z2,y2,…,zN,yN.Thereinto, yi∈Y={c1,c2,…,cK} is a set of types of access behaviors. In the instance, c1 stands for the fixed attack. c2 stands for the traversal attack. c3 stands for flash crowd. c4 stands for normal access.
The classification decision rule is defined as(13)argmaxcj∑Iyi=cj,j=1,2,…,K.I(yi=cj) is indicating function, which is defined as(14)Ix=1,x=true0,x=flase.In summary, detecting AL-DDoS attacks is transformed into classifying points that represent entropy vectors in the coordinate system. According to the respective characteristics of different attack behavior types, the implementation of entropy vector detection algorithm is as follows.
In terms of training data set T, the point number of each class is calculated.(15)Ki=∑TIyi=ciKi stands for the point number of each class.
The boundary discriminant rule is defined as(16)Vj=arg∑TIzik≤Vj=Kior Vj=arg∑TIzik≥Vj=Ki.According to the characteristics of AL-DDoS entropy vector, the coordinate plane is divided into different regions by the boundary of Vj, in which the region decides what class it belongs to. The coordinate discrimination diagram of entropy vector is shown in Figure 1.
The coordinate discrimination diagram of entropy vector.
From Figure 1, c1, c2, c3, and c4, respectively, stand for class of a fixed URL attack, a traversal URL attack, flash crowd, and normal access. On the basis of training data set, K1, K2, and K3, respectively, stand for the point number of three classes of attacks. The boundary V1 is able to be calculated as(17)V1=arg∑TIzi1≤V1=K1.The boundary V2 is able to be calculated as(18)V2=arg∑TIzi1≥V2=K2.The boundary V2 is able to be calculated as(19)V3=arg∑TIzi2≥V3=K3.Accordingly, AL-DDoS attack would inevitably cause the change of entropy. We take the entropy vector zi as an index. In terms of where zi falls in, it can be found whether AL-DDoS attack has occurred and what type AL-DDoS attack could be.
When multiple training data sets are collected, the optimum classification boundary value is able to be obtained. A precision rate of class discrimination is defined as Rpre:(20)Rpre=∑TIzik≤Vj&&yi=ci∑TIyi=cior Rpre=∑TIzik≥Vj&&yi=ci∑TIyi=ci.Suppose the total training data set is T:(21)T=T1,T2,…,TM.Thereinto, Ti={(z1i,y1i),(z2i,y2i),…,(zNi,yNi)} is subset, which is training data set in one sampling time Δt. The optimum classification boundary value Vopt is defined as(22)Vopt=argmax∑i=1M∑TiIzjk≤Vopt&&yji=ciIyji=cjor(23)Vopt=argmax∑i=1M∑TiIzjk≥Vopt&&yji=ciIyji=cj.
4. Simulation Experiment Verification and Analysis4.1. Experimental Conditions and Processes
Based on the open website log and MIT Lincoln Laboratory data sets [18], we use MATLAB software to simulate the access of the web server under the normal condition. Set up a website with 200 URLs, 10% of which are hot pages. There are about 800 visits per simulation time. Under normal circumstances, EUPI is shown in Figure 2.
EUPI under the normal access.
When the fixed URL attack occurs, the change of EUPI of Web server is shown in Figure 3. At 30th time units, the fixed URL attack started, which made entropy obviously decreased.
EUPI under fixed URL attack.
As shown in Figure 4, EUPI instantly increases when the random URL attack occurs suddenly. A large number of random URL request makes the traffic of the server more disorder and chaos, so the corresponding URL request entropy will accordingly increase.
EUPI under random URL attack.
For traversal URL attacks, if the attacks started at 30th simulation time, the request entropy would suddenly rise as shown in Figure 5. On these attacks the URLs are relatively random in a single time unit and the detection results are also consistent with the results of random URL attack.
EUPI under traverse URL attacks.
4.2. Analysis and Optimization of Approach
Through above experiments, it can be seen that when those attacks have occurred the URL request entropy instantly changes, which are very obvious such as fixed, random, and traverse URL attacks. So it shows that the change of the value of EUPI can effectively detect the abrupt changes of traffic that are caused by DDoS attack.
In order to discriminate between attacks and flash crowd on which a lot of sudden hits and needs (e.g., hot events, festival online shopping, and centralized e-ticketing) will lead to a sharp increase of URL traffic in a certain time. It is difficult to discriminate between attacks and flash crowd only through EUPI, so EIPU is considered and applied to detect whether AL-DDoS attacks exist.
Under normal circumstances, EIPU is shown in Figure 6.
EIPU under normal access.
From Figure 6, EIPU is between 5.16 to 5.28 under normal access.
At 30th time units, the URL attack started, which made EUPI obviously decreased in Figure 7. In order to detect whether AL-DDoS attack exists and discriminate what type the attack is, a simulation experiment of joint entropy vector algorithm is designed. The simulation parameter is shown in Table 2.
Table of simulation parameter.
Class
Number (Ki)
Proportion
Vj
The fixed
6 (attacks)
3% (source IP address)
zi(1)≤0.69
The traverse
20 (attacks)
10% (source IP address)
zi(1)≥4.89
Flash crowd
2 (legitimate)
1% (URL)
zi(2)≥4.55
Legitimate IP
194
87%
other
EIPU under URL attack.
The simulation experiment scenario is constructed by the interaction process from 200 IP source addresses to 200 URLs access addresses in the matrix of [xi(k)]N×N. There are 6 IP nodes of the fixed URL attack, 20 IP nodes of the traverse URL attack, and 194 legitimate nodes in simulation experiment, whose proportion of all nodes, respectively, is 3%, 10%, and 87%. There are 2 URLs on flash crowd, whose proportion of all URLs is 1%. In the experiment scenario, we use SOAP replay attacks to simulate DDoS. SOAP replay attackers from distributed nodes send the soap request message repeatedly to URLs, which exhaust the source of the victim server. The number of requests is in proportion to the attack strength and is taken as a measure of attack strength. The experiment data is from Lab website log and MIT Lincoln Laboratory data sets [18]. Matlab is used to integrate data and construct the matrix of [xij]N×N. The nodes are divided into four categories: c1, c2, c3, and c4, respectively, stand for class of a fixed URL attack, a traversal URL attack, flash crowd, and normal access. The corresponding access or attack behaviors are described in the paper (in Section 2). The server records the interactive process and saves it. And we refer the data format of MIT Lincoln Laboratory data sets. We change the distribution, attack strength, and proportion of the four kinds of nodes, and let it run many times. The data with labeled category is used as the training data. According to the boundary criterion (in Section 3.3), the optimal boundary value of the satisfied formulas (22) and (23) is obtained.
The simulation results is shown in the coordinate plane consisting of the EUPI as x-axis and the jEIPU as y-axis. From Figure 8, vector dots belonging to different attack types are distributed in different regions of the coordinate plane. According to the analysis of the third chapter and Figure 7, the position feature of the entropy vector can reflect the characteristics of the AL-DDoS attack.
The simulation distribution figure of entropy vector.
In order to verify the effectiveness of the algorithm on different cases, AL-DDoS attack strength is defined:(24)Astrej=1NjMj∑j=1∣yj=cjNj∑k=1Mjxjk.In terms of formula (15), Nj=Kj=∑TI(yj=cj). Mj only counts xj(k) that is not equal to 0:(25)Mj=∑k=1KIxjk≠0.Also, we can define relative strength of AL-DDoS attack:(26)Rstre=Astrejcj=attackAstreici=normal.In Figure 9, 500 sets of data T={T1,T2,…,T500} are collected. At the same time, the detection precision rate Rpre under three kinds of relative strength Rstre is compared. As can be seen from the Figure 8, with the increase of relative strength, the precision rate of the algorithm is be able to increase and reach more than 90%.
Comparisons of detection precision rate under different relative strength.
In Figure 9, the detection precision rate under the different relative strength is compared. The joint entropy vector can be used to quickly judge what class of DDoS attacks has happened. It also can effectively distinguish the web server DDoS attacks and flash crowd and improve the detection precision rate with the increase of relative strength.
5. Conclusions and Future Work
With the popularity of the network and the rapid growth of network traffic, the burst traffic caused by hot events and centralized access often leads to the service congestion and even paralysis. This burst of traffic is usually called “flash crowd.” Flash Crowd and DDoS attacks are essentially different. In this paper, based on URL access entropy, an anomaly detection algorithm is proposed. The novel method can effectively distinguish between AL-DDoS attacks, which has great reference value for further analysis of DDoS attack and its effective detection. As we have discussed, there is currently a lack of analysis on a big bot-net attacks, for example, hundreds of thousands of zombie machines. (The experiment scenario of this article is constructed by the interaction process from 200 IP source addresses to 200 URLs access addresses.) Focusing more closely on the application layer, we plan to further detect attacks on a larger network scale and more nodes in future. With improving experimental conditions and environment, in subsequent studies, we will further analyze the complexity of the defense technique under increasing the number of simulation nodes. As for future work, we also plan to extend the detection capabilities of the framework, namely, by supporting detection of other indicators, which can be used as a measure of the attack strength, such as the amount of traffic and the number of packets.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
Authors’ Contributions
Yuntao Zhao is the main contributor of this work, given that he originated the idea, provided the general design, and wrote most of the paper. Wenbo Zhang contributed to the implementation and testing of the simulation. All authors read and approved the final manuscript.
Acknowledgments
This work was supported by China Postdoctoral Science Foundation (2016M590234), General Project of Liaoning Provincial Department of Education (LG201611), the Open Foundation of Key Laboratory of Shenyang Ligong University (4771004kfs32), Postdoctoral Fund of Shenyang Ligong University, Project of Applied Basic Research of Shenyang (18-013-0-32), and 2017 Distinguished Professor Project.
ChangR. K. C.Defending against flooding-based distributed denial-of-service attacks: a tutorial20024010425110.1109/MCOM.2002.10398562-s2.0-0036804084ZhouW.JiaW.WenS.XiangY.ZhouW.Detection and defense of application-layer DDoS attacks in backbone web traffic20143836462-s2.0-8490141050710.1016/j.future.2013.08.002MirkovicJ.ReiherP.A taxonomy of ddos attack and ddos defense mechanisms2004342395310.1145/997150.997156RajeshS.Protection from application layer DDoS attacks for popular websites20135655555810.7763/ijcee.2013.v5.771WenS.JiaW.ZhouW.XuC.CALD: Surviving various application-layer DDoS attacks that mimic flash crowdProceedings of the 4th International Conference on Network and System Security (NSS '10)September 201024725410.1109/NSS.2010.692-s2.0-78650391156BeitollahiH.DeconinckG.ConnectionScore: A statistical technique to resist application-layer DDoS attacks2014534254422-s2.0-8490159152410.1007/s12652-013-0196-5BeitollahiH.DeconinckG.Tackling application-layer DDoS AttacksProceedings of the 3rd International Conference on Ambient Systems, Networks and Technologies, ANT 2012 and 9th International Conference on Mobile Web Information Systems, MobiWIS 2012August 2012Canada4324412-s2.0-8488022814210.1016/j.procs.2012.06.056ZhaoG.YuS.Detecting application-layer DDoS attack based on analysis of users’ behaviors2011282717719JungJ.KrishnamurthyB.RabinovichM.Flash crowds and denial of service attacks: Characterization and implications for CDNs and web sitesProceedings of the 11th International Conference on World Wide Web, WWW '02May 200229330410.1145/511446.5114852-s2.0-77953077374LiK.ZhouW.LiP.HaiJ.LiuJ.Distinguishing DDoS attacks from flash crowds using probability metricsProceedings of the 2009 3rd International Conference on Network and System Security, NSS 2009October 200991710.1109/NSS.2009.352-s2.0-72849128542YuS.ThapngamT.LiuJ.WeiS.ZhouW.Discriminating DDoS flows from flash crowds using information distanceProceedings of the 2009 3rd International Conference on Network and System Security, NSS 2009October 200935135610.1109/NSS.2009.292-s2.0-72849125444OikonomouG.MirkovicJ.Modeling human behavior for defense against flash-crowd attacksProceedings of the IEEE International Conference on Communications (ICC '09)June 200910.1109/ICC.2009.51991912-s2.0-70449513306LeeS.KimG.KimS.Sequence-order-independent network profiling for detecting application layer DDoS attacks201120111, article no. 502-s2.0-8496424775510.1186/1687-1499-2011-50RathikaR.DharanyaB.DeviKK.Detecting the DDOS Attacks in Application Layer2011145765784XieY.YuS.-Z.Monitoring the application-layer DDoS sttacks for popular websites200917115252-s2.0-6144913508210.1109/TNET.2008.925628ShiJ.-Q.FangB.-X.GuoL.WangL.-H.Hybrid-structured onion scheme against replay attack of MIX200930321262-s2.0-64249096476AndersonC.2016CITIC Press GroupMIT Lincoln Laboratory, 2000, http://www.ll.mit.edu/ideval/data/2000data.html