Adaptive DDoS attack detection method based on multiple-kernel learning

Distributed denial of service (DDoS) attacks have caused huge economic losses to society. They have become one of the main threats to Internet security. Most of the current detection methods based on a single feature and fixed model parameters cannot effectively detect early DDoS attacks in cloud and big data environment. In this paper, an adaptive DDoS attack detection method (ADADM) based on multiple kernel learning (MKL) is proposed. Based on the burstiness of DDoS attack flow, the distribution of addresses and the interactivity of communication, we define five features to describe the network flow characteristic. Based on the ensemble learning framework, the weight of each dimension is adaptively adjusted by increasing the inter-class mean with a gradient ascent and reducing the intra-class variance with a gradient descent, and the classifier is established to identify an early DDoS attack by training simple multiple kernel learning (SMKL) models with two characteristics including inter-class mean squared difference growth (M-SMKL) and intra-class variance descent (S-SMKL). The sliding window mechanism is used to coordinate the S-SMKL and M-SMKL to detect the early DDoS attack. The experimental results indicate that this method can detect DDoS attacks early and accurately.


Introduction
In recent years, the security of computer networks, chips, virtual networks and mobile devices has been widely concerned [1][2][3]. As an important platform for information exchange, computer network security has attracted much attention. In the security of computer network, Distributed denial of service (DDoS) attack is yet to be settled in a long time. DDoS is a traditional network attack method. It controls a large number of zombie machines sending a large number of invalid network request packets to a target host. It consumes and meaninglessly occupies the resources of the server, causing normal users to be unable to use the normal services provided by the target host [4]. Although the DDoS attack mode is simpler, its destruction power to the network is far more than other network attacks. Anomaly-based detection is adopted by monitoring systems. By establishing the target system and the user's normal behavior model, the monitoring systems can determine whether the states of the system and the user's activities deviate from the normal profile and can judge whether there is an attack. The attack response is to properly filter or limit the network traffic after the DDoS attack is initiated. The attack traffic to the attack target host is reduced as much as possible to mitigate the influence of the denial of a service attack.
With the rise of cloud computing technologies and software-defined networking (SDN) concepts, DDoS attack detection based on cloud computing environments and softwaredefined networks has received widespread attention [8,9]. As a new computing model, cloud computing has powerful distributed computing capabilities, massive storage capabilities, and diverse service capabilities [10,11]. It has become an important means of solving big data problems [12]. Therefore, establishing a cloud platform system is a necessary measure to effectively ensure cloud computing's reliability, stability and security [13][14][15].
In recent years, machine learning has been applied to the field of security [16]. The method of constructing an attack detection model using machine learning has been widely used [17,18].
The machine learning method plays an important role in the traditional network environment, the cloud environment and software-defined network architecture. The reason is that the machine learning method can deeply mine the important information hidden behind the data and combine prior knowledge to discriminate and predict new data [19]. Therefore, compared with traditional detection methods, machine learning methods can exhibit better detection accuracy [20][21][22][23][24] [43]. Except the above detection methods used to ensure the security of the system, some efficient cryptography techniques can be applied to achieve privacy of the system [44][45][46][47]. Therefore, an adaptive DDoS attack detection method is proposed in this paper. Firstly, we design the algorithms to extract five features.
Secondly, through an ensemble learning framework, the five features are used to train two multi-kernel learning models and obtain the adaptive feature weights with gradient method.
Finally, the sliding window mechanism is used to coordinate the two models to improve the detection accuracy.

Analysis of DDoS attack behavior
In the cloud environment, the botnets of DDoS attacks have distributed characteristics.
Each zombie machine has the ability to independently calculate, send and process data packets, and the source IP address of the packets can also be forged. The advantage of these DDoS The IP data packet often presents a situation in which multiple-source IP addresses point to the same or several destination IP addresses, which is expressed as the asymmetry of the source IP as well as the destination IP in sending and receiving.
(2) Interactivity Assuming that there are A (zombie host) and B (attacked host). When an attack occurs, there are two main communication ways as follows: (1) A sends packets to B (denoted as A→B); (2) A and B send packets to each other (denoted as A⇄B). And the packet amount sent with the way (A→B) is much more than those sent with the way (A⇄B). Therefore, the interactivity of DDoS attack flow has different states in communication direction and amount compared with normal flow.

DDoS attack feature extraction
In the cloud environment, assume that network flow F is as follows: (1) In this part  (2) is presented as follows: In this equation,  is the threshold of the number of packets:  is the threshold of the number of ports, and t  is the sampling time.
The weighted value of all packets in SH class is defined as follows: The weighted value of all packets in SD classes is defined as follows: The weighted value of the number of packets of network flow F in unit time T is as follows: In these equations, 6 6 , where, 8 8 ,  is the SH-type port number abnormality threshold.
In this part we define the MFF is as follows: The HIAD is defined as follows: In eq. (10), 9 9 , By the two-order alternation optimization, the formula (12) The gradient descent method is used to adjust () Jd on d, update d, and optimize the d as well as a alternately. Then, an optimal solution is obtained: ; that is, the original objective function eventually turns into (19). The detailed formulation is as follows: The normal intra-class variance is denoted: The attack intra-class variance is denoted: The optimal equation obtained using the above equations (22) and (23) To further determine whether the optimal equation has achieved good results, this paper sets two constraint conditions for M-SMKL and S-SMKL respectively without conflict with the formula (21) constraint conditions. These constraint conditions is expressed as follows: The constraint conditions of M-SMKL are as follows: The constraint conditions of S-SMKL are as follows: where the values of 1 , 2 , 3 , 4 are close to "0"; the values of 1 , 2 and 3 are close to "1"; the values of 4 , 5 , 6 are close to "7.5". If the constraint condition is satisfied, the algorithm will be stopped and the formula (24) will become the optimal function, otherwise, the each dimension weight will be updated iteratively. The gradient of M and S corresponding to the each dimension weight is as follows:      According to gradients in equations (27) and (28), the weight of each dimension is updated as follows (29)   results. Therefore, the sliding window mechanism is adopted to coordinate the two models to detect early DDoS accurately.

Standards
The data set used for this experiment is the CAIDA "DDoS Attack 2007" data set [54]. This The contents of this data set are TCP network traffic packets. Each TCP packet contains the source address, destination address, source port, destination port, packet size, and protocol type.
The duration of normal flow data used in this paper is 2 minutes in total, and the duration of attack data is 5 minutes in total.
We used the above five feature extraction algorithms to extract features from the data set.
The         Figure 9, it can be seen from the value of the ordinate that the HIAD best reflects the difference between the normal flow and the attack flow while having better stability in the latter half of the attack flow. After the early data, this feature can greatly distinguish between normal flow and abnormal flow, influence the classifier more and make better decisions.

Experimental Results and Analysis
In summary, all five features have their own unique characteristics. To make full use of the characteristics of each feature, the feature values extracted by these five algorithms are each used as a five-dimensional-feature data set. Using these five feature values as training sets, two multiple kernel learning models dominated by gradient ascent and gradient descent are trained into the algorithm, and corresponding five-dimensional feature weight vectors are obtained. Finally, according to the framework of figure 2, the classification results of test set are obtained and are used to verify the effectiveness of method. The parameters of M-SMKL are set as follow: l 1 = 2 * 10 −5 , l 2 = 2 * 10 −3 , 1 = 1.002 , 2 = 1.0065 , 3 = 1.007 , 1 = 0.000084 , and 2 = 0.000001 .
The parameters of S-SMKL are set as follow: l 1 = 2 * 10 −5 , l 2 = 2 * 10 −2 , 4 = 7.3425 , 5 = 7.8340 , 6 = 7.8350 , 3           As shown in figure 10-    The experimental data are presented in Table 1, Table 2, and Table 3. Figure 18: The FR contrast diagram of four algorithms for amplifying the normal flow Table 2: Comparison results of four algorithms for narrowing the attack flow

Conclusion
In this paper, five-dimensional features are  We believe that the approach will have great value in the security of cloud computing, cloud robotics [56], intelligent transportation [57], IOT and so on.
In the follow-up work, we will further study how to transform the multi-dimensional weight adaptive problem based on multiple kernel learning into a convex optimization problem, and improve the detection rate and convergence speed of the method.

Conflicts of Interest
There are no conflicts of interest in this paper.