The prosperity of mobile networks and social networks brings revolutionary conveniences to our daily lives. However, due to the complexity and fragility of the network environment, network attacks are becoming more and more serious. Characterization of network traffic is commonly used to model and detect network anomalies and finally to raise the cybersecurity awareness capability of network administrators. As a tool to characterize system running status, entropy-based time-series complexity measurement methods such as Multiscale Entropy (MSE), Composite Multiscale Entropy (CMSE), and Fuzzy Approximate Entropy (FuzzyEn) have been widely used in anomaly detection. However, the existing methods calculate the distance between vectors solely using the two most different elements of the two vectors. Furthermore, the similarity of vectors is calculated using the Heaviside function, which has a problem of bouncing between 0 and 1. The Euclidean Distance-Based Multiscale Fuzzy Entropy (EDM-Fuzzy) algorithm was proposed to avoid the two disadvantages and to measure entropy values of system signals more precisely, accurately, and stably. In this paper, the EDM-Fuzzy is applied to analyze the characteristics of abnormal network traffic such as botnet network traffic and Distributed Denial of Service (DDoS) attack traffic. The experimental analysis shows that the EDM-Fuzzy entropy technology is able to characterize the differences between normal traffic and abnormal traffic. The EDM-Fuzzy entropy characteristics of ARP traffic discovered in this paper can be used to detect various types of network traffic anomalies including botnet and DDoS attacks.
The prosperity of network technologies, such as mobile networks and social networks, brings revolutionary changes to our daily lives. However, due to the complexity and fragility of the network infrastructures, network anomalies and attacks frequently cause serious problems and significant loss to people. Researchers are studying various cybersecurity awareness technologies to help people understand the security status and trend of networks. Characterization of network anomaly traffic is one of the key technologies commonly used to model and detect network anomalies and then to raise the cybersecurity awareness capability of network administrators. The existing approaches of network anomaly detection can be mainly classified into six categories [
Network anomaly detection via traffic feature distributions is becoming more and more popular these days. As the measure of uncertainty, entropy can be used to summarize feature distributions in a compact form [
Investigation irregularity of signals generated by complex systems is valuable to predict the future states as well as detect abnormal behaviors [
However, the existing methods calculate the distance between vectors solely using the two most different elements of the two vectors. Furthermore, the similarity of vectors is calculated with Heaviside function, which has a problem of bouncing between 0 and 1. To this end, we proposed a novel entropy technology named EDM-Fuzzy in the paper [
The rest of this paper is organized as follows. The related works are introduced in Section
Network anomaly traffic detection approaches have been extensively explored. The existing approaches can be mainly classified into six categories [
Entropy-based technologies are highly valued in detecting the degree of disorder or irregularity of a complex system. Thus, there have been a number of entropy-based technologies being proposed and being widely applied in detecting anomalies of complex systems. Khan et al. [
Pincus [
Entropy-based network anomaly detection via traffic feature characterization is becoming more and more popular these days. Ranjan et al. [
However, there are still two disadvantages in the existing state-of-the-art entropy algorithms, such as MSE, CMSE, RCMSE, MMSE, and FuzzyEn. That is, the existing methods calculate the distance between vectors solely based on the two most different elements of the two vectors. Furthermore, the similarity of vectors is calculated using Heaviside function, which has a problem of bouncing between 0 and 1. In order to address the shortcomings of existing state-of-the-art entropy algorithms, we proposed novel entropy technology [
EDM-Fuzzy measures the distance of the two vectors with Euclidean distance taking all the corresponding elements in the two vectors into the computation. Furthermore, in order to solve the problem of instability, we choose the hyperbolic function as the fuzzy function instead of the Heaviside function to define the similarity between vectors with full-range continuous values from zero to one based on the Euclidean distance of the two vectors. The computation process of EDM-Fuzzy is formally described in Algorithm
Time series: Time scale: Vector dimension: Tolerance coefficient: Standard deviation of time series EDM-Fuzzy entropy value of time series Coarse-graining the time series Calculate the mean of each vector Move the vectors Calculate the Euclidean distance of the two vectors Calculate the similarity between Calculate the average similarity between vector Compute the average of Compute the Euclidean distance based on fuzzy sample entropy value for every Compute the fuzzy sample entropy value for the original time series at time scale
The goal of the algorithm is to measure the complexity and irregularity of time series more accurately and stably. The input of the algorithm is a time series
A suitable network traffic trace is essential to the research of the characterization of network anomaly traffic. The traces used in this paper are publicly accessible, within which anomaly activities including botnet and DDoS attack were recorded. Through analysis of these public traces with EDM-Fuzzy algorithm, we can further discover the characteristics of such anomaly activities.
The botnet traffic trace used in this section is the CTU-13 trace that was collected and provided by the Stratosphere Laboratory of CTU University in the Czech Republic [
Scenarios of Botnet traffic.
ID | IRC | SPAM | CF | PS | DDoS | FF | P2P | US | HTTP |
---|---|---|---|---|---|---|---|---|---|
1 | ✓ | ✓ | ✓ | — | — | — | — | — | — |
2 | ✓ | ✓ | ✓ | — | — | — | — | — | — |
3 | ✓ | — | — | ✓ | — | — | — | ✓ | — |
4 | ✓ | — | — | — | ✓ | — | — | ✓ | — |
5 | — | ✓ | — | ✓ | — | — | — | — | ✓ |
6 | — | — | — | ✓ | — | — | — | — | — |
7 | — | — | — | — | — | — | — | — | ✓ |
8 | — | — | — | ✓ | — | — | — | — | — |
9 | ✓ | ✓ | ✓ | ✓ | — | — | — | — | — |
10 | ✓ | — | — | — | ✓ | — | — | — | ✓ |
11 | ✓ | — | — | — | ✓ | — | — | — | ✓ |
12 | — | — | — | — | — | — | ✓ | — | — |
13 | — | ✓ | — | ✓ | — | — | — | — | ✓ |
Traffic volume of 13 types of Botnet scenarios.
ID | Duration (hours) | Packets | Malware type | Infected hosts |
---|---|---|---|---|
1 | 6.15 | 71971482 | Neris-1 | 1 |
2 | 4.21 | 71851300 | Neris-2 | 1 |
3 | 66.85 | 167730395 | Rbot-1 | 1 |
4 | 4.21 | 62089135 | Rbot-2 | 1 |
5 | 11.63 | 4481167 | Virut-1 | 1 |
6 | 2.18 | 38764357 | Menti | 1 |
7 | 0.38 | 7467139 | Sogou | 1 |
8 | 19.5 | 155207799 | Murlo | 1 |
9 | 5.18 | 115415321 | Neris-3 | 10 |
10 | 4.75 | 90389782 | Rbot-3 | 10 |
11 | 0.26 | 6337202 | Rbot-4 | 3 |
12 | 1.21 | 13212268 | NSIS.ay | 3 |
13 | 16.36 | 50888256 | Virut-2 | 1 |
Table
Table
DDoS attack is an abnormal network behavior designed to exhaust server resources. It will cause server congestion and thus will be unable to provide services to users. The traffic trace used in this paper is the CICDDoS2019 which was published by the Canadian Cyber Security Institute (CIC) [
DDoS attack time on November 3.
Type ID | Attack type | Attack time |
---|---|---|
1 | PortMap | 9 : 43–9 : 51 |
2 | NetBIOS | 10 : 00–10 : 09 |
3 | LDAP | 10 : 21–10 : 30 |
4 | MSSQL | 10 : 33–10 : 42 |
5 | UDP | 10 : 53–11 : 03 |
6 | UDPLag | 11 : 14–11 : 24 |
7 | SYN | 11 : 28–17 : 35 |
DDoS attack time on December 1.
Type ID | Attack type | Attack time |
---|---|---|
1 | NTP | 10 : 35–10 : 45 |
2 | DNS | 10 : 52–11 : 05 |
3 | LDAP | 11 : 22–11 : 32 |
4 | MSSQL | 11 : 36–11 : 45 |
5 | NetBIOS | 11 : 50–12 : 00 |
6 | SNMP | 12 : 12–12 : 23 |
7 | SSDP | 12 : 27–12 : 37 |
8 | UDP | 12 : 45–13 : 09 |
9 | UDPLag | 13 : 11–13 : 15 |
10 | TFTP | 13 : 35–17 : 15 |
Two days of traffic were collected in this trace, which were November 3 and December 1, as shown in Tables
Types of DDoS attacks.
As shown in Figure
Entropy-based time-series complexity measurement methods are widely used in fault diagnosis and anomaly detection of various complex systems. In this section, we apply EDM-Fuzzy in network traffic anomaly characterization and detection. The analysis of anomaly traffic characteristics based on MSE of Euclidean distance is an important part of the study of abnormal traffic. In this section, two anomalies of botnet and DDoS attack will be analyzed by Euclidean distance multiscale entropy. This section will calculate the entropy value of these two abnormal network protocol time series to obtain the entropy curves of the two and study the characteristics of the abnormal traffic by comparing the difference in the entropy curves.
In this section, we will study the EDM-Fuzzy entropy characteristics of 13 types of botnets abnormal ARP traffic in the CTU-13 dataset. According to the TCP/IP architecture, the ARP protocol is located in the IP layer of the network layer, and its main function is to provide address translation services and find the network physical address of the host corresponding to the IP address. We first calculate the entropy values for each type of botnet using ARP protocol traffic data in the CTU-13 dataset at time scales from 1 to 40. The entropy curves of 13 types of botnets in the CTU-13 dataset with scale factors from 1 to 40 are shown in Figure
Entropy curves of 13 types of botnets in the CTU-13 dataset.
As can be seen from the figure, there are common trends shared by entropy curves of most types of botnet traffic. More specifically, there is a reflection point for 11 entropy curves (Neris-1, Neris-2, Rbot-1, Rbot-2, Virut-1, Menti, Sogou, Murlo, Neris-3, NSIS.ay, and Virut-2) when the time scale is 20, and the second reflection point appears at the time scale of 30 for all entropy curves. For the above 11 types of botnet ARP traffic, the entropy values between the inflection points increase first and then decrease. The trend of the entropy curves of Rbot-3 and Rbot-4 is different from other types of abnormal behavior. Entropy curves of Rbot-3 and Rbot-4 are in a steady growth state when the time scale is around 20, but when the time scale is 30, there is also an inflection point. Moreover, entropy values of Rbot-4 are significantly larger compared to those of other types of anomalies. The above results illustrate that the attack methods of Rbot-3 and Rbot-4 are different from the other types of botnets. This difference is caused by the way they infect hosts, and the complexity of the botnet is consistent with the complexity of the ARP protocol.
In this section, we will study the EDM-Fuzzy entropy characteristics of the malicious traffic of the distributed denial attacks on November 3 and December 1 in the CICDDoS2019 dataset. Through analysis of the trend of entropy values, it is possible to understand more characteristics of DDoS attack traffic. As introduced in the dataset, there were seven and ten types of distributed denial attacks launched on November 3 and December 1, respectively. In this section, we first calculate the entropy value of the ARP traffic of each type of DDoS attack in the CICDDoS2019 dataset at time scales from 1 to 40, and the entropy value curves are shown in Figures
Entropy curves of DDoS attacks on November 3.
Entropy curves of DDoS attacks on December 1.
Figures
In this section, we will analyze the characteristics of network traffic under normal status. The normal traffic trace used in this paper is captured and published by the Stratosphere laboratory.
In order to study the entropy characteristics of normal traffic, the EDM-Fuzzy entropy values are calculated on the CTU-Normal-20 and CTU-Normal-23 traces with time scales from 1 to 40 and the results are shown in Figure
Entropy values of normal ARP traffic.
As can be seen from Figure
Compared with the time series of the CTU-13 dataset, Figure
In this section, we will compare the ARP traffic entropy curves between botnet, DDoS attack, and normal status and then characterize the differences between normal and abnormal traffic.
By comparing the entropy curves of ARP traffic of botnet, DDoS attack, and normal status, we find out the following main differences between normal traffic and malicious traffic. In the entropy curves of 13 types of botnets, 11 entropy curves (Neris-1, Neris-2, Rbot-1, Rbot-2, Virut-1, Menti, Sogou, Murlo, Neris-3, NSIS.ay, and Virut-2) have a reflection point at the time scale of 20, and all entropy curves have a reflection point at the time scale of 30. In the entropy curves of DDoS traffic, all of the entropy values of DDoS attacks are larger than 0.18 when the time scale is larger than 4, and most types of DDoS attacks gradually stabilized to entropy values between 0.3 and 0.5 when the time scale is larger than 10. In contrast, the entropy values of normal ARP traffic grow slowly as the time scale increases and the entropy values are smaller than 0.18 for all time scales.
In order to be presented more intuitively, the main characteristics of entropy curves of ARP traffic of botnet, DDoS attack, and normal status are listed in Table
Characteristics of EDM-Fuzzy entropy curves of ARP traffic of botnet, DDoS, and normal status.
Botnet | DDoS | Normal | |
---|---|---|---|
Value | Mostly between 0.1 and 0.4. | Larger than 0.18 when the time scale is larger than 4. | Smaller than 0.18. |
Trend | Entropy curves of all types of botnet traffic have an inflection point at a time scale of 30; 11 types have an inflection point at a time scale of 20. | Gradually stabilized to a value between 0.3 and 0.5 when the time scale is larger than 10. | Increase steadily from 0 to 0.18. |
On the basis of the above analysis, it is reasonable to summarize that the characteristics of entropy curves of ARP traffic of botnet, DDoS, and normal status are quite distinguishable. Thus, the characteristics are easy to be used to detect these types of network traffic anomalies. In the future, we will study characteristics of EDM-Fuzzy entropy curves of more types of network traffic anomalies and utilize the learned characteristics of network traffic anomalies in combination with intelligent algorithms to automatically detect network anomalies.
In order to raise the cybersecurity awareness capability of network administrators, it is necessary to develop new technologies for detecting network anomalies more accurately and efficiently. The basis of such network anomaly detection technologies is to understand the characteristics of abnormal network traffic. In this paper, we apply the EDM-Fuzzy technology as a tool to analyze the characteristics of abnormal network traffic such as botnet network traffic and DDoS attack traffic. The EDM-Fuzzy is a technology that we proposed for analyzing and diagnosing faults/anomalies of complex systems by measuring the complexity and regularity of their time-series signals. The experimental analysis shows that the EDM-Fuzzy entropy curve is capable of characterizing the difference between normal traffic and abnormal traffic and the characteristics are easy to be used to detect various types of network traffic anomalies. In the current work, we have not investigated other types of network anomalies and have not finished the automatic detection of network traffic anomalies. In the future, we will investigate EDM-Fuzzy entropy characteristics for more types of network anomalies and then integrate the EDM-Fuzzy entropy and deep-learning technologies to propose the novel network anomaly detection method.
The botnet traffic trace used in this section is the CTU-13 trace that was collected and provided by the Stratosphere Laboratory of CTU University in the Czech Republic [
The authors declare that they have no conflicts of interest.
This work was in part supported by the National Key Research and Development Program of China under Grant 2019YFB2102100, the China Postdoctoral Science Foundation under Grant 2016M600465, the Key Research and Development Program of Zhejiang Province under Grant 2019C03134, and the National Natural Science Foundation of China under Grant 61772165.