GLD-Net: Deep Learning to Detect DDoS Attack via Topological and Traffic Feature Fusion

Distributed denial of service (DDoS) attacks are the most common means of cyberattacks against infrastructure, and detection is the first step in combating them. The current DDoS detection mainly uses the improvement or fusion of machine learning and deep learning methods to improve classification performance. However, most classifiers are trained with statistical flow features as input, ignoring topological connection changes. This one-sidedness affects the detection accuracy and cannot provide a basis for the distribution of attack sources for defense deployment. In this study, we propose a topological and flow feature-based deep learning method (GLD-Net), which simultaneously extracts flow and topological features from time-series flow data and exploits graph attention network (GAT) to mine correlations between non-Euclidean features to fuse flow and topological features. The long short-term memory (LSTM) network connected behind GAT obtains the node neighborhood relationship, and the fully connected layer is utilized to achieve feature dimension reduction and traffic type mapping. Experiments on the NSL-KDD2009 and CIC-IDS2017 datasets show that the detection accuracy of the GLD-Net method for two classifications (normal and DDoS flow) and three classifications (normal, fast DDoS flow, and slow DDoS flow) reaches 0.993 and 0.942, respectively. Compared with the existing DDoS attack detection methods, its average improvement is 0.11 and 0.081, respectively. In addition, the correlation coefficient between the detection accuracy of attack flow and the four source distribution indicators ranges from 0.7 to 0.83, which lays a foundation for the inference of attack source distribution. Notably, we are the first to fuse topology and flow features and achieve high-performance DDoS attack intrusion detection through graph-style neural networks. This study has important implications for related research and development of network security systems in other fields.


Introduction
Popular industries such as shopping, education, finance, government affairs disclosure, and communications connect core services, such as payments, instant messaging, and big data analysis, to the Internet in real time for user access. Due to these services' vulnerability and high value, attacks on infrastructures that provide these services are favored by hackers. One of the most common attacks to block these services is the DDoS attack [1]. How to deal with DDoS attacks to ensure network smoothness has become a research hotspot [2]. e traditional defense strategy assumes that the attack topology is a static point-to-point model whose topology remains unchanged during the attack [3,4]. Under this premise, DDoS detection is mainly realized through changes in traffic size, and the corresponding single-point defense is relatively simple [5]. In 2022, Israel's network providers were hit by a large DDoS attack from abroad, paralyzing the website of the Interior Ministry for hours. e same year, the Ukrainian government suffered repeated DDoS attacks from Russia and Belarus, forcing multiple portals to shut down to avoid losses. e network situation has changed as DDoS attacks shift from individual behaviors to confrontations between countries. Except for the increase in the attack traffic, the range of attack sources continues to expand; the flow topology evolves during the attack. Conventional single-point defense cannot cope with these changes, and multisource protection requires attack source location. However, previous detections cannot identify the attack distribution, thus impossible to precise defense. erefore, we need a detection method that can determine the attack distribution through topology changes to support further attack tracing and defense deployment.
DDoS detection aims to distinguish attack traffic from legitimate traffic. According to the different fields of mathematics, the current mainstream DDoS detection methods can be divided into three categories: statistics, machine learning, and deep learning. e statistical method uses measures such as entropy to evaluate the traffic distribution's change. It is simple and requires no additional hardware support. However, its detection effect depends on thresholds, which researchers usually give directly [6]. is subjective assignment lacks an objective basis, affecting the reliability of results. Machine learning classifies network traffic through classifiers designed based on selected features. Due to modeling using features, it exhibits excellent flexibility. However, reliance on feature engineering makes it less adaptable in the face of complex real network traffic. In addition, traditional machine learning belongs to shallow learning, making it difficult to learn deep relationships. us, its accuracy is usually less than 90%. Deep learning utilizes multilayer neural networks to learn the inherent laws of network traffic. e feature extraction is contained in the neural network structure without additional processing. Besides, multilayer neural networks can mine deep information, making up for the defects of shallow learning. DDoS detection based on deep learning has high accuracy and efficiency. For different requirements and problems, many related research studies are emerging [7]. ese studies usually improve performance by improving or fusing network architectures. For example, convolutional neural network (CNN) and recurrent neural network (RNN) are used to process the relationship between features in time and space [8], RNN and automatic codec are combined to improve the detection sensitivity on SDN [9], and adaptive transfer learning is introduced to achieve small sample detection [10]. ese methods achieve good performance by exploiting the efficient information in the input as much as possible through elaborate devised architecture and parameters.
DDoS attacks have two notable characteristics: (1) from a spatial perspective, heavy traffic in the short term changes the distribution of adjacent upstream nodes of the victim host [11]. (2) From a time perspective, the prolonged blocking makes the limited attack nodes have multiple attack behaviors on the target [12]. ese two intrinsic peculiarities of DDoS attacks make the network topology before and after the attack significantly different. erefore, in addition to traffic characteristics, DDoS attacks can also be detected based on the topology changes [13]. is difference is implicit in the evolution of the topology structure, which traffic statistics cannot depict. Introducing topology changes can improve detection accuracy and help analyze the distribution of attack sources. e data (such as graph) considering topological connection are non-Euclidean data. Sample points (nodes) have different numbers of neighbor points, and edges depict their interdependence. However, traditional deep learning requires Euclidean data as input to extract features. For example, CNN needs the sample to be regular and independent. RNN demands the data to be a one-dimensional real vector. e linear input cannot deal with topological relationships. Graph attention network (GAT) is a powerful analysis tool for graph data [14]. It incorporates the attention mechanism into the graph neural network and captures associations through neighborhoods. Further, the attention mechanism assigns different weights to adjacent nodes, improving feature sensitivity. In this study, we treat topology as graph data. In particular, edge attributes denote traffic features, and node attributes indicate topological features. erefore, GATcan simultaneously analyze traffic and topological features with the graph as input. To our knowledge, we are the first to achieve DDoS attack detection using graph-style deep learning. e main contributions of this study are as follows: (1) e proposed dynamic topology construction algorithm integrates topology and flow features into node or edge attributes. (2) GAT is used to mine topology change patterns and train classifiers. e rest of this study is organized as follows. In Section 2, we discuss research related to DDoS detection. Section 3 describes the details of the proposed method in terms of feature extraction and deep learning architecture. Section 4 designs the experiments and analyzes the results. We summarize this research in Section 5. Finally, the shortcomings and future research are pointed out in Section 6.

Related Work
In recent years, DDoS detection research mainly acquires traffic features containing attack-specific information through feature acquisition [15]. en, characteristics are analyzed based on different theories or tools to discover traffic classification patterns [16]. Section 2 describes current DDoS feature acquisition methods and summarizes three mainstream DDoS detection methods: statistics, machine learning, and deep learning.

Feature Acquisition.
Valid feature input is critical to traffic classification performance since it determines the valuable information contained in samples. ere are two main methods for feature acquisition: output features using generator tools (such as CICFlowMeter) [17,18] and custom features based on subjective experience [19,20]. e former applies public datasets or traffic extraction tools to obtain features, while the latter designs corresponding features according to application requirements. In 2017, Yuan et al. [17]  e above methods propose efficient feature acquisition strategies. Nevertheless, these flow feature extraction ways lack the characterization of the topology. erefore, it is needed to define topological features and give corresponding acquisition methods.

Statistical Method.
Statistical methods use numerical distribution to differentiate traffic. In 2017, Hoque et al. [22] proposed a new correlation indicator NaHiD based on standard deviation and mean. Experimental results show that this measure is more robust and sensitive to state changes than traditional metrics. In 2022, Tsobdjou et al. [23] raised a dynamic entropy threshold algorithm based on Chebyshev inequality. Comparative experiments indicate that this method can better adapt to varied online environments than static thresholds. e same year, Ahalawat et al. [24] proposed a Renyi entropy DDoS attack detection technique based on the packet drop strategy. It can evaluate the probability distribution of flow fluctuations and achieve better results than the Shannon entropy. ese methods analyze the numerical fluctuation of flow from a macro-view. However, their application scope is narrow due to the lack of fine-grained characterization.
us, statistical methods are usually not used alone for comprehensive evaluation.

Machine Learning.
Machine learning can automatically learn feature patterns and create classifiers. In 2019, Gu et al. [25] proposed the DDoS detection algorithm SKM-HFS. Weighted K-means analysis balances the number of samples and accuracy, and the density clustering center algorithm optimizes the extreme values.
e results show that this method performs best when choosing TOPSIS as the evaluation factor. In 2020, Pande et al. [26] utilized the random forest algorithm to distinguish between normal and attack samples and used the WEKA tool to detect DDoS attack ping of death. Experiments on NSL-KDD indicate that random forest achieves the highest accuracy of 99.76% on specific attacks. In 2021, Cvitic et al. [27] understand DDoS detection as a multi-device classification problem and distinguish traffic generated by different IoT devices through a logical model tree. A comparison of four typical devices shows that the logical model tree can better identify DDoS traffic from IoT devices. In 2022, Kumar et al. [28] designed the recursive feature elimination method RFE. It is also combined with the random forest algorithm to train the classifier. Experiments show that this method can cope with fast detection under large network traffic.
e above methods extract relevant information from the traffic details. However, they rely heavily on feature engineering and have low performance in the face of large samples. Hence, we need to find a more efficient detection model.

Deep Learning.
Deep learning applies a multilayer neural network to obtain the correlation between input and output. In 2019, Liang and Znati [29] employed LSTM in a DDoS detection framework. LSTM captures the implicit sequence representation in the input vector through three gating units.
is method can learn flow-level modes, avoiding expensive and error-prone feature engineering. In 2020, Doriguzzi-Corin et al. [30] proposed LUCID, a lightweight DDoS detection system that utilizes one-dimensional CNN to reduce computational load. Experiments on ISCX2021, CIC-IDS2017, and CSE-CIC2018 datasets show that LUCID has a 40x reduction in processing time compared with other deep learning methods, so it is suitable for detection under limited resources. In 2021, Cil et al. [31] built a traffic classification model based on the deep neural network. Its structure contains feature extraction, and training can be completed with only three fully connected layers. Experiments on CIC-DDoS2019 show that the model has an accuracy of 95%. In 2022, Boonchai et al. [32] implemented two DDoS detection models using the DNN architecture and autoencoder, respectively, and verified the attack recognition ability of the models through the CIC-DDoS2019 dataset with an accuracy rate of 87% and 91.9%, respectively.
A single method is challenging to meet diverse DDoS detection needs. erefore, many scholars extend the applicability through method mixing. In 2019, Pektaş and Acarman [8] extracted five statistical features: duration, bytes, packets, periodicity, and states through network traffic summary and mined semantic information in the feature sequence through CNN and RNN. e accuracy of this method reaches 99.1%, significantly higher than a single network. In 2020, Wang and Liu [33] employed information entropy and deep learning to detect DDoS attacks in SDN. First, IP entropy identifies malicious traffic routers, and then, CNN classifies packet-level traffic. is method achieves 98.98% accuracy and also reduces training time. In 2020, Elsayed et al. [9] proposed DDoSNet, an intrusion detection system for SDN. is system combines RNN and autoencoder. RNNs capture sequence relationships, and autoencoders detect small perturbations. Compared with baseline methods such as decision tree, random forest, and support vector machine, DDoSNet is more stable and achieves an accuracy of 99%. In 2021, Shieh et al. [34] built a DDoS unknown traffic discovery model BI-LSTM-GMM. It consists of bidirectional LSTM (BI-LSTM) and Gaussian mixture model (GMM). GMM labels the unknown traffic and adds it to the new input of BI-LSTM. Experiments show that this method can identify unknown attacks through reinforcement learning. In 2022, Almaraz-Rivera et al. [35] designed a new intrusion detection system based on machine learning and deep learning models to solve the unbalanced detection of DDoS attack categories. It combines decision trees and multilayer perceptrons to test binary classification performance on different datasets, avoiding data and fragmentation effects.
Besides binary classification, multi-classification studies that ease defense deployment are also emerging. In 2019, Toupas et al. [36] utilized stacked fully connected layers for intrusion detection. Experiments show that this method can better learn the difference between fast and slow DDoS flows with an accuracy of 95.62%. In 2020, Alzahrani et al. [37] proposed FastGRNN, a DDoS multi-classification method for IoT, which reduces training complexity by adding residual to hidden states. It achieves 1:5 optimization of detection time and training time to adapt to real-time detection. In 2020, Hussain et al. [38] used ResNet for complex traffic detection. ey convert traffic into a three-channel format and analyze it through ResNet. is method achieves an accuracy of 87% in distinguishing normal flow, fast DDoS flow, and slow DDoS flow and an increase of 9% compared with other methods. In 2022, Rusyaidi et al. [39] designed a high-precision DDoS attack detection system based on DNN and LSTM. It achieved an accuracy of 97.37% on the NSL-KDD dataset and excellent performance in identifying 22 traffic types.
With the in-depth development of deep learning, many researchers also apply new architectures to optimize DDoS detection performance. In 2020, He et al. [10] employed transfer learning for small-sample DDoS detection. ey also define the transfer ability to evaluate different networks and select the best network structure and parameters. is method improves the detection accuracy on small samples by 20.8%, which can effectively cope with training degradation. In 2021, Novase et al. [40] utilized generative adversarial network (GAN) to detect DDoS adversarial attacks. It improves system robustness through adversarial training and uses IP entropy to analyze continuous traffic for real-time monitoring.
is method shows strong adaptability in detecting adversarial attacks. In 2022, Doriguzzi-Corin and Siracusa [41] proposed an adaptive mechanism for DDoS attack detection based on federated learning, FLAD. It updated federated learning to solve the integration problem in dynamic security confrontation, monitoring the status locally without interaction. e experimental results verified the efficiency and performance of the method.
Deep learning has shown advantages in different detection requirements. However, it can only process traffic characteristics and not extract topology information. us, we need to find a new way to consider both features to improve detection accuracy and lay a basis for attack source localization.

Methodology
is section details the procedure and implementation of deep learning detection based on topological and flow features. As shown in Figure 1, our proposed DDoS detection system has three main parts. e first part is the extraction module. It is responsible for extracting features from public datasets or actual scenes and transforming samples into graph data consisting of nodes and edges. e second part is the training module, which builds a classification model that can mine deep-level information from samples. e input is sample data, the output is label type, and parameters are optimized during training. e third part is the evaluation module, which compares detection effects under different hyperparameters to select the optimal configuration.
When the above stages are completed, the pattern analysis for feature extraction is saved as an extractor, and the trained neural network is preserved as a classifier. en, real traffic can be quickly classified by running through these processing parts only once without retraining.

Feature Extraction.
Building a topology diagram is the core of topology feature extraction. It maps from raw traffic data to dynamic topology plots that evolve; an example is shown in Figure 2. In particular, f i denotes the flow's distribution originating from the corresponding ip n , G m denotes the subgraph under time slice t m , and the indicators in the feature table denote extracted samples. Besides, F, Tab, G, and T represent the set of flow distribution f 1 , f 2 , · · · , f s , feature table [tab 1 , tab 2 , · · · , tab m ], topology map G 1 ∪ G 2 ∪ · · · ∪ G m , and time slice T 1 + T 2 + · · · + T m , respectively. In Figure 2, there are two major stages. e first stage realizes the transformation from traffic data to node or edge features. e traffic records of different source IPs are divided according to the time unit. en, the time slice proportions of features are formed into [IP-Feature] pairs and saved in the feature table. e second stage builds the connection graph, adds attributes to nodes or edges according to the feature table, and decides whether to add based on the flow that exists or not in the topology. e final topological feature map G � G 1 ∪ G 2 ∪ · · · ∪ G m on m subtime segments is obtained when the addition is finished.
In Figure 2, the feature table determines the attributes of edges and nodes. Besides statistical features, we also add connection state, packet marker, and centrality features. e attack pattern implied in the connection state sequence can distinguish attack phases [8]. For example, the target maintains many half-open connections in a SYN flood attack, making the state list of long LISTENs. e normal communication state is composed of LISTEN, ESTAB-LISHED, and CLOSED. In this case, the proportion of LISTEN differs from that of SYN attack. erefore, state sequences reflecting this divergence can be used for traffic classification. Packet flags reveal the attacker's malicious attack intention. For example, regular data packets must be queued in the buffer before parsing, while numerous URG flags set to 1 increase the processing priority, thus enabling fast attacks. Hence, the packet tampering details that macroscopic features cannot describe are hidden in the packet marking sequence, thus detecting the attack. We choose the degree and betweenness centrality based on the understanding that destructive attackers usually control more agents to execute attacks, and targets are generally critical nodes [42]. en, these two centralities can capture the attack preference to realize attack detection. In conclusion, we extract multiple node or edge features for attack detection, addressing the one-sidedness of the training data.
We extract features from ICMP, UDP, and TCP, respectively. e features of protocols except TCP are the same. For clarity, we take TCP features as an example to illustrate the attributes of the extracted features, as shown in Table 1. Among them, t s represents an arbitrary time field. Table 1 includes eight edge and two node features. In particular, the edge feature depicts the traffic distribution through the edge in period t s ; the node feature describes the spatial distribution of adjacent nodes within t s . URG and ECE flags are extended to lists to preserve the time-varying properties. We also introduce degree and betweenness centrality to characterize topology changes. e meaning and acquisition of the features in Table 1 are described below.
Among edge features, except "connection number," "connection states," "URG flag," and "ECE flag," the other four can be calculated by statistical formula. Notably, the    Computational Intelligence and Neuroscience standard deviation measures the discrete distribution of samples. When the number of samples with the same IP and protocol in t s is 3, regular flows are much more than attack flows, making the training challenging to converge. So, when the number of samples is not less than 3, the standard deviation has practical significance. In addition, in subsequent experiments, we found that the detection efficiency and accuracy are balanced when the number of samples is not less than 4. erefore, we only consider four or more identical protocol connections established between the same node pair as actual training data. Connection number and connection state are two macro-edge features. e former reflects the frequency of establishing connections between nodes; the latter reflects the continuous change in the protocol state. For example, CLOSED means all active links are closed; LISTEN signifies waiting for new requests; and ESTABLISHED means the connection is established. ese states can be obtained through traffic analysis tools like TShark or CICFlowMeter [43]. However, the acquired elements are of type string and hard to use for training directly. We use one hot to convert state sequences into real vectors to simplify computation [44]. In particular, zero indicates that the state is not enabled, and one denotes that the state is activated. "URG flag" and "ECE flag" record the state sequence of consecutive packets. ese two flags represent unexpected events during the sending of traffic. In particular, a URG of 1 indicates that the current data packet is prioritized and should be processed without queuing; an ECE of 1 indicates congestion, and the sending window decreases. e abnormal state to one is set, and the normal state is set to zero; then, the state sequence is a list of zeros and ones. is binary list can be used for training directly without encoding.
Two node features, degree and betweenness centrality, characterize the connectivity properties of the neighborhood. Degree centrality uses the number of adjacent nodes to denote the node importance. Let Len be a function of solving the number of non-repetitive elements; N neighbor represents the number of neighboring nodes of node N; 〈N s src , N s des 〉 indicate flow s. en, degree centrality is formulated as follows: Betweenness centrality measures the node importance by the ratio of shortest paths' number through a node. Let Path(N) denote the number of shortest paths containing node N, and Path(src i , dst j ) denote the number of shortest paths between src i and dst j . en, the formula of betweenness centrality is as follows: We also show the construction process of the DDoS topology map through pseudo-code, as shown in Algorithm 1.
Algorithm 1 can be divided into three main stages, namely, the initial assignment stage (1∼2), configuration stage (3∼11), and mapping stage (12∼21). e graph structure is initialized in the initial assignment phase, and necessary parameters are set. Statistical features are calculated based on item durations' total and average in the configuration stage. In the final phase, nodes and directed edges are added to graph G, and features are attached to them. After the above steps are completed, the topology graph G is constructed.
To sum up, by extracting the features in Table 1 and constructing a dynamic topology graph according to Algorithm 1, the flow or topology attributes are included in the node or edge features. Till now, we have obtained structured data for training. is part introduces GLD-Net, a deep learning model capable of analyzing and fusing topology and flow features. Its structure is shown in Figure 3. is model has three main parts: the GAT layer, LSTM, and the fully connected layer (also known as the dense layer). Firstly, an L-layer GAT network is used to analyze the topological data. Its output is a spatial sequence over the neighborhood. Secondly, sequence relationships in the output are mined by LSTM. Finally, the dense layer reduces the feature dimension, and the softmax function limits the output size between zero and one.
is value corresponds to the traffic label to achieve classification. In the following subsections, we will detail the processing method of each neural network and the information transfer within it. In particular, L and K in Figure 3 represent the number of attention mechanisms and the number of splice heads in multi-head attention, respectively. e detailed parameter functions and setting methods of the GAT, LSTM, and dense layer will be explained in each subsection.

GAT Layer.
e two basic units of GAT are attention coefficient calculation and information aggregation, as shown in the dotted box in Figure 3. e structure of the attention coefficient calculation is shown in Figure 4.
In Figure 4, we take four adjacent nodes N j � (n 1 j , n 2 j , n 3 j , n 4 j ) of node n i as an example to illustrate the information transfer progress in calculating the attention , ur/ij be the intermediate variables obtained by splicing the initial node n i and the adjacent node n r j after feature enhancement φ, and Concat denote concatenation operation; that is, Let w be a trainable shared weight, and φ(n i ) and φ(n r j ) can be obtained by linear transformations, which are, respectively, expressed as follows: A similarity coefficient e r ij can be obtained by the inner product of the intermediate variable u r ij and the trainable parameter vector s → . In addition, the deviation of similarity coefficients is corrected by LeakyReLU. e negative axis slope of LeakyReLU retains negative values so that similarity coefficients do not suffer from the loss of negative information like ReLU. e formula of e r ij is expressed as follows: e r ij needs to be normalized on the interval [0, 1] to facilitate information aggregation. According to different transformation modes, normalization can be divided into linear methods, such as min-max [45] and Z-score [46], and nonlinear methods, such as softmax [47]. In particular, minmax only needs extremum and current values. Its calculation is simple but easily affected by individual points. Z-score utilizes comprehensive information and is less affected by outliers. However, the data must meet the normal distribution; otherwise, the output will be seriously distorted. e exponential calculation of softmax is a smooth derivative transformation that retains each value's influence and has no data distribution requirements. We choose the nonlinear function softmax as the normalization method to prevent the loss of complex information in the transformation process. en, the attention coefficient σ r ij can be calculated by the following formula: e attention coefficient σ r ij contains the correlation between node n i and neighbor node n r j . en, the information aggregation based on neighborhood nodes can be realized with the attention coefficient. Its structure is shown in Figure 5.
In Figure 5, the new node feature n i ′ can be calculated by the weighted sum of all neighbor node features φ(n r j ) with σ r ij . e calculation formula is as follows: where τ is the transformation function that maps the original vector space R r to a new vector space R r′ centered at n i ′ . Due to errors, there may be offsets in a single calculation. erefore, we use multi-head attention with parameter K to improve the robustness of the results. Commonly used methods of combining multiple attention include concatenation and averaging [14]. We choose arithmetic averaging as the synthesis algorithm for reduced dimensionality and higher efficiency. en, the following formula is obtained: Except for the learnable parameters w and s → , other parameters, including the number of attention mechanisms L, the number of multiple heads K, and the gradient θ of LeakyReLU, are hyperparameters configured before training. Grassia et al. [48] pointed out that the size of L is related to the ability of information aggregation, and a single attention mechanism can learn node features up to 3 hops away. us, the bigger L is, the wider the range of information aggregation is. In datasets such as NSL-KDD2009, the number of IP hops of data packets does not exceed seven jumps [49], so L set to three can meet the requirements. K determines the learning perspective of relevant information. Efficiency and accuracy are balanced when K is set to 20 in subsequent experiments. θ affects the weight update rate, and its value should be adapted to the dataset size and learning depth. Combined with the tradeoff theory [50], θ is set to 0.3.
To sum up, we achieve local information aggregation of node features through GAT's feature transformation and multi-head attention mechanism. is study considers edges and nodes as entities of the same status. Edge features are merged into node features for unified processing to simplify computation.
Computational Intelligence and Neuroscience 7

LSTM.
After GAT training, the output vector n ′ � n i ′ , i ∈ no de is obtained. is vector contains the spatial sequence information of nodes in the neighborhood. Suppose it is directly poured into the classifier without processing. In that case, it will cause the loss of semantic information. Common network structures for processing sequence data are RNN and LSTM [51]. RNN uses memory units to retain historical data. us, the output is determined by the previous data and current input. However, due to the disappearance of the back propagation gradient, it is easy to cause short-term memory. LSTM enhances memory with gating units to learn relevant information in longer sequences. ere are many adjacent attack nodes in DDoS attacks, making the distance between related data in the original data larger. erefore, we choose LSTM to mine long sequence information. For clarification, we use the input x t of LSTM at time t as an example to illustrate the information flow. In Figure 3, the memory cell S t at time t comprises three gating units: input gate I(t), forgetting gate F(t), and output gate O(t). Assuming that the output of the memory cell at time t-1 is c t−1 , then the calculation formulas of I(t), F(t), and O(t) at the next moment t are as follows: where W t and b t represent the weight and bias of transformation at time t, respectively. e sigmoid function controls the values of I(t), F(t), and O(t) to fall within the interval [0,1]. In particular, one means all the information flow is passed, and zero means the information flow is blocked. Let B(t) be the data to be processed, and its calculation formula is as follows: where W t e and b t e are the parameters of memory cell state transition. Assume the intermediate state of the memory cell at time t-1 is B t−1 . e recording of B(t) and the forgetting of B t−1 are controlled by I(t) and F(t), respectively. Let ⊗ denote the defined gating transformation; then, at the next moment, the updated intermediate state B t can be expressed as follows: e output gate O(t) controls the actual information passing through the intermediate state B t . en, the formula to obtain the final output c t is as follows: In addition to the learnable weight W and bias b, the calculation of c t has three key parameters: the input vector dimension, the state dimension of the intermediate layer, and the number of memory cell layers. In particular, the input vector dimension is consistent with the received data n ′ . e middle layer's state dimension determines memory cells' learning ability. It is set to 32 to cover as many patterns as possible. e internal recursive structure of LSTM makes nonparallel operations more complicated. Noticeably, the excellent extraction ability of LSTM makes it unnecessary to stack too many layers in practical applications. For example, Google Translate only requires no more than eight layers to complete the vast majority of bidirectional translation tasks   Computational Intelligence and Neuroscience [52]. In this study, when the number of memory cell layers is set to 3, the correlation extraction of DDoS data can be satisfied.

Dense Layer.
e previous chapter realized the fusion of sequence information. en, in this part, the final evaluation value will be obtained based on information aggregation. In Figure 3, LSTM is followed by fully connected layers, constituting a classifier with the dropout layer and activation function. GAT and LSTM map the DDoS raw sample to the feature space. en, the fully connected layer maps the learned feature representation to the DDoS label space.
e dense layers are set to 3 to learn nonlinear correlation [53]. e number of neurons in each layer is 128, 64, and 32, considering the running efficiency and learning ability. We also add dropout layers after the first and second layers to avoid overfitting. In testing, the removal probability was set to 0.3 to improve the model's generalization ability. Distinguishing the attack type (slow or fast) is a multiclassification problem. Softmax is selected to assign probabilities between 0 and 1 for different input samples. e formula is as follows: is model belongs to supervised learning. Labels allow the model to use the feedback value of the cross-entropy loss function to correct errors during the back propagation. e weights and biases are updated layer by layer to approximate the expectation. Training ends when all iterations are over. Training is done multiple times, and the best performing parameters are saved for fast classification.

Experiment
In this section, we elaborate on the implementation and evaluation details of the proposed method. First, running environments are illustrated to enhance the reproducibility of results. Secondly, training datasets are selected, and data preprocessing is given. en, measures including accuracy, recall, precision, and F1-score are used to evaluate the effectiveness of detection methods. Finally, compared with baselines and state-of-the-art techniques, the performance of the proposed method is verified, and its efficiency is examined. Further, the correlation between the detection value and the source distribution is also analyzed.

Running Environment.
e experiments were run on a Windows 10 workstation with Intel Core i7-12700H 4.7 GHz processor, 32 GB RAM, 512 GB SSD, and NVIDIA RTX 3060 graphics card. e GLD-Net model uses Python 3.5 as the programming language and adopts Keras as the deep learning framework to improve portability. Keras provides structured modules and connects to the GPU for acceleration via the backend engine TensorFlow's cuDNN library. Additionally, libraries such as Pickle, NumPy, and SciPy are loaded to enhance the efficiency of algorithms.

Datasets.
Commonly used cybersecurity datasets include NSL-KDD2009 [54], CIC-IDS2017 [55], CIC-IDS2018 [56], and CIC-DDoS2019 [57]. In particular, CIC-IDS2018 and CIC-DDoS2019 simulate DDoS attacks through point-topoint transmission and lack topology characterization, which cannot meet the needs of this study. In addition to the 76 basic features collected based on CICFlowMeter, CIC-IDS2017 includes the timestamp, source IP and port, destination IP and port, protocol, and attack type. Topology structures in different periods can be obtained through the connection relationship's change between the source and destination IPs. Although IPs are not added in NSL-KDD2009, they can be obtained by parsing the original pcap of DARPA 98 and associated with the traffic record. erefore, this study chooses two public datasets, NSL-KDD2009 and CIC-IDS2017, as the experimental datasets. e attack on the fifth day of CIC-IDS2017 was a DDoS attack, and its traffic was recorded in "Friday-WorkingHours-Afternoon-DDos.pca-p_ISCX.csv." e topology change is illustrated by taking every 50 traffic records of the fifth day as a unit. Further, four destination IPs, 192.168.10.25/3/50/9, are selected as examples, as shown in Figure 6.
In Figure 6, a typical DDoS topology A-B-C appears in period 51∼100. Compared with the other two periods, the topology has changed significantly. From the number, there is a jump change 2-3-2; from the structure, there are different connection relationships: one-to-one, many-to-one, and one-to-many. e connections with the same source or destination address at various stages are also distinct. In brief, the topological changes like Figure 6 in the dataset can support the validation of the findings of this study. e first dataset, NSL-KDD2009, is an improved version of KDD99 [58]. It optimizes some inherent problems of KDD99, such as the repeated identical records, missing data, and disproportionate training and testing data. is dataset covers 39 conventional attack methods, including six information gathering (probe), ten blocking attacks (DDoS), nine privilege acquisition (U2L), and 14 remote logins (R2L). Its traffic composition is shown in Table 2, where the bold characters indicate the attack types of the training data. e second dataset, CIC-IDS2017, was developed by Sharafaldin et al. [59] to implement real network traffic collection based on user behavior simulation. It optimizes NSL-KDD2009 by adding the latest attack methods, expanding the feature set, and adding metadata. Fourteen new attack methods are included: two information gathering, six DDoS attacks, three privilege acquisition, and three remote logins. e dataset is not differentiated by training and test data but by acquisition period. Table 3 describes its composition.

Preprocessing.
DDoS-related traffic records are extracted from NSL-KDD2009 and CIC-IDS2017 and classified according to different attack principles, as shown in Table 4. Due to erroneous data, improper formatting, and redundancy, these two datasets require preprocessing before neural network training.

Computational Intelligence and Neuroscience
First, non-numeric features are standardized. ere are two types of non-numeric features in the dataset: irrelevant and categorical strings. e former, such as Flow ID, Source IP/Port, Destination IP/Port, and Timestamp in CIC-IDS2017, have nothing to do with flow characterization and are removed from the dataset. For the latter, such as pro-tocol_type, service, and flag in NSL-KDD2009, its classification includes detection information, which must be converted before use. ere are two standard methods of string conversion: one-hot and normalized encoding [60]. One-hot sparse matrix has an enormous dimension and low computational efficiency. is study uses continuous integers [0,1,2, ...] to encode the classification and map it between zero and one through normalization.
Secondly, the classification labels are digitized. Unlike categorical features, the Euclidean distance between labels  Input: NetFlow items F � f 1 , f 2 , · · · , f s , connection relationship C � < n u ⟶ n v > between node pair (n u , n v ) in node set N, time series T � t 1 , t 2 , · · · , t m Output: network structure graph G of node and edge distribution based on time series (1) Initialize graph G � G 1 ∪ G 2 ∪ · · · ∪ G m , normalize traffic F, create mapping table N ⟶ C F; (2) set edge list P �, time segment serial number i, item serial number j, items' quantity num; Calculate the total time T and average time T that fall into period t i ; (8) num � num + 1; (9) P.append(num); (10) extract features on t i , such as duration, interval, and bits; (11) save as edge i num � table(attr 1 , attr 2 , · · · , attr s ), (14) if graph G m does not contain node N src or N des of C i j then (15) add edge 〈N src , N des 〉 to graph G m ; (16) if graph G m contains the edge between N src and N des then (17) add extracted time attributes edge i num to the edge 〈N src , N des 〉; (18) else (19) create an edge between N src and N des ; (20) add extracted time attributes edge i num to the edge 〈N src , N des 〉; (21) return G ALGORITHM 1: Construction of dynamic DDoS topology graph based on time series. Computational Intelligence and Neuroscience used for error metrics cannot be represented by consecutive integers with uneven differences. After one-hot encoding, the distance between categories is the same and easy to matrix transformation. erefore, we choose one hot to represent labels for efficient loss function computation. en, the labels translate to normal traffic (1,0,0), fast traffic (0,1,0), and slow traffic (0,0,1). e columns marked with one here represent the corresponding classifications.
irdly, the invalid data are removed. ere are two types of useless data: useless row or column data. Useless row data include rows containing ambiguous characters "NaN" and "Infinity." Useless column data include the column where the 43rd feature "success_pred" of NSL-KDD2009 is located. is feature denotes the number of correct predictions, regardless of traffic attributes. Both are deleted from the dataset directly.
Finally, topological features are extracted and normalized. According to Algorithm 1, the topological structure data are obtained. We use min-max to normalize features to cancel the influence of different scales. x min and x max are used to represent the minimum and maximum values of feature x, respectively, and its calculation formula is as follows: x � x After the above processing, NSL-KDD2009 is transformed into a matrix consisting of 41 features and 45927 moments. CIC-IDS2017 is transformed into a matrix composed of 77 elements, with a total of 2827876 moments. e values in each matrix are between 0 and 1; then, we get normalized input data that are easy for deep learning architectures to process.

Performance Metrics.
e detection accuracy verification of the proposed method includes distinguishing between background traffic and attack traffic and between regular traffic, fast attack traffic, and slow attack traffic. e former is a two-class problem, and the latter is a three-class problem. e two have different fine-grained partitions, so we use the targeted evaluation criteria to measure. For binary classification, the indicators are established through the confusion matrix, which has four essential components: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). TP refers to the correct classification of positive samples as positive classes; FP is the proportion of negative samples misidentified as positive classes; TN refers to the correct classification of negative samples as negative classes; and FN refers to the misclassification of positive samples as negative classes. Based on the combination of these parts, we can get four performance metrics: precision, recall, precision, and F1score. In particular, accuracy is the proportion of correctly classified samples x correct to the total samples x total . e formula can be expressed as follows: Recall is the ratio of correctly classified positive samples x correct to the total positive samples x total . Its calculation is as follows: recall � x correct x total � TP TP + FN .
Precision is the ratio of the correctly classified positive samples x correct to the detected positive samples x total + . e formula is expressed as follows:  precision � x correct x total + � TP TP + FP . (17) F1-score refers to the weighted harmonic mean of recall and precision. It is used to measure the relative stability of the two. Its formula is as follows: For triple classification, we adopt comprehensive metrics to measure the overall performance of the detection method, such as macro-average and micro-average. In particular, the macro-average calculates the mean of the metrics under all categories; the micro-average is an extension of the twocategory metrics. Considering all class effects, we choose the macro-average as the three-class measure. Let n denote the number of classifications and Xi denote the ith value of indicator X, and the formula is as follows: Furthermore, this study also studies the relationship between the evaluation result and the distribution of attack source IPs. e Pearson coefficient is used as the correlation measure. Let cov(X, Y), σ x , and σ y denote the formulas for calculating the covariance and variance of variables x and y, respectively, and E(X) and E(Y) denote the expectation calculation. Its formula is as follows: . (20) 4.5. Results and Analysis. As shown in Table 4, the traffic composition of NSL-KDD2009 and CIC-IDS2017 is quite different. For these imbalanced datasets, 10-fold cross-validation is used for optimization. is method divides the sample into ten equal subsamples, sequentially uses one part for testing and the remaining nine parts for training, and takes the average of 10 times as the final result.
Testing the same method on different datasets may yield different results. erefore, all DDoS detection methods are validated on the same dataset for comparative effectiveness. In addition, the same features in Table 1 were chosen for training for control variables. e experiment consists of four parts: the comparison of two-classification methods, the comparison of three-classification methods, the correlation analysis of source IP distribution, and the analysis of method performance. Twoclassification and three-classification methods are not always the same. erefore, different baselines and state-of-the-art methods are selected as the comparison objects for the two comparison experiments.

Two-Classification Experiment.
We choose six baseline and state-of-the-art methods as comparison objects in the binary classification experiment. In particular, baselines include the statistical method NaHiD [22], machine learning SKM-HFS [25], and random forest [26]; the latest methods include LUCID [30], DDoSNet [9], and BI-LSTM-GMM [34]. Baselines are reproduced with the Python library. In particular, NaHiD is obtained according to the mean and standard deviation of NumPy. SKM-HFS and random forest are calculated according to scikit-learn. ese methods do not support GPU, so all baselines run on CPU. Furthermore, deep learning runs on GPU and compares the efficiency.
First, on NSL-KDD2009, GLD-Net is compared with six other means to verify binary classification performance. e epochs are set to 100, and the results are shown in Figure 7. As shown in Figure 7, these methods have different effects. Random forest achieves the best performance among the baselines with an accuracy of 0.896. e three deep learning methods show better detection performance with scores all above 0.9. In particular, BI-LSTM-GMM achieved the highest accuracy of 0.97 among the three. GLD-Net performs the highest metric on four indicators compared with the above techniques. Its accuracy reaches 0.991, which is 0.205 and 0.021 higher than the baselines and BI-LSTM-GMM, respectively.
Secondly, we also conducted a binary classification comparison experiment on CIC-IDS2017, and the result is shown in Figure 8. As shown in Figure 8, there is a significant gap between different methods. NaHiD still performs poorly, with both precision and recall not exceeding 0.65. e accuracy of deep learning is excellent, all exceeding 0.95. Compared with the other six methods, GLD-Net achieves the best performance, 0.191 and 0.0101 higher than the baseline and BI-LSTM-GMM.
Finally, we compare the accuracy distributions of GLD-Net and the three newest methods on two classifications, as shown in Figure 9. From Figure 9, GLD-Net has the highest accuracy and a concentrated distribution across multiple tests. e upper and lower quartile distances of LUCID and DDoSNet exceed 0.01, and the gap between the maximum and the minimum is close to 0.02. In contrast, the quantile distance of GLD-Net is only 0.003, more than four times lower than the average distance of 0.014 of other newest methods, showing the stability of the attack detection.
In summary, the accuracy of GLD-Net on NSL-KDD2009 and CIC-IDS2017 reaches 0.9914 and 0.9942, respectively. Compared with the latest methods, its average improvements are 0.021 and 0.0101; its stability increases four times, showing the best detection performance and stability.

ree-Classification Experiment.
Given the low precision, statistical methods are usually not used to solve multiclassification problems. We choose five baselines and state-ofthe-art methods as comparison objects in three-classification experiments. Among them, the baseline methods include decision tree [27] and random forest [26], and the latest methods include Stacked-DNN [36], FastGRNN [37], and ResNet [38]. Baselines are calculated according to scikit-learn.
First, we compare the three-classification performance of GLD-Net and five other methods on NSL-KDD2009. e results are shown in Figure 10. As shown in Figure 10, GLD-Net achieves the best three-classification performance compared with other methods. Its macro-accuracy reaches 0.958, an average improvement of 0.174 and 0.047 over baselines and the newest techniques. Secondly, based on CIC-IDS2017, the three-classification experiment was performed, and the results are shown in Figure 11. As shown in Figure 11, GLD-Net achieves the best performance, with the accuracy and F1-score reaching 0.925 and 0.924, respectively. Compared to Figure 10, deep learning performance degrades partly due to varying traffic types in datasets. Compared with baselines and the newest methods' averages, GLD-Net improves the accuracy by 0.131 and 0.019, respectively.
Finally, we compare the difference in the confusion matrix among GLD-Net and three other methods, as shown in Figure 12. e colors in the graph range from white to blue, representing accuracy from 0 to 1.0. e darker the blue, the higher the ratio. As shown in Figure 12, GLD-Net achieves good results in traffic type detection, and TP exceeds 0.9. While the normal flow of decision tree, slow flow of random forest, and fast flow of ResNet have lower TP, which is 0.62, 0.75, and 0.86, respectively. Detection based on GLD-Net has better balance and can meet the needs of fine-grained discrimination.
Compared with the state-of-the-art methods, the threeclass accuracy of GLD-Net is improved by 0.047 and 0.02, respectively. Its availability is also increased by 0.023, showing better performance and broad applicability.

Distribution Correlation Analysis.
First, the correlation between TP of attack detection and the number of attack source IPs is investigated. e results are shown in Figure 13. It can be seen from Figure 13 that there is a positive correlation between the TP of GLD-Net and the number of attack source IPs. Its Pearson coefficient is 0.789; greater than 0.75 shows a strong correlation.
Secondly, the correlation between TP and the IP hop count (the average hop count of all leaf nodes) is examined. e result is shown in Figure 14. As shown in Figure 14, a positive correlation exists between TP and IP hops' numbers using GLD-Net. e Pearson coefficient is 0.695, close to 0.7, indicating that the two have specific relevance. irdly, at the network level, the number of subnets [61] and closeness centrality [62] are used to investigate the aggregation and distribution of attack sources. e results are shown in Figure 15. It can be seen from Figure 15 that the number of subnetworks and closeness centrality increase with the rise of TP. After 200 cycles of calculation and taking the mean value, the Pearson coefficients obtained are 0.812 and 0.834, respectively, and over 0.8 indicates a strong correlation.
Finally, GLD-Net is compared with other detection methods using non-topological features as input in the correlation between TP and attack source IP number, hop number, subnet number, and closeness centrality. e results are shown in Figure 16. It can be seen from Figure 16 that the correlation coefficients of the comparative methods are primarily in the range 0.3∼0.5. e correlation coefficients of GLD-Net fall on the interval 0.7∼0.83, and the average increases in the four correlations are 0.441, 0.36, 0.393, and 0.391, respectively. e rise of around 0.4 shows that GLD-Net is more capable of inferring the distribution of attack sources based on the detection results than other methods.

Efficiency Analysis.
First, we investigate the accuracy variation of GLD-Net during 100 epochs. We choose BI-LSTM-GMM and ResNet as the two-classification and threeclassification comparison objects. e results are shown in Figure 17. It can be seen from Figure 17 that the accuracy of BI-LSTM-GMM and ResNet tends to be stable at the 22nd and 28th epochs, respectively. e accuracy of GLD-Net gradually stabilized at the 11th or 16th epoch. In contrast, our proposed model can converge faster and achieve better performance.
Secondly, we also analyze the training and testing time of GLD-Net. We selected six other methods for comparison, and the result is shown in Figure 18. As shown in Figure 18, the average training time of GLD-Net reaches 1312 s, 4.14 times that of random forest and 1.59 times that of ResNet. e test time of GLD-Net is 107 s, which is 8.16% of its training time and 50.47% of random forest. e results show that although the training time is slightly longer than other deep learning methods, the test time is still within the tolerance range. is overhead is worthwhile compared with the improved accuracy.
To summarize, GLD-Net converges in 11 or 16 epochs in binary or multi-classification, an average of 11.5 epochs ahead of the best other methods. e average training and testing times of GLD-Net are 1312 s and 107 s, respectively. Its training time is 4.14 times that of random forest; the test time is only 50.47% of that of random forest, indicating the better practical efficiency of GLD-Net.

Conclusions
In this study, we propose GLD-Net, a new deep learning DDoS attack detection method based on topological and flow features. A graph model is introduced for feature extraction. Traffic features are added to edge features, and node features represent topological features. A dynamic DDoS topology feature construction algorithm is proposed by calculating the feature table and mapping topological entities on the time series. For non-Euclidean input, GAT mines complex topological relationships, and LSTM extracts sequence correlation in vectors. Finally, the fully connected layer obtains the traffic type through data integration. e experimental results show that DDoS detection with topology and flow features as input can solve the problem of limited accuracy due to incomplete feature input. It can also estimate the distribution of attack sources based on the detection results, which facilitates the rapid and accurate deployment of subsequent security strategies. In the future, we also need to design a more fine-grained differentiation model for different traffic types and explore unknown traffic discovery techniques.
ese related researches will expand the scope of application to escort system security in the current increasingly complex network confrontation situation.

Future Research
We verified the effectiveness of the proposed DDoS detection method GLD-Net through comparative experiments on two network security datasets. Nevertheless, there are still the following issues to be studied. Question 1. We mainly verify the method's performance in distinguishing normal, fast, and slow traffic for the multiclassification. Whether this method is suitable for more finegrained differentiation, such as HTTP applications, requires further verification.

Question 2.
e neighborhood computation efficiency of GAT is not high, and it cannot cope with real-time training. Next, we need to study a lightweight GAT structure to improve the execution speed.

Question 3.
is research mainly focuses on relationship mining in the existing traffic. It cannot discover unknown traffic such as 0-day attacks. erefore, it is necessary to study novel deep learning methods that simultaneously identify known and anonymous traffic.
Data Availability e NSL-KDD2009 and CIC-IDS2017 datasets used to support the finding of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this study.