Recurrent and Deep Learning Neural Network Models for DDoS Attack Detection

Distributed denial of service (DDoS) attack is a subclass of denial of service attack that performs severe attack in a cloud computing environment. It makes a malicious attempt to disturb the usual services of any network or server by using botnets. Hence, an e ﬃ cient intrusion detection system (IDS) is essential to detect this attack. Some limitations in the existing IDS models for DDoS attack detection are delayed convergence, local stagnation issues, and local and global optimal trapping issues. These limitations are met by the recurrent neural network (RNN) and deep learning- (DL-) based proposed models that can utilize the previous states of the hidden neuron. The proposed research has used a long short-term memory (LSTM) recurrent neural network and autoencoder- and decoder-based deep learning strategy with gradient descent learning rule. The network parameters like weight vectors and bias coe ﬃ cient are tuned optimally by employing the proposed a hybrid Harris Hawks optimization (HHO) and particle swarm optimization (PSO) algorithm. The proposed hybrid optimization algorithm selects the essential attributes, and the results obtained con ﬁ rmed that the proposed LSTM and deep learning model outperformed all other models developed in the literature.


Introduction
1.1. Distributed Denial of Service Attack. The DDoS attack is one of the severe and most feared malicious cyber-attacks. It makes the website or a server down by flooding it with fraudulent traffic, ultimately making it inactive. Generally, DDoS attack packets have a high bit rate which performs network layer attacks [1]. A botnet master controls botnet machines present in various remote locations in this attack. Botnets are used in this type of attack as it is very similar to the normal traffic patterns of the Internet, and the owner is not aware of the commands received. Since numerous attack machines are involved, it is complicated to turn off these machines. The four components of a DDoS attack are attackers, master, zombie, and victim. This attack is generally classified into two types: bandwidth and resource depletion. In resource depletion, cloud resources are targeted, preventing legitimate users from accessing these resources. In bandwidth depletion, the victim's network resources are targeted as shown in Figure 1.
DDoS attack architecture is an agent handler model. In this model, the master communicates with the other parts of the DDoS attack system. Software packages on the Internet called masters use the handlers to communicate with the agents. Botnets, the so-called compromised machines, have agent software that performs the DDoS attack. The attacker can communicate with numerous handlers to identify the active agents to schedule an attack.
1.2. Recurrent Neural Network. Recurrent neural networks are deep neural networks that can be trained on large volumes of databases and perform well on natural language processing, speech recognition, and other classification problems [2]. The recurrent neural networks differ from the feedforward neural network by their recurrent structure. It has storage units that are programmed to store the previous history of hidden states in hidden layers that are utilized to estimate the output of the current iteration. The basic structure of the RNN is shown in Figure 2. The layer units are represented in Figure 3, in which the weight values in the input, hidden, and output units are defined as W i h , W h h , and W o h . The previous hidden states are utilized during the learning process to compute the current iteration output through the delay unit Z −1 . So, the previous history of output is employed during the learning phase.
The long short-term memory network is an RNN proposed by Hochreiter in 1997 [3]. In practical applications, long short-term memory (LSTM) neural networks, which belong to gated RNNs, can learn long-term dependencies more quickly than the simple recurrent architectures [4]. The LSTM architecture handles the vanishing gradient problem. The data flow during the training process is maintained by switching special gates that decide when to read and write and what data to be stored in the gates coordinately. The LSTM architecture is presented in Figure 4, where the input gate, output gate, and the forget gate maintain the flow of signal between the layers with long-term learning dependencies. The stacked LSTM model is presented in Figure 5, where each layer has individually an LSTM framework.
The recurrent unit is updated by the following expression: where the hidden states are represented by z t , the associated weight vectors are presented as V and W, b represents the bias coefficient, respectively, f is the nonlinear activation function employed, and x = ½x 1 , x 2 , x 3 ⋯ x n is the input vector. The sequence of the learning process is initialized with the decision made in forget gate, and the kind of information to be stored or thrown away is decided in this gate. The previous hidden states are estimated for each input x t , and these values are employed over the sigmoid layer. If the obtained result is 1 then the data in C t−1 is retained; else, it is removed. The following expression estimates the forget gate value: Two factors decide the decision on new information to be stored in cell memory. Initially, the input layer decides with data to be updated, and then the vector of new input data is created in the tan h layer. Based on these two-step results, the vector to be updated into the memory cells is decided.  2

Journal of Sensors
The output is estimated based on the information stored in cell state by the following expression: In recent years, deep learning has become the dominant paradigm in machine learning and computer vision due to the availability of extensive public data and computational resources [5]. Deep learning strategies have been introduced in neural network models to perform training by employing numerous hidden layers over the conventional neural network architecture. The proposed research has used a deep learning strategy because the well-trained neural network model can provide better intrusion classification performance than the generic learning algorithms. The architecture of DLNN is shown in Figure 6. The DLNN comprises several invisible Figure 4: Architecture of LSTM model.    Journal of Sensors   24 Meidan et al. [30] IoTPOT dataset Autoencoder, SVM, IsolationForest TPR, FPR, and detection time IoT security The proposed method can accurately and instantly detect botnets in IoT devices which promotes network security 25 Yadav et al. [31] AL-DDoS attack dataset is created in smart and secure environment (SSE) laboratory  Journal of Sensors layers, and each one carries a nonlinear transformation among the layers. The DLNN has been trained by unsupervised learning techniques and a backpropagation neural network. The unsupervised learning technique has utilized the autoencoder-decoder principle to pretrain the network and adopted a backpropagation neural network to fine-tune the DLNN. The autoencoder is used in unsupervised learning methods, and its output is taken as the input data [6]. The encoder network transforms the input data into code and from high-dimensional space into low-dimensional space. Then the decoder converts the input into its original form. The encoder vector e v in the encoder neural network is given in the following.
where e f represents the encoding function and x v denotes the input data. In a decoder neural network, the reconstruction process is performed by its decoding function d f . This process maps the given dataset from low-dimensional space into high-dimensional space. The decoder process has been done by using the following. x The reconstruction error eðx,xÞ is minimized by using these encoder and decoder processes for the number of trained samples. The term eðx,xÞ is denoted as loss function, examining the inconsistency among encoded and decoded samples. The minimization of reconstruction error is the main objective of the unsupervised autoencoder.
The function of encoding and decoding, along with the nonlinearity process, has been performed by using the following.
where e af e and d af e represent the encoder and decoder activation functions. The network bias is indicated by b, and the weight matrices of the network are given by W and W T . The reconstruction error process is as follows.
The pretraining of the DLNN model is carried out by developing the encoder process in the previous module. The input layer of the DLNN network with the first hidden layer is regarded as the encoder neural network of the first autoencoding process for the given input signal x v . The reconstruction error is minimized by training the first autoencoder process. The first trained parameter of the encoder neural network is used to initialize the first hidden layer of the DLNN process using the following.
Now, the input data becomes the encoder vector e v 1 . The encoder neural network for the second autoencoder is obtained from the first and second hidden layers of DLNN. Next, the second trained autoencoder is used to initialize the second hidden layer of the DLNN network. The above- The N th trained parameter of the encoder neural network is denoted by θ N . The hidden layer of DLNN is pretrained by the N-stacked encoder process. This pretrained process avoids the local minima and improves the generalization aspect. The output of the DLNN model is calculated by using the following.
The trained parameter of the output layer is denoted by θ N+1 . The output error is reduced by using the backpropagation algorithm.

Related Works on DDoS Attacks Detection
In this section, an overview of existing intrusion detection techniques for DDoS attacks is discussed in detail and tabulated in Table 1. The proposed DDoS detection algorithms and techniques are analyzed based on their performance metrics.

Proposed Hybrid Optimization Algorithm
The proposed hybrid swarm intelligent optimization algorithm serves a twofold purpose. Initially, the algorithm is employed to select the significant features employed for the attack identification and tune the proposed neural networkbased IDS models with optimal parameter settings. These objectives have been achieved by employing the Harris Hawks optimization algorithm, which has limitations during the training process. These limitations are addressed by combining HHO optimizer with particle swarm optimization Input: population size, convergence criteria, random factors, acceleration coefficient, inertia factor, upper and lower bounds Output: the fitness value and the corresponding position of the prey 1: Initialize the population 2: while (stopping criteria) do 3: Fitness (all Hawks in population) 4: if current pBest > pBest 5: then pBset = current pBest 6: else pBest = pBest 7: gBest = particle with best pBest among the population 8: Define the position of the rabbit 9: for (all Hawks) 10: Update the initial energy level of prey and its jumping power 11: Update the current energy level of prey # Exploration Phase 12: if ðjEj ≥ 1Þ 13: The position of each hawk in the population is adjusted by equation  Journal of Sensors algorithm and also attain better tradeoff between exploration and exploitation ability of the algorithm. The main inspiration of HHO is the cooperative behavior and chasing style of Harris Hawks in nature called surprise pounce. The detailed pseudo code of the HHO algorithm is presented in [39]. The conventional HHO algorithm suffers from poor exploration ability as the Hawks need to wait for prey from several minutes to hours. This limitation has been eliminated by improving the convergence speed of the algorithm, which has been done by integrating particle swarm optimization and HSO. PSO is a population-based optimization technique applied extensively to many engineering problems [40]. The proposed research has chosen the PSO algorithm because of its simplicity and excellent exploration ability. The detailed pseudocode of the PSO algorithm is presented in [41]. The advantages of HHO and PSO have been combined to give a hybrid HHO algorithm to attain a tradeoff between exploration and exploitation mechanisms than

Proposed Hybrid HO-PSO-LSTM and Deep Learning Models
The proposed hybrid learning model is shown in Figure 7.
The original dataset has 41 features, and the main objective of the proposed models is to attain better accuracy with a reduced number of features. So, the optimal feature selection is performed by employing the proposed HHO-PSO optimization algorithm; the algorithm parameters values are presented in Table 2. The selected features are shown in Tables 3 and 4 and fed into the network, and the corresponding performance is evaluated. 10-fold cross-validation is employed for each fold to select the optimal features, and the selected features are tabulated. The optimal features are identified at the end of the 10-fold cross-validation process based on the frequency of occurrences. The selected features by proposed HHO-PSO algorithm for LSTM is given in Table 5, and the frequency of selected features between the proposed model and existing models is presented in Table 6.

Result Comparison and Discussion for Proposed Hybrid HHO-PSO-LSTM
The proposed IDS models are evaluated with NSL benchmark datasets. The model is iterated for ten trial runs to avoid biased output, and the model performance for each trial run is depicted in Figure 8. The average performance of the proposed LSTM model for the 10-trial run is presented in Table 7 and Figure 9. The classic LSTM model reported an accuracy of 0.9541, but it has poor true negative class identification as compared to the hybrid models. Feeding the model with a PSO-based feature selection strategy enhances performance more than the conventional LSTM strategy. Compared to PSO, on incorporating the HHO strategy, the true negative and true positive cases are improved. So, the model is fed with a hybrid feature selection strategy, and the performance is investigated compared to the other two algorithms. The performance is improved with false negative cases significantly reduced. This shows the significance of the proposed hybrid HHO-PSO optimization strategy in enhancing the performance of the conventional LSTM model.
The random initialization of weight vectors and bias coefficients affects the conventional neural network model. The proposed LSTM model is optimally framed by feeding optimal weight and bias coefficients by the proposed HHO-PSO optimization algorithm that improves the convergence speed presented in Figure 10. The conventional LSTM model converges at the 353 rd iteration. In contrast, the proposed optimally constructed LSTM model starts to converge at the 200 th iteration, and also the reported lacuna of delayed convergence is handled effectively by the proposed LSTM model. To further demonstrate the effectiveness of the proposed model, the model performances are compared with the existing models in the works of literature and other models, as presented in Table 8. The reported results inferred that the proposed hybrid HHO-PSO-LSTM IDS model outperformed all other models with better intrusion classification performance.

Result Comparison and Discussion for Proposed Hybrid HHO-PSO Deep Learning IDS Model
The size of the original dataset has reduced after the feature selection process, and the selected feature subset is fed into the proposed deep learning model for intrusion classification. Parameter handling is the major challenge of the deep learning models, which has been done by optimally choosing the weight and bias vectors of the proposed model. Initially, numerous trials are made to fix the number of hidden layers for developing the DLNN model. The number of hidden  HHO-PSO   #1  F3, F4, F5, F6, F8, F10, F12, F25, F26, F29, F30, F35, F36, F37, and F39  #6  F3, F4, F5, F6, F8, F9, F12, F23, F25, and F30   #2  F4, F5, F6, F8, F12, F25, F26, F30, F35, F36, and F39  #7  F3, F4, F5, F6, F8, F12, F23, F25 10 Journal of Sensors layers and hidden neurons is another factor deciding the complexity of the network. So, the hidden layers and hidden neurons are fixed based on the trial and error method, and the model performances for various hidden layers are presented in Table 9.
The error rate of the proposed model for 10 trial run and their corresponding hidden layers are shown in Figure 11. Handling the overfitting issue is the primary task of the deep learning models, which can be done by analyzing the training efficiency of the proposed model. The training and testing accuracy for various hidden layers are shown in Figure 12, which has confirmed that the network underfits till trial 7 and it overfits from trial 9. So, the overfitting and the underfitting issue has been avoided by fixing 12 numbers of hidden layers. Further, Table 10 confirmed that the proposed model achieved better intrusion classification performance for 12 hidden layers. So, the proposed model has fixed 12 numbers hidden layers. Comparison of the proposed models with existing models is shown in Table 11. Comparison of percentage improvement of the proposed HHO-PSO-DLNN model with other models is shown in Table 10. Comparison of computational time involved in the proposed models and algorithms is tabulated in Table 12. Table 12 shows that the computational time of the proposed hybrid HHO-PSO-DLNN is higher than all the other models under comparison due to its increase in the number of hidden layers. But the important tradeoff is that the performance metric values in DDoS attack detection in proposed optimized DL are much better than other algorithms. The proposed DLNN model is tuned for its weight and bias vectors by individually employing PSO and HHO algorithms and then using the proposed hybrid HHO-PSO optimization algorithm. The performance comparison of conventional DLNN, PSO-DLNN, HHO-DLNN, and hybrid HHO-PSO-DLNN has been done and is shown in Figure 13. ROC curve and rate of convergence of the proposed model are shown in Figures 14 and 15. The performance of the proposed hybrid HHO-PSO-DLNN model is observed to be better than the conventional DLNN, PSO-DLNN, and HHO-DLNN models.

Statistical Analysis of the Proposed Models
Dietterich [58] recommended the statistical models that can be executed 10 times. The 5 × 2 cv test is a powerful strategy for measuring the classifier algorithms' statistical variation. In the proposed scheme, twofold cross-validation is performed five times, where 50% of the data sample is employed for training, and the remaining 50% is employed for testing. Then the dataset is shared such that the testing sample is employed for training and the training sample is adopted for testing, respectively. The performance difference between Table 6: Frequency of selected features between the proposed model and existing models.
The mean and variance of difference are evaluated by using the following equations.
The t statistics is evaluated after 5 iterations as follows: where Acc A,1 is the accuracy of the first iteration sample. The t distribution with 5 degrees of freedom is followed by t statistics. The p value is estimated and compared with the level of significance α = 0:05. When the p value is less than the level of significance, then the null hypothesis is rejected, and both the models reported similar performance. This signifies the estimated difference to be real. Otherwise, if the p value is greater than the significance level, then the null hypothesis is not rejected. The difference obtained in performance is probably due to stochastic factors or a statistical coincidence. The performance of the proposed HHO-PSO-tuned DLNN model is compared with the performance of other models and confirmed that the proposed HHO-PSO possesses better performance, which is reported in Table 13. The results obtained confirmed that the proposed HHO-PSO-DLNN model is statistically better than all other models developed in this proposed research work.       Trial-1 Trial-2 Trial-3 Trial-4 Trial-5 Trial-6 Trial-7 Trial-8 Trial-9 Trial-10   5  6  7  8  9  10  11  12  13  14 Testing accuracy Training accuracy Figure 12: Training efficiency of the proposed model for 10-trial runs.

Conclusion and Future Directions
The proposed model in this paper employed a recurrent neural network and deep learning neural network model for the intrusion classification problem. The LSTM network is adopted, and its intrusion detection performances are investigated. The hybrid HHO-PSO algorithm is employed to improve the model's performance, and the model response is analyzed by the metric values obtained. It is investigated that the proposed hybrid HHO-PSO optimization algorithm has improved the performance of the neural network models by providing a minimal number of optimal feature subsets with better classification accuracy. The convergence speed of the proposed model is improved than the other models in this study and demonstrated better performance than other models in the literature. The deep learning architecture for DDoS attack detection based on autoencoder-decoder strategy is initially framed based on numerous trial results. Then, the weight vectors of the proposed DLNN model are optimally tuned by the proposed hybrid HHO-PSO optimization algorithm. The model performances are analyzed for each trial run. Further, the effectiveness of the proposed model has been analyzed by comparing the model's performance with other existing works. Finally, a statistical analysis is made based on 5 × 2 cv test to justify the superiority of the proposed hybrid HHO-PSO-DLNN model with all other existing models. It is confirmed that the proposed DLNN model outperformed all other models in intrusion detection under a cloud computing environment.
The following are the future research directions for DDoS attack detection using DL methods: (i) DDoS DL-based detection methods can be combined with eXplainable Artificial Intelligence (XAI) techniques, which leads to global interpretation [59] (ii) The proposed DDoS detection DL-based methods can be implemented in a larger system with which it is easy to detect the compromised end points. It also helps us to improve the performance of the proposed algorithms making them skillful in handling the various abnormalities that occurs in the performance of the network (iii) Efficient and lightweight DL models can be developed for attack prone networks which has limited resources for computing (iv) Pattern of attacks changes at a much faster rate, and hence, automatic updated DL models can be developed to detect new DDoS attack instances