An Efficient and Effective Approach for Flooding Attack Detection in Optical Burst Switching Networks

Optical burst switching (OBS) networks are frequently compromised by attackers who can flood the networks with burst header packets (BHPs), causing a denial of service (DoS) attack, also known as a BHP flooding attack. Nowadays, a set of machine learning (ML) methods have been embedded into OBS core switches to detect these BHP flooding attacks. However, due to the redundant features of BHP data and the limited capability of OBS core switches, the existing technology still requires major improvements to work effectively and efficiently. In this paper, an efficient and effective ML-based security approach is proposed for detecting BHP flooding attacks. (e proposed approach consists of a feature selection phase and a classification phase. (e feature selection phase uses the information gain (IG) method to select the most important features, enhancing the efficiency of detection. For the classification phase, a decision tree (DT) classifier is used to build the model based on the selected features of BHPs, reducing the overfitting problem and improving the accuracy of detection. A set of experiments are conducted on a public dataset of OBS networks using 10-fold cross-validation and holdout techniques. Experimental results show that the proposed approach achieved the highest possible classification accuracy of 100% by using only three features.


Introduction
Optical burst switching (OBS) in networks has become an important dynamic sub-wavelength switching technique and a solution for developing the new type of Internet backbone infrastructure [1]. e OBS network mainly consists of three types of nodes, namely, core nodes, ingress, and egress. e core nodes represent the intermediate nodes, which are designed to reduce the processing and buffering of the optical data burst using a control data packet with specific information, namely, burst header packets (BHPs) [2].
In a network with burst traffic, OBS plays an essential role for packet switching with a higher level of necessary details than other existing networks' switching techniques. However, this type of switching is still suffering from several challenges such as security and quality of service (QoS) due to BHP flooding attacks. e function of BHP in OBS is to reserve the unused channel for the arrival of a data burst (DB). is function can be exploited by attackers to send fake BHPs without DB acknowledgment. Such fake BHPs can affect the network and reduce its performance through decreasing bandwidth utilization and increasing data loss, leading to a denial of service (DoS) attack [3], which is one of the most crucial security threats to networks.
Several methods have been proposed to tackle DoS and BHP flooding attacks on OBS networks in the literature and have achieved satisfactory results [4][5][6]. However, due to the limited capability of OBS core switches, developing a lightweight method that can attain high accuracy with a small number of features is still a challenging issue for developers and researchers.
In this research, an effective and efficient approach is proposed for securing the OBS networks. us, the main objective of the work is to develop a lightweight ML model for detecting BHP flooding attacks based on the information gain (IG) feature selection method and a decision tree (DT) classifier. To achieve this objective, two key research questions are formulated to answer throughout this study. e first research question is does the feature selection method improve the effectiveness of the DT model to detect the BHP flooding attacks. e second research question is does the feature selection method improve the efficiency of the DT model for detecting the BHP flooding attacks. Actually, the lightweight property of the model comes from the fact that only a small number of features are used to build the classifier. e model will be evaluated using a public OBS dataset based on a set of performance metrics such as accuracy, precision, recall, and F-measure. e remainder of the research is organized as follows: (i) In Section 2, related works are introduced to give details about the proposed approaches and methods of DoS attack on different networks. (ii) Section 3 presents the proposed approach architecture for detecting the BHP flooding attacks on OBS networks. (iii) Section 4 explains the experimental setup and results in more detail. (iv) Section 5 presents the conclusion of the study.

Related Works
Nowadays, machine learning (ML) methods have been used in many intrusion detection systems (IDSs) to detect several types of network attacks. However, feature selection methods are also used to select the significant features of network traffic without reducing the performance of the IDSs [7]. Feature selection is the process of selecting the best set of features that can be most effective for classification tasks [8,9]. e high number of features may decrease the performance and accuracy of many classification problems [10,11].
In the field of optimization, feature selection methods are classified in three main approaches: embedded, wrapper, and filter methods [12]. For the filter methods, there are two major types of evaluation: subset feature evaluation and groups of individual feature evaluation. In the groups of individual feature evaluation, heuristic or metaheuristic filter methods or even the hybrid of them is utilized for ranking the features and then the best of them is selected based on some thresholds [11,13]. In contrast, the subset feature evaluation methods find the subset of candidate features using a certain measure or a certain strategy. ey compare the previous best subset with the current subset for finding the candidate subset of features. In the groups of individual feature evaluation methods, the redundant features are kept in the final subset of selected features according to their relevance but the group of subset feature evaluation methods removes the features with similar ranks. In general, the filter methods are considered as classifierindependent approaches [13]. e wrapper methods are classifier-dependent approaches that take each time a subset of features from the total features and calculate the accuracy of classifiers to find the best subset. erefore, they are time consuming compared with filter methods [14]. e embedded methods combine wrapper and filter methods [15]. In this study, a filter-based method is used for feature selection.
In the literature review of intrusion detection, a set of ML and deep learning (DL) methods have been widely used to detect different types of attacks in several works [16][17][18][19][20]. Meanwhile, a set of related works have also been proposed for detecting BHP flooding attacks using different ML methods like the decision tree (DT) method in [21]. is work evaluated the performance of the adopted method using different metrics and reported a 93% accuracy rate in classifying the classes of BHP flooding attack. Liao et al. [22] introduced a classification approach to classify the access patterns of various users using sparse vector decomposition (SVD) and rhythm matching methods. is study demonstrates that the approach is able to distinguish between the intruders and the legal users in the application layer.
Xiao et al. [23] offered an effective scheme for detecting a distributed DoS attack (DDoS) using the correlation of the information generated by the data center and the k-nearest neighbors (KNNs) method. ey analyzed the flows of data traffic at the center to identify normal and abnormal flows. In [24], the authors proposed an approach for detecting DDoS attacks based on seven features and using an artificial neural network (ANN) method with a radial basis function (RBF). is NN-RBF approach can classify the data traffic into attack or normal classes by sending the IP address of the incoming packets from the source nodes to be filtered in the alarm modules which then decide if these data packets can be sent to the destination nodes. e authors in [25] applied a data mining method for detecting a DDoS attack using the fuzzy clustering method (FCM) and a priori association algorithm to categorize the data traffic patterns and the status of the network. Another ML approach in [26] used a DT method with a grey relational analysis for detecting DDoS attacks. ey also applied the pattern matching technique to the data flows for tracing back the estimated location of the attackers.
Alshboul [27] investigated the use of rule induction nodes for BHP classification in OBS networks. e author applied a set of data mining methods to the public OBS network dataset. He reported that the repeated incremental pruning to produce error reduction (RIPPER) rule induction algorithm, Naïve Bayes (NB), and Bayes Net were able to achieve a predictive accuracy of 98%, 69%, and 85%, respectively.
Chen et al. [28] developed a detection method to identify a DDoS attack using ANN. A set of different simulated DoS attacks were used for training the ANN model to recognize abnormal behaviors. Li et al. [29] offered different types of ANN models, including learning vector quantization (LVQ) models, to differentiate traffic associated with DDoS attacks from normal traffic. e authors converted the values of the dataset features into a numerical format before feeding them into the ANN model.
In [30], the authors presented a probabilistic ANN approach for classifying the different types of DDoS attacks. ey categorized the DDoS attacks and normal traffic by applying radial basis function neural network (RBF-NN) coupled with a Bayes decision rule. Nevertheless, the approach concentrated on the events of unscrambling flash crowds generated by DoS attacks.
Li and Liu [31] proposed a technique that integrates the network intrusion prevention system with SVM to improve the accuracy of detection and reduce the incidents of false alarms. In [32], Ibrahim offers a dynamic approach based on distributed time-delay ANN with soft computing methods.
is approach achieved a fast conversion rate, high speed, and a high rate of anomaly detection for network intrusions.
Gao et al. [33] introduced a data mining method for analyzing the piggybacked packets of the network protocol to detect DDoS attacks. e advantage of this method is to retain a high rate of detection without manual data construction. Hasan et al. [34] proposed a deep convolutional neural network (DCNN) model to detect BHP flooding attacks on OBS networks. ey reported that the DCNN model works better than any other traditional machine learning models (e.g., SVM, Naïve Bayes, and KNN). However, due to the small number of samples in the dataset and the limited resource constraints of OBS switches, such deep learning models are not effective tools to detect BHP flooding attacks and they are not computationally efficient to run in such network.

Proposed Approach
e proposed approach in this paper consists of two main phases: feature selection and classification. e input of the approach is a set of OBS dataset features collected from network traffic. e output of the approach is a class label of the BHP flooding attacks. e flowchart of the proposed approach is illustrated in Figure 1.
In the feature selection phase of the approach, the input features of OBS network traffic are prepared for processing by using the information gain (IG) feature selection method. e purpose of IG is to rank the features and discover the merit of each of them according to the information gain evaluation of the entropy function. e output of the feature selection phase is a scored rank of features in decreasing order according to their merit, whereby adding any feature decreases the features merit.
is is then followed by the classification phase, in which the dataset with selected features will be used to train and test the DT classifier to detect attacks on OBS networks. e output of the classification phase is a DT trained model that is able to classify the BHP flooding attacks and return the class label of that attack. e following sections explain the methods used in the two phases of the proposed approach.

Information Gain (IG) Feature Selection Method.
Information gain (IG) is a statistical method used to measure the essential information for a class label of an instance based on the absence or presence of the feature in that instance. IG computes the amount of uncertainty that can be reduced by including the features. e uncertainty is usually calculated by using Shannon's entropy (E) [35] as where n represents the number of class labels and P i is the probability that an instance i in a dataset D can be labeled as a class label c by computing the proportion of instances that belong to that class label for the instance i as follows: (2) A selected feature f divides the training set into subsets D 1 , D 2 , . . . , D v according to the values of f, where f has v distinct values. e information required to get the exact classification is measured by where |D j |/D represents the weight of j th subset, |D| is the number of instances in the dataset D, |D j | is the number of instances in the subset D j , and E(D j ) is the entropy of the subset D j . erefore, the IG of every feature is calculated as After calculating the IG for each feature, the top k features with the highest IG will be selected as a feature set because it reduces the information required to classify the flooding attack.

Decision Tree Method.
Decision tree (DT) is a tree-like model of decisions with possible consequences that is commonly used in the fields of data mining, statistics, and machine learning [36]. In machine learning, the goal of DT is to build a model that predicts or classifies the value of a target class based on a learning process from several input features. e tree model that has a target class label with discrete values is called a classification tree model. In this model, the tree leaves constitute the values of the class label and the tree branches constitute aggregations of features that produce this class label.
DT learning is a simple process to represent the features for predicting or classifying instances. DTmodels are created by splitting the input feature set into subsets that establish the successor nodes of the children, thereby establishing the tree root node. Based on a set of splitting rules on the values of the features, the splitting process for each derived subset is repeated in a recursive manner [36]. is recursive manner is stopped when the splitting process no longer adds values to the predictions or when the subset of nodes have all the same values of the target class label. e DT can be described also as a mathematical model to support the categorization, description, and generalization of a given dataset.
Assume the dataset comes in the form of records as follows: where the variable y is a dependent target variable that we need to generalize or classify. e vector x consists of the features x 1 , x 2 , x 3 , . . . , x k , which are led to the variable y.

Security and Communication Networks
In principle, the DT is based on the C4.5 algorithm [37], which is an updated version of the ID3 algorithm [38]. C4.5 can avoid the overfitting problem of ID3 by using the rulepost pruning technique to convert the building tree into a set of rules.
DT is used in the proposed approach because it is simple, very intuitive, and easy to implement. Furthermore, it deals with missing values, requires less effort in terms of data preprocessing, and does not need to scale or normalize the data [36].

Experiments and Discussion
e experiments of this research are implemented using a popular open source tool called the Waikato Environment for Knowledge Analysis (Weka) software [39], which offers a rich toolbox of machine learning and data mining methods for preprocessing, analyzing, clustering, and classification. It offers Java-based graphical user interfaces (GUIs). e implementation was performed on a laptop with an Intel Core i7 CPU processor, 2.0 GHz, 8 GB RAM, and a Windows 10 64 bit operating system. Due to the scarcity of OBS historical data, the experiments were conducted on a public optical burst switching (OBS) network dataset [1].

OBS Network Dataset Description.
e OBS network dataset is a public dataset, available from the UCI Machine Learning Repository [1]. It contains a number of BHP flooding attacks on OBS networks. ere are 1,075 instances with 21 attributes as well as the target class label. is target label has four types of classes, which are NB-no block (not behaving-no block), block, no block, and NB-wait (not behaving-wait). All dataset features have numeric values except for the node status feature that takes a categorical value out of three values: B (behaving), NB (not behaving), and potentially not behaving (PNB). e description of the dataset features is given in Table 1. Table 2 shows the number of instances for each class in the dataset, while Figure 2 shows the distribution of instances over different types of BHP flooding attacks. is figure is deduced from the dataset.

Evaluation Measures.
e experimental results will be evaluated using four evaluation measures. ese measures are precision, recall, F-measure, and accuracy. e following equations show how these evaluation measures are computed: where FP is the number of false positives, FN is the number of false negatives, TP is the number of true positives, and TN is the number of true negatives.

Results and Comparisons.
In this section, the experimental results for both the feature selection and classification phases of the proposed approach are given in detail. e average rank score and average merit of features from the IG feature selection method are shown in Table 3 and are based on a 10-fold cross-validation with stratified sampling in order to guarantee that both training and testing sets have the same ratio of classes. In Table 3, the dataset features are ranked in decreasing order according to their significance to target classes. e reason behind this variation in the feature significance is that the target class has four categorical labels, and for each label, different values for each feature are assigned. erefore, the rank score from the IG method determines how much each feature contributes to the target class label. e rank scores in Table 3 show that the "packet received," "10-run-AVG-drop-rate," and "flood status" features have higher scores than all the other features. us, the hypothesis that those first three features (packet received, 10-run-AVG drop-rate, and flood status) are more influential and more correlated to the labels of target class will be checked experimentally in the following paragraphs. To accept or reject this hypothesis, the evaluation results of the DT method are presented using all features and the combinations of the three selected features. ese evaluation results are reported based on the holdout and 10-fold crossvalidation techniques. For the holdout technique, the dataset is divided into 75% for training and 25% for testing. Before applying the DT method for classifying the types of BHP flooding attacks and getting the results, an analysis of the DT Reserved bandwidth It is a numeric feature denoting the initial reserved bandwidth assigned to a given node.

Average delay time per sec
It is a numeric feature denoting the average delay time per second for each node. It is also called end-to end delay feature. 6 Percentage of lost packet rate It is a numeric feature representing the percentage rate of lost packets for each node.

7
Percentage of lost byte rate It is a numeric feature representing the percentage rate of lost bytes for each node.
8 Packet received rate It is a numeric feature representing the packet received rate per second for each node based on the reserved bandwidth.

9
Used bandwidth It is a numeric feature represents the bandwidth used or what each could reserve from the reserved bandwidth. 10 Lost bandwidth It is a numeric feature denoting the lost amount of bandwidth by each node from the reserved bandwidth.

Packet size byte
It is a numeric feature denoting the packet size in bytes allocated explicitly for each node to transmit. For instance, if the data size is 1440 bytes and there are 60 bytes for (IP header 40 bytes) + (UDP header 20 bytes), then all headers will be added to the data size to get 1500 byte as follows: packet size � ((data size 1440 bytes) + (IP header 40 bytes) + (UDP header 20 bytes)) � 1500 bytes.

12
Packet transmitted is is a numeric feature representing the total packets transmitted per second for each node based on the reserved bandwidth.

13
Packet received is is a numeric feature representing the total packets received per second for each node based on the reserved bandwidth.
14 Packet lost is is a numeric feature representing the total packets lost per second for each node based on the lost bandwidth. 15 Transmitted byte is is a numeric feature representing the total bytes transmitted per second for each node. 16 Received byte It is a numeric feature denoting the total bytes received per second for each node based on the reserved bandwidth. 17 10-run-AVG-drop-rate is is a numeric feature representing the rate of average packets that drop for 10 consecutive iterations and runs. 18

10-run-AVGbandwidth-use
It is a numeric feature representing the average bandwidth that is utilized for 10 consecutive iterations and runs. 19 10-run-delay is is a numeric feature representing the time of average delay for 10 consecutive (run) iterations.

20
Node status is is a categorical feature. It is an initial classification of nodes based on the rate of packet drop, used bandwidth, and average delay time per second. e categorical values are B for behaving, NB for not behaving, and PNB for potentially not behaving.

21
Flood status is is a numeric feature that represents the percentage of flood per node. It is based on the packet drop rate, medium, and high level of BHP flood attack in case behaving (B).

22
Class label is feature is a categorical feature that represents the final classification of nodes based on the packet drop rate, reserved bandwidth, number of iterations, used bandwidth, and packet drop rate. e categorical values of the class label are NB-no block, block, no block, and NB-wait  parameters is investigated to tune and select the best values of these parameters. Practically, the DT classifier (J48) in Weka performs the pruning process based on a set of parameters, which are the subtree raising, the confidence factor, and the minimal number of objects. e default values of these parameters are true, 0.25, and 2, respectively. e subtree raising is the parameter that can be used to move the node of the tree upwards towards the root that can replace other nodes during the pruning process. Confidence factor is a threshold of acceptable error in data through pruning the DT and this value should be smaller. However, in the proposed approach, the values of subtree raising and confidence factor parameters are set to have the default values. e minimal number of objects is very important parameter to represent the minimal number of nodes in a single leaf. It is used to obtain smaller and simpler decision trees based on the nature of the problem. For tuning the minimal number of objects parameter, we try a set of different values for selecting the best value of this parameter. Figure 3 shows the accuracies of proposed approach at different values of minimal number of objects in the range from 2 to 5. ese accuracies are obtained using the holdout technique with 75% training and 25% testing.
As shown in Figure 3, it is clear that the best values of minimal number of objects in a single leaf are 1 and 2 that generate a simple and accurate DT model. e value of this parameter is set to be 2 to make the DT model moderately simple.
Once the values of DT parameters are selected, the evaluation results of the proposed approach are reported in next tables and figures. Table 4 presents the evaluation results of the holdout technique for classifying BHP flooding attacks using all features in the dataset. Figure 4 shows the confusion matrix of classification for the 25% testing set. Table 5 illustrates the evaluation results of the holdout technique for classifying the BHP flooding attacks using the first three selected features (packet received, 10-run-AVGdrop-rate, and flood status) of the dataset, and Figure 5 shows the confusion matrix of this evaluation result.
From Tables 4 and 5, as well as from Figures 4 and 5, it is clear that the selected features improved the values of evaluation measures for the DT method to classify the BHP flooding attacks. Moreover, for efficiency, detecting attacks using only three features is more efficient for the OBS core switches, which have limited resources.
To validate the evaluation results, other experiments for the DT classification method based on the 10-fold crossvalidation technique were conducted using all features and using the first three selected features from the IG feature selection method. Table 6 shows the evaluation results, and Figure 6 shows the confusion matrix for classifying the BHP flooding attacks using all features based on the 10-fold crossvalidation technique.
Similarly, Table 7 and Figure 7 present the evaluation results and the confusion matrix, respectively, for classifying the BHP flooding attacks using the first three selected features based on the 10-fold cross-validation technique. e evaluation results in Tables 6 and 7 and Figures 6 and  7 validate the evaluation results of the 10-fold cross-validation technique that confirm the remarkable performance of the proposed approach. After further investigation, the evaluation results of the DT classification methods using one and two features from the first three selected features are compared with the previous results of the holdout and the 10-fold cross-validation techniques and are shown in Figure 8. Table 8 shows and summarizes a comparison between the proposed approach and the recent related works on the OBS network dataset. In this comparison, we can see that the proposed work achieves the highest accuracy result with a small number of features compared to all these recent works. e results presented in Figure 8 and Table 8 prove the hypothesis of the proposed approach that says that the first three selected features using the IG method are more  influential and more correlated to the labels of BHP flooding attacks than any of the other features.

Result Analysis.
For analyzing the results and linking the results with conclusion, we show how the proposed feature selection method can improve the model from three different angles: reducing overfitting, improving accuracy, and reducing training and testing (prediction) time.
From the definition of the overfitting problem, it occurs when the training errors are low or very low and the validation errors are high or very high. erefore, reducing the     overfitting problem requires to reduce the gap between the training and validation error. To show how the proposed method can reduce the overfitting problem, we depict the training error against the validation error in Figure 9 with different sets of features, which are ordered according to rank score given in Table 3. e training percentage is set to 75%, and the validation percentage is 25%. We notice that the gap between the training and validation error is decreased as the number of features is decreased until the gap reaches zero approximately when using the three selected features of the proposed method. We also notice that the overfitting problem is eliminated with 14 and 7 features. In our opinion, the overfitting problem is eliminated with 14 and 7 features because of an implicit pruning functionality implemented by the used decision tree algorithm (J48). In addition, it is clear that the accuracy is improved by the three selected features.
To evaluate the efficiency of the proposed feature selection approach, the average time of building and testing the DT model is computed. e DT model is trained on 75% of the dataset which consists of 806 instances and tested on 25% of the dataset which consists of 269 instances. Table 9 shows the computed average time of training and testing the DT model using all features and using our three selected features.
As shown in Table 9, we can see that the DT model has a lower average time for training and testing using our three selected features than using all features. In terms of time complexity, represented by O notation, the overall average time of the DT method is O (m × n), where m is the number of features and n is the number of instances [40]. Because the number of features in classification problems is limited, the running time will be O (C × n), where C is a constant time. erefore, the time complexity of the DT method is O (n) for classification problems. e advantage of the proposed approach is that it reduces the number of features to three features (reducing C), which leads to faster running time compared with using all features.
is confirms that the approach is able to detect the attacks more efficiently, especially in congested network with limited computing resources.
We can conclude that reducing the features to three and using the pruning process of the DT classifier helped the proposed approach to reduce the overfitting problem and classify the OBS flooding attacks. Consequently, all performance results clarified the effectiveness and efficiency of the DT model based on selected features to classify BHP flooding attacks. is reveals that the proposed approach is more accurate and suitable for real-time detection in the limited computing capability of OBS core switches.

Conclusion and Future Work
In this paper, an effective and efficient approach using the information gain (IG) feature selection method and the decision tree (DT) classifier is proposed to detect BHP flooding attacks on OBS networks. e approach starts with selecting the most important features of OBS network traffic to improve the accuracy and efficiency of attack detection in OBS switches that have limited resources. A set of experiments is conducted on an OBS network dataset using 10-fold cross-validation and holdout techniques to evaluate and validate the approach. e experimental results demonstrate that the proposed approach can classify the class labels of OBS nodes with 100% accuracy by using only three features. e comparison with recent related works reveals that the proposed approach is suitable for OBS network security in terms of effectiveness and efficiency.
One of the limitations of the proposed approach is the lack of evaluation on more OBS datasets that can be varied in size and types of attacks due to unavailable OBS datasets other than the dataset used in the experiments of this study. Moreover, because the proposed approach is based on the decision tree method for classification, the training time is relatively expensive in case of large training datasets. However, by reducing the number of features of the proposed approach and the emergence of high-speed processors, this limitation is no longer a major problem. In future work, a large set of OBS network data will be collected for further evaluation of the proposed approach and will be made available for researchers in the field. is is due to lack of public OBS network datasets other than the dataset used in this research work.

Data Availability
e OBS-network dataset used in this study is publicly available at the UCI Machine Learning Repository [1].

Conflicts of Interest
e author declares that there are no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.