Cyber Intrusion Detection Based on a Mutative Scale Chaotic Bat Algorithm with Backpropagation Neural Network

In the current complex cyber environment, malicious attacks take various forms and constantly change. To overcome the problems of low detection efficiency and high false alarm rate in traditional intrusion detection techniques, a new method of network intrusion detection combining the variable scale chaos bat (MSCB) algorithm and BP neural network (BPNN) called MSCB-BPNN is proposed in this paper. In the proposed approach, a new MSCB algorithm is proposed to improve the BPNN by optimizing its thresholds and weights so as to prevent it from falling into local optimum. To verify the practical classification performance of the proposed method for intrusion detection, some experiments are carried out. In these experiments, the proposed approach is compared with BPNN, SA-BPNN, and GA-BPNN on the benchmark intrusion detection datasets KDD cup 99 and UNSW-NB15. (e experimental results show that the accuracy of the proposed method can reach 99.4% and 99.8% accuracy on the KDD cup 99 training and test sets, respectively, while 89.0% and 93.9% accuracy on the UNSW-NB15 training and test sets, respectively, which are higher than BPNN, SA-BPNN, and GA-BPNN. Furthermore, the precision and recall ratio of MSCB-BPNN are also superior to other approaches. (us, the proposed model has significant advantages over BPNN, SA-BPNN, and GA-BPNN methods.


Introduction
e security of computer networks has received an increasing amount of focus as a natural consequence of the ongoing development of both computer technology and networks. However, there are a large number of complex attacks in the network, and these attacks have become the main threats to network and information security [1]. As the second line of defense after the firewall, intrusion detection has proactive and dynamic defense capabilities, effectively making up for the deficiencies of traditional security technologies [2][3][4]. Simultaneously, intrusion detection technology may assist system administrators with security audits, monitoring, attack detection, and response and increase their security management skills. e commonly used intrusion detection techniques can be divided into two types: anomaly detection and misuse detection [5]. Misuse detection is to match the existing attack data information features with the data in the host or network, and then judge whether the host or network is in an abnormal state through the matching result. However, this detection method can only detect the types of network attacks that already exist in the sample database. e anomaly detection method is to first establish basic features with normal behavior trajectories in the sample database, and then treat all behaviors that deviate from or do not conform to the normal trajectory as intrusion behaviors [6].
Although intrusion detection, as an active and effective defense method, can predict intrusion behaviors in the time before they occur, there are still problems in how to improve the prediction accuracy. In recent years, with the rapid development of machine learning technology, it has been widely used in the fields of medicine, autonomous driving, image recognition, and natural language processing [7]. Because artificial neural networks in machine learning technology have good adaptive learning ability, multi-data parallel computing, strong anti-interference performance, and high-speed optimization ability, and can handle the characteristics of distortion and incomplete data information, it is ideal for identifying intrusion data from complex, high-speed, and large network data. erefore, improving the effectiveness and accuracy of intrusion detection using machine learning methods has become a research hotspot in the field of network security [8].
Network intrusion detection first needs to collect network status data of the target network, including connection duration, network layer protocol type, application layer service (protocol) type, etc., and then use probability and statistical analysis or machine learning technology to conduct network behavior. By using these technologies, the types of abnormal network behaviors can be accurately analyzed, so that network managers can take the corresponding security measures according to the detection results. e network state is affected by a variety of factors, and has the characteristics of randomness and time-varying. erefore, network intrusion detection is essentially a nonlinear classification problem, and it is difficult for traditional linear classification algorithms to establish an optimal classifier. Due to the nonlinear and intelligent learning ability of machine learning algorithms, network intrusion detection classifiers based on neural networks and support vector machines have emerged. Among them, backpropagation neural network (BPNN) is a multi-layer feedforward neural network with working signal forward propagation and error backpropagation. It has strong selflearning ability, generalization ability, and strong nonlinear mapping ability, and hence has become a most widely used network intrusion detection method [9][10][11].
Although BPNN can achieve better results than traditional intrusion detection, when using BPNN for network intrusion detection, if the BPNN parameters are not properly selected, it is difficult to accurately and comprehensively reflect network trends and intrusion dynamics, which may lead to low correct network intrusion detection rate and a rather high rate of false positives and missed positives of network intrusion [12,13]. To solve this problem, this paper proposes a new intrusion detection approach named MSCB-BPNN. e approach utilizes the strong optimization ability of the Mutative Scale Chaotic Bat (MSCB) algorithm to help the BPNN select the optimal thresholds and weights to prevent it from being trapped in the local extreme values. erefore, BPNN can obtain higher classification accuracy. In MSCB, the chaos theory is introduced, and the chaotic map is used to initialize the population to improve the randomness of individuals in the population. Moreover, a nonlinear mutative scale dynamic inertia weight mechanism is designed for the algorithm, which greatly improves the exploration and development capabilities of the algorithm. And the improved algorithm is then used to optimize the BPNN's initial thresholds and weights. In this paper, the benchmark intrusion detection datasets KDD Cup 99 and UNSW-NB15 are used to verify the practical performance of the proposed MSCB-BP approach in intrusion detection. Compared with BPNN, SA-BPNN, and GA-BPNN in simulation experiments, it can be seen from the experiment results that MSCB-BP has tremendous advantages in detection accuracy, precision, and recall ratio. e major contributions of this paper are as follows: (1) A new MSCB algorithm is proposed to optimize the initial weight and threshold of BPNN, which is of great help to the training process of BPNN. By optimizing the initial thresholds and weights, the BPNN is effectively trained, and the classification accuracy is satisfactory. (2) To solve the problem of network intrusion detection based on network information, we introduced MSCB in BPNN. Using the newly proposed MSCB-BPNN model to accurately determine and discover suspicious attacks, the corresponding defense measures can be taken in time, allowing the network to avoid being attacked, thereby effectively reducing economic losses. e rest of this paper is as follows: Section 2 describes some research work on intrusion detection technology and BPNN optimization. In Section 3, the MSCB-BP approach is proposed. To overcome the shortcomings of the BPNN, we introduced the MSCB algorithm and detailed the process of optimizing the BPNN in the MSCB algorithm. Section 4 applies the proposed model and the other three models to the KDD cup 99 and UNSW-NB15 intrusion detection datasets. By conducting comparative experiments, the experimental results are analyzed in Section 5. Section 6 summarizes the entire research work and briefly introduces the direction of future research.

Related Work
At present, intrusion detection technology has gradually become an important research direction in network security. Since it has the characteristics of real-time detection, dynamic response, and smart monitoring, it is an essential supplement to traditional security products. Early intrusion detection mainly depends on statistical probability analysis and the expert system [14]. e author of the paper [15] built the hidden Markov model in the operation status of computer systems. Paper [16] proposes an intrusion detection method based on a state conversion analysis tool (NetStat). Paper [17] examines how an expert system identifies intrusions using a rule-based pattern recognition engine. Paper [18] presents Kitsune: a plug and play NIDS that can learn to detect attacks on the local network without supervision, and in an efficient online manner. In paper [19], the researchers propose a two-stage layered network intrusion detection method called H2ID, which can help address new vulnerabilities and cyberattacks in the Internet of ings.
Compared with the traditional intrusion detection methods, using a neural network for intrusion detection has more advantages. Because the neural networks have the capability for self-learning, self-organization, and self-adaptation, they are able to start training without knowing the background in detail; intrusion detection technology based on the neural network has gradually become a hot spot of research. In paper [20], the Wireless Equivalent Privacy (WEP) attack data are analyzed. However, no intrusion detection experiment is conducted. Paper [21] provides an intrusion detection system for DDOS that may quickly and effectively detect DDOS attacks. However, that system only targets particular sorts of attacks and has significant limits. In paper [22], Principal Component Analysis (PCA) was used to detect cyber intrusions. However, that algorithm's complexity is too high. To reduce the complexity of the algorithm, an intrusion detection approach based on DBN and probabilistic neural networks was proposed in paper [23]. Nevertheless, that method is only applicable to wireless networks with fewer levels.
Paper [24] proposes a dual support vector machine model, which is challenging to choose the correct parameters quickly, although the classification accuracy is improved. Paper [25] employs a deep self-coding network as a feature extractor in conjunction with a Softmax classifier to detect intrusions; however, the actual results are affected by artificial factors. Paper [26] proposes an improved method for intrusion detection in Bayesian networks based on deep learning and sliding windows to address attribute redundancy in the training sets. Nevertheless, working with data from a multi-dimensional network is difficult. Paper [27] proposes a least squares vector machine intrusion detection approach based on deep belief networks, which effectively solves the classification problem of massive intrusion data. Nevertheless, its detecting performance must be enhanced.
BPNN is an artificial feedforward neural network capable of handling complex nonlinear problems. It is composed of an input layer, a hidden layer, and an output layer. It has been widely used in many other fields due to the advantages of simple BPNN structure, adjustable parameters, multiple training algorithms, and good operability. However, BPNN has some inherent drawbacks, including that the performance is highly rest with the topology, initial connection weights and thresholds of connected nodes, and is prone to trapping into local optimum solutions [28]. us, to solve this challenge, some scholars have made use of the proposed improvement of BPNN parameters using swarm intelligence learning algorithms. Paper [29] uses genetic algorithm to improve BPNN and established the GA-BPNN model. Based on GA-BP, the paper [30] used the Bayesian regularization (BR) algorithm to train the GA-BPNN further to improve the accuracy of the GA-BPNN agent model. Although the research has enhanced the agent model's fitting precision and computing efficiency to some degree, GAbased BPNN is still plagued by issues like early maturation and delayed convergence.
Paper [31] used an improved harmonic search (HS) algorithm to optimize the thresholds and weights of BPNN. Although the HS-BP approach provides superior generalization capability and prediction accuracy, it requires constant updating of the HS repository, which leads to inefficiency. In the paper [32], a new BPNN model based on particle swarm optimization algorithm and artificial bee colony algorithm (PSO-ABC) is proposed to perform network traffic prediction. When optimizing the thresholds and weights of the BPNN using PSO-ABC, PSO-ABC has good global search capability from the beginning to the end of the iterative process. However, the search performance is not robust against its neighborhood in the late iteration of the algorithm, and it cannot perform an effective local search, so a more accurate solution cannot be obtained.
By summarizing the above-related work, we present the research and problems on optimizing BPNN using optimization algorithms in Table 1. According to the abovementioned scholars' research, this paper proposes a new intrusion detection approach MSCB-BPNN to solve the shortcomings of the current BPNN-based intrusion detection system with low detection accuracy mentioned in Section 1, which takes the number of errors in the training data as the objective function and reduces the number of misclassifications and improves the classification accuracy through continuous iterative optimization. e proposed MSCB method uses the number of errors in the training data as the objective function to reduce the number of misclassification and improve the classification accuracy through continuous iterative optimization. e proposed MSCB method introduces chaos theory based on the bat algorithm. It uses chaos mapping to initialize the population to improve the randomness of individuals in the population. More importantly, the proposed algorithm also designs nonlinear convergence factors and dynamic weights, which gives the algorithm advantages in assisting the BPNN in determining the optimal thresholds and weights.

Cyber Intrusion Detection Based on MSCB-BPNN
is section describes the principle of BPNN, the detailed optimization process of the proposed MSCB algorithm for the initial weights of BPNN, and the application of the new MSCB-BPNN method to network intrusion detection Table 2 lists all the notations used in this paper.

Principle of BPNN.
e BPNN used in this paper is a three-layer feedforward neural network with error backpropagation. Each layer contains a certain number of neurons, and its model topology is shown in Figure 1. e learning process of a neural network includes forward propagation of signals and backpropagation of the output errors in some form through the hidden layer to the input layer; the error is distributed to each layer after the second input is performed. Based on the most rapid descent theory, the connection thresholds and weights of each layer are continuously adjusted so that the mean square error sum of the network output is minimized.

Parameter Optimization Based on MSCB.
Since BPNN continuously corrects the weights and thresholds by backpropagating the error signal to minimize the error value between the actual output and the desired output, the classification accuracy of BPNN is affected by the weights and thresholds used to train the network. e weights and thresholds of traditional BPNN use random operators, which can lead to poor accuracy when the amount of original data is small. erefore, to solve the above problem, Security and Communication Networks

Method
Problem GA-BPNN adopted in paper [29] and paper [30] GA-based BPNN is still plagued by issues like early maturation and delayed convergence HS-BP adopted in paper [31] Harmonic search (HS) algorithm requires constant updating of the HS repository, which leads to inefficiency PSO-ABC adopted in paper [32] e search performance of PSO-ABC is not robust against its neighborhood in the late iteration of the algorithm, and it cannot perform an effective local search, so a more accurate solution cannot be obtained Table 2: Notations used in this paper.

Input layer Hidden layer Output layer
Information forward-propagation this paper proposes a chaotic variable scale bat (MSCB) algorithm. In MSCB, a nonlinear convergence factor and a dynamic weight mechanism are designed, which dramatically improves the exploration and development capabilities of the algorithm. Moreover, chaotic perturbation is introduced in the iterative process, and the range of the perturbations decreases with the increase in the number of iteration steps, such that the classification accuracy of the algorithm can be accelerated at a later stage. In addition, MSCB uses random chaotic particles to replace some of the population particles to increase the particle diversity and thus avoid the algorithm falling into a local optimum. By using the proposed MSCB algorithm to optimize the BPNN, the thresholds and weights of the network are fixed and optimized, which can significantly increase the BPNN's classification accuracy. As shown in Figure 2, the specific algorithm flow of MSCB is as follows: Step 1. Configure the initial parameters of the MSCB algorithm. Parameters include bat population size, initial flight speed, loudness, pulse rate, frequency, volume decay coefficient, search frequency enhancement coefficient, and the maximum quantity of epochs.
Step 2. Randomly initialize the position of bats, bring them into the BPNN, let the number of errors in the training data as fitness, and select the current optimal in conformity with the fitness value.
Step 3. Update the position and velocity of the bat Step 4. Generate a random number Rand1 and determine whether it is greater than the pulse emission frequency of the bat. If so, generate a new bat position near the optimal solution using the chaotic perturbation strategy; otherwise, execute the next step of the algorithm directly.
Step 5. Generate a random number Rand2; if Rand2 is less than the loudness of the bat and fit(x new ) < fit(x best ), then replace the original with and update the pulse emission rate and loudness of each bat.
Step 6. Determine whether the algorithm is trapped in a local optimum. If the optimal bat individual is not updated within a certain number of iteration steps, it means that the algorithm is stuck in a local optimum, and it needs to help the algorithm jump out of the local optimum by variable scale chaos substitution.
Step 7. Check if the maximum number of iterations is reached; if yes, the algorithm terminates; if not, continue to execute steps 2-6.

Initialization of Bat Populations.
e initialization of the bat population needs to ensure that the individuals in the population are evenly and widely distributed in the solution space such that the optimal configuration of BPNN weights and thresholds can be found. erefore, this paper uses a chaotic mapping approach to initialize a bat population with population size N pop , which effectively improves the randomness of the bat individuals in the population. e most typical chaotic system nowadays is logistic mapping.
However, the experimental analysis has shown that the mapping is not uniformly distributed in the mapping space, which affects the algorithm's optimization speed. erefore, this paper uses a modified Tent chaotic mapping to initialize the population. e expression of this chaotic mapping is shown in (1). Random variables rand × (1/N T ) are added to avoid the Tent mapping from falling into small or unstable periodic points without destroying the properties of chaotic variables, where N T is the quantity of particles in the chaotic sequence; rand is a random variable with values in the range [0, 1].
e encoding length must be determined before generating the initial population using the chaotic mapping described above. In the bat population, the location information of individual bat represents an initial weight threshold of the BPNN; the bat location information encoding length L can be calculated by where N input indicates the quantity of neurons in the input layer, N hidden indicates the quantity of neuron nodes in the hidden layer, and N output indicates the quantity of neuron nodes in the output layer.
After the encoding length L is determined, a chaotic sequence of length L can be generated by (1). en the chaotic sequence is carried to the solution space of the solution problem by (3). Repeating the N pop step in this way eventually generates a population of bats uniformly distributed in the solution space.
where d min and d max denote the minimum and maximum weight and threshold values, respectively.

Change of Bat Position.
It can be seen from the velocity update formula that the exploratory and exploitation capabilities of the algorithm can be adjusted by changing the velocity term coefficients. erefore, a nonlinear decreasing inertia weight w is used in MSCB. In the early stage of the algorithm, w is large, which gives the algorithm a solid global search capability and accelerates the convergence of the algorithm. As the iterations proceed, w decreases exponentially, which makes the algorithm more capable of local search. is decreasing inertia weight can effectively balance the global exploration and local exploitation ability of the algorithm in different periods, which improves the accuracy of the algorithm. In this paper, the expression for the inertia weight w is defined as follows.  Security and Communication Networks

Security and Communication Networks
. (4) us, the definition of the bat's velocity and position update equation in MSCB is as follows: In the above equation, f min and f max denote the minimum and maximum values of frequency, respectively; β is a randomly distributed random number with values in the range [0, 1]; x best denotes the current global optimal solution; and i denotes the serial number of bat individuals.
Furthermore, after the bat individual position and velocity update, it will generate a new solution locally near the current optimal position with a certain probability using (11): where ε is a random variable in [−1, 1] range and A t avg denotes the average loudness of all individual bats in this generation.

Adjustment of Pulse Emission Rate and Loudness.
It is known from the biological mechanism of bat predation that in the process of searching for prey, the bat colony initially emits sonorous and low-frequency ultrasonic waves to extend the scanning range. After the prey is found, the loudness A i is gradually reduced. At the same time, the pulse emission rate r i is increased to precisely grasp the location of the prey. is algorithm uses (9) to model this predation feature.
where A t l is the loudness emitted by i th bat at moment t, and α denotes the attenuation coefficient of the volume. r 0 i denotes the initial pulse rate of i th bat, which is also the maximum value of the pulse rate. r t i is the pulse emission rate of i th bat at time t. ψ is the pulse emission rate increase coefficient, which is a constant greater than zero.
Suppose the algorithm is trapped in a local optimum. In that case, it will be challenging to get out of the local optimum by chaotic perturbation at some time. To avoid this situation, we propose to get rid of the local optimum by using chaotic substitution. e specific idea is as follows. First, determine if the location of the best individual bat has changed in consecutive δ generations of records, and if no change occurs, we decide that the algorithm has fallen into a local optimum. en the algorithm will use chaotic mapping to generate a random chaotic sequence of size P × D to replace a moderate number of bat individuals in the original bat population, where P is the replacement size. Meanwhile, as the number of steps of the algorithm into the local optimum increases, the size of the chaotic substitution increases linearly until the algorithm jumps out of the local optimum.
3.6. Computational Complexity Analysis. Usually, the complexity analysis of an algorithm is analyzing the efficiency of the algorithm. e analysis process mainly consists of measuring the time efficiency as well as the space efficiency of the algorithm operation. e complexity analysis of the algorithm will reflect whether the algorithm is superior.
In this subsection, since the spatial complexity of the algorithm is not required, the analysis of the computational complexity of the proposed algorithm MSCB in this paper is equivalent to the analysis of the temporal complexity of the algorithm. In addition, to facilitate the analysis of the complexity of MSCB-BPNN approach, we only consider the generation of random numbers and multiplication operations during one round of algorithm execution.
Since MSCB is an improvement to BA, we first analyze the complexity of BA. We assume that the time required to initialize the population size N pop , bat individual coding dimension D, and other parameters in the bat algorithm is T 1 , the time required to generate uniformly distributed random numbers is T 2 , and the time required to calculate the fitness is T f ; thus, the time complexity of the initial stage is expressed as After entering the loop, assume that the time required to generate a D-dimensional chaotic sequence is DT 2 and the time required to update the velocity and position of an individual bat according to equations (4)-(7) is T 3 ; the time required for the judgment operation and the loudness adjustment operation is T 4 ; then the time complexity of this stage is expressed as In summary, the time complexity T(D) of a single round bat algorithm on the bat individual coding dimension D is represented by In MSCB, we only improve the mechanism of position update in the bat algorithm and introduce the mechanism of determining the trapped local optimum. ese operations do not actually introduce multiplication operations; therefore, the complexity of the proposed MSCB and BA is equal to O(D + T f ). Figure 3, in this paper, our proposed network intrusion detection approach comprises three main phases. e first stage is dataset preprocessing, the symbolic feature attributes Security and Communication Networks in the intrusion detection dataset need to be processed numerically, and then all data are normalized to obtain the normalized raw data. e second stage is weight optimization. e network topology of the BPNN must first be established before optimizing the initial thresholds and weights of the BPNN.

MSCB-BPNN for Network Intrusion Detection. As shown in
en the prediction error of the classifier is selected as the fitness function of the model. e evolutionary conditions of MSCB are used to update the thresholds and weights of the BPNN continuously, and finally, the optimized MSCB-BPNN model is obtained. e third stage is to use the obtained MSCB-BPNN model to identify intrusions, output the detection results as an obfuscation matrix, and analyze the detection performance of the model.

Datasets and Performance Metrics.
is paper chooses the most widely used dataset KDD Cup 99 dataset and the UNSW-NB15 dataset from intrusion detection studies as the training and test datasets for MSCB-BPNN. KDD Cup 99 is derived from the Defense Advanced Research Projects Agency (DARPA) IDS data used for evaluation, and the dataset was also used in the KDD Cup 99 contest. e complete KDD Cup 99 dataset contains nearly 5 million input patterns, each data sample representing a network connection. To verify the classification performance of MSCB-BPNN and to increase the validation efficiency of the algorithm, we randomly selected a sample of 4500 data from the 10% version of the KDD Cup 1999 dataset for the analysis of the experiment, which contains 494021 instances. Each piece of data has one label and 41 features, including basic TCP link features, content TCP link features, hostbased network traffic statistics features, time-based network traffic statistics features, etc. Each row has 42 elements. e first 41 elements are feature items, while the final item indicates the sort of data anomaly. e simulated attacks in the dataset contain four categories, DoS, Probe, U2R, and R2L, and the dataset also contains normal samples. e specific percentages of data distribution are shown in Table 3.
e UNSW-NB15 dataset was established by the Australian Cyber Security Centre in 2015 and reflects modern network traffic patterns, containing a large amount of lowoccupancy intrusion and deeply structured network traffic information. e dataset contains 2540044 data instances, including normal-type data and nine types of attack-type data. e specific percentages of data distribution are shown in Table 4. In addition, there are 49 features for each record in the dataset, and these features are divided into 6 categories, including Flow Features, Basic Features, Content Features, Time Features, Additional Generated Features, and Labelled Features, where Labelled Features contain the type and label of the attack.
To understand the specific detection performance of the model during the intrusion detection research, this paper mainly uses Accuracy, Precision, and Recall as the measurement metrics. Accuracy is the ratio of correctly classified samples to the total number of pieces and is calculated by Precision is the probability that a sample is positive out of all the pieces predicted to be positive, and it is defined as Recall is a measure of coverage, which measures the ability of the model to identify positive examples. It is defined as In equations (13) Figure 4 shows the optimization results of the MSCB algorithm of BPNN with weights and thresholds in different number of epochs and population size. e number of misclassifications is used as the objective function in this paper. When the value of the function is smaller, it means that the model's performance is better. It can be seen from Figure 4 that MSCB-BPNN achieves good performance when the number of epochs G is greater than 80, and the value of N pop is greater than 35. Figure 5 shows the comparison graph of MSCB-BPNN with the other three models regarding the change of misclassifications with an increasing number of training generations. In this paper, BPNN is a three-layer structure consisting of an input layer, an implicit layer, and an output layer. ese four models are identical in other parameters, such as network structure, except for the different optimization algorithms used. e four models, BPNN, SA-BPNN, GA-BPNN, and MSCB-BPNN, were trained for 100 rounds, and the ratio of the number of misclassifications to the total number of samples obtained was 8.325%, 5.238%, 3.525%, and 0.608%, respectively. It can be seen that the BPNN optimized by the optimization algorithm is better than the unoptimized BPNN, and most importantly, among the three models that all use the optimization algorithm, the BPNN optimized by MSCB is significantly better than the other two algorithms in terms of reducing the number of misclassifications. Figure 6 demonstrates the accuracy comparison between training data and test data optimized by MSCB-BPNN, GA-BPNN, SA-BPNN, and BPNN. From the comparison results, the MSCB-BPNN approach proposed in this paper outperforms the other three approaches in terms of its intrusion classification accuracy both on the training and test sets. Table 7 shows the precision ratio of the four models for five types of network intrusions in classifying the test set. Table 7 makes it absolutely clear that the MSCB-BPNN proposed in this paper has a better detection effect for these five intrusions, unlike the other three models that have a better detection effect for one type of intrusion and a poorer detection effect for another kind of intrusion. Table 8 shows the recall ratio of the four models for the five network intrusions during the test set classification. e recall ratio is the fraction of the relevant documents that are    successfully retrieved. In the experiments, a higher recall ratio indicates better performance of the approach. From Table 8, we can see that the recall ratio of MSCB-BPNN is higher than the other three models for four of the five network intrusion models.
In Figure 7, comparing the MSCB-BPNN with the other three techniques, the graph reveals that it has a substantially better recall ratio and average precision ratio, which means that MSCB-BPNN is more suitable for network intrusion detection than the other three methods.
In Figure 8, the ROC curves are used to reflect the classification performance of the four methods for the KDDCup99 dataset under all classification thresholds. In this paper, we make the samples with the attack category     To further verify the classification performance of the proposed algorithm, we conducted another simulation experiment on the UNSW-NB15 dataset. During the validation process, we downscaled the UNSW-NB15 dataset to improve the training speed, selected 26 critical features as the input data, and sampled the training and test sets in a 25% stratified manner according to the attack types.   dataset. It can be seen that even if the dataset used for validation is changed to UNSW-NB15, the proposed MSCB-BPNN still has advantages over the other three methods in terms of accuracy, average precision rate, and average recall rate.

Conclusions
To solve the shortcomings of conventional intrusion detection approaches, such as low detection efficiency and a high rate of false alarms, a new intrusion detection approach called MSCB-BPNN is proposed in this paper, which uses the proposed MSCB algorithm to help BPNN for the selection of initial weights and thresholds. From the experimental results, the proposed algorithm performs better than BPNN, GA-BPNN, and SA-BPNN for network intrusion detection on the KDD cup 99 dataset and the UNSW-NB15 dataset, and MSCB-BPNN achieves 99.4% and 99.8% accuracy on the Kddcup99 training and test sets, respectively, while 89.0% and 93.9% accuracy on the UNSW-NB15 training and test sets, respectively. Evidently, the proposed method can effectively handle the network intrusion detection problem.
For future research directions, an extension we found interesting is the use of XAI tools to interpret the behavior of DL-based traffic identification methods. As far as we know, Explainable Artificial Intelligence (XAI) proposes to move towards a more transparent AI. It aims to create a set of techniques to produce more interpretable models while maintaining high-performance levels [33,34]. So, we think much work needs to be done in that research direction.

Data Availability
e data presented in this study are available on request from the corresponding author. e data are not publicly available due to privacy. Disclosure e funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.  no. ZZZC201915B, and Postgraduate Education Innovation Program of the Autonomous Region.