Network Intrusion Detection Method Based on FCWGAN and BiLSTM

Imbalanced datasets greatly affect the analysis capability of intrusion detection models, biasing their classification results toward normal behavior and leading to high false-positive and false-negative rates. To alleviate the impact of class imbalance on the detection accuracy of network intrusion detection models and improve their effectiveness, this paper proposes a method based on a feature selection-conditional Wasserstein generative adversarial network (FCWGAN) and bidirectional long short-term memory network (BiLSTM). The method uses the XGBoost algorithm with Spearman's correlation coefficient to select the data features, filters out useless and redundant features, and simplifies the data structure. A conditional WGAN (CWGAN) is used to generate a small number of samples in the dataset, add them to the original training set to supplement the dataset samples, and apply BiLSTM to complete the training of the model and realize the classification. In comparative tests based on the NSL-KDD and UNSW-NB15 datasets, the accuracy of the proposed model reached 99.57% and 85.59%, respectively, which is 1.44% and 2.98% higher than that of the same type of CWGAN and deep neural network (CWGAN-DNN) model, respectively.


Introduction
e continuous development of computer and network technology has greatly improved people's lives, but with it come a variety of attacks and threats at the network level, making network security an unavoidable and urgent problem. As an effective method to detect and defend against network attacks, the intrusion detection system (IDS) has been widely used. It monitors network traffic in real time, classifies it as normal or malicious, and provides information necessary to intrusion prevention systems. In recent years, machine learning and deep learning have been widely used for intrusion detection. However, since real-life network traffic data are unbalanced and relatively little has malicious attack attributes, the training sets of such methods are severely unbalanced. Hence, while existing network intrusion detection systems have high resolution accuracy for whether there is an attack, the detection accuracy of various samples is still low, especially for minority-class attacks, resulting in the misclassification of such traffic as other traffic, and the failure to meet performance analysis requirements. erefore, it is important to solve the network data imbalance problem and improve the performance of model intrusion detection.
e class imbalance problem is commonly solved by enhancing the model training effect by increasing the number of samples in datasets, and much research has been conducted based on this method. Maryam Yousefnezhad et al. [1] proposed a feature extraction ensemble classification method based on deep learning. Firstly, the feature selection algorithm based on ensemble margin is used to select the samples, and the deep learning method is used to extract the sample features. Finally, the outputs of multiple KNN and SVM are combined according to Dempster-Shafer method. is method uses the method of ensemble learning, which can improve the detection rate of attack types to a great extent. At the same time, feature selection based on ensemble margin can remove the useless data in the original dataset, and improve the overall detection accuracy, and shorten the training time to a certain extent. However, the structure is complex, there are many classifiers, and the overall calculation cost is high. Meanwhile, this method uses KNN and SVM as classifiers to classify samples, and the overall classification accuracy of the model has a large space for improvement. Considering the complexity of dimensions and the low efficiency of traditional algorithms, a chaotic cuckoo optimization algorithm with levy flight, disruption operator, and opposition-based learning (CCOALFDO) is proposed by Kelidari and Hamidzadeh [2]. e algorithm combines levy flight, disruption operator and opposition-based learning to select the optimal feature subspace for classification. Levy flight can deal with uncertainty and better update the cuckoo steps in high-dimensional space. e opposition-based learning and disruption operator can improve the search ability of the algorithm and ensure the diversity of the population. e algorithm proposed in this paper combines the above advantages, which can greatly reduce the randomness of feature selection and avoid falling into the local optimal solution. At the same time, due to the elimination of some redundant features, the classification accuracy can be greatly improved. However, the combination of multiple algorithms leads to the increase of the overall computational complexity of the algorithm, which requires higher computational cost, slows down the convergence speed and increases the computational time. Gonzalez-Cuautle et al. [3] proposed a resampling method that integrates the synthetic minority oversampling technique (SMOTE) and grid search algorithms to solve the problems of overfitting and low classification accuracy. is method improved the classification results of the intrusion detection system (IDS) dataset by merging synthetically generated balanced data and adjusting different supervised learning algorithms. SMOTE can oversample the data sample and increase the number of minority data. e grid search algorithm can automatically optimize the parameters, and find the parameters with the best detection effect and apply them to the model structure, and avoid falling into the local optimal solution, which ensures the optimality of the model detection effect. However, SMOTE randomly synthesizes the original data according to the k-nearest neighbor principle, does not learn the essence of the original data, and the quality of the generated samples is poor. At the same time, the grid search algorithm searches every parameter, which leads to too large calculation cost, too long calculation time, and there is a large space for improvement. Lee and Park [4] proposed AE-CGAN-RF, a model to solve the data imbalance problem by using an autoencoder to reduce the dimension of the network traffic and a conditional generative adversarial network (CGAN) to generate data samples, which were passed to a random forest (RF) to complete the intrusion detection classification. e model could greatly improve the accuracy of minority class sample detection, and reduces the data dimension, which reduces the time required for training and reduces the calculation cost. However, the use of RF as a classifier led to a low overall detection accuracy because of RF's weak classification ability. Lee and Park [5] proposed a detection model using a generative adversarial network (GAN) to generate minority class attack samples and RF for classification. is method increases the minority samples of CIC-IDS2017 dataset and improves the detection ability of the model for minority attack samples, so that the model can achieve better detection effect. At the same time, the structure is simple and the detection speed is fast. However, only ordinary GAN is used for sample generation, without considering the instability of the GAN, there are hidden dangers in the process of sample generation, and other datasets and models were not used to further validate its feasibility, which is not convincing. Liu et al. [6] proposed a GAN-FS method to address feature redundancy. e model can select dataset features based on feature variance, eliminate the impact of redundant data and useless data on the model detection effect to a great extent, improve the accuracy and speed of detection, and uses a GAN to generate samples, which increase the number of samples and enhance the training effect. Comparative experiments confirmed that the method could effectively improve model detection performance, but the method does not consider the degrees of freedom of GAN training, and the generated data are unsupervised and uncontrollable. Compared with CGAN, it is less targeted. At the same time, it only selects the features according to the feature variance, and the detection method is not comprehensive, which has certain limitations. He [7] addressed the low accuracy of class imbalance data detection and proposed a model using a conditional Wasserstein generative adversarial network (CWGAN) to generate minority class attack samples and a Deep Neural Networks (DNN) as a classifier for network intrusion detection, which improves the detection effect compared to a DNN. However, only using DNN as classifier to identify intrusion behavior, there is still a large gap in detection accuracy compared with other deep learning methods. At the same time, the high dimensionality of data is not considered, and the use of the network intrusion detection system in a large-scale network environment will be limited by time and space complexity because the data have high dimensionality and nonlinear characteristics. erefore, dimensionality reduction for high-dimensional data is a key step to improve detection speed and performance.
To solve the above problems, this paper combines feature selection with a CWGAN. e feature selectionbased dimensionality reduction of high-dimensional data can filter out redundant and useless features, simplify the data structure, improve intrusion detection performance, and decrease training time. e CWGAN oversamples the minority class data to supplement the samples and balance the data distribution, thus improving detection performance. A bidirectional long short-term memory network (BiLSTM) is used to extract and classify the features from the time series. e loss function and optimization algorithm are analyzed to select the most suitable hyperparameters.
is paper makes the following contributions: (1) We propose FCWGAN-BiLSTM, a network intrusion detection system based on FCWGAN and a BiLSTM network, to alleviate the impact of class imbalance on detection performance and improve 2 Computational Intelligence and Neuroscience the overall performance of a network intrusion detection model (2) We use XGBoost and Spearman correlation coefficients for feature selection to filter out redundant and useless data and simplify the feature structure, which reduces computational difficulty and improves detection accuracy (3) We apply CWGAN to generate minority class samples to supplement the dataset, enhance the model training effect, reduce the impact of class imbalance on the detection rate, and improve detection performance (4) A BiLSTM network captures information in network traffic data with long-term dependency, extracts network traffic feature extraction based on time series, and effectively uses future moment information to improve the model classification effect (5) Model performance analysis experiments, model ablation experiments, and comparison experiments with different data augmentation algorithms and classification algorithms demonstrate the performance of the proposed model e rest of this paper is organized as follows. Section 2 presents the background and related work. Section 3 presents the proposed model, Section 4 provides experimental results and analysis, and Section 5 presents the conclusions.

Feature Selection.
Feature selection is a method of selecting relevant features of a dataset by obtaining a subset from the original feature set based on specific criteria. Data dimensionality reduction is often applied to high-dimensional complex data [8]. Unlike feature extraction, feature selection preserves the physical meaning of the original features by retaining some of the data, and thus makes the model more readable and interpretable [9,10]. In the field of intrusion detection, where datasets are characterized by a large volume of data and high dimensionality, feature selection reduces computational difficulty and eliminates data redundancy [11], thereby improving the detection rate of the model and reducing false positives. For example, a firefly algorithm was used for feature selection and to pass the generated features through a classifier based on C4.5 and a Bayesian network (BN) to complete the classification for intrusion detection [12]. e method selected important features in the KDD CUP 99 dataset and reduced the 41dimensional features to 10 dimensions, which achieved better detection performance and reduced computation. However, the method suffers from a low discovery rate and slow solution speed, which leads to long calculation times. Le et al. [13] proposed SFSDT, a feature selection model that combines a hybrid sequence forward selection (SFS) algorithm with a decision tree (DT) model to select the best feature subset from the complete set of features in a dataset. e CF function in the SFS algorithm is adjusted, and the accuracy and error score of the DT model on each feature subset are generated by the SFS. SFSDT starts from an empty set and sequentially adds features to enhance the accuracy of the DT model until it is maximized on a validation dataset (feature subset). e algorithm reduces execution time and required memory, and significantly improves detection performance. However, SFS can only add features, and cannot remove them, and it tends to fall into local optima. us, it requires a large number of experiments to obtain the best subset. Considering the above problems, we use XGBoost and the Spearman correlation coefficient for dataset feature selection.
2.1.1. XGBoost. Proposed by Chen in 2015, XGBoost (eXtreme Gradient Boosting) is a model framework based on the idea of the gradient boosting decision tree (GBDT) [14]. It has the advantages of high speed, high efficiency, and strong performance, and has been widely used to solve classification and regression problems. e core idea is to generate a new tree by splitting the features in a dataset, and then to add new trees. It fits the residual of its last prediction to obtain a new function and improves performance through iteration. e traditional GBDT algorithm uses only firstorder derivative information, while XGBoost uses a secondorder Taylor expansion of the loss function and a regular term to speed up training and prevent overfitting. We use this method to rank the importance of features in the dataset [15].

Spearman Correlation Coefficient.
We use the Spearman correlation coefficient to measure the correlation between features. Proposed by Spearman in 1904, it measures the strength of the relationship between two variables [16], and it takes values in the range (−1, 1). e Spearman correlation coefficient between variables x i and y i is calculated as where x i (i � 1, 2, . . . , n) and y i (i � 1, 2, . . . , n) are elements of the vectors X and Y, respectively. A value of ρ close to ±1 indicates a strong association; hence one of the features can be filtered out. A value close to 0 indicates that there is no association between them, and both should be retained.

CWGAN.
A GAN is a deep learning model inspired by the two-person zero-sum game in game theory and is used to simulate complex high-dimensional distributions of realworld data. It consists of a generator (G) and discriminator (D) [17], which are both neural networks. e generator captures the potential distribution of real data samples and generates new data samples. e discriminator is a binary classifier used to determine whether the input sample is real or generated data. e classification results are passed back to the generator and discriminator through updates of the Computational Intelligence and Neuroscience 3 weighted loss.
e above networks are trained until the discriminator can no longer distinguish between real and generated samples [18]. Its optimization process is a minimax game problem with the goal to achieve a Nash equilibrium so that the generated network can estimate the distribution of the data samples [19]. e objective function for generating the adversarial network is where p data denotes the distribution of real samples, the function G(z) maps noise z to the data space, and D(x) is the probability that sample x is real data. To distinguish between real and generated data, D(x) should be as large as possible, and D(G(z)) as small as possible.
e CGAN is based on a GAN, where category information and noise are merged with the original data as the input to the generator and discriminator [20], with loss function where y represents the category information, and other parameters are the same as in (2).
A GAN is different from ordinary oversampling, as it generates new samples by obtaining the potential distribution of the original data and passing it randomly into the generator. By training the generator and discriminator, the generated samples are similar to the original sample distribution with high confidence. GANs are used to generate samples for minority classes and to expand datasets. For example, the SIGMA method [21] generates new samples to enhance the ability of IDSs to resist new types of attacks, combining a GAN with hybrid local search and genetic algorithms to iteratively generate new samples to retrain the intrusion detection system based on machine learning until the detection rate converges. AEGAN [22] is a hybrid model consisting of adversarial environment reinforcement learning (AE-RL) and a CGAN, whose model is trained on a network intrusion detection dataset to generate synthetic samples to deal with class imbalance problems. e above methods can improve the performance of network intrusion detection systems, but none considers the vanishing gradient problem that might occur during the training of GANs.
GANs and CGANs can generate samples and reduce class imbalance problems. However, their use of Jensen-Shannon scatter requires overlap between the distributions of real and generated samples, which is nonexistent or negligible when the discriminator is trained to be optimal, which can lead to model collapse and vanishing gradient problems [23].
To solve the above problems, we introduce the Lipschitz limit and Wasserstein distance to CGAN to realize CWGAN for the dataset samples, with the workflow shown in Figure 1.
We fix the discriminator, input the noise vector and labels to the generator, and train it to simulate the real data distribution. We use the discriminator to judge the real and generated samples. If it cannot distinguish between them, we fix the generator and train the discriminator, and if it can, we fix the discriminator and train the generator. We repeat these steps until the loss function of the discriminator is stabilized at about 0.5, at which time we generate attack samples and add them to the training set.
rough the above method, the model can generate data of a specified pattern to supplement the dataset, while effectively avoiding the vanishing gradients caused by the failure of the discriminator to converge during training. e objective function of CWGAN is where λ is an artificial parameter, ‖∇ x D(x)‖ is the calculation paradigm for x in D(x), and x ∼ p Penaty is the middle position of the line connecting points on p r and p g .

BiLSTM.
e model in a traditional neural network focuses only on the processing of the current moment, while a recurrent neural network (RNN) can use information processed at the current moment at the next moment [24]. Considering the problem of the vanishing gradient and gradient explosion during the training of an RNN, Hochreiter et al. proposed the long short-term memory network (LSTM) [25], which adds a gate mechanism and a memory unit on the basis of the RNN and memory unit to effectively solve the problems of RNNs, and better solves the longer distance dependence problem [26]. LSTM has input, forget, and output gates, as shown in Figure 2. e LSTM structure is described as where f t is the forget gate; i t is the input gate; C t and C t are the current input and unit state, respectively; σ is the sigmoid function; W f , W i , W o , and W C are the weight matrices of the forget gate, input gate, output gate, and current input unit state, respectively; [h t−1 , x t ] denotes the concatenation of the two vectors; and b f , b i , b o , and b C are the bias terms of the forget gate, input gate, output gate, and current input unit state, respectively. e above parameters change continuously during training.
Considering the distinct temporal characteristics of network traffic data, the use of RNN-like approaches to deal with network intrusion problems has unique advantages. For example, in [27], a deep learning-based intrusion detection system, DL-IDS, uses a hybrid network of convolutional neural networks (CNNs) and LSTM to extract the spatiotemporal characteristics of network traffic data, thus providing a better intrusion detection system. However, it was not considered that the unidirectional LSTM can only read sequence data from one direction and cannot exclude the influence of subsequent information on the detection results.
us, BiLSTM was used instead of LSTM to process incoming data [28].
BiLSTM combines forward and backward LSTM to learn from forward and backward time-series data. e hidden layer contains two units with the same input that are connected to the same output, where one processes the forward time series, and the other the backward time series, increasing the time series involved in training by learning features better, thus providing higher accuracy for longer time series.
e BiLSTM process is shown in Figure 3.
e process is where the LSTM function represents the nonlinear transformation of the input feature, which is encoded as the corresponding hidden state of the LSTM ((5) and W T and W V are the weight coefficients corresponding to the forward and backward moment unit state, respectively.

Network Intrusion Detection Method Based on FCWGAN and BiLSTM
We propose a network intrusion detection method based on FCWGAN and BiLSTM. XGBoost is used in the feature selection stage to rank the importance of the features in the dataset, whose relevance is analyzed based on the Spearman correlation coefficient. Features with strong relevance and low importance are filtered out to simplify the feature structure. e selected features are passed into CWGAN together with the labels, and minority class samples in the training set are generated in a controlled manner. Generated samples are passed into BiLSTM together with the original data in the training set for training, and the model is validated on a test set. e intrusion detection process includes stages of data preprocessing, feature selection, sample generation, feature extraction and training, and testing, as shown in Figure 4.

Data Preprocessing.
Tag encoding was used to convert the string-type features in the NSL-KDD and UNSW-NB15 datasets to numeric-type. It was judged whether there was a null value in the dataset, and if there was none, the data were normalized by Min − Max,

Feature Selection.
In the feature selection stage, we used XGBoost to rank the feature importance, and Spearman's correlation coefficient to analyze the feature relevance. Irrelevant and redundant features were filtered out, and important features were retained to improve detection speed and enhance detection results.
XGBoost obtains a new function by fitting the residuals of the last prediction of the model and iterates to improve model performance [29]. Unlike the traditional GBDT algorithm that uses only first-order derivative information, the XGBoost algorithm performs a second-order Taylor expansion on the loss function and adds a regularization term to improve the model training speed and prevent overfitting. e target loss function of the XGBoost algorithm is where l(y i , y i ) is the loss function, which represents the difference between the predicted value y i and true value y i ; and Ω(f k ) aims to prevent overfitting, where T is the number of child nodes, ω denotes the leaf weights, cT reduces the number of leaf nodes in the tree, c is the penalty coefficient, λ‖ω‖ 2 is the regularization term, and λ is the regularization coefficient. XGBoost requires several iterations to continuously generate the tree [30], assuming that the t-th iteration produces the tree, and the objective function of the t-th iteration is where Ω(f t ) is a function to prevent overfitting. We can evaluate the reasonableness of the decision tree structure based on the structure loss, where g i and h i are the first-and second-order derivatives of the loss function to the predicted values after iteration t-1, I j � i|p(x i ) � j is the index of leaf node j, and a smaller structural loss indicates a better decision tree structure. If the tree splits at node j, the structure gain of the leaf node is where c is the split coefficient, which can reduce the complexity of the model and prevent overfitting. is split gain is used to judge the quality of the split node. Based on the above formulas, the importance of the features was ranked, and their relevance was analyzed through the Spearman correlation coefficient. e importance of the features is sorted according to formula (11), and the Spearman correlation coefficient is used to analyze the feature correlation. e two are combined to eliminate irrelevant and redundant features, filter out key features, and pass them to the GAN for minority class sample generation.   Computational Intelligence and Neuroscience

Sample Generation.
In the sample generation process, CWGAN was trained using noise and data samples that underwent feature selection and preprocessing [31], as shown in Table 1.
In the process of training CWGAN, the generator and discriminator were trained in turn, as follows: (1) e discriminator is fixed and the generator is trained to simulate the distribution of the real data (2) e generator is fixed, and the discriminator is trained until it cannot distinguish whether samples are from the real dataset or the generator (3) e discriminator is fixed, and the generator is trained until the discriminator cannot distinguish samples by successive training (4) Steps 1-3 are repeated until the loss value of the discriminator reaches 0.5 (5) e generator is used to generate attack samples, and these are added to the training set to complete sample generation

Feature Extraction and Training.
In the feature extraction stage, a BiLSTM layer learned the long-term temporal features in the dataset, Nadam optimization was applied to the neural network [32], a dropout layer alleviated overfitting, and a softmax classifier was used for network attack classification.

3.5.
Testing. e trained model was used to classify the test set to obtain the prediction type. To ensure credible test results, the model was tested by k-fold cross-validation. e softmax function, was used to calculate the probability of the classification and compare it with the original labels. A Bayesian optimization algorithm was used for automatic optimization of model parameters, whose settings are shown in Table 2. e categorical cross-entropy loss function is

Dataset and Experimental Evaluation
Criteria. e proposed model was evaluated on the NSL-KDD and UNSW-NB15 datasets. e NSL-KDD dataset was obtained by Tavallaee et al. in 2009 by eliminating duplicate instances in the KDD99 dataset and enabling a more objective reflection of the detection accuracy of the model [33]. It includes DoS, Probe, R2L, and U2R attack types, and has 41 attributes, but the data are extremely unbalanced. It has far fewer attack instances than normal instances, with only 995 R2L attacks and 52 U2R attacks. e UNSW-NB15 dataset was created by the Cyber Range Lab of the Australian Cyber Security Centre, and includes attack types other than NSL-KDD, i.e., Fuzzers, Analysis, Backdoor, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. Similarly, there are far fewer attack instances than normal instances. e distributions of training set types for the NSL-KDD datasets are shown in Figure 5.
e distributions of training set types for the UNSW-NB15 datasets are shown in Figure 6.
Comparative experiments used classification accuracy, precision, recall, and F1-score to judge the classification effectiveness of the models. e classification confusion matrix is shown in Table 3.  � (z, y), where z is noise data, y is class label Output: s G � [G(z, y′), y′] (1) While D does not approach 0.5 / * CWGAN training * / (2) for t � 1, . . . , n do / * optimize discriminator * / (3) Sampling where θ G , η θ G , θ D and η θ D respectively denote the network parameters and gradients of the generator and discriminator.
Computational Intelligence and Neuroscience 7 e four evaluation criteria are as follows: FCWGAN was used to select the features of the training set samples of the NSL-KDD and UNSW-NB15 datasets, filter out redundant and useless samples, and simplify the data structure. e feature importance was judged using XGBoost, and the feature importance scores were obtained as shown in Figures 7 and 8. e feature importance score in Figures 7 and 8 selects the total splitting gain, which can better reflect the importance of variables to the model. From Figure 7, one can see that among the features of NSL-KDD datasets, the "dst_host_srv_count" is the most important and the "su_attempted" is the lowest; Similarly, it can be seen from Figure 8 that among the features of UNSW-NB15 datasets, the "dur" is the most important and the "ct_ftp_cmd" is the lowest. At the same time, it can be seen that in the above two datasets, the importance of different features varies greatly, and the importance of individual features is close to 0, which has little influence on the discrimination of sample types. erefore, these useless features with low importance can be screened out to simplify the feature structure. e feature correlations were analyzed using the Spearman correlation coefficient; the correlation between individual features is strong, and redundant features can be filtered out (Figures 9 and 10).
We combined the feature importance and correlation for analysis, and the filtered features are shown in Table 4.
Training set samples were then generated based on the selected features. We expanded the training set samples and combined the generated and original samples. e data distribution of the combined training set is shown in Tables 5 and 6.
Finally, the training set was passed into the BiLSTM network for training, and the test data were passed into the completed model to evaluate the model detection effect. e trends of model detection accuracy and average loss with the number of iterations are shown in Figures 11 and 12. e trends of various class detection rates with the number of iterations are shown in Figures 13 and 14.
From Figures 11 and 12, one can see that the accuracy of the model increases rapidly with the number of iterations at the early stage of training, and gradually stabilizes; the average loss decreases rapidly with the number of iterations, and can reach a stable state quickly. Using the proposed model to perform multiclassification on the NSL-KDD and UNSW-NB15 datasets, the best accuracy rates are 99.57% and 85.59%, respectively.
is shows that the model can distinguish types of network intrusion attacks well, thus obtaining high detection accuracy and a better detection effect.
From Figures 13 and 14, it can be seen that the proposed model can accurately identify normal and majority class attacks on both datasets, and the detection rate for minority class attacks can also reach a high standard, showing that the minority class samples generated by the model largely alleviate the impact of the class imbalance problem, thus improving the overall detection effect.

Model Noise Robustness Experiment.
In recent years, the network environment has become more and more complex. In addition to a large number of redundant and useless data, there are also noise data in the network data, which will lead to the low robustness of the intrusion detection system [34]. In order to verify the robustness of the model proposed in this paper to noise, this section sets up a noise robustness experiment for network intrusion detection methods based on FCWGAN and BiLSTM.
Different levels of Gaussian white noise are added to NSL-KDD and UNSW-NB15 datasets, which obey N (0, 0.02), N (0, 0.04), N (0, 0.06) and N (0, 0.08), respectively. e detection accuracy of the model under the influence of different noise levels is shown in Table 7.
From Table 7, it shows that the accuracy of the two datasets decreases to a certain extent with the increase of the noise level. However, the range of change did not exceed 1.5%. is shows that the model proposed in this paper has strong robustness and stability to the interference of noise, and a small amount of noise data cannot have a significant impact on the performance of the model. At the same time, according to the conclusion of 3.3.1, different levels of Gaussian white noise are added to several features with e result shows that adding noise to the features with stronger correlation has more obvious impact on the performance of the model, while the features with weaker correlation have little impact. It shows that when dealing with noise, it is not necessary to deal with all features, but only some noise sensitive features, which also confirms the necessity of feature selection.

Model Ablation Experiment.
We set up model ablation experiments to verify the proposed feature selection and the ability of CWGAN to improve the detection effect of the model for minority samples.
Under the same experimental conditions, BiLSTM, GAN-BiLSTM, CWGAN-BiLSTM, and the model in this paper were compared on the NSL-KDD dataset.
e detection rates of each model for various types of NSL-KDD datasets were evaluated, and are displayed in Table 8.
From Table 8, it shows that the feature extraction and the proposed CWGAN played a relatively significant role in the improvement of the detection rate for minority class samples. e reason is that real-world data contain many irrelevant, redundant, and noisy features, whose removal through feature selection can greatly reduce storage and computational costs, and can simplify the data structure and improve the detection results. e proposed feature selection method was used to directly select a subset of relevant features for the model, eliminate useless and redundant features, and improve the test effectiveness from the original    to verify the superiority of the FCWGAN data enhancement algorithm at network intrusion detection. Under the same experimental conditions, ROS, ADA-SYN, SMOTE, WGAN, and the proposed FCWGAN method were used for data enhancement on the NSL-KDD and UNSW-NB15 datasets, respectively, using BiLSTM as a classifier, with test results as shown in Tables 9 and 10.
From Tables 9 and 10, it can be seen that the proposed FCWGAN-BiLSTM achieved the best test results in terms of accuracy, precision, recall, and F1-score. Overall, FCWGAN was better for data enhancement. e time in the table is the training time of a single epoch. It can be found that the training time of the model in this paper is lower than that of other methods, indicating that the calculation speed of the model is the fastest and the calculation cost is the smallest.
is is because ROS only performs a simple resampling of the original data, ADASYN and SMOTE perform a random synthesis of the original data based on the k-nearest neighbor principle, and neither learns the nature of the original data. In contrast, FCWGAN, which is based on deep learning, can acquire the potential distribution of the original data, randomly connect the data points with class labels, and pass them to the generator to generate new minority samples. Compared with WGAN, FCWGAN adds feature selection and simplifies the data structure, which calculation cost is reduced and the calculation speed is accelerated. At the same time, a gradient penalty term solves the vanishing gradient problem during training, so that FCWGAN can generate minority class samples that have higher quality and are more similar to the original samples.

Comparative Experiments with Different Classification
Algorithms. We performed comparison experiments to verify that BiLSTM could achieve better results for the classification of network intrusions.
Under the same experimental conditions, the dataset was processed using FCWGAN, and was then trained on RF, DNN, LSTM, and BiLSTM. e results of different algorithms for network intrusion behavior were evaluated, and the results are shown in Tables 11 and 12.  From Tables 11 and 12, it can be seen that the proposed FCWGAN-BiLSTM achieved the best results in terms of accuracy, precision, recall, and F1-score. Moreover, BiLSTM    Figure 9: Feature correlation diagram of NSL-KDD.
Computational Intelligence and Neuroscience has advantages in network intrusion detection problem. e reason is that network traffic data have obvious time-series characteristics, while LSTM and BiLSTM have strong timeseries processing capability and could perform deeper feature extraction on long-term time-series data. erefore, this type of method can achieve good results at network intrusion detection. LSTM could only read sequence data from one direction and could not rule out the influence of subsequent information on the detection results. us, BiLSTM is used to process the incoming data to improve the   Figure 10: Feature correlation diagram of UNSW-NB15.    Computational Intelligence and Neuroscience       From Tables 13 and 14, it can be seen that the proposed model achieved the best detection results on all metrics. Compared with CNN-BiLSTM and SSAE-LSTM, the proposed model uses FCWGAN to simplify the data features and reduce dataset dimensionality, which reduces the computational cost, while generating minority class samples to supplement the dataset, which alleviates the impact of class imbalance, and thus it could obtain better detection results. Compared with CWGAN-DNN and AE-CGAN-RF, the proposed model eliminates high-dimensional disasters and simplifies the data structure, while uses BiLSTM for feature extraction and classification, which can extract more in-depth and comprehensive data features from the timeseries level, and thus obtains better multiclassification results.

Conclusion
To alleviate the impact of class imbalance on the accuracy of network intrusion detection models and improve their effectiveness at detecting network intrusion attacks, we proposed a network intrusion method based on FCWGAN and BiLSTM. e method uses XGBoost and Spearman correlation coefficients to process the dataset, which effectively filters out redundant and useless features, simplifies the data structure, which reduces the computational cost and training time, and avoids high-dimensional disasters. Minority class samples are generated using CWGANs to supplement the dataset and alleviate class imbalance. BiLSTM is used to extract the time-series features of data to complete the classification of network intrusions. Extensive experiments on the NSL-KDD and UNSW-NB15 datasets demonstrated that the model greatly improves the detection effect for minority class samples, has a strong feature extraction capability, high detection accuracy, and low falsepositive rate when processing large-scale network data, and shows promise for real-time intrusion detection systems. However, the accuracy of this model on the UNSW-NB15 dataset demonstrated that there is room for improvement. Future work will focus on this deficiency, and we will investigate the construction of feature extraction and classification models to find ways to improve detection accuracy.

Data Availability
All data used in this paper can be obtained by contacting the authors of this study.

Ethical Approval
is article does not contain any studies with human or animal subjects performed by any of the authors.

Consent
Informed consent was obtained from all individual participants included in the study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.