Network Intrusion Detection Technology Based on Convolutional Neural Network and BiGRU

To solve the problem of low accuracy and high false-alarm rate of existing intrusion detection models for multiple classifications of intrusion behaviors, a network intrusion detection model incorporating convolutional neural network and bidirectional gated recurrent unit is proposed. To solve the problems of many dimensions of features and imbalance of positive and negative samples in the original traffic data, sampling processing is performed with the help of a hybrid sampling algorithm combining ADASYN and RENN, and feature selection is performed by combining random forest algorithm and Pearson correlation analysis; after that, spatial features are extracted by the convolutional neural network, and further features are extracted by incorporating average pooling and max pooling, and then BiGRU is used to extracts long-distance dependent information features to achieve comprehensive and effective feature learning. Finally, the Softmax function is used for classification. In this paper, the proposed model is evaluated on the UNSW_NB15, NSL-KDD, and CIC-IDS2017 data sets with an accuracy of 85.55%, 99.81%, and 99.70%, which is 1.25%, 0.59%, and 0.27% better than the same type model of CNN-GRU.


Introduction
Network intrusion detection is a security mechanism that has been developed in recent years to dynamically monitor, prevent, and defend against system intrusions. It mainly means that to find out whether the network system is attacked or violates the security policy by analyzing the information from several nodes of the network. Research on intrusion detection technologies at home and abroad has started since the 1980s and has now developed into an integral part of the network security architecture [1].
Traditional machine learning methods have been widely used in network intrusion detection systems, such as Bayesian [2][3][4], support vector machines [5][6][7][8][9][10], decision tree [11][12][13], logistic regression [14][15][16], and so on. ey all have achieved good results. However, these methods are not suitable for massive and high-dimensional data, and they cannot improve their own sensitivity to outliers and noise, resulting in the degradation of classification performance. At the same time, due to the continuous development of digital technology, network attack methods are becoming more and more diversified, and the traditional machine learning methods have been difficult to meet the needs of users.
In recent years, deep learning techniques have been widely used in natural language processing [17], image recognition [18], and so on. It forms more abstract nonlinear high-level representations by combining low-level features and then mines the input-output relationships between data, which has also achieved better results in the field of intrusion detection. Deep learning techniques commonly used in the field of intrusion detection include convolutional neural network (CNN), recurrent neural network (RNN), deep belief network, and so on. e literature converts the data traffic into individual pixel points in bytes to obtain the images generated by the traffic; then inputs the images into the convolutional neural network for convolution, pooling, and other operations; and finally obtains the classification results. e method achieves high accuracy in binary classification and multiclassification problems [19]. e literature uses the recognized KDD99 data set to conduct experiments, in which the long short-term memory (LSTM) network is used to complete the selection of parameters and achieve more satisfactory experimental results. However, the method leads to a high false-alarm rate due to the improper selection of training parameters [20]. A hierarchical intrusion detection system based on spatial and temporal features is proposed in the literature. It first learns low-level spatial features of network traffic by deep convolutional neural networks and then acquires high-level temporal features by LSTM, but the method does not consider the problems of feature fusion and data imbalance [21]. e paper combines the features of WaveNet and bidirectional gated recurrent unit (BiGRU) for feature extraction and proposes an intrusion detection method that fuses WaveNet and BiGRU. e model of the paper can achieve better detection accuracy but does not consider the problem of sample imbalance [22]. Now the intrusion detection techniques have made great progress, but there are also the following problems. First, it faces the problem of feature redundancy; more feature dimensions will not only increase the training time of the model but also reduce the detection effect of the model. An intrusion detection method based on principal component analysis (PCA) and recurrent neural network is proposed in the literature. e principal component analysis method is used to reduce the dimension and noise of the data to find out the principal component feature subset with the maximum information. Finally, the processed data is trained for classification using a recurrent neural network and achieves high accuracy [23]. e literature proposes an intrusion detection method by combining the advantages of an autoencoder and residual network. e feature extraction is performed by reconstructing the network with an autoencoder, and then the designed residual network is trained with the extracted features.
e experimental results are better in terms of accuracy, true rate, and false-alarm rate [24]. Secondly, it faces the problem of unbalanced samples of positive and negative classes in the data set used to evaluate the effects of the model. e literature uses an improved local adaptive synthetic minority oversampling technique for unbalanced traffic data to achieve abnormal traffic detection using RNN that has high detection accuracy for different types [25].
In response to the above-mentioned problems, this paper designs an intrusion detection model incorporating CNN and BiGRU. Its main contributions are as follows: (1) For the problem of feature redundancy, this paper proposes a feature selection algorithm (RFP algorithm). It introduces the random forest algorithm to calculate feature importance and combines Pearson correlation analysis for feature selection. (2) For the problem of sample imbalance, this paper proposes a hybrid sampling algorithm (ADRDB algorithm) by combining the adaptive synthetic sampling (ADASYN) [26] and repeated edited nearest neighbors (RENN) [27] for sampling. At the same time, the density-based spatial clustering of applications with noise (DBSCAN) [28] is adopted to eliminate noise and finally obtain a balanced data set.
(3) Spatial features are extracted by split-residual-fuse convolutional neural network (SRFCNN), and features with long-distance dependent information are extracted by BiGRU to fully consider the influence between the before and after attribute information to learn the data features comprehensively and effectively.

Related Work
Network security intrusion detection is a relatively broad area of research. Existing models used in the field of intrusion detection include convolutional neural networks, recurrent neural networks, machine learning, and hybrid models. Scholars have used a variety of different approaches to address the problems of low detection accuracy and difficulty in detecting a few classes of samples in the field of intrusion detection. Convolutional neural networks are mainly used in tasks related to image and video analysis, such as image classification, face recognition, target recognition, image processing, and so on. And, in recent years, it has also been widely used in the field of intrusion detection. A recurrent neural network is mainly used in various tasks of connected handwriting recognition and speech recognition. It is also widely used in the field of intrusion detection due to its effectiveness in processing time-series data.
In terms of improving detection accuracy, Tama et al. used a combination of particle swarm optimization algorithms, ant colony algorithms, and genetic algorithms for feature selection to reduce the feature size of the training data, followed by a secondary classification method to detect abnormal behavior in the network [29]. Bu and Cho combined a traditional learning classifier system with a convolutional neural network for the detection of anomalous behavior, and the proposed system has adaptive and learning capabilities [30]. Song et al. applied deep convolutional neural networks to intrusion detection systems, reducing the complexity of the models while also improving their detection accuracy [31]. Roy and Cheung proposed an IoT system based on a bidirectional long short-term memory recurrent neural network that achieves better results in detecting attacks [32]. Le et al. first performed feature selection via the SFSDT model, followed by classification via recurrent neural networks, achieving better results on both the NSL-KDD data set and the ISCX data set [33]. Hassan et al. proposed an intrusion detection system based on CNN and weight-dropped long short-term memory network and achieved more satisfactory results [34]. Tama and Lim used a parallel architecture to combine random forests, gradient boosters, and extreme gradient boosters to detect anomalous behavior with better results [35].
In terms of addressing the class imbalance: Louk et al. compared existing sampling methods and found that EasyEnsemble performed better in resolving sample imbalance [36]. Liu et al. divided the data set into hard and easy sets by ENN and reduced the imbalance of the original data set by processing the samples in the hard set through the K-means algorithm [37]. Yan

The Network Intrusion Detection Model Incorporating CNN and BiGRU
Traditional intrusion detection models pay more attention to the features in time series and ignore spatial features in the process of detecting attacks. e use of a single convolutional neural network can lead to insufficient ability to extract features, which in turn results in low detection accuracy. e SRFCNN structure can extract the spatial features of data traffic more effectively and avoid the problem of gradient explosion while deepening the depth of the model. But its ability to extract long-distance dependent information is not good. BiGRU has a strong ability to extract long-distance dependency information; it can avoid the phenomenon of forgetting in the learning process, but its number of parameters is larger and the training time is longer. is paper integrates the two models to improve the ability to learn features, which can fully extract features from both spatial and temporal dimensions, and then achieve higher classification detection accuracy. e proposed network intrusion detection model integrating convolutional neural network and BiGRU consists of three main stages.
First, preprocessing stage. Convert the original traffic data into numerical features and normalize them, balance the data set by hybrid sampling method, and finally extract features by RFP algorithm.
Second, training stage. e preprocessed data were extracted by SRFCNN network and BiGRU and finally classified by Softmax classifier.
ird, testing phase. Pass the test set to the trained model for classification. e structure diagram of the proposed model in this paper is shown in Figure 1.

Data Preprocessing.
In the preprocessing stage, this paper firstly converts the non-numerical features in the original traffic data into numerical features and normalizes the features; secondly, a hybrid sampling algorithm (ADRDB algorithm) combining ADASYN and RENN is used for sampling; afterwards, feature selection is performed by the feature selection algorithm (RFP algorithm); finally, the obtained data is converted into grayscale maps. e specific process of this stage is shown in Figure 2.

Non-Numerical Feature Transformation and
Normalization.
e only way the traffic data can be used as model input is after cleaning, labeling, annotation, and preparation. In this paper, the LabelEncoder function in scikitlearn is used to convert the non-numeric features in the raw data traffic to numeric features to ensure that all data are numeric, so as to facilitate the model to learn the data features. After the traffic features are converted to numeric, it is easy to ensure that the clustering of sample points in the feature space will be guided by individual feature values and less influenced by other feature values due to the different sizes of the taken values. Data normalization can reduce the variance of the features to a certain range and reduce the influence of outliers. In this paper, we use min-max normalization to normalize the feature values to between 0 and 1, as shown in the following formula: where h i,j represents the feature value of row i and column j in the data set.
After the values are normalized, the majority class and minority class samples are balanced by the proposed hybrid sampling algorithm to obtain the balanced data set. After that, the useful features are extracted by the feature selection algorithm.

Hybrid Sampling Method Combining ADASYN and RENN.
e core idea of the hybrid sampling method combining ADASYN and RENN is mainly divided into the following sections: firstly, the original data set is divided into majority and minority sample sets. e new majority sample set is obtained by undersampling through the RENN algorithm, and the new minority sample set is obtained by oversampling with the ADASYN algorithm. Afterwards, the new data set obtained by merging the two is passed through the DBSCAN clustering algorithm to remove the noise and obtain the balanced data set. e hybrid sampling method combining ADASYN and RENN is specified as follows. e inputs of the algorithm are the original majority sample set N and minority sample set P and the number of samples. e outputs are the balanced majority sample set newN and minority sample set newP (Algorithm 1).
(1) Calculate the imbalanced degree of the data set.    (4) and (5) to generate a new set of majority samples. (7) Eliminate the noise in newP and newN to get the final newN and newP.

Feature Selection Algorithm.
To address the problem of feature redundancy in data sets, this paper proposes a new feature selection algorithm. e algorithm first calculates the importance degree of each feature of the sample by the random forest algorithm and ranks them according to the importance degree; after that, it calculates the correlation between features by Pearson correlation coefficient; finally, it combines the two obtained results to achieve feature selection. Random forest algorithm (RF) is an ensemble learning algorithm based on the decision tree. In feature engineering, the RF algorithm can identify important features from a large number of sample features; its essence is to analytically calculate the contribution of each feature of the sample on the tree and later calculate its average and compare the magnitude of the contribution between features to identify the important features [41]. Existing methods are usually evaluated using the Gini index or the out-of-bag data error rate as evaluation metrics; the specific steps are as follows: (1) For each base learner, select the corresponding outof-bag data to calculate its error, denoted as error_a. (2) Randomly add disturbances to all samples of out-ofbag data and calculate its error, denoted as error_b. (3) Assuming that the forest contains M trees, the importance value of a feature can be calculated by the following equation: (4) Filter out the features with higher importance to construct a new data set.
Pearson correlation coefficient is used to measure the correlation between two variables X and Y, which takes values in the range (-1, 1) [42].
e Pearson correlation coefficient between the two features is obtained by calculating the covariance and standard deviation between the two eigenvalues and quotienting them with the following formula: e Pearson correlation coefficient varies from − 1 to 1. If the Pearson correlation coefficient of two characteristics is close to ±1, it indicates a high correlation between them, and the relationship between them can be well expressed by a linear equation. If the Pearson correlation coefficient is close to 0, it means that there is no linear relationship between the two features. e pseudo-code of the feature selection algorithm proposed in this paper is shown in Algorithm 2.
e raw traffic data are converted into grayscale maps after feature selection. e converted grayscale plots for different categories are shown in Figure 3.

Model Structure.
One of the main advantages of CNN over traditional classification methods is that it attempts to learn the best filters on its own. e existing popular CNN structures mainly include residual network (ResNet) [43] and inception network [44]. ResNet proposes a concept of split-transform-merge.
In order to improve the expressiveness of CNN and to fully learn the diversity of features in the classification process, a new convolution neural network based on separation-residual-fusion is proposed in this paper according to the relevant ideas of the residual neural network, and the specific structure is shown in Figure 4. After the data is input, it is split into different paths by the segmented block convolutional neural network, and then different types of residual transformation are carried out for each segmented feature. As shown in the figure, the layers of each residual are different, so as to ensure that it can learn simple to complex feature transformation. Finally, the feature maps after the residual neural network are fused. e application of the residual network can effectively solve the gradient explosion problem caused by the increased depth of the network. Twodimensional convolution has shown excellent performance in the field of computer vision, so this paper uses 2D convolution to extract the spatial features of the data. e intrusion detection model proposed in this paper consists of three main parts: in order to comprehensively and finitely learn the features of the data, firstly, the spatial features of the data are extracted by SRFCNN; secondly, the feature extraction capability is further enhanced by fusing average pooling and max pooling; afterwards, the temporal features are extracted by BiGRU, and finally, the classification is carried out by Softmax. e specific structure of the model is shown in Figure 5: (1) e grayscale map obtained after preprocessing is input to the SRFCNN network to extract spatial features and obtain the output F (2) e new feature map F is aggregated with spatial information by fusing max pooling and average pooling to obtain the new feature map F C (3) Pass F C into the BiGRU unit to extract the dependencies between features and obtain the output F G (4) Pass F G into the fully connected layer that uses Softmax as the activation function to achieve the classification of intrusion detection behavior  (6)

Input:
Original data set, D Output: Processed data set, NewD Procedure: (1) Choose corresponding out of bag data and calculate the error, error_a (2) Randomly add interference to all samples of data outside the bag and calculate its error, error_b (3) Calculate feature importance (4) Feature importance ranking (5) Calculate Pearson correlation coefficient (6) Selection feature in combination with (4) and (5) (7) Processed data set NewD ALGORITHM 2: Feature selection algorithm (RFP). 6 Computational Intelligence and Neuroscience with 16 GB RAM and a python-based Nvidia GeForce GTX 1050 GPU (4 GB), using Python's TensorFlow library to write the SRFCNN and BiGRU models for this paper.

Data Set and Evaluation Criteria.
Over the years, many data sets related to intrusion detection have been introduced for research and development, including KDDCup99 [45], UNSW-NB15 [46], NSL-KDD [47], and CIC-IDS2017 [48]. In this paper, we choose to use the UNSW-NB15, NSL-KDD, and CIC-IDS2017 data sets to evaluate the proposed model. e NSL-KDD data set is an improvement of the KDD99 data set, which removes the redundant and duplicate data from the training and test sets on the basis of the KDD99 data set so that the training and test sets are set up in a more reasonable way. It mainly contains 41-dimensional attribute features and 1-dimensional category features, covering 5 types of Normal, Probe, Dos, R2L, and U2R. e number of samples of different categories in the NSL-KDD data set is shown in Table 1.
e UNSW-NB15 data set is a new data set generated in 2015 by the Cyber Range Laboratory of the Australian Centre for Cyber Security (ACCS) using the IXIA

Computational Intelligence and Neuroscience
PerfectStorm tool to simulate realistic cyber environments. e data set mainly consists of 47 attribute features and 2 category features and contains 9 types of attacks: Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. is paper directly uses the partitioned training set and testing set to test the performance of the model. e number of samples of different categories in the UNSW_NB15 data set is shown in Table 2.
e CIC-IDS2017 data set is derived from the July 3-7, 2017 Canadian Institute for Cybersecurity (CIC) collection for cyber data, which contains benign as well as recent common attacks in the field of cyber intrusions, filling the gap of no cyber-based attacks in the UNSW-NB15 data set. e data set contains 78-dimensions of attribute features and 1-dimension of category features covering 15 attack types. In this paper, the anomalous behaviors of similar nature are merged, and the final data set contains 10 types of attacks: BENING, Dos, Portscan, Ddos, Patator, Bot, Web attack, Infiltration, and Heartbleed. e number of samples of different categories in the CIC-IDS2017 data set is shown in Table 3. e evaluation metrics of the network security intrusion detection model include four main metrics: precision, accuracy, recall, and F1-score. In the specific detection results, T (true) and F (false) represent correctly or incorrectly classified data, respectively. P (positive) and N (negative) indicate that the predicted results of the detection system are abnormal or normal data, respectively. All data in the data set must be classified into four categories: TP, TN, FP, and FN. Only TP indicates that the system classification result consists of abnormal attack data with correct classification result; TN indicates that the system classification result is positive and correct; FP indicates that the system predicts the data as abnormal attack data, but the classification result is wrong; and FN indicates that the system predicts the data as normal data, but the classification result is incorrect. e classification results of the model for the data are represented by the confusion matrix, as shown in Table 4. e accuracy describes the ratio of the number of correctly predicted samples to the total sample number and is calculated as follows: e precision describes the ratio of the number of classes predicted to be positive to the number of classes actually predicted to be positive and is calculated as follows: e recall describes the ratio of the number of predicted positive classes that are actually positive to the number of all positive classes and is calculated as follows: e F 1 − score describes the magnitude of the harmonic mean between precision and recall, calculated as follows: It can be seen that F1 achieves larger values when both recall and precision have larger values.

Feature Selection Analysis Experiment.
In order to verify the classification performance of the CNN-GRU algorithm proposed in this paper, the public data sets UNSW_NB15, NSL-KDD, and CIC-IDS2017 were selected.
is section focuses on the UNSW_NB15 data set for the detailed introduction. In order to visualize the distribution of each feature, it was demonstrated by histograms and box plots. e histograms and box line plots of some features are shown in Figures 6 and 7.    Since some features do not have outliers, Figure 7 shows only a few features with outliers. e analysis shows that there are a few outliers in the six types of features: spkts, dpkts, ct_src_ltm, ct_srv_dst, ct_srv_src, and ct_state_ttl, which have a small impact on the whole data set. However, state, dur, sloss, dloss, service, ct_dst_ltm, ct_src_dport_ltm, st_dst_sport_ltm, tcprtt, synack, and ackdat have more outliers.
In this paper, the importance of each feature in the data set UNSW_NB15 is first calculated by the random forest algorithm and ranked according to the degree of importance, as shown in Figure 8. It can be seen from the figure   Feature selection based only on feature importance is a single reference criterion, and the results obtained are not very convincing, so this paper combines feature importance and Pearson correlation analysis for feature selection. In order to visualize the correlation between features, a feature correlation diagram is established as shown in Figure 9. e correlation between these 42 features can be clearly seen from the figure. And the lighter and darker parts in the figure clearly show the strong correlation between the two types of features. To further observe whether features X and Y present correlation in the plane distribution, a correlation graph with feature X as the x-axis and feature Y as the y-axis is established. Because of the large number of data feature dimensions, this paper selects the cases where the correlation index of the two types of features is greater than or equal to 0.9 or less than or equal to -0.9 for analysis and introduction, as shown in Table 5, and the correlation graph established between features X and Y is shown in Figure 10.
From Figure 10(a), we can see that spkts, sbytes, and sloss show linear correlation. Figure 10(b) shows that dpkts, dbytes, and dloss show linear correlation. From Figure 10(c), we can see that sinpkt and is_sm_ips_ports are not linearly correlated. Figure 10(d) shows that the two types of features, swin and dwin, are linearly uncorrelated, and their multiple values are only 0 and 255. Figure 10(e) shows that there are some similarities between "tcprtt" and "synack" as well as "tcprtt" and "actdat." As the value of x increases, the value of y also increases, but the values of synack and ackdat are relatively scattered. From Figure 10(f ), we can see that the values of ct_dst_src_ltm, ct_srv_dst, and ct_srv_src features are relatively dispersed, but there are still some linear relationships. Figure 10(g) shows that the values of ct_dst_ltm, ct_src_dport_ltm, ct_src_ltm, and ct_dst_sport_ltm are relatively scattered, and there are also some linear relationships. Figure 10(h) shows that the values of is_ftp_login and ct_ftp_cmd are linearly unrelated.
Combining Figures 8 and10 for feature selection, for features with strong linear correlation, the more important features are retained according to the importance degree; for features with weak linear correlation, the importance index of the features is analyzed, and if they are lower than 0.001, they are eliminated; for features whose correlation index is not within the analysis interval, their importance index is also analyzed, and features with importance degree lower than 0.0001 are eliminated. Finally, the NSL-KDD data set leaves 28-dimensional features; the UNSW_NB15 data set leaves 28-dimensional features; and the CIC-IDS2017 data set leaves 52 features.

Experiment on the Number of SRFCNN Modules.
In order to select the best number of SRFCNN modules, this section sets a comparison experiment with different numbers of modules: under the same experimental conditions, the grayscale maps obtained after preprocessing are input to the SRFCNN with the number of modules 2, 3, 4 and 5 to Computational Intelligence and Neuroscience extract features and test them, and the classification accuracy, precision, and F1-score values are shown in Table 6. It is experimentally demonstrated that for the NSL-KDD data set, the results obtained with five modules are better with the accuracy, precision, and F1-score of 99.81%, 99.76%, and 99.79%, respectively. For the UNSW_NB15 data set, the results obtained with three modules are better with the accuracy, precision, and F1-score of 85.55%, 86.24%, and 85.61%, respectively. For the CIC-IDS2017 data set, the results obtained with three modules were better, with the  accuracy, precision, and F1-score of 99.70%, 99.68%, and 99.69%, respectively. erefore, this paper uses SRFCNN fused with five residual modules to extract spatial features of the NSL-KDD data set, SRFCNN fused with three residual modules to extract spatial features of the UNSW_NB15 data set, and SRFCNN fused with three residual modules to extract spatial features of the CIC-IDS2017 data set.

Comparison Experiment between Single Model and Hybrid Model.
To verify the effectiveness of the model proposed in this paper on intrusion recognition, this section sets performance analysis experiments on the intrusion detection model combining SRFCNN and BiGRU: SRFCNN, BIGRU, and hybrid models are tested by the NSL-KDD, UNSW-NB15, and CIC-IDS2017 data sets under the same experimental conditions, and their classification accuracy, precision, recall, and F1-score values are obtained as shown in Table 7.
It can be seen from Table 7 that compared with using the single model of SRFCNN and BiGRU, the hybrid model combining SRFCNN and BIGRU can effectively extract the features of the raw data traffic and then effectively achieve intrusion detection. e detection accuracy, recall, precision, and F1 score of data set NSL-KDD reached 99.81%, 99.81%, 99.76%, and 99.79%, respectively; the detection accuracy, recall, precision, and F1 score of data set UNSW_NB15 reached 85.55%, 85.55%, 86.24%, and 85.61%, respectively; and the detection accuracy, recall, precision, and F1 score of data set CIC-IDS2017 reached 99.70%, 99.70%, 99.68%, and 99.69%, respectively. e reason is that SRFCNN can learn spatial features effectively by deepening the depth and width of the network, while BiGRU can extract temporal features of the data better. e model in this paper combines SRFCNN and BiGRU to learn both spatial and temporal features of the data to achieve effective and comprehensive learning of the features, thus achieving better results.

Comparison Experiments of Different Feature Selection
Methods. In order to verify the effectiveness and applicability of the feature selection method proposed in this paper, a comparison experiment of different feature selection methods is set up in this section: the feature selection method (RFP) proposed in this paper is compared with existing feature selection methods such as PCA [23] and AE [24] under the same experimental conditions. e features of the NSL-KDD data set are reduced to 28 dimensions; the features of the UNSW_NB15 data set are reduced to 28 dimensions; and the features of the CIC-IDS2017 data set are reduced to 52 dimensions by the above 3 methods. e results are shown in Table 8.
From Table 8, it can be seen that the data processed by the RFP algorithm proposed in this paper are used in the model can achieve better results. It is found that PCA relies more on variance when performing data dimensionality reduction, but the non-principal components with small variance may also contain important information on sample differences, and the dimensionality reduction process will have an impact on the subsequent data processing. AE relies more on the training data when performing feature space reconstruction. So both methods do not achieve better results. e RFP algorithm proposed in this paper starts from the data itself and selects features according to their importance degree and relevance to achieve the effect of improving the classification accuracy of the model.

Comparison Experiments of Different Sampling
Methods. In order to solve the problem of the unbalanced data set, this paper adopts the sampling methods of mixed ADASYN and RENN to process the data set. In order to verify the effectiveness of the proposed method, this section sets the comparison experiment of different sampling methods: under the same experimental conditions, the model adopts SMOTE, ADASYN, random undersampling, random oversampling, ENN, RENN, and ADRDB to process the imbalance data set. e detection results are shown in Table 9. From Table 9, it can be seen that comparing many different sampling methods, the ADRDB proposed in this paper, which integrates ADASYN and RENN, has a better treatment effect for sample imbalance. e reasons are that the single oversampling methods such as random oversampling, SMOTE, and ADASYN cannot effectively discriminate the noisy data and easily generate a large amount of noisy data in the process of synthesizing new samples, which leads to the degradation of the model classification effect; the single undersampling methods such as random undersampling, ENN, and RENN easily tend to lose the key information of most classes of samples, resulting in lower classification results. e ADRDB samples the majority and minority samples separately and rejects the noisy data by the DBSCAN algorithm, which not only avoids the loss of key information but also reduces the influence of noisy data on the classifier model, thus achieving better results.

Comparison Experiments of Different Pooling Methods.
In this paper, we adopt fusion max pooling and average pooling to solve the problem of insufficient feature extraction ability of the model. To verify the effectiveness of the proposed method, this section sets comparison experiments of different pooling methods: under the same experimental conditions, the model adopts three different methods of average pooling, max pooling, and fusion pooling to extract features. e detection accuracy is shown in Table 10.
From Table 10, it can be seen that the fusion pooling method is more effective. e reason is that the average pooling is used to extract features by averaging the global range of features to achieve feature learning, while the max pooling is used to extract features by taking the maximum value of the feature points in the domain, and the fusion of the two pooling methods can make up for each other and fully learn the features. e experimental results show that fusion pooling effectively improves the model's ability to learn features, and the classification results are greatly improved. Figure 11 gives the classification result accuracy and loss value variation curves with the number of iteration steps for the intrusion detection model combining SRFCNN and BiGRU. From Figure 11, it can be seen that the model in this paper achieves a better convergence effect.

Performance Analysis and Comparison Experiments.
To further verify the effectiveness of the intrusion detection model proposed in this paper, this section sets performance comparison experiments: under the same experimental conditions, common machine learning methods such as random forest, K-means clustering, decision tree, and the recently proposed intrusion detection model are applied to the data set. e performance comparison is shown in Table 11.
From Table 11, it can be seen the proposed model achieves better results in all evaluation indexes. e reasons are that compared with machine learning algorithms, the model in this paper learns features through neural networks,    incorporating SRFCNN and BiGRU extracts and learns both spatial and temporal features of the data, and the extracted feature information is more comprehensive, thus achieving better results.

Conclusions
To solve the problems of incomplete feature extraction and the general multiclassification effect of general intrusion detection models, this paper proposes an intrusion detection model fusing convolutional neural network and bidirectional gated recurrent unit. e model solves the problems of the unbalanced data set and feature redundancy by ADRDB and RFP algorithm and then achieves comprehensive and sufficient learning of features by fusing SRFCNN and BiGRU. Finally, feature selection analysis experiments, hybrid model versus single model comparison experiments, feature extraction method comparison experiments, pooling method comparison experiments, and performance analysis experiments on the data set prove that the model has strong feature extraction capability, high detection accuracy, and low false-alarm rate when processing large-scale and highdimensional network data, providing some research support for intrusion detection systems.

Data Availability
All data used in this paper can be obtained by contacting the authors of this study.

Conflicts of Interest
e authors declare that there are no conflicts of interest.

Authors' Contributions
Y.S. contributed to resources; X.F. visualized the study; B.C. contributed to validation; B.C. reviewed and edited the paper; and C.L. supervised the study. All authors have read and agreed to the final version of the manuscript.