Application of Deep Neural Network with Frequency Domain Filtering in the Field of Intrusion Detection

,


Introduction
With the arrival of the information age, the Internet has undergone signifcant development as an important production tool and has gradually permeated all aspects of the national economy and social functioning.Te Internet has long been recognized as one of the most critical infrastructures in every country, highlighting the importance of network security.Te primary threat to network security is the intrusion of information systems through the network.Te process of identifying and detecting intrusion behavior, whether attempted, ongoing, or completed, is known as intrusion detection [1].Te core concept of intrusion detection is to analyze collected network data to distinguish between normal and intrusive data, and subsequently identify unsafe network behavior.However, with the continuous advancement of network technology, the level of attacker techniques has been improving, making it increasingly difcult to distinguish between abnormal and normal behavioral data, and network attacks are becoming increasingly covert.In the face of the escalating level of network attacks, existing intrusion detection technology is gradually exhibiting shortcomings, including lower accuracy, higher false positive rates, and difculties in efectively diferentiating the characteristic data of normal and abnormal samples.
Commonly used intrusion detection techniques include attack detection techniques based on statistical methods [2], intrusion detection methods based on expert systems [3], and methods based on machine learning and deep learning [4][5][6][7][8].Te main advantage of statistical methods is their ability to "learn" user habits, resulting in high detection rates and usability.However, this "learning" ability also provides intruders with the opportunity to gradually "train" intrusion events to mimic normal statistical patterns, leading to the failure of intrusion detection systems.Te efectiveness of expert systems in preventing intrusion behavior relies on the completeness of the knowledge base, which is often impractical to achieve for large network systems.Due to the inherent incompleteness of expert system rules, expert systems alone are no longer suitable for intrusion detection, especially with the continuous development of network intrusion technology.Traditional machine learning methods often require extensive upfront feature engineering work, which relies heavily on expert knowledge.Te quality of feature engineering has a signifcant impact on the efectiveness of the algorithm, making it susceptible to human factors.
Te concept of deep learning was frst introduced by Professor G.E. Hinton at the University of Toronto in 2006 [9].Te network structure of deep learning consists of a large number of individual components called neurons.Each neuron is connected to other neurons, and the strength of the connections between neurons is determined by weights that can be optimized during the learning process to determine the performance of the neural network.Deep learning algorithms, also known as deep neural networks (DNNs) [10], have gained signifcant attention in recent years as a new research direction in the feld of machine learning.Deep neural networks have made breakthroughs in various applications, including speech recognition and computer vision [11][12][13][14][15]. Network intrusion data difers from speech, text, and image data in that its feature values do not exhibit obvious correlations.Speech and text data are examples of time series data.Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) models can efectively process this type of data by capturing the interrelationships between feature values over time [16][17][18][19][20]. Image data exhibit a property known as translation invariance, and the use of convolutional neural networks (CNNs) can be highly efective in processing image data by leveraging this property [21][22][23].
Tere is no fxed, stable, universal a priori knowledge known to humans that governs the relationships between feature data in network intrusion data.Tis knowledge evolves with the development of Internet technology and is infuenced by the skill level and attack techniques employed by attackers.Since network intrusion data difer from image data and time series data, mainstream neural networks such as CNN, RNN, and LSTM are not well suited for handling network intrusion data.Tis limitation afects the ability of deep learning algorithms to efectively capture the features of network data.In other words, intrusion detection models based on traditional CNN, RNN, and LSTM have limited feature expression capabilities and struggle to accurately model the complex mapping relationship between network data and attack behaviors.Te more complex the mapping relationship that a model can capture between network data and attack behaviors, the more promising it is in distinguishing intricate and covert network intrusion behaviors.In essence, a more expressive model is capable of accurately discerning network intrusion data.To uncover the underlying patterns within the intricate and dynamic network intrusion data, we propose a Fourier Neural Network (FNN) with enhanced data processing capabilities and an expanded data mapping space.
Te core of FNN is the Deep Fourier Neural Network Block (DFNNB), which consists of the Hadamard Neural Network (HNN) and the Fourier Neural Network Layer (FNNL).To apply FNN to the feld of intrusion detection, this paper frst designs a Hadamard Neural Network (HNN) that combines the dot product operation of matrices with the Hadamard product operation.By using HNN, the algorithm can efectively ft the network intrusion data in the multitemporal space of the sample X t , thereby enhancing the ability to represent data features.Once X t is obtained, the frequency spectrum X f in the frequency domain space is computed by applying a fast Fourier transform to X t using FNNL.To perform efective fltering operations on the data signal, this paper introduces the High-energy Filtering Process (HFP) to flter X f and obtain the high-energy spectrum X hf .Subsequently, X hf is inverse transformed using the Fast Fourier Transform to obtain high-energy time domain feature data X ht .Finally, X ht is summed, compressed, and input into a fully connected neural network for classifcation.By stacking multiple layers of DFNNB, the FNN can achieve a more powerful mapping ability, thereby enhancing its performance on complex data.
Overall, we contribute to the intrusion detection feld as follows: (1) Te Hadamard Neural Network (HNN) is proposed, which can efectively enhance the data dimensions and provide a new method for future applications that require data dimension enhancement.HNN assigns diferent weights to the network intrusion data samples to obtain the sample matrix X under diferent weights.Ten, it performs the Hadamard product operation between X and a weight matrix W with the same dimension as X.Tis process enables the sampling of network intrusion data samples in diferent time domain spaces X t .(2) Te Fourier Neural Network Layer (FNNL) is proposed to integrate the Fourier transform with the neural network algorithm.FNNL transforms the feature data of network intrusion data into the frequency domain for processing, and then applies inverse Fourier transform to convert the processed frequency domain data back to the time domain.Tis process efectively enhances the feature extraction capability of the neural network algorithm for complex data, thereby improving its ability to handle network intrusion data.(3) Te High-energy Filtering Process (HFP) is designed to efectively process frequency domain data.It has the capability to automatically flter out weak noise 2 International Journal of Intelligent Systems signals, thereby reducing their impact on the fnal performance of the neural network.(4) In order to validate the efectiveness of our proposed method, we conduct experimental tests on the FNN using network intrusion datasets such as KDD Cup99, NSL-KDD, UNSW-NB15, and CICIDS2017.We evaluate the performance of the FNN using multiple evaluation metrics and compare its performance with various machine learning and deep learning algorithms.
Te paper is organized as follows: Section 2 introduces the related work; Section 3 provides a general introduction to FNN; Section 4 ofers a detailed description of the components of FNN and analyzes the back propagation of gradient information in the DFNNB; Section 5 elaborates on the steps of intrusion detection using FNN; Section 6 explains the intrusion detection datasets; Section 7 presents the evaluation criteria for the experiment; Section 8 showcases the experimental results and analyzes the model performance; fnally, Section 9 summarizes the entire paper and provides an outlook on future research directions for FNN.

Related Work
In recent years, machine learning (ML) and deep learning (DL) algorithms have emerged as the predominant and efcacious models for numerous data processing applications.Notably, ML algorithms have found widespread utilization in the realm of intrusion detection.Tis section presents an overview of ML and DL algorithms employed in intrusion detection, with a specifc focus on the KDDCUP99, NSL-KDD, UNSW-NB15, and CICIDS2017 datasets.Te common ML algorithms utilized in intrusion detection encompass support vector machine (SVM), logistic regression (LR), decision tree (DT), Plain Bayes, random forest (RF), K nearest neighbors (KNN), and artifcial neural network (NN).By optimizing these classical ML algorithms at various levels, the performance of intrusion detection systems can be signifcantly enhanced.
Jan et al. [24] used SVM based IDS on CICIDS2017 dataset and achieved 98% accuracy.Safaldin et al. [25] proposed an intrusion detection method using Improved Binary Grey Wolf Optimizer (GWOSVM-IDS) and got 96% accuracy on NSL-KDD dataset but the detection time of this method is very long.Ponmalar and Dhanakotid [26] combined ensemble support vector machine (SVM) with Chaos Game Optimization (CGO) algorithm.Te proposed ESVM algorithm was employed for classifcation prediction on the UNSW-NB15 dataset, while the CGO algorithm was utilized to fne-tune the parameters of ESVM, thereby enhancing the accuracy and reducing the occurrence of false positives.Boahen et al. [27] proposed a diversity enhancement strategy based on Improved Particle Swarm Optimization (PSO) algorithm, Gravitational Search Algorithm (GSA) and used to optimize a Random Forest classifer with 98.92% detection accuracy.Ding et al. [28] designed a KNNbased undersampling mechanism and a generative adversarial network model for oversampling attack samples.Te model undersamples normal trafc samples and oversamples attack trafc, thus balancing the dataset, but its detection rate is not signifcantly improved.Yousefnezhad et al. [29] employed KNN and SVM for multiclassifcation and used Dempster-Shafer method to combine multiple outputs.Gu and Lu [7] applied Naive Bayes feature embedding to the original data to obtain new high-quality training data and used a support vector machine to construct an intrusion detection classifer.However, the method is sensitive to data noise and may afect the efectiveness of the feature transformation if noise is present in the data.In the context of intrusion detection, a substantial number of features and data are typically involved.By employing feature selection techniques [30], the quantity of features can be reduced, thereby diminishing the computational costs associated with training and testing deep learning models.From the wide array of techniques and algorithms that have been developed, intelligent optimization algorithms [31] have proven successful in identifying the most representative and signifcant features, consequently reducing the dimensionality of the feature space.Tis reduction in dimensionality serves to enhance the performance and efciency of intrusion detection systems.Alazab et al. [32] used Moth Flame Optimization (MFO) method as a search algorithm and decision tree (DT) as an evaluation algorithm to generate an efective subset of features for intrusion detection systems.Halim et al. [33] used a modifed genetic algorithm (GA) to search for the best features and performed classifcation experiments on three machine models, namely SVM, kNN, and XgBoost.Tey designed a novel objective function for the GA which assigns ftness values to individuals in the GA population to be able to select the chromosomes that represent the best set of features, but the algorithm runs slowly due to multiple iterations and reproduction operations.
Although many literature works prove that intrusion detection methods based on traditional machine learning algorithms are indeed efective, these methods still sufer from the following drawbacks: (1) Traditional machine learning methods usually require human intervention and expertise when performing feature extraction.In intrusion detection, determining an efective feature set is challenging because intrusion behaviors can be dynamic and diverse, making it difcult to capture all intrusion patterns.(2) Traditional machine learning methods may take longer to complete training and prediction.In addition, some complex machine learning algorithms, such as SVM and DT, may be more expensive in terms of computational cost.(3) Intrusion detection data usually have high-dimensional features, and traditional machine learning methods may encounter dimensionality catastrophe problems when dealing with high-dimensional data, and it is difcult to extract information about the nonlinear features in the data, which results in a degradation of the model's performance.Given the limitations of ML algorithms and their variants, as well as the emergence of deep learning, recent advancements in DL algorithms have been applied to the feld of intrusion detection.Tese include deep neural networks (DNNs), recurrent neural networks (RNNs), convolutional neural International Journal of Intelligent Systems networks (CNNs), and deep belief networks (DBNs).Unlike traditional machine learning methods, deep learning methods can efectively extract the underlying patterns in sample feature data by constructing multilayer nonlinear network structures.Consequently, deep learning exhibits superior capability in learning and predicting highdimensional feature data compared to traditional machine learning methods.
Takkar and Lohiya [34] proposed a novel feature selection technique that combines statistical signifcance based on standard deviation and the diference between mean and median.Tey also employed deep neural networks (DNNs) to learn and derive patterns in simplifed subsets of features.However, it should be noted that the presence of noise in the data may signifcantly impact the computational results and lead to instability in feature selection.Riyaz and Ganapathy [35] achieved an accuracy of 98.8% on the KDDCUP99 dataset using CNN.Fu and Zhang [36] introduced a feature fusion technique based on gradient importance enhancement.Tey employed ResNet-18 as the detection model and incorporated feature fusion at each layer during training.Additionally, they applied feature enhancement at the last layer of the classifcation network before forwarding the data to the fully connected layer for classifcation.It is worth mentioning that the training and inference process of CNNs typically demands substantial computational resources, particularly when dealing with large-scale datasets and complex model structures.Tis limitation may restrict the application of CNN in resource-constrained intrusion detection environments.Ravi et al. [37] conducted a detailed study on recurrent deep learning models.Tey employed a sequential feature fusion technique to combine the functionalities of diferent layers in the network, specifcally on the RNN, LSTM, and GRU hidden layer features.Subsequently, the fused features from the recurrent hidden layer were forwarded to an integrated meta-classifer for classifcation.However, recurrent deep learning models exhibit slower training and testing times compared to CNNs, particularly when processing larger datasets.Moreover, they encounter limitations when addressing the issue of attack class imbalance.Wang et al. [38] introduced an intrusion detection model that leverages Improved deep belief networks (DBNs) employ a kernel-based extreme learning machine (KELM) with supervised learning capability, as an alternative to the BP algorithm in DBNs.Experimental evaluations were conducted on the KDDCUP99, NSL-KDD, UNSW-NB15, and CICIDS2017 datasets, demonstrating the robustness of the proposed approach.
By applying optimization algorithms to the engineering design problem of intrusion detection systems [39], it is possible to identify globally optimal intrusion detection model structures or parameters that can adapt to various network environments and intrusion behaviors [40].Kanna and Santhi [41] combined Hierarchical Multiscale LSTM (HMLSTM) and CNN to efectively extract and learn spatiotemporal features.Tey also employed a novel metaheuristic method called Lion Swarm Optimization to fnetune the hyperparameters of the model, thereby enhancing the learning rate of spatial features.In their other proposed deep network model, BWO-CONV-LSTM, Kanna and Santhi [42] utilized the Black Widow Optimization (BWO) algorithm to optimize the hyperparameters and achieve the desired architecture.However, their experiments were limited to binary classifcation and did not consider the detection of specifc attack types.Balasubramaniam et al. [43] proposed the Gradient Hybrid Leader Optimization (GHLBO) algorithm to train Deep Stacked Autoencoders (DSAs) for efective DDoS attack detection.Yang et al. [44] introduced a hybrid partitioning strategy in the Negative Selection Algorithms (NSAs), which divides the feature space into grids based on the density of sample distributions.Tis strategy generates specifc candidate detectors in the boundary grids to efectively mitigate vulnerabilities caused by boundary diversity.Finally, the NSA is enhanced through self-clustering and a novel gray wolf optimizer, enabling adaptive adjustment of detector radius and position.
Te use of deep learning methods in solving the intrusion detection problem has been shown in current research to compensate for the limitations of shallow machine learning techniques in detecting high-dimensional data and extracting nonlinear feature information.Table 1 provides a chronological summary of the approaches discussed in the related literature in this section.However, deep learningbased intrusion detection techniques still have the following limitations: (1) they require a large number of parameters to be trained, resulting in high time and space costs for running the models.Currently, parallel processing with multiple GPUs is often needed to handle large-scale data; (2) when the model becomes too deep, it can lead to the vanishing or exploding gradient problems due to the long back propagation path during gradient descent; (3) existing deep learning algorithms are primarily designed for solving problems in other domains, while network trafc exhibits characteristics of large scale and high dimensionality, and network intrusion trafc is characterized by hidden diversity.As a result, many existing deep learning models are not fully suitable for the feld of intrusion detection.With the development of information technology, network intrusion techniques have also advanced signifcantly.Network intrusion behaviors are becoming increasingly covert, making it more difcult to detect diferences between network intrusion data and normal data.Tis paper fully considers the complex and variable characteristics of existing network intrusion data.Starting from improving the algorithm's ability to analyze data, a Fourier Neural Network (FNN) is designed based on deep learning, which has stronger feature extraction and representation capabilities for complex data.Te intrusion detection model designed with FNN as the core is end-to-end and does not require manual feature selection.It can learn features directly from the raw data, thereby improving classifcation performance and demonstrating stronger generalization ability.Additionally, the high-energy fltering process in the model can be used to handle data noise, reducing the impact of weak noise signals on the neural network's fnal performance and improving the model's performance and robustness.

4
International Journal of Intelligent Systems

Fourier Neural Network Model
Te detailed structure of the FNN is shown in Figure 2. Te FNN can be composed of multiple Deep Fourier Neural Network Blocks (DFNNBs), and the detailed structure of the DFNNB is shown in the upper part of Figure 2. Te DFNNB processes the data as follows: (1) Before entering the jth DFNNB, the data are transformed by the DNN to a diferent feature space by mapping the learned feature representation of the j− 1th DFNNB, achieving a change in data dimension; (2) Te one-dimensional data , obtained after processing by the DNN, is input into the jth DFNNB; (3) In the jth DFNNB, X j− 1 is frst dimensionally expanded by HNN, ftting the sampled network intrusion data X t j in the multitemporal space; (4) After obtaining X t j , split X t j into m 1-dimensional tensor data X t ij .X t ij denotes the ith time domain spatial sampling of network intrusion data within the jth DFNNB, and m and M are manually set hyperparameters and are taken as integers; (5) Te Fast Fourier Transform (FFT) is performed on this X t ij , respectively, to obtain m representations of the time domain signal in the frequency domain space, is fltered using a high-energy fltering process that removes the noisy mass signal to obtain the high-energy spectrum X hf ij ; (7) Using the inverse Fourier transform of the fast Fourier transform, obtain the time domain representation X ht ij of X hf ij ; (8) Finally, each X ht ij is summed and compressed to obtain a feature signal X j that integrates the spatial samples in each time domain, X j is a vector where the number of elements is the same as X j− 1 .
Similar to traditional DNN, VGG19, and VGG16, FNN allows for a deeper exploration of data features by stacking multiple DFNNBs.In FNN, a DNN is added at the end of multiple DFNNB structures, enabling the mapping of feature representations learned by the DFNNBs to the sample labeling space, thereby achieving the goal of training classifers and learning global features of the target.

Hadamard Neural Network.
Te top left part of Figure 2 provides a detailed illustration of the Hadamard Neural Network structure.Based on Figure 2, it can be observed that in the jth DFNNB, the HNN assigns diferent weights W je to the input data X j− 1 to obtain the signal expansion matrix X e j .Ten, a weight matrix W jh with the same shape as X e j is used in the Hadamard product operation with X e j to ft the network intrusion data X t j sampled in the multitemporal space.Te above process can be represented by ( 1) and (2) (the weights W e and W h are optimized using the back propagation algorithm combined with the gradient descent algorithm, and the specifc optimization process will be elaborated in detail in the subsequent sections).
In ( 1), W je is a vector and the number of elements in W je is m.m is a manually set hyperparameter.From the above, it is clear that m determines how many time domain spaces the ftted network intrusion data are sampled in.

Forward Propagation Process of Information in Fourier
Neural Network Layer.It can be seen from Figure 2 that the forward propagation of information in FNNL mainly accomplishes four operations: (1) Fast Fourier Transform; (2) Highenergy Filtering Process; (3) Inverse Fast Fourier Transform (IFFT); and (4) summation and compression process of X ht ij .Te following paper will introduce the above four processes in detail, of which the process 4 operation is relatively simple, and this paper will be introduced together with the process 3.

International Journal of Intelligent Systems
Output:

Fast Fourier Transform inside FNNL.
Te theory and methods of the Fourier transform have a wide range of applications in many disciplines such as mathematical equations, linear system analysis, and signal processing.
Since computers can only handle discrete sequences of fnite length, it is the discrete Fourier transform (DFT) that really operates on computers [45].
From the above analysis, it is evident that FNNL performs a discrete Fourier transform on Te discrete sequence X t ij is composed of N elements, and the operation process of obtaining (3) and k is an integer.From (3), it can be seen that calculating one X f ij [k] needs to complete N times of complex multiplication and N − 1 times of complex addition, and calculating all the values of X f ij [k] needs to complete N 2 times of complex multiplication and N × (N − 1) times of complex addition, and the time complexity of the algorithm is O(N 2 ).
In order to reduce the algorithm time complexity and the running cost of the FNN, this paper uses the Fast Fourier Transform (FFT) with a running time complexity of O(N/2log N 2 ) within the FNNL [46].FFT is a fast computational method for DFT.When using FFT, control N � 2 L and L is a positive integer, which can be realized in FNNL by controlling the number of neurons in the DNN part of DFNNB.
Te core idea of FFT is to continuously divide the sequence X t ij into two sets of sequences with the number of elements: X t ij− 1 and X t ij− 2 according to the odd and even nature of the positions of the elements therein, and then perform the DFT operation on X t ij− 1 and X t ij− 2 , and the above process can be expressed by equations ( 4)-( 6) as follows: where 0 ≦ z ≦ N/2 − 1 and z is an integer, and are the DFT transform results of X t ij− 1 and X t ij− 2 , respectively.Equations ( 5) and ( 6) together form the FFT transform result of X t ij .Referring to ( 5) and ( 6) as a butterfy operation process, the process is represented by Figure 3 as follows.
Te number of complex multiplications and additions required to compute X f ij after one division of the sequence X t ij is N 2 /2 + N/2 and N 2 /2, respectively, so the workload is approximately halved by one decomposition.
According to the FFT, the elements in the sequence X t ij are continuously divided into two subsequences with an equal number of elements based on the odd and even nature of their positions, following the rule shown in (4), until each subsequence contains only one element.Ten, the butterfy operation, as illustrated in Figure 3, is performed on each subsequence until the fnal result of the FFT transformation is obtained.In order to illustrate the above process clearly, Figure 4 demonstrates the FFT operation with N � 8 as an example.
Tere is a specifc correspondence between the elements in X t ij and the elements in X t ij -end .Te correspondence is as follows: the bth element in Here, b represents the inverted bit order of the binary encoding of a. Table 2 illustrates the relationship between each element in X t ij -end and each element in the original sequence X t ij , using N � 8 as an example.Te specifc implementation process can use Rader's algorithm [47][48][49][50] to obtain the correspondence between the sequence X t ij -end and the elements in the sequence X t ij , and then obtain the sequence X t ij -end .Since Rader's algorithm has been introduced in many literature works, it will not be repeated here.
After determining the elements in the sequence X t ij -end using Rader's algorithm, X f ij can be obtained according to the butterfy operation rules shown in Figures 3 and 4. Te process can be represented by Algorithm 1.
In summary, in FNNL, the fast Fourier transform of X t ij , denoted as X f ij , is obtained by combining the Radix algorithm with butterfy operations.Tis process can be represented by (7) for simplicity.From Figure 5, it can be observed that before entering the s-th high-energy fltering module, X f ij needs to be assigned a coefcient vector

High-Energy Filtering Process
Tese coefcient vectors (M s ij ) are determined during the neural network training process using gradient descent algorithm.
Te Hadamard product is performed between X f ij and M s ij to obtain the primary energy wave of In the s-th high-energy fltering module, all Z s ij are combined to form International Journal of Intelligent Systems the primary energy wave matrix E s j .Tis process can be represented by ( 8) and (9).
According to the rule shown in equation ( 4), continuously divide the elements in the sequence X ij t into two groups of equal size based on their odd or even positions until each subgroup contains only one element.
Step one In the actual implementation process, it is not necessary to follow equation ( 4 X ij t -end [7] Layer 1 butterfly operation

International Journal of Intelligent Systems
In the actual implementation process, X f ij can be vertically concatenated to form a spectrum matrix X f j .Similarly, M s ij can be vertically concatenated to form a coefcient matrix M s j .Te Hadamard product is then performed between X f j and M s j .Tis process can be represented by (10): (1) High-Energy Selection Algorithm.In this paper, the element Z s ij in row i of E s j is called the primary energy wave corresponding to X f ij .Each element in Z s ij represents the amplitude of the wave at diferent frequencies.Te sum of these amplitudes at each frequency represents the total energy of the wave.A larger amplitude indicates a more energetic wave, which can be considered as the primary component.To identify the main components Z s ij in E s j and attenuate the other nonmain components, this paper proposes a high-energy selecting (HSE) algorithm.Te operation steps of the HSE algorithm are as follows: Step 1: Initialize the importance vector ] T , initially C s and the internal elements of V are all 1, initialize the number of iterations n.
Step 2: Calculate the sum of the amplitudes of each energy wave.Let E s j perform Hadamard product operation with C s , and the process can be expressed as (11).Ten, E s j × C s is subjected to matrix multiplication operation with V to obtain the energy matrix aggregation matrix E s e , and the process can be expressed as (12).
Step 3: Te process of processing each element within E s e using softmax is carried out with the updated importance vector C s , which is represented by (13).
Step 4: Repeat steps 2 and 3 for n times, with the importance vector C s obtained during the nth iteration serving as the fnal importance coefcients for each primary energy wave.
Step 5: Replace C s in (11) with C s n , then execute (11) to assign importance coefcients to each primary energy wave, resulting in the fnal energy matrix aggregation matrix E s ee .Ten, calculate the columnwise sum of E s ee to obtain the high-energy spectrum Te aforementioned process can be represented by ( 14) and (15).
In formula (15), Te pseudocode for the high-energy selection algorithm is shown in Algorithm 2 as follows: Trough the above analysis, it can be observed that the HSE algorithm determines the importance of each energy wave based on the relative magnitude of the 1-norm of the vector.It achieves the objective of assigning distinct importance coefcients to diferent energy waves through iterative iterations.Consequently, the HSE algorithm is capable of efectively identifying the primary components Z s ij in E s j while attenuating the nonprimary components.Tis process can be succinctly represented by (16).To maintain the invariance of the input eigenvalues' properties, it is necessary to convert the fltered frequency domain data X hf ij back to the time domain data X ht ij , which can be processed by neural networks.Tis conversion is achieved using the inverse discrete Fourier transform on the high-energy spectrum X hf ij , as shown in (17).
where (17) has the same structure as (3), and only requires replacing the input and W nk N with W − nk N .Tus, (3) and ( 17) are completely identical.However, directly using (17) to process X hf ij in terms of computational cost is impractical, as discussed in Section 4.2.1.To address this, a similar approach as in Section 4.2.1 is employed here, using the inverse fast Fourier transform (IFFT) to accelerate the computation process of (17).Since (17) shares the same structure as (3), the operation of IFFTaligns with that of FFT.Initially, X hf ij is processed using Rader's algorithm to obtain the sequence X hf ij -end , which will undergo the butterfy operation.Once X hf ij -end is obtained, X ht ij can be derived by applying the butterfy operation rules to X hf ij -end .Specifcally, this can be achieved by modifying the input of Algorithm 2 to X hf ij -end .Tis entire process can be simplifed as shown in (18).
After obtaining X ht ij , it is necessary to perform a summation and compression operation on X ht ij , specifcally by performing matrix addition on X ht 1j , X ht 2j , . . ., X ht ij , . . ., and X ht mj .Tis operation allows for the preservation of the extracted feature information from DFNNB while enabling further processing of the extracted feature data by subsequent network structures.Te process of obtaining the fnal output X j of FNNL through the summation and compression of X ht ij , as shown in Figure 2, is represented by (19). (

Information Back Propagation and Parameter Update.
In FNN, there are two parts: DNN and DFNNB.Te back propagation process of gradient information in the DNN part has been extensively discussed in existing literature.Terefore, in this paper, we focus on elaborating the back propagation process of gradient information within the DFNNB.Specifcally, we analyze the back propagation process of internal gradient information in the jth DFNNB.Te variables that require gradient information calculation in the jth DFNNB include the coefcient matrix M s j in FNNL, and the weights W jh and W je in HNN.Let L denote the loss between the network output value and the labeled value.From equations ( 10), ( 16), (18), and (19), the gradient relationship between L and the coefcient matrix M s j , as presented in Section 4.1, can be expressed by (20).
In order to be able to calculate the gradient of the coefcient matrix M s j in FNNL smoothly on the computer, a further transformation of ( 20) is required.According to the analysis in Section 4.1, it can be seen that ( 18) is equivalent to (17), and ( 17) can be rewritten in the form of matrix operation as shown in (21): where V N can be expressed by (22): Input: Primary energy wave matrix E s j .Output: (1) Initialize importance vector C s , Initialize the summation vector V, Initially, the internal elements of C s and V are both

International Journal of Intelligent Systems
In (22), W N � e − j2π/N .From the analysis in Section 4.2.2 and ( 14) and ( 15), it can be deduced that the work accomplished by (16) in the high-energy fltering process is equivalent to (23): From equations ( 20), (21), and ( 23), the gradient of the coefcient matrix M s j in FNNL can be expressed by (24) as follows: Equation ( 24) represents the back propagation process of gradient information for the coefcient matrix M s j inside FNNL in a DFNNB.X f j is a matrix formed by combining X t ij , where each X f ij is obtained from the corresponding X t ij through FFT transformation, which is equivalent to applying DFT to each X t ij to obtain the corresponding X t ij .Te process of obtaining X f j by performing DFT on X t ij can be expressed by (25).
X t ij is the output of HNN, and the only trainable parameters in HNN are the weights W je and W jh .According to the chain derivation rule, the gradient information of W je and W jh in HNN can be back propagated.Equations ( 27) and ( 28) represent the back propagation process of gradient information for W jh and W je in DFNNB, respectively.
Equations ( 24), (27), and (28) show the back propagation process of gradient information of the to-be-trained variables M s j , W jh , and W je in the jth DFNNB.Te back propagation process of the gradient information of the parameters to be trained in the remaining DFNNBs is the same as the back propagation process of the gradient information of the variables to be trained in the jth DFNNB.
From equations ( 24), (27), and (28) combined with the gradient descent algorithm, the updating process of the coefcient matrices M s j of FNNL in the jth DFNNB, and the weights W jh and W je in the HNN can be expressed in equations ( 29), (30), and (31), respectively, as follows: M s ′ j , W ′ jh , and W ′ je denote the updated M s j , W jh , and W je , respectively, and η is the learning rate, which is able to complete the updating of the parameters in the DFNNB according to equations ( 29)- (31).14 International Journal of Intelligent Systems Sections 4.1 and 4.2 elaborate the structure and information forward propagation process of DFNNB in FNN, and Section 4.3 describes the backward propagation process of the information of the parameter gradient to be updated and the parameter updating process in DFNNB.

Time Complexity Analysis.
A DFNNB consists of HNN (Hadamard Neural Network) and FNNL (Fourier Neural Network Layer).In this subsection, the time complexity of executing a DFNNB is analyzed.
First, let us analyze the time complexity of HNN.Te computational process of HNN involves equations ( 1) and (2).Equation (1) represents the matrix multiplication of an N * 1 vector with a 1 * M vector, with a time complexity of O(NM).Equation ( 2) involves the Hadamard product of an N * M matrix with another N * M matrix, which also has a time complexity of O(NM).Terefore, the time complexity of HNN is O(NM).
Next, let us consider the time complexity of the Fast Fourier Transform (FFT).As mentioned in Section 4.2, this process requires the fast Fourier transform of M vectors, resulting in a time complexity of O(MNlogN).Trough the FFT, a frequency domain representation of the data, X f ij , is obtained.Subsequently, a high-energy fltering process is performed on X f ij .In this process, each X f ij is multiplied elementwise with a coefcient vector M s ij to obtain the primary energy wave Z s ij .Terefore, the time complexity of processing one All Z s ij together form the primary energy wave matrix E s j .E s j is then processed using a high-energy selection algorithm that involves K iterations.In each iteration, the key computational steps include (11)- (13).Equation (11) represents the Hadamard product of matrices, with a time complexity of O(MN).Equation (12) represents matrix multiplication, also with a time complexity of O(MN).Equation (13) represents the softmax function, with a time complexity of O(N).Terefore, the time complexity of the high-energy selection algorithm with 1 iteration is O(MN), while the time complexity of the high-energy selection algorithm with K iterations is O(KMN).
Finally, according to the analysis in Section 4.2.3, the inverse transform of the Fast Fourier Transform (FFT) needs to be performed.Its time complexity is the same as the FFT, which is O(MNlogN).
In summary, the time complexity of a DFNNB can be expressed as O(NM) + O(MN) + O(KMN) + O(MNlogN).

FNN Detection Steps
Step 1: Perform preprocessing operations on the data, including normalization and splitting the dataset into a training set and a test set.
(1) Determine the structure of the FNN and initialize its parameters.
(2) Input the training set data into the frst DFNNB.Using equations ( 1) and ( 2), the HNN fts the network intrusion data in the multitemporal space.Ten, the ftted samples are processed by FNNL, resulting in fltered and integrated feature data according to equations ( 7), ( 16), (18), and (19).Tese data serve as the output of the DFNNB and are also used as the input for the next DFNNB layer.Step 3: FNN testing.Input the test dataset into the trained FNN to obtain the classifcation results for each test data.
Te overall fowchart of the FNN intrusion detection is illustrated in Figure 6.

Datasets Description
In order to verify the detection capability of FNN in the face of network intrusion data, this paper not only carries out intrusion detection implementation on the older intrusion detection datasets KDD CUP99 and NSL-KDD but also conducts intrusion detection experiments on the newer intrusion detection datasets UNSW-NB15 and CICIDS2017.Tis demonstrates the intrusion detection capability of FNN in a more comprehensive way and ensures that the experiments are more convincing.

Evaluation Index
Due to the imbalance in the intrusion detection dataset, with a signifcant disparity between the number of normal and abnormal samples, the accuracy rate alone cannot provide a comprehensive evaluation of the intrusion detection algorithm's performance.Terefore, this paper introduces additional evaluation metrics such as accuracy, precision, recall, F1-score, and AUC values to assess the efectiveness of the FNN algorithm [57].Tese metrics are derived from the confusion matrix presented in Table 6, enabling a more comprehensive analysis of the algorithm's performance.
Te above evaluation indicators are defned as follows: Accuracy: It estimates the ratio of the number of correctly identifed samples to the entire test set.Te higher the accuracy, the better the performance of the neural network (accuracy ∈ [0, 1]).It is a good metric for test datasets containing balanced classes.Te accuracy rate is defned as follows: Precision: It estimates the ratio of the number of correctly identifed normal samples to the actual predicted normal samples.Te higher the precision, the better the performance of the neural network (precision ∈ [0, 1]).Te precision is defned as follows, where k is the sample category: Recall: It estimates the ratio of correctly classifed normal samples to the total number of normal samples.If the recall rate is higher, the neural network model is better (recall ∈ [0, 1]).Recall is defned as follows: F1-score: It is the harmonic average of precision rate and recall rate.Te higher the F1-score, the better the performance of the neural network (F1-score ∈ [0, 1]).F1-score is defned as follows, where α k is the weight, representing the proportion of diferent sample categories: ROC (receiver operating characteristic) curve: Its horizontal axis is false positive rate (FPR), and its vertical axis is true positive rate (TPR).Te value of AUC is the area under the ROC curve, which is used as a comparison index for neural network models together with ROC.Te higher the AUC value, the better the neural network model.Te AUC value is calculated as follows: where TP (true positive) is the total number of samples correctly classifed into the normal class, TN (true negative) is the total number of samples correctly classifed into the attack class, FP (false positive) is the total number of samples that misclassifed the normal class as an attack class, and FN (false negative) is the total number of attack samples that are misclassifed as normal.

Experiments and Discussions
In this section, two sets of comparative experiments were designed with the following objectives: the frst group aimed to validate the efectiveness of FNN and explore its performance on diferent datasets, while the second group aimed to investigate the impact of the number of iterations of the HSE algorithm on the performance of FNN.
In the feld of intrusion detection, it is crucial to validate the efectiveness of an intrusion detection method by applying it to various real-world scenarios.To evaluate the efectiveness of FNN, this study utilized FNN for detecting intrusions in the KDD Cup99, NSL-KDD, UNSW-NB15, and CICIDS2017 datasets.Multiple evaluation metrics such as accuracy, precision, recall, F1-score, ROC curve, and AUC value were employed to assess the performance of FNN.
Te deep learning framework used in this study was TensorFlow 2.13.0 (CPU version), and the machine learning library employed was scikit-learn 1.0.2.Te programming language used was Python 3.9.1.Te hardware confguration for this experiment consisted of an AMD Ryzen 7 5800H processor and 16 GB RAM.Te operating system used was Windows 11.
In the specifc implementation of the experiments, for the binary classifcation experiments, normal samples are labeled as 0 and attack samples are labeled as 1.For the multiclassifcation experiments using the CICIDS2017 dataset as an example, normal samples are labeled as 0, Bot attack samples as 1, BruteForce attack samples as 2, Dos attack samples as 3, Infltration attack samples as 4, PortScan attack samples as 5, and WebAttack samples as 6.One-hot encoding is employed for the labeling process.Te epoch value for both the FNN and the comparative deep learning models during training is set to 50.Te Adam optimization function with a learning rate of 0.001 is utilized, and the Categorical Crossentropy is adopted as the loss function.8.
When performing binary classifcation tasks on the data, the number of neurons in the last layer of the DNN network is 1, and the activation function used is the sigmoid function.For multiclassifcation tasks, the number of neurons in the last layer of the DNN network is 5, 10, and 7 according to the diferent datasets (the number of neurons in the experiment on the KDD Cup99 and NSL-KDD dataset is 5, the number of neurons in the experiment on the UNSW-NB15 dataset is 10, and the number of neurons in the experiment on the CICIDS2017 dataset is 7), and the activation functions are all softmax functions.Tables 9-16 presents the accuracy, precision, recall, and F1-score values of FNN1, FNN2, and FNN3 in binary classifcation and multiclassifcation using classical deep learning and machine learning algorithms, including DNN, CNN, RNN, LSTM, RF, LR, KNN, DT, and SVM, on the KDD Cup99, NSL-KDD, UNSW-NB15, and CICIDS2017 datasets.Tables 9-16 demonstrate the superior detection performance of all algorithms.Overall, the performance of each algorithm on KDD Cup99, NSL-KDD, and CICIDS2017 is better than that on UNSW-NB15.Tis discrepancy can be attributed to the higher data complexity of UNSW-NB15 compared to other datasets.In deep learning methods, CNN, RNN, and LSTM exhibit relatively poor performance.Tis can be attributed to the absence of translation invariance and clear sequential relationships in network intrusion data.Among traditional machine learning methods, RF and KNN demonstrate more prominent performance.Tis is primarily due to the fact that RF is an ensemble algorithm that incorporates multiple decision trees, providing certain advantages over individual traditional machine learning algorithms.Additionally, the network intrusion data have a relatively low feature dimension, which contributes to the advantageous performance of KNN in handling such data.Due to the higher performance requirements of algorithms for multivariate classifcation compared to binary classifcation, the overall performance of each algorithm decreases in various performance metrics when performing multiclassifcation on each dataset.Tis performance decrease is particularly evident in the classifcation of the UNSW-NB15 dataset, which is more complex than the other three datasets and includes nine types of attacks, including newer attack types.Terefore, it is expected that the algorithms would exhibit a signifcant decrease in performance on the UNSW-NB15 dataset.In general, each algorithm in the experiments efectively detects the attack samples in the dataset.Analyzing Tables 9-16, FNN demonstrates superior detection performance compared to other algorithms in the majority of cases.Furthermore, the detection performance of FNN continues to improve with an increase in the number of DFNNB layers within the FNN.
In the vast majority of cases, FNN demonstrates superior detection performance compared to traditional methods.In the binary classifcation test on the KDD Cup99 dataset, each detection method achieves excellent detection results.Specifcally, FNN2 and FNN3 exhibit accuracy rates exceeding 0.996, which is a 19.2% improvement compared to the worst-performing LSTM.Moreover, as the depth of FNN increases, both precision and recall also increase.Notably, FNN3 achieves precision and recall rates of 0.999 and 0.997, respectively, surpassing other methods.However, overall, the performance diferences between FNN2 and FNN3 are not signifcant, indicating that further increasing the depth of FNN does not yield substantial improvements in performance.In the multiclassifcation test on the KDD Cup99 dataset, most algorithms, except LSTM, exhibited a slight decrease in performance.Notably, FNN3 demonstrated the smallest decrease in accuracy.One of the factors contributing to the marginal improvement in LSTM's performance is the use of a multiclassifcation dataset, which partially alleviates the issue of class imbalance present in the binary classifcation dataset.Tis observation suggests that LSTM is less adept at handling signifcant imbalances in sample quantities.Furthermore, FNN consistently outperforms other algorithms in terms of precision and recall, indicating its superiority.Due to certain limitations of the KDD Cup99 dataset, this study proceeds with further experiments on the NSL-KDD dataset, which is an improved version of the KDD Cup99 dataset.Te NSL-KDD dataset provides a better means to evaluate the performance of various algorithms in intrusion detection.Notably, the NSL-KDD dataset has a signifcantly reduced number of samples in the training set, resulting in fewer data features being learned by the detection methods.As a consequence, all algorithms experience varying degrees of degradation in their detection capabilities.However, when comparing Tables 9-16, it can be observed that FNN demonstrates a relatively smaller decrease in performance compared to other algorithms.Moreover, FNN3 consistently achieves optimal performance indicators when performing both binary classifcation and multiclassifcation tasks, indicating its superiority over other algorithms in classifying the KDD Cup99 and NSL-KDD datasets.
When performing binary classifcation on the UNSW-NB15 dataset, FNN consistently outperforms other algorithms in the majority of cases.Furthermore, increasing the depth of FNN leads to a certain degree of performance improvement.Te UNSW-NB15 dataset is relatively new and contains numerous novel attack types, resulting in higher data complexity compared to the NSL-KDD and KDD Cup99 datasets.At this stage, the accuracy rates of DNN, CNN, LSTM, LR, DT, and SVM fall below 0.800, with SVM achieving an accuracy rate of only 0.686.Additionally, the accuracy rates of RNN and KNN are below 0.850.Te algorithms with accuracy rates exceeding 0.900 are FNN1, International Journal of Intelligent Systems FNN2, FNN3, and RF, with RF having an accuracy rate of 0.900, which is lower than that of FNN1, FNN2, and FNN3.In terms of precision and recall, FNN1, FNN2, and FNN3 outperform other methods.When performing multiclassifcation on the UNSW-NB15 dataset, the accuracy rates of FNN1, FNN2, and FNN3 are 0.790, 0.846, and 0.853, respectively.At this stage, the accuracy rates of RF and KNN are 0.809 and 0.802, respectively, which are higher than FNN1 but lower than FNN3 and FNN2.Overall, FNN2 performs similarly to FNN3, and both outperform other algorithms.
Trough the analysis of Tables 15 and 16, it is found that FNN's performance in binary classifcation on the CICIDS2017 dataset is not as good as in multiclass classifcation.Tis is because the CICIDS2017 dataset has less noise contamination, and when performing binary classifcation, there are only two labels, so the infuence of data noise on the classifcation results is relatively small.However, if the fltering frequency of FNN is too high, it will lead to the loss of non-noise information.Terefore, the efectiveness of our designed high-energy selection algorithm cannot be fully exerted.For the selection of the  16, it is evident that as the number of DFNNB layers in FNN increases, the FNN detection performance also improves consistently, with FNN3 achieving an high accuracy rate of 0.995.Te multiclassifcation result of CICIDS2017 is greatly afected by data noise, which signifcantly impacts its multiclassifcation results.However, the high-energy selection algorithm in FNN efectively flters out the noise information, thereby enhancing the classifcation performance.Meanwhile, it can be observed that RNN and LSTM exhibit the poorest performance among the deep learning algorithms, indicating that the unoptimized design of RNN and LSTM is not suitable for intrusion detection, which also refects the absence of clear sequential relationships in network intrusion data.In terms of precision and recall, FNN demonstrates a signifcant advantage over other algorithms, except RF.RF performs well among machine learning algorithms, highlighting the strengths of RF as an ensemble algorithm.Te F1-score is an evaluation metric that combines precision and recall, providing a better refection of algorithm performance.Tables 9, 11, and 13 indicate that, when performing binary classifcation on each dataset, FNN2 and FNN3 consistently achieve the optimal F1-score across all three datasets, with FNN1 also outperforming most other algorithms.Among the ML algorithms, RF and KNN exhibit International Journal of Intelligent Systems relatively higher F1-scores compared to the others.Tis highlights the superiority of RF as an ensemble algorithm over traditional single algorithms, while also showcasing the advantages of KNN in handling low-dimensional network intrusion data.Overall, FNN demonstrates superior detection performance in binary classifcation on the KDD Cup99 dataset, the NSL-KDD dataset, and the UNSW-NB15 dataset.Tables 10, 12, 14, and 16 reveal that in most cases, FNN2 and FNN3 exhibit a signifcant advantage in F1-score compared to other algorithms when performing multiclassifcation on each dataset.Tis suggests that other algorithms are more prone to false negatives in multiclassifcation tests on these datasets, whereas FNN performs better.In summary, FNN exhibits superior detection performance in both binary and multiclassifcation tasks on the KDD Cup99 dataset, the NSL-KDD dataset, the UNSW-NB15 dataset, and the CICIDS2017 dataset.Compared to other algorithms, FNN has a lower likelihood of false negatives during the detection of network intrusion data.Te confusion matrix is an analytical table used in machine learning to summarize the predictions made by classifcation models.It presents the relationship between the true attributes of the sample data and the predicted classifcation types in the form of a matrix.Te confusion matrix is a commonly used method for evaluating the performance of classifers.It allows for the visualization of classifcation results and enables the calculation of various evaluation metrics.Te confusion matrix provides a clear understanding of the classifcation results for normal and abnormal samples after being evaluated by the intrusion detection model.Figure 7 illustrates the confusion matrix of FNN, RF, KNN, and their binary classifcation performance on diferent datasets.From Figure 7 it is evident that, in most cases, the FNN classifcation outperforms other algorithms in terms of accuracy, with FNN3 exhibiting the lowest rates of false positives and false negatives.
For the KDD Cup99 and NSL_KDD datasets, the KNN algorithm performs the best overall among the machine learning algorithms.Although KNN outperforms FNN1 in terms of detection performance on these datasets, it falls short compared to FNN2 and FNN3.By comparing Figure 7(i) with Figure 7(q), we can observe that on the KDD Cup99 dataset, FNN3 only misses 75 attack samples and incorrectly identifes 11 normal samples as anomalies.FNN3 demonstrates a clear advantage in terms of both false negatives and false positives.Similarly, comparing Figure 7(j) with Figure 7(r) on the NSL_KDD dataset, FNN3 misclassifes 174 abnormal samples as normal samples, whereas KNN misclassifes 234 abnormal samples as normal samples, indicating a higher false negative rate for KNN.When it comes to detecting the UNSW-NB15 dataset, KNN's performance is noticeably inferior to that of FNN and RF.Tis is due to the high dimensionality and complexity of the UNSW-NB15 data.For the UNSW-NB15 and CICIDS2017 datasets, the RF algorithm achieves the best overall performance among the machine learning algorithms.Analyzing Figures 7(c), 7(g), and 7(k) together with Figure 7(o), we can observe that RF has a higher number of misclassifcations than the FNN1, FNN2, and FNN3 on the UNSW-NB15 dataset, suggesting that RF does not perform as well as FNN.Furthermore, by examining Figures 7(d), 7(h) and 7(l), it is evident that FNN2 has the fewest misclassifcations among the FNN models on the CICIDS2017 dataset.Specifcally, the number of attack samples misclassifed as normal samples accounts for 0.016 of the total attack samples, while the number of normal samples misclassifed as attack samples accounts for 0.0038 of the total normal samples.Although Figure 7(p) illustrates that RF has fewer misclassifcations than FNN on a single dataset (CICIDS2017), an analysis of the combined confusion matrices shown in Figure 7 indicates that FNN has fewer false positives and false negatives across all the datasets, suggesting its superior performance in detecting various datasets when compared to other algorithms.
To present the multiclassifcation experiment results more intuitively, we extracted 20,000 samples from each dataset based on the proportion of each type.Tese samples were processed by the respective FNN models, and the resulting values were passed to t-SNE for visualization.Additionally, the same 20,000 samples were directly passed to t-SNE without any processing for visualization, resulting in Figure 8.
From Figures 8(a), 8(c), and 8(e), it is visually evident that the distribution of the KDD Cup99 dataset is the simplest.In fact, applying t-SNE directly to the KDD Cup99 dataset yields satisfactory segmentation results.However, Figure 8(a) clearly shows that there is signifcant overlap between the Probe, R2L, U2R data and the Normal, Dos data, indicating that t-SNE alone cannot efectively diferentiate the Probe, R2L, U2R data from the KDD Cup99 dataset.Figure 8(b) demonstrates that after processing the data with FNN, t-SNE exhibits an improved ability to distinguish the Probe, R2L, and U2R data, particularly in distinguishing the Probe data.Due to the limited amount of training data for R2L and U2R, the impact of the algorithm on enhancing the discrimination of these two types of data is not very pronounced.Nevertheless, from Figure 8(b), it can be observed that the R2L and U2R data are mostly located at the edges of the Normal and Dos data.Tis indicates that even with a small amount of training data, FNN still possesses the capability to diferentiate attack data.From Figure 8(c), it is evident that the NSL-KDD dataset, as an improvement over the KDD Cup99 dataset, is more complex.Unlike the KDD Cup99 dataset, which exhibits more distinct diferentiation results after direct processing with t-SNE, the distribution of various data types in the NSL-KDD dataset is closer after t-SNE processing, with the R2L and U2R class data showing signifcant overlap with other data types.Figure 8(d) clearly demonstrates that after undergoing FNN processing, all data types exhibit signifcant improvements compared to the nonprocessed data.At this stage, the Normal, Dos, and Probe data show more distinct diferentiation, while the R2L and U2R data, although overlapping with other data types, are mostly located at the edges of the other data clusters.
From Figure 8(e), it is evident that the UNSW-NB15 dataset is more complex compared to the KDD Cup99 and NSL-KDD datasets, with a greater variety of attack types and a more chaotic distribution of various data types.Furthermore, Figure 8(f ) clearly demonstrates a signifcant and efective diferentiation between diferent types of attack data after FNN processing.
Figure 9(g) illustrates that the CICIDS2017 dataset encompasses a wide range of attack types, resulting in a confusing distribution of these attack types.Notably, Bot, BruteForce, and WebAttack exhibit signifcant overlap with the Normal class, indicating a potential misclassifcation as Normal.Upon observing Figure 9(h), it becomes apparent that FNN processing signifcantly enhances the data compared to the unprocessed data.Specifcally, a clear diferentiation emerges between the Normal, DoS, and PortScan categories.Moreover, Bot, BruteForce, and WebAttack form distinct clusters and reside at the periphery of the clusters comprising other data types.
In conclusion, the visualized images derived from the original dataset and processed by the FNN reveal a distinct clustering pattern of diferent data types in the lowdimensional space.Te proposed FNN model in this study demonstrates accurate identifcation of various types of anomalous network trafc.While the FNN exhibits superior detection performance in the feld of intrusion detection, as a novel neural network, there remains signifcant room for further enhancement in its capabilities.
Figures 9 and 10 depict the accuracy curves and accuracy box plots for each dataset, respectively.Te accuracy curves provide a clear visualization of the convergence of each algorithm during the training process, while the box plots ofer insights into the presence of outliers and the distribution characteristics of the data.
From Figure 9(a), it can be observed that in the binary classifcation of the KDD Cup99 dataset, all deep learning algorithms, except for CNN, converge at a faster pace.However, CNN exhibits a sudden increase in accuracy at the 8th iteration, indicating an unstable optimization process for its parameters.Tis substantial jump in performance suggests instability within CNN.Furthermore, Figure 10(a) reveals that the accuracy distribution of DNN and FNN is more concentrated, indicating a comparatively stable training process for these algorithms.Combining this with Figure 9(a), it becomes evident that the training process of DNN and FNN is smoother compared to other deep learning algorithms.In the case of multiclassifcation on the KDD Cup99 dataset, CNN still experiences a jump in accuracy at the 8th iteration, further highlighting the instability of its training process.Notably, LSTM achieves an accuracy exceeding 0.950 during training; however, Table 10 demonstrates that LSTM's fnal accuracy is only 0.864, indicating overftting.It is worth mentioning that the accuracy trends of the remaining algorithms show no signifcant deviation from Figure 9(a).
When binary classifcation is performed on the NSL-KDD dataset, it can be seen from Figure 9(c) that FNN has the fastest convergence speed among the deep learning algorithms, and its accuracy converges to an optimal value at the early stage of training.Figure 10(c) shows that the accuracy of FNN is stably distributed around an optimal value, and there are only a few outliers.As in the case of binary classifcation on KDD Cup99, the accuracy of CNN also shows large jumps in the training process when binary classifying the NSL-KDD dataset, and it can be seen from Figure 10(c) that the accuracy of CNN is distributed in a large range, which indicates that the CNN training process is extremely unstable.In Figure 10(c), there are more outliers in the accuracy rates of LSTM and RNN, which also indicates the instability of their training process.From Figure 9(d), it can be seen that when multiclassifcation is performed on the NSL-KDD dataset, the instability of various algorithms in the training process is signifcantly improved, but CNN still shows large fuctuations in its accuracy in the early stage of training.
Analyzing Figures 9(e) and 10(e), it can be observed that during binary classifcation of the UNSW-NB15 dataset, the training processes of all neural networks, except for DNN, exhibit varying degrees of fuctuations.Additionally, Figure 10(e) displays a higher number of outliers, which can be attributed to the continuous improvement of algorithm accuracy with increasing training iterations.Figure 9(e) reveals that the accuracy of DNN stabilizes at a relatively low value early in the training process, indicating that DNN does not efectively capture more useful data features in subsequent training stages.In the case of multiclassifcation on the UNSW-NB15 dataset, LSTM and RNN initially show improvements in accuracy during early training stages, but quickly stabilize at a lower level.Tis suggests that LSTM and RNN fail to learn signifcant data features through training at this point.Moreover, Figure 10(f ) clearly illustrates the concentrated distribution of LSTM and RNN accuracy at a lower level.
Analyzing Figure 9(g), it can be observed that during binary classifcation of the CICIDS2017 dataset, the accuracy of various deep learning algorithms, except for DNN, continues to improve with increasing training iterations.Each neural network's training process exhibits varying degrees of fuctuations, with Figure 10 for RNN's parameters is highly unstable.From Figure 10(h), it can be observed that during multiclassifcation of the CICIDS2017 dataset, CNN and FNN demonstrate a more concentrated distribution of accuracy, indicating a more stable training process.Tis observation, combined with Figure 9(h), clearly indicates that the training process of CNN and FNN is comparatively smoother than that of other deep learning algorithms.Figure 10(h) illustrates that DNN's accuracy is centrally distributed at a lower level, indicating that DNN fails to learn useful data features through training.On the other hand, LSTM and RNN exhibit signifcant fuctuations with increasing training iterations, with a higher number of outliers in their accuracy rates, further highlighting the instability of their training process.
Overall, in most cases, FNN demonstrates faster convergence towards a higher accuracy rate than other algorithms.Te accuracy distribution of FNN remains more stable during the training process, with only a few outliers in the accuracy values.Tis indicates that FNN exhibits a more consistent and stable performance.
In conclusion, the FNN exhibits superior detection performance in binary and multivariate classifcation of network intrusion data than traditional deep learning and machine learning algorithms.It is worth noting that we have validated the FNN model on multiple datasets, further enhancing the credibility of the experimental results.Tis indicates that the exceptional performance of the FNN model is not solely attributed to chance or specifc features of a particular dataset but possesses general applicability and stability.Terefore, the FNN model holds signifcant potential for application in the feld of network intrusion detection.

Comparative Experiment Two.
In Section 4.2.2, we proposed a high-energy selection algorithm.In this algorithm, the number of iterations n needs to be manually selected.Based on the analysis in Section 4.2.2, it is observed that a larger value of n leads to a stronger fltering efect of the HSE algorithm on the noise signal wave, but at the cost of losing more information.Conversely, a smaller value of n results in a weaker fltering efect on the noise signal wave but with less information loss.Excessive noise signals can degrade the algorithm's fnal performance, and a higher degree of information loss can also lead to a decline in performance.To investigate the impact of n on the fnal performance of FNN, experiments were conducted with diferent values of n for FNN3: n � 1, 2, 3, 4, 5, 6. Te performance metrics of FNN3 on the KDD Cup99, NSL-KDD, UNSW-NB15, and CICIDS2017 datasets were recorded under each condition.Te detailed experimental results can be found in Tables 17-24.
As shown in Tables 17-24, the experimental results of classifying the KDD Cup99, NSL-KDD, UNSW-NB15, and CICIDS2017 datasets indicate that the detection performance of FNN initially increases and then decreases with the increase in the number of iterations (n) of the HSE algorithm.Specifcally, the detection performance of FNN usually reaches its peak when n is set to 3 or 4. When classifying the KDD Cup99 dataset, the experimental results demonstrate that the detection performance of FNN is optimal when n is set to 3. On the other hand, for the NSL-KDD dataset, the peak detection performance of FNN occurs at diferent values of n.For the binary classifcation task, the best detection performance of FNN is achieved when n is set to 3, while for the multiclassifcation task, the optimal detection performance is observed when n is set to 4. When classifying the UNSW-NB15 dataset, the experimental results suggest that the detection performance of FNN is highest when n is set to 4. Regarding the binary classifcation task of the CICIDS2017 dataset, the best detection performance of FNN is achieved when n is set to 2, while for the multiclassifcation task, the optimal detection performance is observed when n is set to 3.
From the above observations, it can be noted that increasing the value of n within a certain range efectively flters out noise signals through the HSE algorithm, thereby improving the fnal detection performance of the FNN.However, when the value of n exceeds a certain range, excessive information fltering by the HSE algorithm may lead to information loss and a subsequent decrease in the FNN's detection rate.Further analysis reveals that for complex datasets, the FNN's detection performance reaches its peak at larger values of n.For instance, in the UNSW-NB15 dataset, the optimal value of n is found to be 4. Te identifcation of this phenomenon provides valuable guidance for future research, enabling the selection of an appropriate value of n in the FNN based on the complexity of the dataset and specifc application requirements, ultimately enhancing the classifcation performance of the FNN.

Comparison of Performance with Other Studies.
Table 25 provides a comparison of diferent models in terms of accuracy and time consumption.Each row represents a model proposed for intrusion detection in diferent references.Te table includes the accuracy, precision, recall, F1-score, training time, inference time, and loss of our proposed models on diferent datasets, including KDD Cup99, NSL-KDD, UNSW-NB15, and CICIDS2017, for comparison with existing studies.It is important to note that the selection and division of datasets in diferent references may have an impact on the training time and inference time of the corresponding models.
In our experiments, we specifcally recorded the training and inference time for FNN1, FNN2, and FNN3 for each dataset.Te training time represents the time required to train each epoch of the training set, while the inference time represents the time required to make predictions on the test set.
In order to determine the superiority of the proposed FNN compared to the other studies listed in As shown in Table 25, some studies rely on a single dataset to validate their proposed models.However, this approach is not ideal as it does not ensure the generalization of the model.Tis is because a single dataset may possess unique features that the model can learn and overft to, resulting in higher accuracy scores on that particular dataset but lower performance on other datasets.To test the model's generalizability, it is crucial to evaluate it on multiple datasets with varying characteristics.Models that perform well across diferent datasets demonstrate their applicability to diverse environments, which is a key requirement for an efective IDS.Our proposed model has been validated on four distinct datasets, including Cup99, NSL-KDD, UNSW-NB15, and          Hence, our FNN model can be considered more generalizable, indicating its ability to perform well on previously unseen datasets.In comparison to many other models in the table, our model exhibits strong performance across multiple datasets, making it more reliable than models that solely excel on a single dataset.Overall, the proposed FNN stands out due to its high accuracy, competitive training time, and efcient inference time, making it a superior model compared to many other studies mentioned in Table 25.

Conclusion and Future Work
Tis paper hasproposed a novel approach called the Fourier Neural Network (FNN).It utilizes Fast Fourier Transform to convert network intrusion data into the frequency domain space, applies fltering to the converted data, and can subsequently convert the fltered data back to the time domain space.By enabling processing of network data in both the time and frequency domain spaces, FNN enhances the neural network's capability to handle complex data and extract features.To eliminate noisy signals, this study has also introduced a high-energy fltering process (HFP), which further enhances the performance of FNN in intrusion detection by fltering out low-energy noise signal waves based on their amplitude.Te experimental results have shown that FNN has signifcant advantages over existing classical neural network algorithms and traditional machine learning algorithms in dealing with intrusion detection problems.In addition, this study has explored the efect of the number of iterations n in HSP on the performance of FNN, and the results have shown that choosing the right value of n is crucial for achieving the best performance.In summary, the main contributions of this study are twofold.First, the FNN framework was developed to enhance the ability of neural networks to process complex data using Fourier transform.Second, HSP was introduced to efectively eliminate the noisy signals and further improve the performance of FNN in intrusion detection.As a novel neural network, FNN has certain limitations in its application in other domains.For instance, currently, FNN is only capable of performing Fourier transform on one-dimensional data, thereby restricting its ability to handle images and videos.Furthermore, the FNN structure is specifcally suited for classifcation tasks and not applicable to regression problems.In the future, further enhancements will be made to FNN to enable efective processing of image and video data, as well as its applicability to regression problems.Additionally, improvements will be made to the HSE algorithm to develop a more efcient fltering algorithm, thereby enhancing the performance of FNN.

Figure 2 :
Figure 2: Structural details of Fourier Neural Network.

Figure 4 :
Figure 4: An example of FFT operation process.
and N � 2 k , k is a positive integer.(1)initialize dep to record the number of layers of the butterfy operation in which it is currently operating; m to record the number of elements involved in each set of butterfy operations; w m is principal mth unit root.(2) for dep ⟵ 1 to log 2 N , do / * Determine the current layer of butterfy operation * / m � 2 dep / * Te number of elements participating in each group of butterfy operations * / w m � e i2π/m � cos(2π/m) + i sin(2π/m) / * principal mth unit root * / for k ⟵ 0 to n − 1 by m / * Perform the butterfy operation as shown in step two of Figure 4 * / w � 1 for j ⟵ 0 to − 1 + m/2 / * Complete a set of butterfy operations as shown in Figure 3 * /

( 3 )
Use the output of the previous DFNNB layer as the input for the current DFNNB layer.(4) Repeat step (3) until the output of the fnal DFNNB layer is obtained.Tis output is then used as the input for the DNN part of the FNN, and the output of the DNN part becomes the fnal output of the FNN.(5) Calculate the loss value between the FNN's output and the labeled values.Use the gradient descent algorithm to update the trainable parameters in the FNN, reducing the diference between the output and the labeled values.Continue this process until the specifed training iterations are reached.

8. 1 .
Comparative Experiment One.Te FNN network structure signifcantly infuences the detection performance of FNN.To validate the efectiveness of FNN, this paper International Journal of Intelligent Systems
of which have yielded impressive results.

Table 1 :
Comparison table of intrusion detection algorithms.

Table 2 :
Example of element relationship between X t ij and X t ij -end .
International Journal of Intelligent Systems 1, Te number of initial iterations n. (2) for t ⟵ 1 to n, do / * Iterate n times * /

Table 3 :
Training and testing connection records of KDD Cup99 and NSL-KDD.

Table 4 :
Training and testing connection records of UNSW-NB15.

Table 5 :
Training and testing connection records of CICIDS2017.

Table 7 :
Network structure of FNN.

Table 25 ,
it is necessary to analyze the diferent metrics provided, including model accuracy, training time, and inference time.By comparing these metrics, we can gain insights into the advantages of the proposed FNN.Te FNN achieves high accuracy on multiple datasets, with FNN3 achieving the highest accuracy of up to 99.7%.Tis demonstrates the model's efectiveness in accurately classifying network trafc and detecting intrusions.In comparison to other studies, the proposed FNN consistently maintains high accuracy, surpassing many other models listed in the table.Regarding 28 International Journal of Intelligent Systems training time, the proposed FNN exhibits relatively fast training times compared to other models.Te size and characteristics of the datasets, as well as the division of the training set, can infuence the duration of the training process.Te training time for the FNN ranges from 1 second to 717 seconds.Although the table does not provide training times for all models, the training time of the proposed FNN appears competitive, as it falls within a reasonable range when compared to other models in the table.Inference time refers to the time required for the model to make predictions on new and unseen attacks.In our experiments, we recorded the prediction time of the test set as the inference time.Te proposed FNN demonstrates efcient inference times, ranging from 1 to 152 seconds for diferent datasets.Similar to the training time, the inference time of the proposed FNN is competitive with other models in the table.

Table 17 :
Binary classifcation test results for KDD Cup99 at diferent n values.Te bold values indicate the best performance in these set of experiments.

Table 18 :
Multiclass classifcation test results for KDD Cup99 at diferent n values.Te bold values indicate the best performance in these set of experiments.

Table 19 :
Binary classifcation test results for NSL-KDD at diferent n values.
Te bold values indicate the best performance in these set of experiments.

Table 20 :
Multiclass classifcation test results for NSL-KDD at diferent n values.Te bold values indicate the best performance in these set of experiments.

Table 21 :
Binary classifcation test results for UNSW-NB15 at diferent n values.
Te bold values indicate the best performance in these set of experiments.

Table 22 :
Multiclass classifcation test results for UNSW-NB15 at diferent n values.Te bold values indicate the best performance in these set of experiments.

Table 23 :
Binary classifcation test results for CICIDS2017 at diferent n values.Te bold values indicate the best performance in these set of experiments.

Table 24 :
Multiclass classifcation test results for CICIDS2017 at diferent n values.Te bold values indicate the best performance in these set of experiments.
International Journal of Intelligent Systems

Table 25 :
Performance comparison with other studies.