Efficient Privacy-Preserving Federated Deep Learning for Network Intrusion of Industrial IoT

Intrusion detection systems play a very important role in industrial Internet network security. However, in the large-scale, complex, and heterogeneous industrial Internet of Tings (IoT), it is becoming more and more difcult to defend network intrusion threats due to the insufciency of high-quality attack samples. To solve the problem, an efcient federated network intrusion method called EFedID is proposed for industrial IoT, which can allow diferent industrial agents to collaboratively train a comprehensive detection model. Specifcally, the adaptive gradient sparsifcation method is introduced to alleviate the communication and computation overheads. To protect the data privacy of the agents, a CKKS cryptosystem-based secure communication protocol is designed to encrypt the model parameters through the federated training process. Our proposed system demonstrates exceptional detection performance on the NSL-KDD, KDD CUP 99, and CICIDS 2017 datasets. Notably, on the NSL-KDD dataset, the model compression rate reaches 9 times while the model accuracy reaches 84.31%. On the KDD CUP 99 dataset, the model compression rate reaches 8.9 times while the model accuracy reaches 97.3%. Lastly, on the CICIDS 2017 dataset, the model compression rate reached 6.173 times while the model accuracy reached 95.51%. Te experimental results demonstrate that the proposed method is very suitable for efectively developing a high-accuracy detection model while protecting the data information of industrial agents. Furthermore, the method can be extended to other recent deep learning networks for intrusion detection.


Introduction
Industrial Internet of Tings (IoT) encapsulates communication technologies, edge computing, cloud servers, artifcial intelligence (AI), and existing industrial control systems [1,2].It aims to connect real-world scenarios with distributed computing principles to realize smart manufacturing, resource management, and other processes.With the rapid development of industrial IoT, the network security threats to IoT are becoming more and more serious [3,4] and cyber security has become a key issue for industrial IoT.Network intrusion refers to any unauthorized activity on a digital network, which is one of the most common threats in cyber space.It often involves stealing valuable network resources and jeopardizing the security of networks and/or their data.Hence, intrusion detection methods are proposed to monitor network operations in real time and detect suspicious invasions.
In recent years, to proactively detect and respond to network intrusions, researchers have used artifcial intelligence (AI) technologies to design intrusion detection methods.Deep learning (DL) is predominant in the recent literature on network intrusion detection.Recent studies [5] have defnitely demonstrated that DL techniques can achieve excellent detection accuracy compared to conventional machine learning techniques.Many DL-based models have been used for network intrusion detection.For instance, the paper [6] developed an autoencoder-based intrusion detection framework that harnesses the power of convolutional and recurrent neural networks to proactively identify cyber threats in IIoT networks.Te work placed a strong emphasis on model explainability, empowering security administrators to interpret the underlying data evidence and causal reasoning behind intrusion alerts.Te framework applies a two-step sliding window (SW) to better learn the latent representations of the data features, efectively extracting features including malicious pattern contexts.Ismail et al. [7] investigated electricity theft attacks in smart grid cyberphysical systems and proposed a deep learning-based intrusion detection system.Wazzeh et al. [8] proposed a DTL-(deep transfer learning-) based residual neural network (ResNet) to efectively detect various network threats against the heterogeneous Internet of Tings.More recent related works are presented in Section 2. 1.
Training high-performance deep learning models depends on a large number of high-quality data.Currently, most of the existing DL-based intrusion detection methods assume that developers have sufcient high-quality cyberattack data.However, it is difcult to achieve this assumption in practical scenarios because it is usually very difcult and time-consuming for one industrial IoTowner to collect a large number of cyberattack samples.In addition, the traditional centralized learning approach (CL), which centralizes distributed owners' data to a central server for model building, is also difcult to implement because industrial IoT owners are usually unwilling to share their attack samples with third parties out of security, privacy, and business interest considerations [8][9][10][11].Te open troublesome problem of insufcient training samples has become a major obstacle in training high-quality intrusion detection models.Terefore, how to solve this difculty and develop accurate and efcient intrusion detection methods becomes a challenge for protecting intelligent networks in practical applications.
Te application of federated learning to solve the insufcient data problem is a relatively new research area [9,[12][13][14].Since federated learning can organize diferent participants to collaboratively train a comprehensive model, it has great potential to utilize more data from diferent participants and get better model performance [12,15,16].Tere are also some federated learning-based intrusion detection methods [17,18] that have achieved good performance, but traditional federated learning requires a large number of parameters to be transmitted during the training process, which may not be applicable to resourceconstrained industrial environments.Moreover, it is very possible to be attacked by an adversary during the parameter transmission process, which can cause data privacy leakage.In this paper, we frst develop a new method for multiple industrial IoT owners to cooperatively build a comprehensive intrusion detection model to alleviate the problem of insufcient high-quality attack samples while preserving their local data.In addition, to be more applicable to industrial IoT environments with resource-constrained devices, we introduce adaptive gradient sparsifcation technology, which sends sparse vectors of the model parameters, to alleviate the cryptographic computing and communication overheads.Considering the security of the model parameters in the data transmission process, we design a secure communication protocol based on the CKKS cryptosystem.Compared to other traditional homomorphic encryption schemes, the CKKS scheme has a faster encryption/decryption speed and supports both additive and multiplicative homomorphic encryption.Te main contributions of this study can be concluded as follows.
First, we present a network intrusion method named EFedID, which (1) relieves the open troublesome problem of insufcient training samples and (2) supports data preprocessing at each industrial agent and preserves their local data information.
Second, an adaptive gradient sparsifcation method, named AGS, is developed to alleviate the resource overheads of the EFedID system while retaining high efciency so that it can be deployed to a large number of resource-constrained devices.
Tird, the CKKS cryptosystem-based secure communication protocol is designed for the federated learning system, by which the security and privacy of model parameters through the training process can be well preserved.
We present related works in Section 2 and describe the design of EFedID in Section 3. In Section 4, we analyze the security and functionality of our method.In Section 5, we give comparison experiments to verify the proposed method's performance.Finally, we conclude this study in Section 6.Table 1 shows a summary of the acronyms used in this paper.

Intrusion Detection Schemes for Industrial CPSs (Cyber-Physical Systems).
To fght against cyberattacks, various intrusion detection methods have been proposed.Wang et al. [19] proposed an intrusion detection framework method based on SVM with feature augmentation.However, traditional machine learning-based detection methods are not suitable for massive and high-dimension network trafc data detection.In recent years, owing to the rapid development of deep learning (DL) technologies, many researchers proposed DL-based intrusion detection methods.For instance, Li et al. [20] introduced a convolutional neural network (CNN) to design a network intrusion detection model for industrial IoT, which achieves high accuracy on the NSL-KDD dataset.However, it has been experimented on only one dataset.In practice, the performance of the classifer may fuctuate due to some redundant or inefcient features in diferent datasets.To alleviate this problem, Wu and Li [21] proposed some feature selection methods and introduced a combination of neural networks and random forests to improve the detection performance.Compared to similar methods, their approach provides better results in general by identifying important and closely related features.However, this scheme requires features to be extracted from existing training data samples and it lacks generalization.Te paper [22] proposed a real-time industrial IoT intrusion detection system based on deep autoencoders.Tis system utilizes a statistical feature mining approach to extract relevant features from network trafc data, which is designed to be helpful in improving the model's generalization and 2 International Journal of Intelligent Systems addressing the issues of low detection rates and high false positive rates (FPRs).Nevertheless, it is difcult to obtain enough high-quality data samples for detection model training in practical scenarios.Moreover, due to the sensitivity, privacy, and high value of industrial IoT data, data owners are usually reluctant to share data.Tang et al. [17] proposed a network intrusion detection method based on federated learning.Although this scheme alleviates the problem of insufcient attack samples, it does not consider the FL system resource consumption problem, which does not apply to resource-constrained devices.Khan et al. [18] proposed the federated-SRU IDS model, which employs the improved simple recurrent unit architecture to reduce computational cost and mitigate the gradient vanishing problem in recurrent networks for enhancing the model intrusion detection performance.Moreover, the system facilitates model aggregation through multiple communication rounds within the federated learning architecture, allowing multiple ICS networks and stakeholders to collaboratively build comprehensive IDS models while preserving their data privacy.However, they did not consider the problem that the model parameters may be attacked by an adversary during transmission which is not able to protect the privacy and security of the local data.

Federated Learning-Based Industrial Applications.
Owing to its outstanding performance, many researchers proposed various FL-based industrial applications.For instance, Zhang et al. [23] proposed a rolling bearing fault diagnosis method based on federated learning and convolutional neural network.Lu et al. [24] proposed a federated learning scheme for the digital twin networks by incorporating IoT technologies, which improves communication efciency and reduces the transmission energy cost.Zhang et al. [25] proposed a dynamic fusion-based FL for medical diagnosis to classify COVID-19 infections, which can adaptively determine the participants according to their local model performance and model aggregation scheme based on participants' training time.In 2022, Aloqaily et al. [26] proposed a hierarchical federated learning (HFL) solution based on blockchain, which can provide fast, safe, and accurate decision making for industrial machines.

Our Approach
In this section, we introduce our proposed EFedID, which combines our designed adaptive gradient sparsifcation (AGS) method and the CKKS cryptosystem-based secure communication protocol.We frst describe the system model and then present the detailed operations within the method.
3.1.System Model.Federated learning ofers a solution to the challenge of insufcient data by facilitating collaborative model training among multiple institutions, all without the need to disclose their individual data to each other or to a central server.In this process, instead of transmitting raw data, each agent sends model parameters to a cloud server.We use the generic setting for the federated learning system, where a cloud server and K industrial agents collaboratively train a model for intrusion detection.As shown in Figure 1, the system is composed of three parties: (1) a key generation center (KGC), (2) a cloud server, and (3) K industrial agents.
Tese parties in the FL system are described as follows: (1) KGC: Te KGC is a trusted third-party organization, which is responsible for generating the keys based on the CKKS cryptosystem and distributing the keys to industrial agents.KGC does not send keys to entities outside the system or without access.

Adversary Model.
We assume that the cloud server and all the industrial agents are honest-but-curious entities.Honest-but-curious entity means that it will faithfully follow the designed protocol and not tamper with the calculation results but will attempt to infer private information from the input of other entities in the scheme (industrial agents in this article).Meanwhile, the KGC is considered a trusted third where CkkDec(•) denotes the decryption operation (see more details in Section 3.6).
After obtaining the sparse model parameters sparse (v glo ), each industrial agent updates the local model.
where w a,k denotes the local model parameters of the agent k.
Ten, each agent uses the local data to train the local model, which can be expressed as where MGD denotes the momentum gradient descent algorithm and D batch,k denotes the mini-batch data of the agent k.

Each Industrial Agent Uploads Model Parameters.
Each industrial agent processes the local model parameters according to the sparsifcation rate φ downloaded from the cloud server to obtain sparse (v a,k ) (see more details in Section 3.4).
After sparse (v a,k ) is obtained, each agent encrypts the local sparse model parameters and uploads the encrypted parameters to the cloud server.Te encryption formula is shown in the following equation: where CkkEnc(•) denotes the encryption operation (see more details in Section 3.6).

Cloud Server Updates Global Parameters.
Te server receives the encrypted sparsifcation local parameters Enc (sparse (v a,k )) from all the industrial agents and computes the average model parameters Enc (sparse (v glo )) to update the global model parameters.Encrypted sparsifcation local parameters Enc (sparse (v a,k )) can be aggregated without decryption because the CKKS encryption algorithm is homomorphic.Te server uses the weighted federated averaging method, which assigns varying weights to individual agents based on the proportion of data each agent holds relative to the total dataset size.Te aggregation formula is shown in the following equation: where ϑ k denotes data contribution ratios calculated by Ten, the cloud server uses the AGS to make some adjustments to φ (see more details in Section 3.4).
Finally, the cloud server broadcasts the global model parameters Enc (sparse (v glo )) and adjusts φ to all agents to start the next round of federated training until the end.Tis

Adaptive Gradient Sparsifcation.
Te FL system communication process requires the transfer of a large number of parameters, which can put a huge strain on industrial IoT environments with limited network resources.In order to protect the data privacy of industrial agents, homomorphic encryption technology is usually used in the FL system.Encrypting and decrypting model parameters requires a lot of resources, especially when there are many agents in the system, which is often difcult on resource-constrained devices.To reduce the consumption of computing  Te Cloud Server receives the encryption model parameters Enc (sparse (v a,1 )) , Enc (sparse (v a,2 )), ..., Enc (sparse (v a,K )) and the validation accuracy value α 1 , α 2 ,..., α K uploaded by each industrial agent.
Encryption: Enc (sparse (v a,1 )) Repeat the following steps: International Journal of Intelligent Systems resources and improve communication efciency, a gradient sparsifcation method is introduced in our scheme to reduce the number of parameters for communication transmission.
In this paper, the momentum gradient descent (MGD) optimizer [27] is used to minimize L(b k , w a,k t ).Te optimization objective is formulated as shown in the following equation: where the local data of industry agent k are denoted by D a,k .At every iteration t, industrial agent k computes the loss L(b k , w a,k t ) and the model parameters v a,k t with regard to w a,k t .
For t � 1, 2, 3, . .., N, where w a,k t is the weight vector obtained at the end of the current iteration t, L(f(x i ; w a,k t ), y i )) is the loss obtained at the end of the previous iteration t(t � 0 corresponds to model initialization), and ∇ w a,k t L(f(x i ; w a,k t ), y i ) ∈ R Τ is the sparse gradient of the local loss in iteration t with T defned as the dimension of the weight vector.
Formally, in every iteration t, the weight vector of the industrial agent k is adjusted by MGD as follows: When all industrial agents upload model parameters to the server, the global model parameters are computed as In every iteration t, the global model weight vector is optimized by MGD by Te main goal of gradient sparsifcation is to exchange only a small number of important gradients.Te cloud server calculates the sparse global gradients according to these gradients and sends the updated sparse global gradients to each agent.Te gradient sparsifcation scheme is very efective in reducing the computation and communication costs in the FL system.Han et al. [28] provided theoretical analysis to prove that local and global models can still converge after gradient sparse.Setting the sparsifcation rate in the gradient sparsifcation scheme is critical and requires a trade-of between model performance and resource savings.A high sparsifcation rate can signifcantly reduce the resource overhead, but it can also signifcantly degrade model performance.A low sparsifcation rate guarantees a limited loss of model performance but saves very few computing and communication resources.To better balance model performance and resource consumption, we adopt the adaptive gradient sparsifcation method referring to the adaptive learning rate method [15].We consider a slightly diferent procedure in which instead of using a fxed compression ratio, we adaptively adjust the compression rate according to the training information of model performance.We will see in the experiments in Section 5.3 that our AGS performs better than the fxed compression ratio approach (GS).
In AGS, we use the accuracy value as the model performance metric.Industrial agent k computes the local accuracy value denoted by α k .Te local accuracy values α k are sent from each industrial agent to the cloud server, and the server calculates the average of the accuracy values, which can be expressed as We adjust the sparsifcation rate φ when the value of α glo is continuously below the highest accuracy τ times.Te adjustment formula is as follows: where d rate denotes the decay rate.We set the value of d rate to 0.001 in this paper based on experimentation.Te τ value afects the sensitivity of the proposed scheme.We set the value of τ to 2 in this paper based on experimental.Te cloud server sends the sparsifcation ratio φ to all industrial agents.After the agents receive φ, each agent calculates the absolute value abs(v a,k t ) of the local model parameters v a,k t and then sorts abs(v a,k t ) of the local model from smallest to largest values, setting the value at φ% position as the local sparsity threshold θ for the t-th iteration.
Each industrial agent updates model parameters v a,k t whose absolute value is evaluated to exceed θ instead of all model parameters.
6 International Journal of Intelligent Systems where sparse(v a,k t ) denotes the sparse model parameters.Θ [•] denotes the identity function that is equal to v a,k t if the condition is satisfed and zero otherwise.
Te rate of agent k can be computed as where the total number of sparse (v a,k ) is denoted by |sparse (v a,k )| and the total number of v a,k is denoted by |v a,k |.Finally, the cloud server receives the encrypted sparse model parameters Enc (sparse (v a,k )) and computes the sparse global model parameters Enc (sparse (v glo )) according to equation (5).
Te overall process is shown in Algorithm 2. Next, we analyze the resource consumption of the AGS algorithm.U A denotes the number of agents, the total number of model parameters is denoted by M, and each model parameter takes up 4 bits.Te dropout layer with a dropout rate of 0.5 is adapted to control overftting.Finally, we use the softmax layer that transforms the MLP output from exponential to probabilistic form.Te cross-entropy function is used as the loss function, and the formula is as follows: where y i,j represents the true label of i-th sample,  y i,j represents the probability label of i-th sample classifed by softmax layer, J denotes the number of categories, and B denotes the size of the batch data.

Model Training.
Model training is performed on the industrial agent side.At round t, local models of all industrial agents are initialized to the global model w glo t .Ten, each industrial agent trains our designed intrusion detection model locally on their private data resource D k .w a,k t+1 is updated locally based on MGD optimizer, which can adjust the weights of the model to reduce the cross-entropy function value.

CKKS Cryptosystem-Based Secure Communication
Protocol.Now, we introduce our designed secure communication protocol based on CKKS [29].Compared with Paillier homomorphic encryption algorithm, CKKS has great advantages in computing speed [30,31], which is very helpful in improving the operation speed of federated learning.It is noticeable that TLS/SSL is used in our protocol to create secure communication channels, which helps reduce the risk of potential external adversary attacks during parameter transmission between the cloud server and industrial agents.Our CKKS cryptosystem-based secure communication protocol contains a total of four functions: KeyGen, CkkEnc, CkkAgg, and CkkDec.. Te detailed algorithms are as follows: (1) KeyGen (λ): Te key generation center generates the public key and the private key by executing KeyGen (λ).Te key generation center is given the security parameter λ.It selects a prime p and an integer q 0 , L, τ, sets q l � p l •q 0 , where l = 1, 2, . .., L. Te parameter N � N(λ, q L ) and B-bound error distribution χ � χ (λ, q L ) selected reasonably as parameters.Next, a random number s � HWT(h) is chosen to generate the security key sk , where HWT (h) is a signed set of n-dimensional {1, 0, 1} N vector with Hamming weight h.In addition, a random number A ⟶ Z N * τ q L , e ⟶ χ τ are selected to generate the public key pk � (−As + e(mod q L ), . Finally, the key pairs are distributed to industrial agents by the key generation center.
(2) CkkEnc (sparse (v a,k ), pk): Te CkkEnc function is executed by the industrial agents to encrypt the local model parameters before uploading them to the cloud server.Randomly select a vector r← 0, 1 { } τ and obtain the ciphertext of the local model parameters CkkEnc(sparse (v a,k )).It is formulated as

Analysis
In this section, we present the security analysis and functionality analysis of our EFedID scheme.

Theorem . Our proposed method can guarantee that no industrial agents' data information will be leaked if the CKKS cryptosystem-based secure communication protocol can against Chosen Plaintext Attack (CPA) when all components involved in the system are noncolluding.
Proof.According to CPA-secure, we assume that there is an adversary in our system who intercepts the ciphertext of all the model parameters.However, in the absence of collusion, the adversary has no way to obtain the decrypted private key sk to decrypt the ciphertext.Te key generation center will not distribute key pairs to entities other than agents.So, the adversary cannot infer the true value of the model parameters.For eavesdroppers in the process of parameter transmission, we have employed TLS/SSL (Transport Layer Security/ Secure Sockets Layer) to establish separate communication channels between the server and each agent to provide additional protection for the system to prevent eavesdroppers.Moreover, each industrial agent cannot obtain the model parameters of other agents because each industrial agent uses a separate channel to communicate with the cloud server.Terefore, in our system, we can ensure the privacy of the parameters of all models and the privacy of the industrial agent's data.□ 4.2.Functionality Analysis.Now we compare the functionality of the latest FL deep learning models as shown in Table 2. MFL [27] uses the momentum term to accelerate convergence but does not take privacy into account.DPFL [32] and PFL [33] do not use the momentum term to speed up convergence, although privacy protection is considered.Compared with these schemes, our proposed EFedID uses AGS to reduce resource consumption while considering data privacy protection and model convergence rate during the training process.

Datasets and Data
Preprocessing.We select three network intrusion detection datasets from diferent domains to validate the efectiveness of our scheme.Te KDD CUP 99 [34] dataset is extracted based on packet traces from military network environments, and it is one of the most widely used datasets in the feld of intrusion detection.Te NSL-KDD [35] dataset is an updated version of the KDD CUP 99, which eliminates redundant data and selects a number of records from each difculty level group that is inversely proportional to the percentage of records in the original KDD CUP 99 dataset.Terefore, the classifcation rates of diferent machine learning methods vary over a wider range, which makes the accurate assessment of diferent learning techniques more efective.In the recent literature [20,27], many researchers use the NSL-KDD dataset as a valid baseline dataset, which can help researchers to compare diferent intrusion detection methods.CICIDS 2017 [36] is collected by the Canadian Institute for Cybersecurity Research in 2017, which contains the benign and newest common types of attacks similar to real-world data.Terefore, we decide to use the above three datasets to test the benchmark performance of our approach.Te number of data records and characteristics in diferent datasets are given below: We assume that the data are independently and identically distributed (IID).Te data are shufed and Next, we use one-hot encoding to encode the three features of "protocol_type," "service," and "fag."Tis operation can convert data into numerical values, which is convenient for neural network processing.In addition, we use Min-Max normalization to scale the samples to the range of 0-1.

Models.
We design experiments to study the performance of the proposed EFedID for multiclass classifcation on the NSL-KDD, KDD CUP 99, and CICIDS 2017 datasets.We use our designed CNN-based intrusion detection model as the local model of industrial agents.A dropout layer with a dropout rate of 0.5 is used between the frst fully connected layer and the second fully connected layer to control overftting.Rectifed linear unit (ReLU) is used as the activation function of each hidden layer.A momentum gradient descent (MGD) optimizer with a momentum rate of 0.5 is adopted to train models.Te loss function is the crossentropy cost function.Te mini-batch size, the aggregation round, the learning rate, and the decay rate d rate used in training the networks are 512, 1500, 0.05, and 0.001, respectively.

Performance Metrics.
In this paper, we use the accuracy, precision, recall, and compression rate to evaluate the performance of the diferent methods: (1) accuracy-the proportion of correctly classifed samples to the total number of samples; (2) precision-the percentage of records predicted to be of categories are indeed those categories; (3) recall-the proportion of all correctly predicted category records to exact types of categories; (4) F1-score-the harmonic mean of the accuracy and recall, with the maximum value of 1 and the minimum value of 0.5; and (5) compression rate-the proportion of the number of the transmission parameters computed as CR � where N denotes the aggregation rounds.

Case 1.
In this experiment, we conduct a comprehensive evaluation by comparing our proposed EFedID approach with three other prominent methods: centralized learning (CL), privacy-preserving federated learning (PFL), and PFL-GS using a fxed sparsifcation rate of 0.8.Te comparison is performed across various dimensions to assess the efciency and efectiveness of EFedID.4(a), 5(a), and 6(a), our EFedID initially exhibits lower accuracy in the early aggregation rounds due to its higher sparsifcation rate during this phase.Ten, with the increase of the number of aggregation rounds, the accuracy of EFedID steadily improves, eventually converging to levels closely matching those of PFL.Further insights are provided in Figures 7-9, which present histograms derived from experimental results.

Accuracy. Accuracy is a critical metric for evaluating intrusion detection models. As illustrated in Figures
Notably, the centralized learning (CL) approach attains the highest accuracy, achieving 84.33%, 97.35%, and 95.85% on diferent datasets, making it the top-performing model among the four learning approaches.For the purpose of baseline comparison, we chose PFL.Importantly, it is worth highlighting that EFedID and PFL-GS (φ � 0.8) demonstrate comparable model performance to PFL.Tis result underscores that EFedID does not afect model performance on diferent datasets.

Resource Costs.
Resource consumption is a critical concern in practical deployments.Figures 4(c), 5(c), and 6(c) show transmission parameter curves on diferent datasets, while Figures 7-9 present histograms of these transmission parameters.Te number of transmission parameters plays an important role in determining cryptographic computation and communication overheads.Observing Figures 4(c), 5(c), and 6(c), it becomes evident that the number of transmission parameters in PFL signifcantly increases with each iterative round across diverse datasets.In contrast, the transmission parameter count in PFL-GS exhibits a more gradual increment.Te histograms of the experimental results clearly illustrate the efcacy of our GS method, reducing the transmission parameter count by a factor of 5 (φ � 0.8).Moreover, EFedID further enhances compression rates by 1.92 times, 1.78 times, and 1.23 times on diferent datasets.Tis observation underscores the AGS method's capability to further reduce resource consumption while maintaining model performance.

Case 2.
In this section, we delve into the validity of our AGS method by contrasting it with PFL utilizing diferent fxed sparsifcation rates (φ � 0.8 and φ � 0.9) for comparison with our EFedID approach.Figure 10 provides an overview of the experimental curves, while Figure 11 presents detailed experimental results on the NSL-KDD dataset.An initial observation reveals that EFedID's accuracy convergence appears slower than that of PFL-GS (φ � 0.9) and PFL-GS (φ � 0.8) in the early rounds of aggregation.Tis divergence can be attributed to the comparatively higher sparsifcation rate employed by EFedID during these initial aggregation rounds.Figure 10(c) underscores this point by showing that the number of transmission parameters gradually increases with iterative rounds in PFL-GS (φ � 0.8).However, when compared to PFL-GS (φ � 0.8), both EFedID and PFL-GS (φ � 0.9) exhibit a more gradual increase in the transmission parameter count.From Figure 11, we observe that the accuracy of PFL-GS (φ � 0.8) reaches 84.45%, surpassing the other two learning approaches, with EFedID closely following.Conversely, the lowest accuracy, 83.07%, is attributed to PFL-GS (φ � 0.9), underscoring the signifcant impact of the sparsifcation rate on model accuracy.Regarding the reduction in the number of transmitted parameters, PFL-GS (φ � 0.9) achieves a remarkable  12 International Journal of Intelligent Systems compression ratio of 10 times, followed by EFedID at 9.6 times.However, EFedID outperforms PFL-GS (φ � 0.9) in terms of accuracy improvement.Notably, EFedID manages to halve the number of transmitted parameters compared to PFL-GS (φ � 0.8).
To validate our AGS method further, we extend our analysis to the KDD CUP 99 and CICIDS 2017 datasets, as illustrated in Figures 12-15.Remarkably, similar trends emerge across these datasets, afrming the consistent improvement in training efciency achieved by our adaptive sparsifcation rate method compared to the fxed sparsifcation rate approach.
In conclusion, our results underscore the efectiveness of our AGS method in enhancing training efciency, particularly when compared to fxed sparsifcation rate methods.EFedID's adaptability to diferent datasets and its ability to maintain accuracy while signifcantly reducing resource consumption make it a promising choice for privacypreserving federated learning in various applications.

Case 3.
In this subsection, we explore the adaptability of our EFedID method to a distributed computing environment and investigate the infuence of the momentum rate (c) on its convergence rate.

Varying the Number of Agents N.
We begin by varying the number of industrial agents (N) to assess how EFedID performs under diferent distributed scenarios.Te curves of EFedID for N ranging from 10 to 50 are depicted in Figures 16-18, derived from simulations.As illustrated in Figures 16(a), 17(a), and 18(a), with the increase in the number of aggregation rounds, all accuracy curves of EFedID gradually improve and converge to similar accuracy levels by the end of the training process.Complementary insights are provided in Figures 19-21, presenting histograms based on experimental results.Notably, our EFedID exhibits comparable performance between scenarios with N � 20 and N � 50 when compared to the baseline scenario with N � 10 (chosen as a reference point).Tis observation suggests that small variations in the number of agents (N) do not signifcantly afect EFedID's performance.Tis robustness underscores the adaptability of our proposed scheme to edge-computing industrial environments.
In summary, our EFedID method's consistent performance across varying numbers of agents demonstrates its suitability for a distributed computing environment.Additionally, our investigation into the impact of the momentum rate (c) on the convergence rate will provide further insights into EFedID's optimization potential in such environments.

Impact of c.
In this subsection, we explore the infuence of the momentum rate (c) on the performance of our EFedID method, as illustrated in Figures 22-24.As observed, in the fgures, the convergence rates gradually increase as c is adjusted from 0 to 0.7.Tis trend indicates that the incorporation of the momentum term enhances the convergence speed of EFedID.Notably, the sparsifcation rate experiences a slightly faster decay as c increases.Tis behavior can be attributed to the acceleration of adaptive adjustments caused by the model rapidly approaching the adjustment threshold.To provide detailed insights into EFedID's performance with diferent c values, we present the results in Figures 25-27.In this evaluation, we use EFedID with c � 0 as the baseline for comparison.Across the NSL-KDD dataset, EFedID with c � 0.7 achieves the highest accuracy at 0.8450, followed by EFedID with c � 0.5 (accuracy of 0.8431) and EFedID with c � 0.3 (accuracy of 0.8414).Notably, all models with c values outperform the baseline model with c � 0, highlighting the positive impact of the momentum term on accuracy improvement.A similar trend is observed in the KDD CUP 99 dataset, where EFedID with c � 0.7 attains the highest accuracy of 0.9735, followed by EFedID with c � 0.5 (accuracy of 0.9712) and EFedID with c � 0.3 (accuracy of 0.9706).Once again, all models with varying c values     CNN [37] 92.14 Tree-stage SMOTE-GAN-VAE [39] 94.0 NIDS-CNNLSTM [38] 97.05 EFedID 97.3

95.51
Bold values represent the best results of the several compared models.
International Journal of Intelligent Systems

Performance Comparison with Other Latest Methods.
In the same way, we compare the performance of EFedID with some of the latest other notable state-of-the-art methods.Models CNN [37] and NIDS-CNNLSTM [38] were both trained using the KDDTrain + dataset and tested using the KDDTest + dataset.Te model SMOTE-GAN-VAE [39] was evaluated on the NSL-KDD and CICIDS 2017 datasets, respectively.Muli-CNN [40] and SCAE + SVM [41] showed their performance using the NSL-KDD dataset.FL [42] and CPIO [43] showed their performance using the CICIDS 2017 dataset.Table 3 presents the experiment results of the above compared methods.It can be seen that the proposed EFedID model performs better than other state-ofthe-art methods in multivariate classifcation.

Conclusion
In this article, we proposed EFedID, which allows distributed industrial agents to collaboratively train a network intrusion detection model.Te resource overheads of the FL system are reduced by using the AGS algorithm, and the performance of the model is guaranteed.Moreover, we demonstrated the preservation of data privacy for industrial agents through a secure communication protocol based on the CKKS cryptosystem.Te experiment results show that our proposed method can efciently organize multiple industrial agents to collaboratively train a network intrusion detection model and protect the industrial agents' data.
Since the training data are collected by each client in its own local environment and follows its usage patterns, the size and distribution of the local datasets often diferent.Non-IID (not identically and independently distributed) data for privacy-preserving federated learning models can refect practical scenarios.In the future, we will also focus on investigating how to adjust the model aggregation interval to further improve training efciency for privacy-preserving federated learning on non-IID data.Tis will allow us to continue pushing the boundaries of secure and efcient collaborative learning in diverse industrial settings.

( 2 )
Cloud server: the cloud server contains three functions: (a) it establishes diferent communication channels for the industrial agents, (b) it collects the trained models from the industrial agents and then aggregates the models to obtain a new global model, and (c) it adjusts the sparsifcation rate φ.(3) Industrial agents: Each industrial agent collects and stores the raw data.Tey are responsible for training a model locally and uploading the trained model to the cloud server until the end of training.In our system, each agent sparsifes the model parameters before sending them.
denotes the number of the data of the agent k, and |D all | denotes the number of all agent's data.Te global sparsifcation model parameters are obtained by adding all encrypted sparsifcation model parameters according to the data contribution ratios.
3.4.1.Communication Cost.According to our algorithm, the communication cost mainly arises from data transmission.At each training, industrial agents send sparsifcation local models and local accuracy information to the cloud server, which incurs a communication cost of [(1 − φ) M + 1] * 4 U A bits.Te cloud server then returns the aggregated global model and the updated sparsifcation rate φ, which is of size 4(M + 1) bits.Tus, the communication cost is about 4[U A M(1 − φ) + M + U A + 1] bits per training round.3.4.2.Computational Cost.Before uploading the local model, the industrial agent needs to perform a sparsifcation operation on the local model parameters.Te time complexity of obtaining the absolute value of the local model parameters v a,k t is O(T) and T is the dimension of the local model parameters.Te time complexity of computing the sparsifcation threshold is O(T logT).Te time complexity of selecting the updated model parameters is O(T).Te total time complexity of the AGS algorithm is O(T logT + 2T).

3. 5 .
Te CNN-Based Intrusion Detection Models.Now, we describe our designed intrusion detection model in detail.3.5.1.Model Structure.Figure3shows the CNN-based intrusion detection model structure, which consists of a CNN module, a MLP (multilayer perceptron) module, and a softmax layer.Te structure of the input data is resized to 5 * 5 square, as expected by the CNN model.Te CNN module contains two convolutional blocks, and each convolutional block contains one convolutional layer, one batch normalization layer, and one max-pooling layer.Te activation function of each hidden layer is the rectifed linear unit (ReLU).Cblock1 in the CNN module extracts 5 * 5 feature map as the input of Cblock2.Cblock2 extracts 5 * 5 feature map with 16 channels and inputs it into the MLP module.Te MLP module is used to predict classes, which contain two fully connected layers and one dropout layer.

Figure 3 :
Figure 3: Te structure of the proposed CNN-based intrusion detection model.

Table 1 :
Summary of acronyms used.

Table 2 :
Comparison of functionality with the latest FL models.International Journal of Intelligent Systems partitioned to each industrial agent.Before model training, the industrial agents in the FL system process their local data to generate training and testing samples.First, the random forest algorithm is used to perform feature analysis on the data, and the top 25 features are selected for model training.

Table 3 :
Te accuracy of the EFedID and the other latest models.