A Deep Intelligent Attack Detection Framework for Fog-Based IoT Systems

Fog computing provides a multitude of end-based IoT system services. End IoT devices exchange information with fog nodes and the cloud to handle client undertakings. During the process of data collection between the layer of fog and the cloud, there are more chances of crucial attacks or assaults like DDoS and many more security attacks being compromised by IoT end devices. These network (NW) threats must be spotted early. Deep learning (DL) assumes an unmistakable part in foreseeing the end client behavior by extricating highlights and grouping the foe in the network. Yet, because of IoT devices' compelled nature in calculation and storage spaces, DL cannot be managed on those. Here, a framework for fog-based attack detection is proffered, and different attacks are prognosticated utilizing long short-term memory (LSTM). The end IoT gadget behaviour can be prognosticated by installing a trained LSTMDL model at the fog node computation module. The simulations are performed using Python by comparing LSTMDL model with deep neural multilayer perceptron (DNMLP), bidirectional LSTM (Bi-LSTM), gated recurrent units (GRU), hybrid ensemble model (HEM), and hybrid deep learning model (CNN + LSTM) comprising convolutional neural network (CNN) and LSTM on DDoS-SDN (Mendeley Dataset), NSLKDD, UNSW-NB15, and IoTID20 datasets. To evaluate the performance of the binary classifier, metrics like accuracy, precision, recall, f1-score, and ROC-AUC curves are considered on these datasets. The LSTMDL model shows outperforming nature in binary classification with 99.70%, 99.12%, 94.11%, and 99.88% performance accuracies on experimentation with respective datasets. The network simulation further shows how different DL models present fog layer communication behaviour detection time (CBDT). DNMLP detects communication behaviour (CB) faster than other models, but LSTMDL predicts assaults better.


Introduction
IoT gadgets like IoVT, IoMT, smart grids, and smart electrical appliances, inter alia, are exceedingly raising in current technologies, leading to so many attacks on those devices with their prominence in resource sharing. Physical devices like sensors and actuators give on-demand administration over the cloud, but its centralization is hazardous. All with this, on providing services by cloud to IoT faces high challenges in data abeyance, data security, data obtrusion, and data shielding [1][2][3][4][5][6][7][8].
An abstraction layer called Fog is used to ofer services close to the network's edge and solve cloud-based IoT challenges. Fog, a distributed decentralized model, has evolved that lies between the cloud and client devices [2], in providing services with less latency and less bandwidth utilization in the network (NW). Howbeit, the fog nodes have a low level of data privacy and are vulnerable to assaults such as probe, DDoS, man-in-the-middle, port scan attacks, and many others [9]. As a result, the fog layer needs an attack detection system. Noncellular network protocols including LoRa, COAP, LoRaWAN, and MQTT, are required to enable communication between the smart devices. Tese protocols help end users by having low latency and low bandwidth utilization [5,10]. Data collected from end devices gets exchanged and set aside for the devices using fog communication protocols for further speedy retrieval of data. During the exchange of data, the fog layer/fog-node is more susceptible to attacks. Hence, we require a security system in the fog layer to defne attack detection. Tis work uses LSTMDL model to prognosticate fog layer attacks.
Likewise with immense expansion in use of web and huge measure of information move, has caused a more noteworthy number of peculiarities. In equal measure, the reason for attacks is additionally expanding reliably. Numerous associations are consistently working on network attack discovery to ofer secure types of aid to end users. Because of high utilization of cloud administrations and IoT over fog layer prompts more expanded hazard of information infringement. Here in such manner compelled to give or confgure more secured system by DL algorithms which can distinguish the attacks powerfully.
With truly expanding of web, the general public is moving towards present day advancements to foresee, recognize or order and investigation network conduct using ML and DL approaches are broadly utilized. Henceforth attack detection is turning out to be latest pattern and examination scope for cyber threats.
Due to geo-distribution and location awareness fog layer became exacting in its nature. At frst to distinguish attacks ML strategies are profoundly utilized yet inadmissible for enormous magnitude of information. To beat the limit of ML, DL is utilized in distinguishing assaults in the fog layer as it has numerous layers in handling with a high detection rate. On detection of an attacker, the fog node sends the behaviour update of the node to the cloud as malicious and nonmalicious and multilabel classifcation [1,4,9,[11][12][13][14][15].
With a premier detection rate, DL has been used to categorize numerous attacks, producing binary classifcations of typical and aberrant behaviour as well as multilabel classifcations that are sent to the cloud for node behaviour updates [1,4,9,11,12]. Because of the resource restraint nature of IoT, it is preposterous to expect to execute complex DL calculations. Along these lines, DL is reasonable to carry out on fog node/fog layer with high precision. In light of the enormous amount of data, DL is superior to ML calculations. Te LSTMDL model is used in this work to identify a security attack on an IoT application based on fog nodes.
Tis work's accomplishments are as follows: (1) For fog-based IoT systems, a deep intelligent attack detection framework is suggested in this paper. Te framework uses the LSTMDL model to fnd NW security breaches.
(2) Te LSTMDL model is set up in a fog node's compute module to analyze the behaviour of end IoT devices. To choose the most accurate DL model at the fog layer, potentiality balancing is performed across HEM, Bi-LSTM, GRU, CNN + LSTM, DNMLP, and LSTM. (3) Te experimentation is done by using Anaconda platform by considering SDN [16], NSLKDD [17], UNSW-NB-15 [18], and IoTID20 datasets [19] for diferent attacks. (4) From the upshots, it is found that the LSTMDL model showed fner accuracy than the other fve models, and it is considered in this framework to prognosticate the attack. LSTMDL model shows outperforming nature in binary classifcation with 99.70%, 99.12%, 94.11%, and 99.88% performance accuracies on experimentation with respective datasets. (5) Te network simulation is also performed to show the performance of diferent DL models for presenting the behaviour detection time at the fog layer. From this upshot, it is found that DNMLP shows smaller communication behaviour detection time (CBDT) than other models; however; the LSTMDL model performs better in predicting the attacks well. Along with, the CBDT gets reduced with the rise in the number of fog-nodes.
Te following is a discussion of the remaining sections: Te acknowledgements are presented in Section 2, the system design with network design and assault design are mentioned in Section 3, the problem statement is discussed in Section 4, profered deep intelligent assault prognostication structure is described in Section 5, and the performance evaluation with simulation setup, results, and discussion is presented in Section 6. Lastly, Section 7 presents the conclusion.

Literature Review
Numerous studies on the subject are presented in this section, and they present the best DL techniques for an assault prognostication structure for fog-based IoT systems. Samy et al. [1] proposed a framework for attack detection on several cyber-attacks using DL technique resulted high detection rate in multi classifcation with 99.65% and 99.96% detection accuracy in binary classifcation, respectively. Lawal et al. [2] used signature-and anomaly-based methods designed framework with two modules for oddity detection. Obtained accuracy for binary and multi classifcation by module-2 with 99% and 97% for average recall, precision, and F1-score using XGBoost classifer. Module-2 showed six times lesser performance than module-1. Puthal et al. [3] proposed advanced research issues needed for fog architecture and raised chances of threats and discussed the overcoming of threats at each layer in a three-layered architecture. Sudqi Khater et al. [10] considered ADFA-LD and ADFA-WD datasets to address problems on latency, mobility support, and location awareness on cloud using MLP model of lightweight IDS, which resulted in 92%, 94%, and 95% F1-measure, accuracy, and recall on ADFA-WD dataset. Obtained F1-measure, recall, and accuracy with 74% using Raspberry Pi on ADFA-LD dataset. Bhushan's [20] DDoS attack defence framework is proposed on Kali Linux Machine using LOIC on TCP trafc, allowed only legal request for accessing on cloud by framing rules at fog layer using fog defender. Priyadarshini and Barik [21] designed new DDoS defence model using DLMs by obstructing malicious packets transferred to cloud in order to overcome DDoS attacks on ISCX 2012 IDS and CTU-B Botnet datasets. Te experimentation resulted with accuracy of 98.88% accuracy with 10-fold cross validation scheme. Chaudhary et al. [11] made survey on domain of computing and inspect subsisting things related to privacy, security, confronts, limitations, and open directions of research. Douligeris and Mitrokotsa [9] discussed elaborately on segregation of DDoS attack system, advantages, disadvantages, and techniques of defence models. Potluri et al. [12] presented various algorithms like machine learning, deep learning, neural network, blockchain, software defned networks, and genetic algorithm in cloud environment for detection and prevention mechanism. Kalaivani and Chinnadurai [22] designed fog computing intrusion detection model using to predict attacks by CNN and LSTM on NSL-KDD dataset with 96.5% accuracy. To prevent from malicious users the model is deployed in fog layer. Tis model is used in predicting multiclass attack classifcations. By taking into account a variety of criteria, Churcher et al. [23] performed a comparison of various machine learning techniques for binary and multilabel classifcation. Kilincer et al. [24] performed a comparative study using diferent ML algorithms on fve diferent datasets, namely, CSE-CIC IDS-2018, UNSW-NB15, ISCX-2012, NSL-KDD, and CIDDS-001. On comparison, the decision tree classifer proved to be better than the remaining two classifers, SVM and KNN. Many such related research works can also be found in [25][26][27][28][29][30][31][32][33][34][35][36].
Te research gaps that are identifed from the study on distinct attack detection frameworks are observed, for instance, (i) performance accuracy or exactness is evaluated on smaller datasets with fewer attributes which fall behind in better attack detection. Hence, we considered newer datasets DDoS-SDN [16] and IoTID20 [19] datasets with huge number of instances and attributes. (ii) Even with the increase in dataset size, most of the prognostications are made on conventional ML algorithms which do not yield better accuracy for attack detection and became cumbersome to decide best ML algorithm on selected datasets. (iii) From the observations on many datasets, we listed only fewer number of attacks and hence need to be considered dataset with more number of attacks which helps in better prognostication of attacks.

System Design
In this part, a framework is considered with both the NW and assault designs. Te network model portrays about the organization part, the network arrangement, and the correspondence between the organization parts. Attack model portrays how the perpetrator attacks the organization. Te notations used in this paper are shown in Table 1.

Network Design.
Te network design is designed with a three-layered architecture containing cloud, fog, and IoT end devices in the top layer, amid layer, and bottom layer, respectively, [1-3, 23, 24] as depicted in Figure 1. Te Cloud Node (CN), an upper layer that stores updated behaviours (attacker/normal) of the end devices as centralized data storage which is connected to amid layer through gateway (GW) and base station (BSS) using either wired or wireless communication.
Te amid layer, called the secure fog layer, containing k number of fog nodes FN � FN 1 , FN 2 , . . . , FN k which performs computations, localized communication, and data storage for the nearby IoT end device. Tey likewise record the way of behaving of the devices promptly. Te FN for the most part comprises of a CMFN and MMFN. Te CMFN of a FN is empowered where the CMFN is prepared with a DL model to play out an errand to anticipate the ways of behaving of the IoT devices which speaks with the FN in closeness. Te FNs are likewise associated with one another through wired/remote interchanges for informational correspondence among them. Te secure fog layer is associated with the upper cloud layer through GW and BSs. Te correspondence happens utilizing wired/wireless interchanges. Te secure fog layer is likewise associated with the lower layer utilizing GW and BSs, through which correspondence happens.
Te lowest layer referred to as the sensing layer that mainly composed of IoT devices iot 1 , iot 2 , . . . , iot l which does enormous amount of end clients information or solicitations to fog or cloud for quick calculation and administration. For communication with the cloud or fog layer, the IoT devices use BSs and GT.

Assault Design.
Te peculiarity in the NW now and again causes diversion from the ordinary progression of trafc, which prompts an assault by the attacker P k . In a fogbased IoT environment, attackers may originate from IoT devices, protocols, applications, and software. Vulnerabilities can arise on various device parts such as web interface, memory, and frmware. Protocols in IoT end devices, by means of communication channels and related applications and software, are also prone to security issues and attacks [20,21,37]. Figure 2 shows a typical attack sequence model. By taking possession of the IoT devices that are connected to the closest fog node, the attacker launches various attacks on the fog nodes in the fog layer.

Problem Statement
Te problem statement defned in the model with l number of IoT end devices as I � i 1 , i 2 , . . . , i l communicates with k FNs with distinct communication behaviours denoted as CB � cb 1 , cb 2 , . . . , cb l , where k ≤ l and each cb k has set of communication instances (CI) at diferent time intervals Computational Intelligence and Neuroscience 3 (TI), represented as CI � ci 1 , ci 2 , . . . ., ci m and TI � t 1 , t 2 , . . . , t m where m is the number of CI with distinct TI between IoT and FN. On communication, IoT with FN considers diferent attributes to obtain target label which is denoted as either normal (0) or attacker (1) from the dataset where the set of attributes is denoted as A � a 1 , a 2 , . . . , a p . In this work, the main problem is to predict the behaviour of the IoT devices more accurately by training and testing on diferent standard datasets by implementing DNMLP, LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM DL models.

Proffered Deep Intelligent Assault Prognostication Framework
Te profered assault prognostication framework is presented here is designed to tackle the above issue. Te framework principally comprises six stages: (1) network confguration/setup, (2) network's data classifcation setup, (3) deploying deep learning models and confguring the network, (4) identifcation of assault, (5) behaviour update at cloud, and (6) network update at FN. In Figure 3, the operational fow model of these six steps is shown.

Network Setup.
Te cloud CN is frst set up as depicted in Figure 3 which ofers various kinds of help to the users. According to the framework, the cloud stores the IoT devices' behaviours and furthermore updates them in a convenient way. Ten, the FNs are set in the organization in such a way that the IoT gadgets can convey to get administrations in minimal time.  Figure 4, the details of which are explained as follows.

Data Preprocessing.
Te preprocessing of the dataset is as follows: (1) Handling of missing values: the DL model encounters problems when a sizable fraction of the datasets utilized for classifcation have missing values. According to the profered framework, we dealt with the missing values by eliminating the columns or rows which have zeros or null values. Subsequently, we additionally search for mean and median techniques by supplanting the missing values with mean or median. Notwithstanding, it is just employed for numeral data. (2) Feature scaling: datasets having features of variable types and values will need to have their features scaled to meet the specifcations. Normalization and standardization are two of the most well-known methods. To put it simply, the normalization method is used if the data does not have a Gaussian distribution, and the standardization method is used otherwise. Te term "normalization" refers to the process of adjusting the absolute values of attributes in a dataset to create a consistent scale without affecting the relative variances between values. In the process of "standardization," the mean is lowered to zero and the standard deviation is raised to one. (3) One-hot encoding: since the DL model cannot process categorical information, it is necessary to transform the dataset's categorical features into numeral data using one-hot encoding in order to improve prediction. Te categorical data is transformed using this method into a categorical new vector, which maps to an integer and each integer is represented by a binary vector.  Figure 4 shows the information prehandling, preparing, and testing that is performed involving DL model in FN. Computational Intelligence and Neuroscience

Splitting Dataset.
Here, the dataset is partitioned into training and test dataset after completion of data preprocessing. Te DLM is trained on training dataset and model accuracy of prediction is tested on test set. Te partition process of considered dataset is 80 : 20 ratio.

DLM Used for Prognostication of Attack Behaviour.
Te fog nodes CMFN are trained on training dataset after partition using DLMs. In the following sections, we considered various DLMs for IoT devices behaviour prediction [1]. Te models used are DNMLP, LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM.
(1) DNMLP. Fundamentally, an input layer, an output layer, and a hidden layer with an arbitrarily chosen number of hidden layers make up a DNMLP architecture [10]. Except for the input layer, every neuron in this layer uses a nonlinear activation function. Information fows forward in the DNMLP in order to be described, and the neurons are also set up with a backprop algorithm. Te DNMLP design's frst step takes into account the sum of information values i k multiplied by w k : In the subsequent advance, bias b i is added as follows:

Computational Intelligence and Neuroscience
Now Y value is advanced through the activation function ReLU or Softmax, generally denoted by y: (3) Te above function will return zero if Y < 0, and if Y ≥ 0, the result is just input. Now that the fnal step loss (Y − Y) 2 has been calculated, it should be limited if it is higher by modifying w k and b, which should be feasible with an optimizer. As a result, the cost function is calculated as k i�1 � (Y − Y) 2 . We arrive at global minima using the backprop technique in a predetermined amount of cycles, and we can consider this to be the success of preparing DNMLP.
(2) LSTM. LSTM is specifcally made to address the issue of RNN's long-term dependencies [38]. It is employed for categorizing data and producing prognostications. Cell state, input gates, forget gates, and output gates make up each LSTM unit. It is employed in language modeling, network anomaly detection, picture captioning, and other processes. Because LSTM can retain data for a long time, it is frequently used to categorize data. A chain of LSTM units can be depicted as in Figure 5.
An LSTM cell's progressive fow is governed by the following equations: where e t is forget gate, a t is hidden state, n t is input gate, x t is cell state, y t output gate, and s t is cell vector.
(3) Bi-LSTM. It stands for Bidirectional LSTM and works on historical data for extracting spatial features and bidirectional time dependencies [40]. It has been developed for many applications, like protein structure prediction, handwritten recognition, and speech recognition. In the former and future sequences, the best benefts result from the input sequence. In this process, the frst layer is given an input sequence, and the next layer is given an input of reverse copy, where the primary and secondary layers are connected with the same layer of output.
(4) GRU. GRU is a mechanism of RNN in similar fashion to LSTM but with no output gate [1,41]. It is considered a variant of LSTM used to overcome the vanishing gradient problem by means of an update and reset gate. Both gates are utilized to regulate the movement of information into and out of memory. GRU outperforms LSTM, which takes longer on large datasets, in comparison. GRU performs better than LSTM for smaller datasets. Speech signal modeling, handwriting recognition, and polyphonic music modeling all make extensive use of GRU. Te update gate (u) and the reset gate are the two gates that make up the GRU (rs). Te calculation of u and rs gates at time t−1 is illustrated in the following equations: It is a blended DLM intended for visual time series expectations and text-based classifcation, such as video depiction and image chaining. Figure 6 depicts the constructed CNN + LSTM model. Te CNN + LSTM engineering consolidates CNN layers for feature extraction from inputs and LSTM layers for time sequence expectation. CNN + LSTM has accomplished upgrades in speech recognition on DNN. It is utilized in visual acknowledgment and elucidation in [42].
(6) HEM. In the proposed architecture, a hybrid ensemble method Figure 7 is used for attack detection at FN [25]. Tis model is constructed into three stages: data preprocessing, hybrid ensemble mechanism, and data gathered from IoT end devices. In the second stage, the hybrid ensemble mechanism is implemented by considering k-fold crossvalidation where k � 10, which needs to be trained on fve diferent ML algorithms, namely, logistic regression (LR), decision tree (DT), XGBoost, K-Nearest neighbour (KNN), and Gaussian naive bayes (NB). Te considered data set is partitioned into k parts, of which k th portion is served as testing set, and the left over k−1 part is served for training. On this k−1 and k th part, the above fve algorithms are executed collaterally, which obtains fve diferent prediction results denoted as R1, R2, R3, R4, and R5 that are used for fnal classing on the voting classifer. In the third stage, the data from IoT end devices is collected at FN as test data, which was tested for classing the attack behaviour.

Dataset Description.
Te DLMs are assessed on old and novel datasets to distinguish the various attacks and characterize the end client conduct (benign/assailant). Te datasets used in this framework for training and testing are discussed as follows: (     Computational Intelligence and Neuroscience with 40073, 59391, 55124, 121181, 55818, 183554, 35377, 22192, and 53073 records. Te main advantages of the IoTID20 dataset; it imitates a cutting edge pattern of IoT network correspondence; it is among the couple of openly accessible IoT intrusion detection dataset.

Deploying Deep Learning Models and Confguring the Network.
We select the DL model after completing the training of the above models on the considered datasets, resulting in high accuracy in prognosticating the behaviour of IoT end devices as normal or malicious. Te maximum accuracy attained after training and testing each model is used to make the model selection. Now that the chosen model has been deployed, the entire architecture is prepared for real-time processing where the FN and IoT end devices communicate with one another on the CMFN in the fog layer of fog nodes. Te fnest algorithm for choosing DL models is Algorithm 1. Te network confguration and DL model installation at the fog layer is shown in Algorithm 2.

Theorem 1. Te IoT device iot i s total result time (TRT) is denoted by TRT ioti .
Proof. Consider an IoT device iot i nearer to fog node FN i which sends REQ to FN i . Te time to send REQ is formulated as follows: where T iot i −BSS is the request time from iot i to BSS, T BSS−GW is the request time from BSS to GW, and T GW−FN i is the request from GW to FN i . Te execution time to process the request by FN i is represented as T execution FN i as follows: where T queue FN i is the time spent in waiting queue and T compute FN i is the computation time for processing the request to obtain the outcome. Ten, outcome is passed to the iot i with a time of T FN i −iot i : where T FN i −GW , T GW−BSS , and T BSS−iot i are all the time to send outcome to iot i . Terefore, the TRT is calculated as follows:

Theorem 2. Aiot i IoT device's communication behaviour detection time (CBDT) is expressed as CBDT iot i .
Proof. Allow l number of CB for l number of IoT gadgets in the FN queue. Consequently, the CBDT iot i of an IoT device iot i of a fog node FN i is computed as follows: where T queue cb i is the time spent in the FN i waiting queue of the IoT device communication behaviour and T prognostication is the time needed to detect the IoT device communication behaviour by FN i . □ □ Te time complexity to test (18) depends on its execution, i.e., the model we selected for deploying on FN i . It is clearly observed from Section 6.2 that the fnest DLM obtained for classifying the behaviour of IoT end devices is LSTM model which is deployed on FN i . Te complexity of LSTM model with multiple LSTM layers always depends on its implementation. Generally, any model with neural networks is tested by means of an onward pass. To obtain the complexity for any LSTM network with layers we need to consider the LSTM units which are connected in a recurrent manner.
Equations (4)-(9) which represent the onward pass of LSTM layer generate its time complexity on n t as O(n(d + n + 2)) where n and d are dimensions. Te computation of e t , x t , and y t are same as n t and, thus, the complexity will be O(4n(d + n + 2)). Considering the cell vector s t and the hidden state a t time complexity of each is O(2n). Hence the total time complexity for single LSTM layer onward pass is O(4n(d + n + 3)). According to the (18) the complexity of CBDT iot i also depends on T queue cb i for the cb i insertion into the queue and is (1). Hence, the total complexity for CBDT iot i in a single onward pass is only O(4n(d + n + 3)).

Behaviour Update at Cloud.
Te cloud node CN updates the IoT device information table with the updated behaviours after receiving the behaviour of IoT end devices from the FN. Te device information table is updated after receiving the responses from FN i . Te behaviour update at cloud node CN is shown by Algorithm 4. Here, the storage operations which are external to the main memory depend on the table structure maintained, the type of indexing that is being supported, the number of disc accesses that are done, the complexity of the query, etc.

Network Update at FN.
Here, the cloud CN transmits the smart gadgets TL cb i to the FNs via transmission links GW CN , GW CN to BSS, BSS to GW i , and GW i to FN i to update the local tables at FN closest to BSS. Further if transmission occurs among neighboring FNs, it is done only when the behaviour is verifed using the local database. If it is discovered to be an assaulter, it terminates additional interaction with the network's adjacent nodes. Te network refresh at FN is shown by Algorithm 5.

Theorem 3. Te time to refurbish/update the attacker/assaulter behaviour at FN (TTR) is the total amount of time required by the cloudCNto update IoT end device behaviour at FN.
Proof. Consider at time T, h attacker devices prognosticated behaviours are denoted as TL cb 1 , TL cb 2 , . . . , TL cb h for l IoT devices. Tese prognosticated behaviours are sent as message Msg to the FNs. To calculate TTR, to send Msg from CN to FN is represented as: where T CN−GW CN is the time required to send the message Msg from CN to GW CN , T GW CN −BSS is the time required to send the message M from GW CN to BSS, T BSS−GW i is the time required to send the message Msg from BSS to i th GW, and T GW i −FN i is the time required to send the message Msg from i th GW to i th FN.

Performance Evaluation
To test how well the profered framework works, Python 3 is used as a software requirement, and the core i7-11370 CPU, 3.30 GHz clock speed, and 16 GB RAM are used as hardware requirements. Te framework is implemented with various DLM models on four datasets, resulting in diferent accuracies. Te accuracy (Accr), precision (P), recall (R), and F1-Score (F1 S) of DLM models are calculated using confusion matrix parameters, where true positive is (T P), true negative is (T N), false positive is (F P), and false negative is (F N): (1) Accuracy (Accr): Accuracy is characterized by the number of correct predictions obtained from the observed values. Te notation is as follows: (2) F1-Score: Te harmonic mean of recall and precision is used to reckon the F1_S in order to provide more accurate results. Below is a representation of an F1_S: (2) T r , T s ←train test split(DTT)); (3) forDLM 1 to DLM n do (4) Train(T r ); Test(T s ); (6) Accr m ←Cal Accr(); ⊳ cal: calculate (7) end for (8) FINEST DLM Chosen←Maximum(Accr 1 , Accr 2 , . . . , Accr m ); ALGORITHM 1: Method of choosing the fnest DL model for the fog tier. Chosen, FN 1 , FN 2

Input:FINEST DLM
(3) Precision: Precision is a model's consistency in categorizing the model as positive and is denoted as follows: (4) Recall: Recall is the ability how well a model can identify positive samples and below is the representation:  (12) ifFN i � � P then ⊳ FN j checks its local table. (13) there is no interaction between the parties; (14) else (15) communicate; (16) end if (17) end if ALGORITHM 5: Network update at FN. implement and evaluate the hybrid ensemble model. Te matplotlib is used obtain graphs on accuracy and loss performance.
Using the DDOS SDN dataset, NSLKDD dataset, UNSW-NB15 dataset, and IoTID20 dataset, we trained and evaluated the abovementioned models for binary classifcation (normal or attacker). Diferent attacks are involved with considered datasets [16,19,42] and are utilized to recognize the ability of DNMLP, LSTM, Bi-LSTM, GRU, CNN + LSTM DL, and HEM models for attack identifcation. In our work to prognosticate the attacks, some features from the considered datasets are discarded on the basis of high correlation among the traits, or those features that don't afect the prognostication. On the expulsion of these attributes, the computation burden is lowered, and thus the framework is built with vital information. Utilizing a standardization strategy, the dataset is scaled on diferent traits for the fuctuating sizes of values and divided into two proportions of 80 : 20 as train and test data. Te point of apportioning the dataset in the proportion of 80 : 20 is to prepare the model with sufcient data and to corroborate the model with suitable data. To procure the most accurate trained model in the proposed framework using the LSTMDL model, we considered a mini batch of 32 with 100 epochs on the Adam optimizer using the learning rate (LR) of 0.001 and considered beta values as arguments for the frst-and second-moment exponential decay rate estimates as 0.9 and 0.999, which prevents an adverse efect on optimization for binary classifcation. Te callback function on early stopping is called on Tensor-Flow, which keeps track of fow to decide the termination condition on validation loss. For the datasets under investigation, NN is built using Keras on TensorFlow using the aforementioned models. In this work, we constructed a model for DNMLP as shown in Figure 8 on new IoTID20 dataset and also a model is built using LSTM is shown in Figure 9, in a similar way the model is built on Bi-LSTM and GRU. On the same dataset, the model is also built on CNN + LSTM as shown in Figure 10. We used ReLU as the activation function in the dense layers of DL models and sigmoid activation function in the output layer as we performed binary classifcation. Using a stacking approach with a voting classifer, the model is built on HEM using the same IoTID20 dataset as discussed in Section 5.2.3. As described above for constructed models on the IoTID20 dataset, in the same way, constructed models were created on the remaining datasets after performing one-hot encoding. Te sequential model is used to create NN with Keras, and it accepts the result of each layer as a contribution to the subsequent layer that uses the add-on model. Te dense from the Keras package was used to determine the completely associated layer.
In the implementation of HEM using the stacking approach as discussed in Section 5.2.3, after preprocessing, in stage 1, fve algorithms such as LR, DT, XGBoost, KNN, and NB are imported from sklearn machine learning library. In the second stage, we used a voting classifer by importing the package using "sklearn.ensemble import VotingClassifer" for fnal classing.

Results and Discussion.
Te accuracy of the DL models for binary classing on four datasets is evaluated. Te experiment evaluation revealed that a LR of 0.001, a minibatch of 32, and 100 epochs produced the best performance accuracy. Te best performance accuracy over all the datasets is obtained with the LSTMDL model as shown in Table 2 and the model accuracy, model loss, model recall, and model precision graphs of the IoTID20 dataset are shown in Figures 11-14. As IoTID20 is a novel dataset on which only ML models are implemented in previous studies [43], in Section 6 we focused on DLM models on the IoTID20 dataset, whose upshots are depicted in graphs. Te execution measures for each model on the datasets taken into consideration for binary classifcation are shown in Table 2. For the IoTID20 dataset, LSTM achieves better accuracy (99.88%) with precision (99.77%), recall (98.4%), and F1_S (99.08%). Te discharge of HEM is comparable to that of LSTM and it outperforms the Bi-LSTM, GRU, CNN + LSTM, and MLP models. With UNSW-NB15 dataset, LSTM achieves better performance (94.11%) with precision (95.87%), recall (94.47%), and F1_S (95.16%). Bi-LSTM performs like LSTM and it outperforms the GRU, CNN + LSTM, HEM, and MLP model. With NSLKDD dataset, LSTM achieves better accuracy (99.12%) with precision (99.22%), recall (99.08%), and F1_S (99.15%). MLP performs like LSTM and it outperforms the Bi-LSTM, GRU, CNN + LSTM, and HEM models. With DDOSSDN dataset, LSTM achieves better accuracy (99.7%) with precision (99.6%), recall (99.64%), and F1_S (99.62%). Te discharge of GRU and Bi-LSTM performs like LSTM and they outperform the CNN + LSTM, HEM, and MLP models. For binary classifcation, excluding DDOSSDN dataset, GRU did not perform well. In implementing GRU, we used dropout mechanism at every stage of constructing model. So, we discarded dropout mechanism and implemented L2 regularization in GRU for better performance. On contrast with all the models on the considered datasets, the false-positive rate (FPR/FAR/1-specifcity) on LSTM may not outperformed at its best, but on overall comparison LSTM proved to be performed well with FPR.
Te ROC-AUC score of LSTMDLM on all four considered datasets are shown in Figures 15-18. ROC curves are attained by marking out T_P rate (TPR/Recall) versus FPR. AUC summarizes the ROC curve and takes the value between 0 and 1 where one indicates the classifer's exactness in prediction and zero, otherwise. It is evident from above graphs in Figures 15-18 that LSTMDLM exhibited higher AUC score which indicates the ability to classify positives and negatives exactly. Remaining algorithms on all four datasets also showed AUC score between 0.98 and 1.
Te study of execution measures amid the DLMs and HEM on the considered datasets is depicted in Figures 19-22. In terms of accuracy, LSTM performed better than the considered DLMs and HEM, as shown in Table 2 with bold values. Te accuracies of all DLMs and HEM with binary classing are shown in Figure 23. Hence the LSTM model is prognosticated to be greater in rank compared to all others.  Utilizing datasets, we trained and evaluated DLMs and HEM model for binary classifcation and found LSTMDLM showed preferred exactness over any remaining models in anticipating the way of behaving of the IoT end devices as normal or attack. On using balanced dataset, accuracy is only considered to be an essential measure in assessing a model. But in this work, excluding NSLKDD, the other remaining datasets are imbalanced, hence there is possibility of having more F P and F N. In these circumstances, it is smarter to pay attention to the other execution measures like precision, recall, and F1 S. Recall, in all actuality, does just think about F N and T P and subsequently, recall might be high. Precision really does just consider F P and T P, it might endure with low worth. Te  Table 2.

Computational Intelligence and Neuroscience
Te simulation condition is confgured as a three level framework with one cloud server coupled to numerous fog nodes to investigate the expandability problem. We make the assumption that the closest fog nodes are connected to 10-100 IoT devices (smart gadgets) in the fnal layer. For instance, if there is 1 fog node and 10 smart gadgets, then 10 smart gadgets can connect directly to the fog node. If there is more than 1 fog node, then the number of fog nodes splits the number of smart gadgets equally to provide the required service. Terefore, one fog node will provide aid to 5 smart gadgets if there are 2 fog nodes. In this case, we'll assume that a smart gadget links to the fog node and produces one sample (row). Te fog node then processes this sample to forecast behaviour (attack/assault or normal/benign). Te average CBDT from the aforementioned examination using DNMLP is discovered to be 0.0000672 seconds for 10 smart gadgets, 0.0024924 seconds for 10 smart gadgets for LSTM, 0.004164 seconds for 10 smart gadgets for Bi-LSTM, 0.0021 seconds for 10 smart gadgets for GRU, 0.000476 seconds for 10 smart gadgets for CNN + LSTM, and 0.008 seconds for HEM, respectively. In this experiment, we looked at how the number of smart gadgets compared to the number of fog nodes might afect the time it takes to identify behaviour. Behaviour detection time (BDT) is the period of time during which fog nodes using DNMLP, LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM can determine whether a certain number of smart gadgets are benign or an assaulter. Te variables and values for the NW simulation are displayed in Table 3.
From Figures 24-28, it is apparent that as the number of Internet of Tings grows in the NW, so does the time it takes for all IoTdevices to detect their behaviour. Te result is depicted in Figure 24 when there are 1 fog node in the NW and 10-100 IoT devices. According to this graph, DNMLP has a faster time to detect behaviour than LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM. Te DNMLP, LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM average CBDT for 100 IoT devices were determined to be 0.003695 sec, 0.22902 sec, 0.1155 sec, 0.02618 sec, and 0.44 sec, respectively. Te result is shown in Figure 25 when there are 3 fog nodes in the NW and 100 smart gadgets overall. According to this graph, DNMLP has a faster time to detect CB than LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM. Te average CBDT for 100 IoTdevices is determined to be 0.001232 seconds for DNMLP, 0.045694 seconds for LSTM, 0.0385 seconds for CNN + LSTM, and 0.146666 seconds for HEM. Te outcome is shown in Figure 26 when there are 5 fog nodes in the NW and 100 smart gadgets in total. According to this graph, DNMLP has a faster time to detect CB than LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM. Te average CBDT of 100 IoT devices is determined to be 0.000739 sec, 0.027416 sec, 0.045804 sec, 0.0231 sec, 0.005236 sec, and 0.088 sec for DNMLP, LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM, respectively. Te result is shown in Figure 27 when there are 7 fog nodes in the NW and 100 smart devices in total. According to this graph, DNMLP has a faster time to detect behaviour than LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM. Te average CBDT of 100 smart gadgets is determined to be 0.000527 sec, 0.019583 sec, 0.032717 sec, 0.016499 sec, 0.00374 sec, and 0.062857 sec for DNMLP, LSTM, Bi-LSTM,  16 Computational Intelligence and Neuroscience GRU, CNN + LSTM, and HEM, respectively. Te result is shown in Figure 28 when there are 9 fog nodes in the NW and 100 smart devices total. According to this graph, DNMLP has a faster time to detect CB than LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM. Te average CBDT of 100 smart gadgets is determined to be 0.00041 sec, 0.015231 sec,               From the aforementioned fndings, it is also shown that communication behaviour detection time is reduced as the number of fog nodes in the network increases.

Conclusion
Tis study proposes a DL model-based assault prognostication system for fog-based Internet of Tings environment. Te network consists of a smart sensing tier, a secure fog tier, and a cloud tier. Following this, a variety of deep learning (DL) models, including DNMLP, LSTM, Bi-LSTM, GRU, CNN + LSTM, and HEM, are assessed to prognosticate the most accurate model with high exactness for installation at the fog nodes. Te DDOS SDN, NSL-KDD, UNSW-NB15, and IoTID20 datasets indicate 99.70%, 99.12%, 94.11%, and 99.88% accuracy using the LSTMDL model, respectively. As a result, the fog tier in the NW is installed using LSTM, with every fog node being equipped with the LSTMDL model. Te deployed model performs binary classing as two classes, 1 and 0, as the assailant or benign separately and sends the device CB to the cloud for refurbishing. Te cloud then sends the misbehaviour data to the fog nodes, each of which is aware of the local attack situation in the fog layer. Te individual fog nodes decide whether to communicate with these attacking devices in the future by evaluating their current behaviour. In a fog-based Internet of Tings condition, the profered model will be a fner stratagem against the attacks from securing the fog layer, which conquers the stratagem of deploying the DLMs in the sensing layer. Te results of the proposed framework prove that the considered DLMs can be acquired for cybersecurity to identify cyberattacks that prevailed in distinct datasets. Additionally, network simulation is used to demonstrate how well various DL models portray the CBDT in the fog layer. Te LSTMDL model outperforms DNMLP in terms of accurately forecasting the attacks, although it takes longer to identify the activity (CBDT) than other models, according to this study. Additionally, it has been discovered that the CBDT decreases as the fog nodes in the NW grows. We will implement a similar strategy for attack forecasting in the future using multiclass classing. Similar to how specifc attacks can be discovered by using more recent datasets, new multilayer deep neural network models such as AlexNet, ResNet, VGGNet, DenseNet, and Shufenet, can be created by prepping the fog nodes with a larger dataset.

Data Availability
Data are available upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest.