Intrusion Detection System for IoT Based on Deep Learning and Modified Reptile Search Algorithm

This study proposes a novel framework to improve intrusion detection system (IDS) performance based on the data collected from the Internet of things (IoT) environments. The developed framework relies on deep learning and metaheuristic (MH) optimization algorithms to perform feature extraction and selection. A simple yet effective convolutional neural network (CNN) is implemented as the core feature extractor of the framework to learn better and more relevant representations of the input data in a lower-dimensional space. A new feature selection mechanism is proposed based on a recently developed MH method, called Reptile Search Algorithm (RSA), which is inspired by the hunting behaviors of the crocodiles. The RSA boosts the IDS system performance by selecting only the most important features (an optimal subset of features) from the extracted features using the CNN model. Several datasets, including KDDCup-99, NSL-KDD, CICIDS-2017, and BoT-IoT, were used to assess the IDS system performance. The proposed framework achieved competitive performance in classification metrics compared to other well-known optimization methods applied for feature selection problems.


Introduction
Te emerging technology of the Internet of Tings (IoT) is constantly evolving and being exploited in the last couple of years, enabling communications and interactions among several devices via a network; thus, it is propelling new technology of business process [1]. Subsequently, several challenges in many aspects, such as fnancially, in proving credibility, in the enforcement, and in business operations, have come to the fore resulting from the exponential growth of cybersecurity attacks [2]. Cloud computing is normally used as an IoT data storage, which is formulated as a model that supplies various resources and services to the customer on-demand. Typically, cloud computing minimizes the human intervention between users and providers [3]. Due to its impressive features, it has received serious attention from organizations and users. However, to transit from the current platform to the cloud computing platform, several struggling issues can be faced related to the operation mechanism and security. Te vulnerability of cloud computing is related to the valuable data stored remotely on servers. Tis security threat makes it a target for many cybercriminals and intruders; therefore, it hinders many people from favoring or transiting to the cloud computing platform. Tere are several reasons why the recent cyberattacks are substantially growing. One of the main reasons is related to the existence and accessible hacking tools that can be easy to use, which allow the naive hackers to quickly attack the cloud storage without brilliant skills or specifc knowledge [4][5][6][7][8].
In the last decades, a considerable inattention from a wide range of research communities has been paid to address diferent issues in cyberattacks domain such as intrusion detection systems (IDSs) [9]. Furthermore, various machine learning (ML) algorithms were utilized to address the cyberattack issues such as the implementation of the decision tree algorithm (DT) in [10,11], support vector machine (SVM) models in [12,13], k-means [14,15], k-nearest neighbor (kNN) [16,17], and many other machine learning algorithms [18][19][20]. Quite recently, many deep neural network solutions have been applied to the IDS in fog, clouds, and other IoT-based systems. Notably, the convolutional neural network (CNN) model [21] and the deep recurrent neural network (RNN) model in [22], as well as the restricted Boltzmann machines (RBMs) in [23], multilayered perceptron neural network [24], and many others [25].
Te IDS is modeled as a feature selection problem and has been successfully addressed by various traditional classifers. As the revolution of metaheuristic (MH) optimization algorithms, they used to tackle a wide range of complex optimization problems. MH is essentially utilized for IDSs such as the particle swarm optimization (PSO) algorithm [26], crow search algorithm (CSA) [27], genetic algorithm [28][29][30], random harmony search algorithm [31], and grey wolf optimizer (GWO) algorithm [32,33].
In this article, we propose a novel powerful IDS model utilizing advanced versions of deep learning (DL) and metaheuristic optimization algorithms. Te features initially were extracted efciently and simply by implementing a convolutional neural network (CNN) model. Tere are many consecutive convolution blocks designed to extract the informative features. Te CNN was only used in the feature extraction phase, which allows the extraction of meaningful features that can represent the raw data in a lower-dimensional space. In addition, CNNs are well known for their ability to learn complex features with less complex architectures and fast training processes. Following the blocks in CNN, the fully connected layer is built to extract relative features and detect malicious or intruder activities. Tereafter, a new and efcient version of the reptile search algorithm (RSA) [34] is proposed as a feature selection tool to improve the classifcation results of IDS. Te RSA is used since it is a very recent but efcient algorithm due to several impressive features, such as it has few parameters to be initiated. In the initial search, the derivative information is not mandatory. It is simple and easy to use. It is scalable and admissible. Finally, it is sound and complete. Terefore, it has been tested against several benchmark functions and engineering problems [34]. Te RSA also helps in improving the neuro-fuzzy inference system for predicting the swelling potentiality for fne-grained soils [35]. Although the RSA has several advantages, as with other MH algorithms, its performance can be afected by the problem size and complexity. Accordingly, the RSA sufers from premature convergence due to the lack of balance between the exploration and exploitation capabilities during the search. Terefore, the problem-specifc knowledge embedded with the search space shall be considered, and a suitable adjustment to the RSA optimization structure shall be adopted.
Te designed model proposed in this study was initiated by preparing the IoT dataset for feature extraction. Te feature extractor model is a CNN model that is trained over the preprocessed dataset. Te outputs of the CNN model, which are the extracted features, are fltered, and the most relevant features are selected by the RSA. To evaluate the proposed model, four public datasets, KDDCup-99, NSL-KDD, Industrial IoT (IIoT) trafc data (BoT-IoT), and CICIDS-2017 were used. Furthermore, the results of the proposed RSA-based model are evaluated against the other seven well-established algorithms. Te comparative results demonstrate the viability of our developed model, which shows signifcant performance for all datasets.
Our main objective of this study is to propose a novel and efcient IDS model that utilizes the impressive features of efcient deep learning and MH algorithms. To achieve these objectives, several contributions are presented in this article as follows. Design a CNN model as a feature extractor with the goal of extracting the feature from the mentioned IoT datasets. Propose an adapted version of the RSA as a feature selection technique for selecting the most relevant and informative features. Assess the model by comparing its yielded results against seven state-of-art models over fve well-known public datasets.
Te remaining parts of this article are organized as follows: Section 2 reviews the related research works on IDS models. Ten in Section 3, we elaborate on the basics and fundamentals for RSA. Te proposed IoT security model is presented in Section 4. Te results and discussion is given in Section 5. Finally, in Section 6, the conclusion of the article is stated, and the possible future works are recommended.

Related Works
Te related works of some previous IDS utilizing metaheuristic algorithms are summarized. Te deep learning model and swarm intelligence approaches are combined by Saljoughi et al. [36] to address the IDS scheme for cloud computing. Te authors used multilayer perceptron (MLP) neural networks as a feature extractor and the particle swarm 2 Computational Intelligence and Neuroscience optimization (PSO) as a feature selection method. Two datasets are used for evaluation purposes: KDD-CUP and NSL-KDD. Teir proposed method yielded a signifcant performance in detecting intruders and cyberattacks through experimental validation. Also, in [37], the denial-of-service (DOS) attack detection in cloud computing is tackled using an enhanced version of the artifcial bee colony metaheuristic, which is utilized for boosting the classifer's performance [37]. Teir developed system can achieve prediction results with a 72.4% average detection rate when compared to QPSO. In [38], Dash suggests two IDS methods based on the artifcial neural networks algorithm for intrusion detection and metaheuristic algorithms. Te frst method suggests utilizing the gravitational search (GS) algorithm during the second combined GS with PSO. Te two methods (GS and GS-PSO) are used as a trainer for the ANN. Teir performance is validated using comparative evaluation against several wellestablished algorithms such as gradient descent, PSO, and GA.
Te literature indicates the signifcant use of various metaheuristics in line with machine learning classifers for security protection applications, where the metaheuristic algorithms will be utilized as feature selection optimizers and the classifers as improper action detectors. For instance, the authors of [39] reported signifcant outcomes for KDD-CUP 99 datasets, where an intrusion detection system is composed of a genetic algorithm and fuzzy support vector machine (SVM). Similarly, Nazir and Khan built a Tabu Search Random Forest (TS-RF), which is a strong intrusion detection system (IDS) in [40], such that the TS algorithm was integrated with the RF classifer. Te performance of the system was tested using the UNSW-NB15 dataset, where the results revealed an improvement in the classifcation accuracy compared to several other methods.
In addition, an improved intrusion detection system was proposed by Mayuranathan et al. in [31], where the feature selection mechanism was optimized by applying the random harmony search algorithm (RHS) and the distributed DoS (DDoS) detection was performed by implementing the restricted Boltzmann machine classifer. Te system was tested utilizing the KDD'99 datasets, and the results denoted a considerable detection performance.
On the other hand, other authors utilize neural network classifers in their systems. In particular, an intrusion detection system for the Internet of medical things applications [33] was built by integrating the hybrid of principal component analysis (PCA), grey wolf optimizer (GWO) algorithm, and deep neural network (DNN). Te PCA-GWO was used to optimize the performance of the classier (DNN). Te feature selection optimization was refected in the results and indicated a respective classifcation accuracy.
Furthermore, a new denial-of-service (DoS) detection system was proposed in [27] by SaiSindhuTeja and Shyam. Te system optimizes the feature selection mechanism with the use of the modifed crow search algorithm (CSA), such that for optimization performance enhancement, integration between the crow search algorithm (CSA) and the opposition-based learning (OBL) is implemented.
Consequently, the second component of the system is the classifer, where the recurrent neural network (RNN) will be utilized for this task. Te strength of the system gave it the ability to compete and outperform other detection systems.

Background
Tis section provides the two main aspects of the RSA as follows: the inspirations of the RSA are illustrated in Section 3.2, while the detailed descriptions of the procedural steps of RSA are shown in Section 3.3. In addition, this section presents a brief introduction to CNN-based models and their applications in the following section.

Convolutional Neural Network.
Nowadays, AI-based algorithms such as CNNs have been widely exploited in felds such as computer vision. For instance, CNNs were extensively used to identify the COVID-19 and quickly diagnose image data. Tis section will briefy cover the recent advances and existing literature on using CNNs in diferent applications. Depending on the CNN architecture and building blocks, the CNN models can be applied to various data, including time-series data, textual data, images, and videos [41]. Tus, the main crucial component of such a model is the convolution operation applied to the input data. Te convolution operation extracts features from the input data using several convolutional flters with the same or diferent flter sizes. In addition, the convolution operation relies on the local correlation of the information, which can help extract more complex features and learn more meaningful feature representations. Te CNNs can sufer from variations in the data, such as image data (translation, rotation, and scaling). Tus, the CNNs use a pooling operation to sample the feature map extracted from the previous layer. Depending on the task, fully connected (FC) layers can be placed after a convolution block (convolution and pooling) or at the end of the network to classify or detect the input data.
Several CNN architectures have been proposed based on several criterions such as the network depth or width, the type of the convolution operation, the number of convolutional flters and their corresponding size, the pooling operation and its size, the number of fully connected layers, and the deployment environment of the model. Many CNNbased models have been proposed including MobileNet, ResNet, NASNet, EfcientNet, MnasNet, and AlexNet [42][43][44][45][46]. For instance, MobileNet has three versions where MobileNetV3 implements the inverted residual block inherited from EfcientNet and ResNet [47]. Te Mobile-NetV3 uses diferent types of convolution layers named the depthwise separable convolution, which was proposed to replace the standard convolution operation and lower the computation cost, facilitating the model deployment in embedded and edge systems. In addition, the proposed MobileNetV3 consists of a novel building block named Squeeze-And-Excite block [44]. Te depthwise separable convolution uses the inverted residual connection to reduce the number of training parameters and improve the learned Computational Intelligence and Neuroscience representations from the input data. Te architectures mentioned above have been employed in a variety of tasks related to computer vision, such as image recognition, classifcation, image segmentation, face detection, and video classifcation [48]. Te CNNs have shown a great ability in extracting features automatically, even when using simple networks. Tus, in our study, we propose a simple yet effective CNN architecture and adapt it to the network intrusion detection task.

Inspiration of RSA.
Te RSA is a recently developed metaheuristic algorithm by Abualigah et al. in 2021 [34]. Te RSA mimics the hunting behavior of crocodiles in their natural habitat. In general, the crocodiles are belonged to the family of "Crocodylinae," while they prefer to live in an environment where water and food are available. Tey are from the amphibians capable of hunting in the water, as well as out of the water. Te living behaviors of crocodiles are illustrated as follows: (i) Vision: Crocodiles have a penetrating night vision that many other animals lack. Tey use the disadvantages of most other animals of poor night vision for hunting at night. (ii) Eating: Crocodiles are predators residing at the top of the food chain, as they are fed from the environment surrounding their habitat such as fshes and deer, cows, zebras, baby elephants, and small crocodiles. In addition, large crocodiles are not afraid to add other predators to their food sources, such as sharks and cats. It also has the ability to live for long periods without food if the surrounding environment lacks any food source. It was reported from the sources that some of them can feed on fruits. (iii) Locomotion: Crocodiles have the ability to swim, walk, and run. In swimming, they use the tail for steering, and the legs are ignored. In walking, they use their legs to carry their bodies and facilitate their movement, and the tail is used for balancing and steering. Finally, crocodiles can run short distances out of the water to attack prey, and thus, the energy is transmitted from the tail to the body to move forward at high speed. (iv) Cognition: Crocodiles have the ability to recognize the patterns of prey; for example, they have the ability to know which animals come to water in order to drink frequently. (v) Hunting: Crocodiles are set ambushes inside the water to catch animals that come to drink from the water's edge or that dive in the water. At the right moment, crocodiles stealthily attack their prey from the water.
Once the crocodile catches its prey, it drags it into the water and drowns it. Finally, the crocodile cuts its prey into large pieces and devours it completely. Frequently, crocodiles fght each other in order to share prey. (vi) Cooperation: Crocodiles are animals that prefer to live in groups. Tis pattern helps crocodiles cooperate in order to prepare for ambushes of predation. Everyone in the group has a role in helping accomplish the task of predation. For example, a crocodile attacks the animal that drinks from the riverbank in order to push him towards the water and then the crocodiles hiding in the water attack the prey.

Procedural
Steps of RSA. Figure 1 illustrates the procedure steps of the RSA, while a detailed description of these steps is shown.

Phase 1: Initialization of RSA's Parameters.
Te control parameters and the algorithmic parameters should be initialized before executing the RSA. Te list of control parameters includes (N), which represents the number of crocodiles, and (T) as the maximum number of iterations. Furthermore, two algorithmic parameters are used in RSA, such as α and β. Tese two algorithmic parameters are used to control exploitation and exploration abilities, respectively, in order to reach the right balance between the two abilities during the search process.

Phase 2: RSA Population Initialization.
During this phase, we randomly generate a set of initial solutions using the following equation [34]: where X i,j represents the decision variable of the i th solution at the j th position. Te upper and the lower bounds of the decision variable at the j th position are X max j and j X min . rand is a randomly generated value between 0 and 1, while d indicates the total number of decision variables at each solution. Te set of solutions, as many as N, are generated and stored in X as follows [34]: where each row

Phase 4: Encircling Phase.
Tis is the exploration behavior of crocodiles in the RSA. Tis phase is introduced to fnd a better solution by exploring new regions in the search space of the problem following two strategies, namely, the high walking and belly walking, as shown in (3). Te high walking strategy is controlled by t ≤ T/4, while the belly walking strategy is controlled by T/4 < t ≤ 2T/4 [34]: 4 Computational Intelligence and Neuroscience where X i,j represents the decision variable of the i th solution at the j th position. X Best j (t) is the j th position in the best solution obtained at t iteration. t + 1 is the new iteration, and while the previous iteration is t. Te hunting operator of the j th position in the i th solution is denoted as η i,j (t), which is calculated using (4). Te parameter β controls the exploration capability of the high walking strategy. Te value of β is set to 0.1 according to [34]. rn d is a randomly generated value ranging between zero and one. X r1,j (t) is the decision variable at the j th position in the r1 th solution, where r1 ∈ [1, N]. η i,j (t), P i,j , and Avg(X i ) are calculated, respectively, as follows: where P i,j is the percentage diference between the decision variable at the j th position of the best solution X Best and the decision variable at same position of the current solution X i . α is set to 0.1 according to [34], which is also used to control the exploration ability of the RSA during the hunting cooperation. ϵ is a random value between 0 and 2. Avg(X i ) is the average value of all decision variables of the current solution X i . R i,j (t) is a factor used to reduce the search area of the j th position in the i th solution and ES(t) is the evolutionary sense probability and assigns a randomly decreasing value from 2 to -2 [34], which are calculated, respectively, as follows: where in the equation, r2 is a randomly generated value ranging between 1 and N, which refers to the index of one solution in the population that is randomly chosen. r3 is a random integer value between 0, or 1, or -1.

Phase 5: Hunting Phase.
Tis is the exploitation behavior of crocodiles in the RSA. Tis phase is designed in the RSA to exploit the current research regions in order to fnd the optimal solutions according to two strategies: hunting coordination and hunting cooperation, as shown in (9). Te hunting coordination strategy is controlled by t ≤ 3T/4, while the hunting cooperation is controlled by [34].
Initialization of the initial population

Fitness evaluation
Return the best solution End t = 1 Computational Intelligence and Neuroscience 5 3.3.6. Phase 6: Stop Criterion. Repeat from Step 3 to Step 5 until we reach the maximum number of iterations T.

Proposed Model
With this part, the phases of the proposed IoT security are based on extracting the feature from the data using CNN and then selecting the relevant feature using a modifed RSA. In general, the IoT security model consists of four stages, as given in Figure 2 and the description of each phase is given as follows.

First
Using the min − max approach to normalized TS, DN ij is formulated as [34] where tf ij indicates the value of feature j at the sample i.
where TS i stands for the features of i th trafc, and they are represented as [tf 11 , tf 12 m, . . . , tf 1d ] of i. n is the number of samples, and d stands for the number of features.

Second Phase: CNN for Feature Extraction.
Te CNN is a widely used automatic feature extractor in various applications [49,50] such as image classifcation, text classifcation, speech recognition, and others. In our study, we implemented a CNN model using the following architecture: Te core building blocks are convolution layer (Conv), ReLU activation function, fully connected layer (FC), and pooling layer (Pool). Te CNN learns complex representations as features from the network trafc samples and classifes them based on their intrusion type. Using a convolution operation, the CNN extracts local and position-invariant patterns while sharing the weights across the layers and channels [51]. In our case, the design of the CNN network was based on the error, and trial method, where the objective is to build a simple yet powerful model that maximizes the classifcation Generate initial population X using Eq. (15) Initialize all the parameters of RSA Return X b Figure 2: Steps of the presented IoT security method. 6 Computational Intelligence and Neuroscience accuracy on the tackled task. In addition, the best-trained model based on its performance on the test data is used to extract the learned features for the feature selection stage. Te proposed CNN is illustrated in Figure 3.
In the implemented CNN architecture, the Conv block is followed by a rectifed linear unit (ReLU) [52] defned in (13) to prevent the negative/small values from being propagated, while the pooling operator is used for reducing the dimensionality of the activation map ReLU(x) of the inputted data x: To reduce the model complexity and prevent overftting, dropout layers are used with a regularization rate equal to 0.5 to drop some neurons during training randomly. Furthermore, the Conv1 layer [53] consists of a 1 × 3 kernel size with 64 flters and a 1 × 1 stride. Te 1D convolution operation applied on the input data x l− 1 of the previous layer is defned in the following equation: Te output is defned as x l where W l and b l represent the weight matrix and the bias corresponding to the l-th layer, respectively. Meanwhile, two types of pooling were used, which are max-pooling and adaptive average pooling [54] with size 2 × 2.
As Figure 3 shows, the extract feature maps after the last pooling operation are inputted to a sequence of FC layers. Te layers FC1, FC2, and FC3 were employed for feature extraction, whereas FC4 was used for the classifcation task. Te FC4 used the Softmax function to output the probability of classifying a trafc sample to a specifc type. As a regularization method, the CNN model uses batch normalization (BN) to normalize the input features fed to the FC4. Te extracted feature vector from FC3 of each sample is of size 1 × 64. Te extracted features are fed into the FS algorithm, which only selects the most relevant features to boost the overall performance of the intrusion detection task.

Tird Phase: Feature Selection.
During this phase, the proposed model selects the relevant features based on their quality. Tus, this process has a signifcant impact on IDS detection in IoT environments.
Te proposed RSA as FS approach (see Figure 4) begins by initializing X population, with a number of agents represented by N. After that, it converts each agent into its binary version. More so, it reduces the number of features excluding those related to zeros from the binary version. Tereafter, the proposed RSA approach assesses the quality of the chosen features by computing the error classifcation according to the KNN classifer. Ten, the best solutions (agents) are updated till reaching the optimal solutions.

Create Population.
Te proposed RSA begins by dividing the given datasets into training and testing subsets, with 80% and 20%, respectively. After that, the following equation is applied to construct the initial values of population X with N agents: where D represents the dimension of each agent, which means the number of features. More so, rand(1, D) refers to a random vector, and LB and UB indicate the limits of the search space.

Updating Population.
In the updating phase, each X i agent is converted into its Boolean version, as in the following equation: Accordingly, feature numbers in the training set can be decreased by eliminating the features that belong to zeros. After that, the ftness value for each X i agent is computed, as follows: where c i refers to the classifcation error, which is computed utilizing the KNN depending on the training sets. More so, λ ∈ [0, 1] represents random weights that are applied for balancing between classifcation error and the ratio of relevant features (|BX i |/D  (17) is used to evaluate the quality of this section process. Te next stage is to obtain X b , which got the best ftness value Fit b . Tereafter, the X b is used for updating the current agents using the operators of the RSA. Computational Intelligence and Neuroscience

Stop Learning Phase.
During this phase, if the terminal criteria are not met, they will be checked. In this case, the updating process will be implemented again. Otherwise, X b is considered as output, and it is applied to reduce the testing set that is used in the next phase.

Fourth Phase: Evaluation Performance.
To evaluate the performance of the developed RSA, the best agent X b is employed for ignoring, from the testing set, those features that correspond to zeros and are considered irrelevant. Ten, compute the accuracy of the classifcation using several evaluation measures. Algorithm 1 presents the full steps of the proposed RSA. Te complexity of the developed method RSA is

Experimental Series and Results
Tis section presents the evaluation experiments of the developed IoT security approach and the evaluation process based on diferent evaluation metrics and real-world datasets and extensive comparisons to diferent methods in terms of features selection techniques.

Evaluation Measures.
Several evaluation indicators are used to assess the quality of the proposed approach and all comparative methods.
We defne those indicators according to the concept of the confusion matrix (see Table 1). Acc k Best ,

Average Accuracy (AV Acc
where N r � 30, which refers to the iteration number(number of runs). ). Tis is also known as a truepositive rate (TPR), and it refers to the percentage of intrusion predicted positively. It is calculated as

Average Recall (AV Sens
Sens k Best , Generate initial population X using Eq. (15) Initialize all the parameters of RSA Return X b Figure 4: Steps of the RSA as an FS model for IoT security.

Performance Improvement Rate (PIR).
It is used to compute the rate of the improvement got by the developed method, and it is defned as where M RSA and M Alg indicate the value of measure (i.e., precision, accuracy, recall, and F1-measure) of RSA and other algorithms, respectively.

Experiments Setup.
Te proposed CNN model in this study was trained for 100 epochs with early stopping using 2024 samples in each training batch. We save the best model during the training, resulting in a good performance on each dataset. Te Adam [55] optimizer was used, where the learning rate is set to 0.005. Te CNN model has been trained on a GPU of type Nvidia GTX 1080 and implemented using Pytorch framework1. Te complexity of the CNN can be measured using the total updated parameters during the training, which is equal to 63,432. Te proposed RSA was evaluated and compared to the following optimization algorithms: multiverse optimization algorithm (MVO) [56], whale optimization algorithm (WOA) [57], moth fame optimization (MFO) [58], grey wolf optimizer (GWO) [59], transient search optimization (TSO) [60], Bat (BAT) algorithm [61], and frefy algorithm (FFA) [62]. Te parameters of each of these algorithms are set according to its implementation. However, the common parameters such as the number of iterations and agents are 50 and 20, respectively.

Dataset Description.
To validate the proposed framework, we used KDDCup-99, NSL-KDD, CICIDS-2017, and BoT-IoT datasets. Tese datasets are the well-known datasets used to assess the IDS techniques, whereas the KDDCup-99 and NSL-KDD datasets share the exact source of data and the same intrusion type labels. Both KDDCup-99 and NSL-KDD were used to compare the proposed framework with other methods. Tables 2-4 list the datasets and the corresponding labels and samples distribution in training and testing sets. Te NSL-KDD dataset was built based on KDDCup-99, representing the refned version without duplicated network trafc samples. During the challenge on intrusion detection held by DARPA (defense advanced research projects agency) in 1998, the KDDCup-99 was created. Te KDDCup-99 data were gathered from MIT Lincon laboratory experiments, where network trafc data were recorded during a period of 10 weeks. Te setup used to experiment was around 1000 UNIX machines and 100 users. Te collected network trafc data were around 5 million records stored in a raw transmission control protocol/Internet protocol (TCP/IP) dump format. Due to the enormous size of the dataset, the data collectors released a minor version representing only 10% of the total connection records consisting of 41 features for each record and the following types of attack: denial-of-service (DoS), probing, remote-to-user (R2L), and user-to-root (U2R). Meanwhile, the Bot-IoT dataset [63] consists of more than 72 million connection records gathered from many IoT devices. Te dataset was collected by the Cyber Range Lab (at the UNSW Institute for Cyber Security) in Australia. We only used 5% of the entire dataset in our experiments, consisting of around 3.5 million records with ten features. Te CICIDS-2017 [64] consists of 79 network fow features from gathered network trafc using the CICFlowMeter tool. Te CICIDS-2017 datasets were collected by the CIC (Canadian Institute for Cybersecurity) to emulate realworld data (PCAPs). In addition, the collected connection records cover a variety of network protocols, including SSH, e-mail, HTTP, and FTP protocols generated by 25 users on machines with varying operating systems.
Input: t max : number of generations, and N: number of agents. Using equation (11) to normalize the IoT data. Apply the CNN-based feature extraction (see Section 4.2). Dividing the dataset into training and testing according to the extracted features. Generate initial X population by applying (15). Set t � 1. While t < � t max do Apply (16) to fnd boolean form for each X i solution. Use (17) to calculate the Fit i ftness value for each X i . Allocating the best X b solution. Updating X i using (3)-(9). t � t + 1. Use the relevant features (corresponding to ones) inside X b for reducing the testing set. Outputs: Return by the X b and the values of evaluation indicators.  Computational Intelligence and Neuroscience     Tables  5 and 6 illustrate the average of each performance measure among the 25 independent number of runs for both binary and multiclass cases. Te analysis of the results in the multiclassifcation case can be noticed in the following points. Te frst point is that the efciency of the developed RSA is better than the competition algorithms' overall performance measures during the learning stage among KDD99, NSL-KDD, and CIC2017. However, the performance of the RSA at BIoT achieved the second rank, following the MFO, which has better results. Te second point that can be noticed is that the ability of RSA to detect the attack type using testing samples is higher than other methods when using the four dataset.
Furthermore, we can notice from the results of the algorithms in the case of the binary classifcation of the four datasets the high performance of the RSA either in the learning stage or evaluation stage. However, it can be noticed that high quality is achieved in the case of KDD99 and NSL-KDD. However, the result outcomes of the competitive methods are nearly the same in the other two datasets (i.e., BIoT and CIC2017), with little better performance for the developed method.
Moreover, Figure 5 depicts the average of each method among all the tested datasets in terms of each performance measure. It can be observed from this fgure that the RSA has a high average overall performance metrics in the training and testing stages of the binary and multiclassifcation, followed by MVO in the multiclassifcation case, which provides better accuracy results than other algorithms. Te BAT has a better recall value in the training and testing stages, and provides a better F1-measure value in the testing stage. Each of MFO and GWO, in the case of training, has higher precision and F1-measure value than other algorithms, whereas, in the case of the testing stage, FFA has a higher precision value than other methods. Te same observation for MVO can be noticed in the case of binary classifcation. Each of MFO and GWO has better performance in terms of F1-measure and precision, respectively, in the training and testing stages. BAT provides better Recall value among the tested datasets in either the training or testing stages.
For further analysis of the obtained results, we used the Friedman test [65] to check whether the diference between  the competition methods is signifcant or not. Te Friedman test provides us with a mean rank for each method as given in Table 7. From these mean ranks, we can conclude that the mean rank of RSA is the highest in terms of performance measures in both classifcation scenarios (binary and multiclass), followed by MVO, FFA, MFO, and BAT, which has a high mean rank according to accuracy, precision, F1-measure, and recall, respectively. From the previous results, it can be noticed the high ability of the developed method to improve the process of predicting the attack in the IoT environment. However, the developed method has some limitations, such as being timeconsuming resulting from learning the model. However, this can be fxed by using transfer learning techniques.

Conclusion
Tis article presented a new method for intrusion detection systems (IDSs) of the Internet of things (IoT) and cloud environments. Te main idea is to utilize the proliferation of deep learning and metaheuristic optimization algorithms to build robust feature extraction and selection techniques. First, a one-dimensional convolutional neural network (CNN) method is suggested to extract the relevant features. Second, the reptile search algorithm (RSA) is employed to select an optimal feature subset to reduce data dimensionality and boost classifcation accuracy. Several wellknown and public datasets were used to assess the performance of the suggested techniques. More so, extensive experimental comparisons were carried out to confrm the quality of the RSA as a feature selection technique. Te outcomes revealed that the RSA obtained better performance compared to several optimization approaches, such as PSO, FA, GWO, WOA, TSO, BAT, and MVO. It recorded over 99% for all training scenarios of all datasets. Also, it recorded high results in a testing scenario; for example, for multiclassifcation, the RSA obtained 92.040%, 89.684%, 89.985%, and 92.040%, of accuracy, precision, F1, and recall, respectively, for KDD99 datasets. Also, in the binary classifcation, the proposed method recorded high results; for example, it recorded 92.344%, 94.335%, 92.763%, and 92.344%, of accuracy and precision, F1, and recall, respectively, for KDD99 datasets in the testing scenario. For other datasets, the proposed RSA also recorded superior results in all evaluation tests using several classifcation indicators. We concluded that the applications of CNN with RSA have signifcant impacts on the IDS classifcation process. For future work, other issues could be addressed; for example, the convergence speed of the RSA needs to be improved. Tus, other artifcial search mechanisms could be integrated with the RSA to tackle this problem. Also, in future work, we may consider applying the RSA for training deep learning models to boost the classifcation process for diferent applications, including IDS.

Data Availability
Te data used to support the fndings of this study are available from the authors upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this article.