Improved TLBO-JAYA Algorithm for Subset Feature Selection and Parameter Optimisation in Intrusion Detection System

,


Introduction
Recent advancements and popularisation of network and information technologies have increased the significance of network information security.Compared with conventional network defence mechanisms, human-based smart intrusion detection systems (IDSs) can either intercept or warn of network intrusion.However, most studies on information security have focused on the ways to improve the effectiveness of smart network IDSs.e use of smart IDSs is an effective network security solution that can protect against attacks.Nonetheless, machine learning (ML) methods and optimisation algorithms are often used for intrusion detection because the detection rate of existing IDSs is low when faced with audit data that have a high overhead [1].e execution time can sometimes increase substantially when one attempts to rise a detection accuracy.Also, the execution time may be significantly reduced but at the cost of decreased accuracy.erefore, the feature subset selection (FSS) problem can be considered as a multiobjective optimisation problem; it has more than one solution, from which the best may be chosen.Solutions that offer superior accuracy are selected by customers who prioritise precision.Other clients choose solutions that provide reduced execution times as the best solutions, even though accuracy is compromised to a certain extent.
e teaching-learning-based optimisation algorithm (TLBO), as a novel metaheuristic, has been recently applied to various intractable optimisation problems with considerable success.TLBO is superior to many other algorithms, such as genetic algorithms (GAs), particle swarm, and ant colony.Moreover, TLBO needs fewer parameters for tuning during execution compared with other algorithms.us, the combination of improved multiobjective TLBO frameworks with supervised ML techniques was proposed in the present study for FSS in multiclass classification problems (MCPs) for intrusion detection.e selection of the least number of features without causing an effect on the result accuracy in FSS is a multiobjective optimisation problem.
e first objective is the number of features, and the second is the detection accuracy.TLBO remarkably outperforms other metaheuristic algorithms.
us, ITLBO and a set of supervised SVM were deployed in this study for the selection of the optimal feature subset.JAYA is a new metaheuristic optimisation algorithm proposed by Rao (2016), which was recently deployed in several intractable optimisation problems.JAYA differed from other optimisation algorithms by not requiring parameter tuning [2].It has been used as a benchmark function for constrained and unconstrained cases, and despite being parameterless like TLBO, it requires no learning phase, making it different from TLBO [3].e principle of JAYA is the establishment of the problem's solution by inclining towards the best result and keeping off from the bad one.
is movement depends on certain control parameters like the number of design variables, the maximum number of generations, and the size of the population.It requires no tunable control parameter before the computation phase.us, IPJAYA is used to tune the parameters of the SVM.In order to improve the feature selection process and SVM parameter tuning, in this paper, we propose an improved algorithm for subset feature selection using an enhanced TLBO algorithm.It uses an additional phase in TLBO to increase the information exchange between teachers and learners.SVM parameter tuning is based on the improved parallel JAYA algorithm, which uses parallel processing to increase the speed of parameter tuning.e proposed algorithm is called ITLBO-IPJAYA-SVM.
e remaining part of this paper is presented in the following manner.Section 2 reviews work related to this study, and the FSS problem is introduced in Section 3. e ITLBO is discussed in Section 4, and Section 5 explains ML applied with ITLBO.Section 6 compares the results of the ITLBO and TLBO algorithms.Finally, Section 7 concludes this study.

Related Work
Intrusion detection is a prevalent security infrastructure topic in the era of big data.Combinations of different ML methods and optimisation algorithms have been developed and applied in the IDS to distinguish a normal network access from the attacks.Existing combinations include fuzzy logic, cuttlefish optimisation algorithm, K-nearest neighbour, artificial neural network, particle swarm algorithm, support vector machine (SVM), and artificial immune system approaches [4].Most methods that combine ML with optimisation algorithms outperform conventional classification methods.Numerous researchers have also proposed ML and optimisation-based IDSs [5].Louvieris et al. [6] proposed a novel combination of techniques (Kmeans clustering, naïve Bayes (NB), Kruskal-Wallis (KW), and C4.5) that pinpointed attacks as anomalies with high accuracy even within cluttered and conflicted cyber-network environments.Furthermore, the inclusion of the NB feature selection and the KW test in this method facilitates the classification of statistically significant and relevant feature sets, including a statistical benchmark for the validity of the method, while the detection of SQL injection in this method remains low.De la Hoz et al. [7] presented a method for NIDS that was based on self-organising maps (SOMs) and principal component analysis (PCA).Noise within the dataset and low-variance features were filtered by means of PCA and Fisher discriminant ratio. is procedure uses the most discriminative projections based on the variance explained by the eigenvectors.Prototypes generated by the self-organising process are modelled by a Gaussian, where d is the number of SOM units.erefore, this system must be trained only once; however, the main limitation of this work is that the detection rate remains low.Bamakan et al. [8] proposed a chaos-particle swarm optimisation method to provide a new ML IDS based on two conventional classifiers: multiple-criteria linear programming and an SVM.
e proposed approach has been applied to simultaneously set the parameters of these classifiers and provide the optimal feature subset.e main drawback of this work is the long training time needed.erefore, even though these combinations can improve the performance of IDSs in terms of learning speed and detection rate compared to conventional algorithms, further improvement is needed.
e performance of most IDSs is affected in terms of classification accuracy and training time by an increase in the number of audit data features.
e present paper proposes the use of the TLBO technology to address this issue through the supply of a fast and accurate optimisation process that can improve the capability of an IDS to find the optimal detection model based on ML.In the TLBO algorithm proposed by Rao et al. [9], the optimisation process for mechanical design problems does not need any user-defined parameter.is novel technique was tested on different benchmark functions, and the results demonstrated that the developed TLBO outperformed particle evolutionary swarm optimisation, artificial bee colony (ABC), and cultural DE.Das and Padhy [10] studied the possibility of applying a novel TLBO algorithm to the selection of optimal free parameters for an SVM regression model of financial time-series data by using multicommodity futures index data retrieved from multicut crossover (MCX).eir experimental results showed that the proposed hybrid SVM-TLBO model successfully identified the optimal parameters and yielded better predictions compared to the conventional SVM.Das et al. [11] proposed an extension of the hybrid SVM-TLBO model by introducing a dimension reduction technique whereby the number of input variables can be reduced by using PCA, kernel PCA (KPCA), and independent component analysis (ICA) (three common dimension reduction methods).is 2Complexity study also examined the feasibility of the proposed model using multicommodity futures index data retrieved from MCX. Rao et al. [12] confirmed the superiority of the model compared to some population-inspired optimisation frameworks.Rao and Patel [13] investigated the effect of sample size and number of generations on algorithmic performance and concluded that this algorithm can be easily applied to several optimisation cases.C � repinšek et al. [14] solved the problems presented in [9,12] by using TLBO.Nayak et al. [15] developed a multiobjective TLBO in which a matrix of solutions was created for each objective.e teacher selection process in TLBO is mainly based on the best solution presented in the solution space, and learners are taught to merely maximise that objective.
All the available solutions in the solution space were sorted to generate a collection of optimal solutions.Xu et al. [16] presented multiobjective TLBO based on different teaching techniques.ey used a crossover operator (rather than a scalar function) between solutions in the teaching and learning phases.Kiziloz et al. [17] suggested three multiobjective TLBO algorithms for FSS in binary classification (FSS-BCP).Among the presented methods, a multiobjective TLBO with scalar transformation was found to be the fastest algorithm, although it provided a limited number of nondominated solutions.Multiobjective TLBO with nondominated selection (MTLBO-NS) explores the solution space and produces a set of nondominated solutions but requires a long execution time.Multiobjective TLBO with minimum distance (MTLBO-MD) generates solutions that are similar to those of MTLBO-NS but in a significantly shorter time.e proposed multiobjective TLBO algorithms have been evaluated in terms of performance using LR, SVM, and extreme learning machine (ELM).Wang et al. suggested a novel "alcoholism identification method from healthy controls based on a computer-vision approach."[18] is approach relied on three components-the proposed wavelet Renyi entropy, feedforward neural network, and the proposed three-segment encoded JAYA algorithm.e results showed the proposed method exhibits good sensitivity, but the accuracy still needs improvements; Migallón et al. [19] developed parallel algorithms and presented their detailed analysis.ey developed a hybrid algorithm that exploited inherent parallelism at two different levels.
e lower level was exploited by parallel shared-memory platforms, while the upper level was exploited by distributed shared memory platforms.e results of both algorithms were good, especially in scalability.Hence, the proposed hybrid algorithm successfully used a number of processes with nearperfect efficiencies.
e experiments showed that the method used about 60 processes to achieve near-ideal efficiencies as analysed on 30 unconstrained functions.Gong [20] suggested a "novel E-JAYA algorithm for the performance enhancement of the original JAYA algorithm." e proposed E-JAYA used the average of the better and worse groups to derive the best solution.
e solution provided by the proposed E-JAYA had better accuracy than that of the original JAYA.
e swarm behaviours were considered in the E-JAYA rather than considering the best and worst individual behaviours.
e performance of E-JAYA was assessed on 12 benchmark functions of varying dimensionality.
Another study proposed an effective demand-side management scheme for residential HEMS [21].e system was proposed for peak creation prevention to reduce electricity bills.is study applied JAYA, SBA, and EDE to realise its objectives; it also deployed the TOU pricing scheme for electricity bill computation.From the result, JAYA was sufficient in reducing electricity bill and PAR, thereby achieving customer satisfaction.Furthermore, the SBA outperformed JAYA and EDE in achieving user comfortability as it related negatively with an electricity bill.Yu et al. [22] developed improved JAYA (IJAYA) for steady and accurate PV model parameter estimation by incorporating a self-adaptive weight for the adjustment of the propensity of reaching the best solution and avoiding the bad solution while searching.e weight helps in ensuring the framework achieves the possible search region early and to perform local search later.Furthermore, the algorithm contains a learning strategy derived from other individuals' experiences, which was randomly used for population diversity improvement.Table 1 shows the lacks and limitation of IDS studies mentioned in the related work.

Feature Subset Selection Problem
is section explains the representation of the features and the problem of choosing the best feature subset.FSS refers to the selection of feature subsets from a larger feature set.FSS reduces the number of features in a dataset, thereby preventing complex calculations and improving the speed and performance of classifiers.Several definitions of FSS exist in literature [23]; some definitions deal with the reduction in size of the selected subset, while others focus on the improvement of prediction accuracy.FSS is essentially a process of constructing an effective subset that represents the information contained in a dataset by eliminating redundant and irrelevant features.FSS mainly aims at finding the least number of features without having a significant influence on classification accuracy.Owing to the complicated nature of optimal subset feature extraction, as well as the nonexistence of a polynomial-time algorithm for addressing it, FSS has been classified as an NP-hard problem [24].ere are four steps in typical FSS [23]; the first step involves the selection of candidate features that will constitute the subsets, while the second step is the evaluation and comparison of these subsets with each other.In the third step, a check is made for the satisfaction of the termination condition; otherwise, the first and second steps will be repeated.e final step checks if the optimal feature subset has been established based on prior knowledge.With these two major aims, FSS can be considered a multiobjective problem.A formal definition of finding optimal solutions through the satisfaction of both objectives is given in the following equation: where k is the subset of the original dataset K which optimises f1 and f2 (the objectives).e establishment of the best solution or the decision on the improved condition of a new individual is a complicated task in a multiobjective optimisation process. is is due to the chances of enhancement in one objective, causing a reduction in the other.

Improved TLBO Algorithm
e ITLBO algorithm was executed at the FSS phase in this study.
e ITLBO algorithm was initialised by randomly generated initial population, namely, the teacher and a set of students, which represents the set of solutions.To represent the features in the ITLBO algorithm, ITLBO borrowed the crossover and mutation operators from GA by representing the features as chromosomes (one of the GA properties).To update this chromosome, crossover and mutation operators were used.In the population (called a classroom), each solution is taken as an individual/chromosome (Figure 1).A feature gene of a chromosome with a value of 1 is considered as selected, while a value of 0 denotes otherwise.Figure 1 shows a sample of the dataset; regarding Figure 2, features A,  B, C, D, E, I, K, and L were selected (their values are 1), while features F, G, H, and J were not (their values are 0).e TLBO algorithm runs through iterations where the teacher is the best individual in the population and the rest of the individuals become the students.Having selected the teacher, ITLBO works in three phases: Teacher, Best Classmates (Learner Phase 1), and Learner Phase 2. In the Teacher phase, the teacher enhances the knowledge of each student by sharing knowledge with them, but in the Best Classmates phase, two best students are selected and assigned the task of interacting with the other students.In the Learner phase, there is a random interaction among the students in a bid to enhance their levels of knowledge.New chromosomes are generated in the proposed ITLBO using "half-uniform crossover and bit-flip mutation operators" which are special crossover operators (Figures 3 and 4).Two-parent chromosomes (could be a teacher, a student, or two students) are needed for the crossover operator.e crossover operator relies on the information of the twoparent chromosomes; if both parents feature the same gene, the gene is kept, but whenever there are different feature genes in both parents, a parent's gene is randomly chosen.Only one new chromosome is generated from this operation.
e "bit-flip mutation" works on a single chromosome when trying to manipulate a single gene based on a probabilistic ratio.If the gene has a zero value, it will be updated as one, or vice versa.In the proposed ITLBO algorithm, nondominated sorting and selection were used.e dominance of an individual over another individual is determined strictly on the basis of whether a minimum of one of its objectives is superior to that of the other while keeping all the other objectives the same.
A nondominated scenario arises when there is no possibility of an individual being dominated by another.e front line of the solution set is filled by the nondominated individuals.ose that are closest to the ideal point in the front line are chosen as the teachers.All the teachers teach all students discretely at the Teacher, Best Classmate, and Learner phases.
e details of the ITLBO algorithm are presented in Figures 5 and 6. e detail steps of ITLBO are as follows: (i) Step 1: initialise the population randomly with each population having a different set of features from 1 to a maximum number of features (41 in NSL-KDD).is step is captured in line 2 of Figure 5. (ii) Step 2: choose the best individual as a teacher.e chosen teacher interacts with all other individuals separately, and a crossover is applied with each one, and then a mutation is applied to all the resulting individuals.e operators used are half-uniform crossover and bit-flip mutation operators (represented in lines 4 to 5 in Figure 5).(iii) Step 3: check the population (chromosome) that results from the crossover and mutation; if the new chromosome is better than the old, then the new one is kept; otherwise, the old one is retained.All the aforementioned steps are collectively called the Teacher phase because all individuals learn from the best one (the teacher).is step is represented in lines 6 to 13 in Figure 5. (iv) Step 4: after that, Learner Phase 1 or learning from the best classmates is started.is phase begins with the fifth step which is the selection of the best two individuals as students and applying a crossover between them followed by a mutation.If the new one is better than the previous two students, then the newer choice is kept; otherwise, the older best choice is kept. is process is repeated with all other individuals (students).At this point, Learner Phase 1 has terminated (viewed in lines 14 to 27 in Figure 5).(v) Step 5: this step is Learner Phase 2 which involves choosing two random individuals (students) between whom a crossover is applied followed by a mutation on the new individual.If the new

Ref. Limitation [6]
Detection of SQL injection is low [7] Detection rate is low [8] Long training time Figure 1: Schematic representation of a chromosome: 1 � selected features; 0 � unselected features.4 Complexity individual is better than the old two students, then the new one is kept; otherwise, the best old one is retained.is step is repeated with all other students.At this point, the main three stages of ITLBO have been completed, and a check should be carried out on whether the termination criteria have  (7) for (i: =1 to number_of_individuals) do (8) Xnew: = Crossover (Xteacher, Xi); (9) Xnew: = Mutation (Xnew); (10) if (Xnew is better than Xi) then (11) Xi: = [Xnew] (12) End if (13) End for (14) Learning from Best Classmates * /learner phase 1 (15) for (i: =1 to number_of_individuals) do (16) m: = Select_best_individual_from (population);

Parameter Optimisation
After selecting the optimal subset feature, several SVM parameters will be tuned.e tuning of SVM parameters is a problem which can determine algorithm performance.e  2) and (3) represent the cost and gamma, respectively.In the next section, the two parameters (C and c) were tuned by using the IPJAYA algorithm.

Improved Parallel JAYA Algorithm
e JAYA algorithm needs improvements to work better.One of the observations on the JAYA algorithm is that if we sort the populations from best to worst and divide them into two groups, the best and the worst solutions.Obviously, the optimal solution is located in the best solution group [2].Based on this observation, an improvement has been done in the JAYA algorithm; rather than selecting the best and worst cases from the whole solutions, which puts the worst solution further from the best solution and increase the iterations needed to reach the optimal solution, the solutions were divided into two groups.e best solution is chosen from the best solution group as "Best," and the best solution from the worst solution group is the "Worst." is procedure reserves the population's diversity and makes the solution start from a point closer to the optimal solution and decreases the number of iterations needed to reach the optimal solution.In the proposed work, JAYA algorithm was improved to optimise two parameters of the SVM classifier simultaneously.Figure 7 shows the flowchart of IPJAYA, while Figure 8 shows the IPJAYA algorithm followed by the detailed steps of IPJAYA.3.
To continue the optimisation process, the population is arranged from best to worst and split into two groups (Best and Worst groups) as shown in Table 4. e same procedure is repeated for c parameter, and this time, C is by default, and the new value of c is 11.006.(iv) Tables 5 and 6 show the details of c parameter.(v) Step 4: the result will be considered as the objective function for both C and c and then compared with other populations and continued until the termination criterion is satisfied.is step can be viewed in line 7 of Figure 8.
ese two new values for C and c will be evaluated using the same subset feature at the same time as shown in Table 7.
is step can be viewed in lines 5 to 6 of Figure 8.

The Proposed Method
is section describes the proposed combination of three different algorithms.Each algorithm has a different task to do, and these tasks complete the work of the model.e first algorithm is the ITLBO whose task is to choose the optimal subset feature from the whole features.e second algorithm is IPJAYA algorithm, and its task is to optimise the parameters of the SVM.e third algorithm is the SVM classifier which takes the outcome of the first two algorithms to determine if the processed traffic is intrusion or normal traffic.Figure 9 shows the flowchart of the proposed method.Figure 10 shows the pseudo-code of the proposed method, while Figure 11     and apply crossover between these two students.en, apply mutation on the new one.If the new one is better than the old two students, keep the new one; otherwise, keep the best old one, and apply this with all other individuals (students).e students are chosen once and will not be chosen again.At this point, Learner Phase 1 has ended.
is step can be viewed in lines 14 to 27 of Figure 10.(vi) Step 6: Learner Phase 2 is initiated with two objectives; one is to optimise the SVM parameters, and the other is make students learn from each other.is phase starts with choosing two random individuals (students) and then applying crossover between these Complexity two students and applying mutation on the new individual.After that and before the classification process is initiated, check if the new student is better than the old two students.e SVM parameter optimisation is started using IPJAYA; this process starts at the 29 th step by initialising population size, the number of design variables, and the termination criteria for IPJAYA.e population size can be set before the execution, and each population is generated randomly.e designed variables are the two parameters of the SVM which need to be optimised.
e termination criteria can be the number of iterations; after that, each population for each parameter is evaluated separately (which one gives better accuracy) followed by a parallel poll for each parameter, sorting the population from best to worst (best accuracy to worst accuracy), and separating them into two groups (best and worst groups).
e best population in the best group is chosen as best, and the best population in the worst group is chosen as worst.en, the population is modified based on equation in Figure 8 and updated if the new one is better than the old one.IPJAYA is repeated until the termination criterion is satisfied.e final step of IPJAYA is to deliver the best value of the two parameters to be used by the SVM.At this point, the parameter optimisation has ended, and Learner Phase 2 continues in the next step.is step can be viewed in lines 28 to 39 of Figure 10.
(vii) Step 7: evaluate the individuals (chromosome) by using the outcome of IPJAYA.If the new individual is better than the old two students, keep the new one; otherwise, keep the best old one.Apply this step to all other students.At this step, the main three stages of the ITLBO have finished.e next step is to check for the satisfaction of the termination criteria; if satisfied, proceed to the next step.Otherwise, the main three stages are repeated.is step can be viewed in lines 40 to 48 of Figure 10.
(viii) Step 8: the last step is to apply nondominated sorting on the result.Nondominated sorting means no result (individual) is better than all the other individuals.is step can be viewed in line 49 of Figure 10.

Evaluation Metrics
e metrics, measures, and validation procedures used in the evaluation of the experimental data were reviewed in this section.e literature review showed that most studies use overall accuracy as the major performance measure for ID systems.However, other metrics and validation measures have also been mentioned.Some works have detailed the information on FAR detections and missed detections which are all useful system performance evaluation measures.e following section details the analysis based on standard metrics for objective evaluation of the results achieved by various classification methods.
e performance of the system was evaluated using several metrics based on the NSL-KDD and CICIDS 2017 datasets.A detailed description of learning performance measures has been provided by Singh et al. and Sokolova and Guy [25,26], while Phoungphol et al. [27] detailed the imbalanced dataset issues.One of these metrics is the accuracy as given in the following equation: Accuracy is the capability of the classifier in predicting the actual class; here, TP � true positive, TN � true negative, FP � false positive, and FN � false negative.
Several metrics can be computed from the confusion matrix.e false-positive rate (FPR) is another metric; it is the percentage of the samples incorrectly predicted as positive by the classifier.It is calculated by using the following equation: e false-negative rate (FNR) is the percentage of the data incorrectly classified by the classifier as negative.It is calculated by using the following equation: e detection rate (DR) is the percentage of the samples correctly classified by the classifier to their correct class.It is calculated by using the following equation: e recall quantifies the number of correct positive predictions made out of all positive predictions.It is calculated by using the following equation: F-Measure provides a way to combine both detection rate and recall into a single measure that captures both properties.It is calculated by using the following equation: e results were validated by using k-fold cross-validation technique [27][28][29][30]. is technique requires a random partitioning of the data into k different parts, and one part is selected from each iteration as testing data, while the other (k−1) parts are considered as the training dataset.All the connection records are eventually used for training and testing.For all experiments, the value of k is taken as 10 to ensure low bias, low variance, low overfitting, and good error estimate [28].

Dataset Preprocessing and Partitioning
e whole dataset is preprocessed in this stage.It consists of two steps, i.e., scaling and normalisation.In the scaling step, the dataset is converted from a string representation to a numerical representation.For example, the class label in the dataset contains two different categories, "Normal" and "Attack."After implementing this step, the label is changed to "1" and "0," where "1" means normal case, while "0" means attack.e second step is normalisation [31].e normalisation process removes noise from the dataset and decreases the differences in the ranges between the features.In this work, the Max-Max normalisation method was used as shown in the following equation: where Fi represents the current feature that needs to be normalized and Mini and Maxi represent the minimum and the maximum value for that feature, respectively.e objective function represents the accuracy of the SVM when it is evaluated on the validation set.e validation set is a part of the training set.In order to make the validation fairer, Kfold validation can be used.e value K is 10. e NSL-KDD and CICIDS 2017 datasets were used to evaluate the performance of the proposed models.

NSL-KDD Dataset
In this study, NSL-KDD datasets were used to evaluate the proposed method.is dataset was suggested in 2009 by Tavallaee et al. [32] due to the drawbacks of KDD CUP99.e NSL-KDD is a variant of the KDD CUP 99 dataset in which the redundant instances were discarded followed by the reconstitution of the dataset structure [29].e NSL-KDD dataset is commonly used for evaluating the performance of new ID approaches, especially anomaly-based network ID.ere are a reasonable number of testing and training records in the NSL-KDD.
e training set (KDDTrain+) consists of 125,973 records, while the testing set (KDDTest+) contains 22,544 records.In this dataset, each traffic record has 41 features (six symbolic and 35 continuous) and one class label (Table 7).e features are classified into basic, content, and traffic types (Table 8).Attack classification in the NSL-KDD is based on the feature characteristics [33].e NSL-KDD dataset can be downloaded from https://www.unb.ca/cic/datasets/nsl. html.

CICIDS 2017 Dataset
e CICIDS 2017 dataset consists of benign and the most current common attacks, which mimic real-world data (PCAPs).It also contains the results of a network traffic analysis obtained by using a CICFlowMeter; the flows are labelled based on the timestamp, source and destination ports, source and destination IPs, protocols, and attack.e CICIDS 2017 dataset satisfies the 11 indispensable features of a valid IDS dataset, namely, anonymity, available protocols, feature set, attack diversity, complete capture, complete interaction, complete network configuration, complete traffic, metadata, heterogeneity, and labelling [34].ere are 2,830,540 rows in the CICIDS 2017 devised on eight files with each row containing 79 features.In the CICIDS 2017, each row is labelled as benign or as one of the 14 attack types.A summary of the distribution of different attack types and the benign rows is presented in Table 9.

Results of ITLBO-IPJAYA vs. ITLBO and ITLBO-JAYA
is section provides the results of the improved methodbased ITLBO-IPJAYA algorithm.is method selects the best features and updates the value of SVM parameters. is work proposed the idea of "parallel execution" to update the SVM parameters.e parameters for ITLBO, ITLBO-JAYA, and ITLBO-IPJAYA methods used in this study are shown in Table 10.
e results show that ITLBO-IPJAYA performs better than ITLBO and ITLBO-JAYA in all metrics.Figure 11 shows the comparison results based on the accuracy of ITLBO, ITLBO-JAYA, and ITLBO-IPJAYA.
Figure 12 shows a comparison between ITLBO-JAYA and ITLBO-IPJAYA based on the number of iterations.It shows that ITLBO-IPJAYA performs better than ITLBO-JAYA even with less number of iterations.e increase in rate of accuracy for ITLBO-IPJAYA is higher than ITLBO-JAYA.e figure shows that ITLBO-IPJAYA with 20 iterations performs better than ITLBO-JAYA with 30 iterations and that ITLBO-IPJAYA performs better than ITLBO-JAYA with less number of iterations.
is means there is less complexity and less execution time for ITLBO-IPJAYA.Figure 13 shows the average FAR of the three methods, showing that ITLBO-IPJAYA performs better than ITLBO and ITLBO-JAYA even with less number of features, where ITLBO-IPJAYA with 19 features performs better than TLBO and ITLBO-JAYA with 21 and 22 features, respectively.e improvements shown in Sections 4 and 6 reduce the execution time for ITLBO-IPJAYA over ITLBO-JAYA.e parallel processing of each SVM parameter independently is the main factor that reduces the execution time for ITLBO-IPJAYA over ITLBO-JAYA, as shown in Figure 14.
e results of the CICIDS 2017 dataset are shown in Table 12.
Finally, statistical significance tests (T-test), T-test made on the distribution of values in both samples, showed their significant difference, which allowed us to reject null hypothesis H0. e test sh2ows the superiority of IPJAYA-ITLBO-SVM over JAYA-ITLBO-SVM.e P values and T-    14 Complexity values are shown in Table 13; the small values show that the IPJAYA-ITLBO-SVM method (MV1) is highly significant.

The Comparison of the Proposed Methods
To illustrate the effectiveness of our proposed IDS methods, the performance of the proposed methods is compared with six recently developed anomaly detection techniques.

Discussion
is work in general contains 4 sections based on the proposed method.Furthermore, all methods proposed in this work were evaluated based on the NSL-KDD and CICIDS 2017 datasets.
Firstly, the proposed ITLBO-IPJAYA based on network intrusion detection and method results were compared with TLBO, ITLBO, and ITLBO-JAYA as shown in Tables 11 and  12. Additionally, the table shows the different features for the three algorithms to investigate the influence of the feature's increase on the performance, which represents a different algorithm structure.
e ITLBO-IPJAYA results showed higher stability and better accuracy than ITLBO and ITLBO-JAYA algorithms.
Furthermore, Figure 13 shows that ITLBO-JAYA needs 60 iterations to reach accuracy of 0.9816 when the ITLBO-IPJAYA algorithm with 50 iterations achieved higher accuracy.erefore, ITLBO-IPJAYA achieved better detection rate and less false alarm rate with less complexity of  Secondly, with all the improvement of ITLBO-SVM mentioned above, random selection of the main SVM parameters is considered as one of the algorithm limitations, which may not provide optimal parameter value and affect the model accuracy negatively.e results above showed that the ITLBO-IPJAYA performance improved the basic SVM performance by providing the best parameter values as shown in the ITLBO-IPJAYA block diagram in Figure 11.In the end, the performance of ITLBO-IPJAYA is worth reducing the impact of selected parameters randomly.
As a result of the differences in the algorithm structure, the ITLBO structure contains three phases which should prevent the algorithm from being trapped in local and global optima.Also, teachers not only teach learners (students) but also teach other teachers.On the contrary, the TLBO structure contains two phases only, where teachers teach learners only.
Furthermore, the ITLBO algorithm achieved higher accuracy than TLBO because the knowledge exchange rate is higher in ITLBO since teachers teach learners and other teachers.erefore, ITLBO achieved better detection rate and less false alarm rate with less complexity of iterations.16 Complexity Dividing the solutions of the IPJAYA algorithm into two groups and choosing the best solution from the best solution group as "Best" and the best solution from the worst solution group as "Worst" cause IPJAYA to need less iterations than JAYA to reach better solutions, as shown in Figure 13. is also leads to improvement in accuracy and detection rate.
e parallel improvement done on the JAYA algorithm reduces the time needed for execution and hence reduces the total execution time for the ITLBO-IPJAYA-SVM model as shown in Figure 15.

Figure 2 :
Figure 2: Sample of the dataset.

e
detailed steps of IPJAYA are shown as follows: (i) Step 1: select the population size and the number of design variables, as well as initialise the termination condition.To explain the parameter optimisation in detail, we assume the following scenario: population size � 3, design variables � 2, and termination criterion � 2 iterations.e value of the population is the value of parameters C and c; in this scenario, each one has 3 values.ese values were initialised randomly for C between 0.001 to 100 and for c between 0.0001 to 64.Table 2 shows the values of C and c. (ii) Step 2: SVM needs three things to classify any labelled data, i.e., features to choose, value of C parameter, and value of c parameter. is step can be viewed in line 2 of Figure 8. (iii) Step 3: the next step is to evaluate each value for both C and c separately by using the SVM and on the first student from Learner Phase 2 after applying crossover and mutation as shown in Table

( 1 )( 8 )
Start(2) Initialise the population size, number of designed variables, and termination criteria (3) Repeat Steps 3-6 until the termination criteria are met (4) Arrange the solutions from best to worst and split the solutions into two groups -best and worst solutions (5) Make the best solution in best group as best, and make the best solution in worst group as worst (6) Modify the solution based on the following equation: Y′ j, k, I = Y j, k, I + r 1, k, I (Y j, best, I -| Y j, k, I |) -r2, k, I (Y j, worst, I -| Y j, k, I |) (7) Update the previous solution if Y′j, k, I > Yj, k, I, otherwise, do not update the previous solution Display the established optimum solution (9) End

Figure 13 :
Figure 13: Accuracy comparison based on the number of iterations for the NSL-KDD dataset.

Figure 12 :
Figure 12: Accuracy based on the number of features for the NSL-KDD dataset.

Figure 14 :
Figure 14: Comparison based on the number of features head for the NSL-KDD dataset.

Figure 15 :
Figure 15: Execution time comparison for the NSL-KDD dataset.
RBF), kernel function of the SVM, is employed for the conversion of the completely nonseparable problem into a separable or approximate separable state.e RBF kernel parameter c suggests data distribution to a new feature space, while parameter C suggests the level of penalty for the classification error in the linear nonseparable case.Equations (

Table 2 :
C and c values.

Table 3 :
Evaluation of C.

Table 4 :
Best and Worst groups for C.

Table 5 :
Accuracy based on c.

Table 6 :
Best and Worst groups for c.

Table 7 :
Evaluation of features based on new C and c.

Table 10 :
Parameters used in this study; margin.

Table 14
JAYA method, as shown in Table11.However, Table15demonstrates the result achieved by the proposed methods compared with other methods tested on the CICIDS 2017 dataset in terms of detection rate and false alarm rate.

Table 15 :
Comparison with the existing work for the CICIDS 2017 dataset.

Table 14 :
Comparison with the existing work for the NSL-KDD dataset.