LNNLS-KH: A Feature Selection Method for Network Intrusion Detection

As an important part of intrusion detection, feature selection plays a signiﬁcant role in improving the performance of intrusion detection. Krill herd (KH) algorithm is an eﬃcient swarm intelligence algorithm with excellent performance in data mining. To solve the problem of low eﬃciency and high false positive rate in intrusion detection caused by increasing high-dimensional data, an improved krill swarm algorithm based on linear nearest neighbor lasso step (LNNLS-KH) is proposed for feature selection of network intrusion detection. The number of selected features and classiﬁcation accuracy are introduced into ﬁtness evaluation function of LNNLS-KH algorithm, and the physical diﬀusion motion of the krill individuals is transformed by a nonlinear method. Meanwhile, the linear nearest neighbor lasso step optimization is performed on the updated krill herd position in order to derive the global optimal solution. Experiments show that the LNNLS-KH algorithm retains 7 features in NSL-KDD dataset and 10.2 features in CICIDS2017 dataset on average, which eﬀectively eliminates redundant features while ensuring high detection accuracy. Compared with the CMPSO, ACO, KH, and IKH algorithms, it reduces features by 44%, 42.86%, 34.88%, and 24.32% in NSL-KDD dataset, and 57.85%, 52.34%, 27.14%, and 25% in CICIDS2017 dataset, respectively. The classiﬁcation accuracy increased by 10.03% and 5.39%, and the detection rate increased by 8.63% and 5.45%. Time of intrusion detection decreased by 12.41% and 4.03% on average. Furthermore, LNNLS-KH algorithm quickly jumps out of the local optimal solution and shows good performance in the optimal ﬁtness iteration curve, convergence speed, and false positive rate of detection.


Introduction
With the advent of the era of big data, the dimension of information has increased exponentially. In many fields such as machine learning, data analysis, and text mining [1], it is increasingly difficult to handle large amounts of high dimension data. Irrelevant and redundant features increase the complexity of the dimension and interfere with the accurate classification results, resulting in poor performance of the algorithm. Intrusion detection system (IDS) [2] relies on a large amount of network data, which carries out real-time monitoring of network transmission and identifies and processes malicious use of computers and network resources.
e "curve of dimensionality" (COD) caused by massive data of IDS leads to low detection rate, poor effect, and high false positive rate, which seriously affect the efficiency of intrusion detection. How to improve the efficiency of intrusion detection while ensuring the detection accuracy has become an urgent problem to be solved.
As a common method of data dimensionality reduction, feature selection has attracted more and more attention. It reduces the complexity of data by deleting unnecessary features, which is of great significance to IDS. Feature selection algorithms filter out redundant data to reduce the dimensions of network data. In addition, the computing payload of IDS is decreased and the detection speed is improved. Consequently, feature selection is one of the critical links of data preprocessing in IDS, which has a significant impact on detection accuracy and model generalization ability. Generally, the feature selection framework is composed of four parts: search module, evaluation criterion, judgment condition and verification, and output. e search module includes search starting point and search strategy. After the original feature set is processed by the search module, the corresponding feature subset is generated. Appropriate evaluation criteria are constructed to evaluate the feature subsets. When the termination condition of the feature selection process is reached, the final selected feature subset is output. Meanwhile, it is verified to evaluate the quality of feature selection algorithm. e framework of feature selection is shown in Figure 1.
e swarm intelligence optimization method is a kind of group-oriented random search technology, which provides new ideas for solving the feature selection problem. Krill herd (KH) algorithm is a new type of swarm intelligence optimization method that studies the foraging rule and clustering behavior of krill herd in nature. By simulating the movement induced by other krill individuals, foraging activity, and physical diffusion motion of krill herd, the position of individuals is constantly updated. While looking for food and the highest krill herd density, they will move towards the best solution and finally get the global optimal solution. KH algorithm has been widely concerned by many scholars and engineers for its excellent optimization performance and is considered to be one of the fastest developing natural heuristic algorithms in solving practical optimization problems [3]. It integrates the local robust search method with the population-based method and has a good performance in high-dimensional data processing. It is widely used in network path optimization [4], text clustering analysis [5], neural network training [6], multiple continuous optimization [7][8][9], combinatorial optimization [10,11], constraint optimization [12][13][14], and other scenarios [3]. KH algorithm has good exploitation ability, but the exploration ability is not satisfactory, which means that the algorithm is easy to fall into local optimal solution when solving practical problems. Although there are existing optimization algorithms for KH algorithm, the research on the optimization algorithm that can provide high convergence rate and global optimal solution is continuing. erefore, the improvement of KH algorithm to balance the global exploration and local exploitation abilities is of great significance for improving the solution accuracy and optimization efficiency.
In this paper, an optimized LNNLS-KH algorithm for feature selection is proposed to address the problem of large number and high dimension of intrusion detection datasets. It filters out the redundant features of IDS data so that the efficiency of intrusion detection is significantly improved and the time cost is enormously reduced. e main contributions of this paper are listed as follows: (i) e number of dimensions and detection accuracy of feature selection were introduced into the fitness function, which improved the ability of feature selection. (ii) To accelerate the convergence speed of the algorithm, we modified the physical diffusion motion of krill individuals by the nonlinear method. (iii) e LNNLS-KH algorithm was proposed for feature selection of intrusion detection data, which effectively enhanced the local exploitation ability and global exploration ability of the algorithm. (iv) e proposed algorithm was comprehensively evaluated by conducting a large number of experiments on NSL-KDD and CICIDS2017 datasets dataset. e experimental results show that the LNNLS-KH algorithm exhibited good competitive performance in the evaluation indicators for intrusion detection. e remaining sections of this paper are organized as follows. Section 2 presents the related works about feature selection methods and the variants of KH algorithm. Section 3 provides a detailed description of the proposed LNNLS-KH algorithm. Section 4 provides shows the experimental results and discussion. Section 5 is concluded with future research.

Related Works
In this section, we show three feature selection methods based on evaluation criteria and feature selection algorithms in IDS. Meanwhile, we summarize swarm intelligence algorithms, especially KH algorithm and its variants.

Feature Selection Methods Based on the Evaluation Criteria.
ere are three types of feature selection methods based on the evaluation criteria: the filter method, the wrapper method, and the embedded method [15]. e filter method assigns weights to the features of each dimension, filters the features in the order of weight, and uses the feature subsets to train the classification algorithm. erefore, the process of feature selection is independent of the classification algorithm. Although the filter method occupies fewer computing resources and saves more time for feature selection, the selected feature subset lacks the adjustment of the classification algorithm, resulting in low classification accuracy. e wrapper method takes into account the effect of the performance of the classification algorithm on the feature subsets, so it derives a high classification accuracy, but the computation and time are consumed enormously. e embedded method integrates the feature selection process and the classification algorithm and simultaneously performs feature selection during the classification training. Its computation cost and classification accuracy are between the filter method and the wrapper method. e feature selection of intrusion detection data requires high accuracy, and the training time of offline data is not concerned. erefore, the wrapper method is adopted as the feature selection method in this paper. e frameworks of the three types of feature selection methods based on the evaluation criteria are shown in Figure 2.

Feature Selection Algorithms in IDS.
Feature selection is one of the most important parts of data preprocessing in intrusion detection, which is of great significance to IDS. e    [16]. ey used Bayesian networks to adjust the correlation of attributes and PCA to extract the primary features on an institute-wide cloud system. e disadvantage is that the detection accuracy is considered to be further improved as an improvement. Zhao et al. [17] proposed a feature selection method based on Mahalanobis distance and applied it to network intrusion detection to obtain the optimal feature subset. Feature ranking based on Mahalanobis distance was used as the principle selection mechanism and the improved exhaustive search was used to select the optimal ranking features. e experimental results based on the KDD CUP 99 dataset show that the algorithm has good performance on both the support vector machine and the k-nearest neighbor classifier. Singh and Tiwari proposed an efficient approach for intrusion detection in reduced features of KDD CUP 99 dataset in 2015 [18]. Iterative Dichotomiser 3 (ID3) algorithm was used for feature reduction of large datasets, and KNNGA was used as a classifier for intrusion detection. e method performs well on evaluation measures of sensitivity, specificity, and accuracy. However, both Zhao et al. and Singh and Tiwari [17,18] conduct experiments on the outdated datasets, which are difficult to reflect the new attack features of modern networks. In [19], Ambusaidi et al. proposed a feature selection algorithm based on mutual information to deal with linear and nonlinear related data features. ey established an intrusion detection system based on leastsquares support vector machine. Experimental results show that the proposed algorithm performs well in accuracy, but poor in false positive rate. Shone et al. proposed an unsupervised feature learning method based on nonsymmetric deep autoencoder (NDAE) and a novel deep learning classification model constructed using stacked NDAEs [20]. e results demonstrated that the approach offers high levels of accuracy, precision, and recall together with reduced training time. Meanwhile, it is worth noting that the stacked NDAE model has 98.81% less training time than the mainstream DBN technology. e limitation is that the model needs to assess and extend the capability to handle zero-day attacks.
In [21], a self-adaptive differential evolution (SaDE) algorithm was proposed to deal with the feature selection problem. It uses adaptive mechanism to select the most appropriate among the four candidate solution generation strategies, which effectively reduced the number of features. e disadvantage is that the experiment uses small sample data and more data is needed to further support the conclusion. Shen et al. adopted principal component analysis and linear discriminant analysis to decrease the dimensionality of the dataset and combined with Bayesian classification to construct an intrusion detection model [22]. Simulation experiments based on CICIDS2017 dataset show that the proposed algorithm filters out the noise in the data and improves the time performance to a certain extent. However, the algorithm still needs to be optimized to further improve the classification accuracy. In [23], a hybrid network feature selection method based on convolutional neural network (CNN) and long and short-term memory network (LSTM) had been applied to IDS. According to the experimental results, the proposed feature selection algorithm achieves better accuracy compared with the CNNonly model and the LSTM-only model. However, the detection accuracy of Heartbleed and SSHPatator attacks is low. In [24], Farahani proposed a new cross-correlationbased feature selection (CCFS) method to reduce the feature dimension of intrusion detection dataset. Compared with cuttlefish algorithm (CFA) and mutual information-based feature selection (MIFS), the proposed algorithm was demonstrated to have a good performance in the accuracy, precision, and recall rate of classification. However, the author simply replaced the categorical attributes with numeric values when dealing with symbolic data, without considering a more reasonable one-hot encoding method. e summary of feature selection methods in IDS is shown in Table 1.

Swarm Intelligence Algorithms for Feature Selection.
e core of feature selection is the search strategy for generating feature subsets. Although the exhaustive search strategy can find the globally optimal feature subset, its excessive time complexity consumes huge computing resources, whether exhaustive search or nonexhaustive search. In recent years, swarm intelligence optimization methods inspired by natural phenomena provide a new approach to solve the problem of feature selection [10][11][12][13][14][15][16][17]. erefore, we propose the LNNLS-KH algorithm with high search efficiency as the search strategy for feature subset. Swarm intelligence optimization methods simulate the evolution of survival of the fittest in nature and are a group-oriented random search technique that can be used to solve complex problems in large-scale data analysis [25]. Common swarm intelligence optimization methods include particle swarm optimization (PSO) [26], ant colony optimization algorithm (ACO) [27], cuckoo algorithm (CA) [28], artificial fish swarm algorithm (AFSA) [29], artificial bee colony algorithm (ABC) [30], fruit fly optimization algorithm (FOA) [31], monkey algorithm (MA) [32], bat algorithm (BA) [33], and salp swarm algorithm (SSA) [34].
Moreover, Ahmed et al. proposed a new chaotic chicken swarm algorithm (CCSO) for feature selection [35]. By combining logical maps and chaotic trend maps, the CSO algorithm acquires a strong spatial search ability. e experimental results show that the classification accuracy of the model is further improved after CCSO feature selection. e disadvantage is the lack of comparison with other chaotic algorithms. Ahmtabakh proposed an unsupervised feature selection method based on ant colony optimization (UFSACO) [36], which iteratively filtrates feature through the heuristic and previous stage information of the ant colony. Simultaneously, the similarity between features is quantified to reduce the redundancy of data features. However, the efficiency of feature selection process needs to be improved.
To solve the problem that it is easy to fall into the local optimal solution, Arora and Anand proposed a butterfly optimization algorithm (BOA) based on binary variables [37]. Based on the foraging behavior of butterflies, the algorithm uses each butterfly as a search agent to iteratively optimize the fitness function, which has good convergence ability and avoids the premature problem to a certain extent. Experimental results show that the algorithm reduces the length of feature subset while selecting the optimal feature subset and improves the classification accuracy to a certain extent. However, the time cost is larger than that of genetic algorithm and particle swarm optimization algorithm, and the optimization result of the feature subset for repeated experiments is inaccurate and has poor robustness.
In [38], Yan et al. proposed a hybrid optimization algorithm (BCROSAT) based on simulated annealing and binary coral reefs, which is used for feature selection in highdimensional biomedical datasets. e algorithm increases the diversity of the initial population individuals through the league selection strategy and uses the simulated annealing algorithm and binary coding to improve the search ability of the coral reef optimization algorithm. However, the algorithm has high time complexity. In [39], a new chaotic Dragonfly algorithm (CDA) is proposed by Sayed et al., which combines 10 different chaotic maps with the search iteration process of dragonfly algorithm, so as to accelerate the convergence speed of the algorithm and improve the efficiency of feature selection. e algorithm uses the worst fitness value, best fitness value, average fitness value, standard deviation, and average feature length as evaluation criteria. e experimental results show that the adjustment variable of Gauss map significantly improves the performance of dragonfly algorithm in classification performance, stability, number of selected features, and convergence speed. e disadvantage is that the experimental data is small, and the algorithm needs to be verified on large-scale datasets. Zhang et al. [40] mixed genetic algorithm and particle swarm optimization algorithm to conduct taboo search for the produced optimal initial solution, and the result of quadratic feature selection is the global optimal feature subset. e algorithm not only guarantees the good classification performance but also greatly reduces the false positive rate and false negative rate of classification results. e disadvantage is that the algorithm takes a large calculation cost and a long offline training time.

Krill Herd (KH) Algorithm and Variants. Krill herd (KH)
algorithm is a new swarm intelligence optimization method based on population proposed by Gandomi and Alavi in 2012 [41]. e algorithm studies the foraging rules and clustering behavior of the herding of the krill swarms in nature and simulates the induced movement, foraging activity, and random diffusion movement of KH. Meanwhile, it obtains the optimal solution by continuously updating the position of krill individuals.
Abualigah et al. introduced a multicriteria mixed function based on the global optimal concept in the KH algorithm and applied it to text clustering [5]. By supplementing the advantages of local neighborhood search and global wide area search, the algorithm balances the exploitation and exploration process of krill herd. In [42], the influence of excellent neighbor individuals on the krill herd during evolution is considered and an improved KH algorithm is proposed to enhance the local search ability of the algorithm. In [43], a hybrid data clustering algorithm (IKH-KHM) based on improved KH algorithm and k-harmonic means was proposed to solve the problem of sensitive clustering center of K-means algorithm.
is algorithm increases the diversity of KH by alternately using the random walk of Levi flight and the crossover operator in the genetic algorithm. It improves the global search ability of the algorithm and avoids the phenomenon of premature convergence of the algorithm to some degree. e simulation experiments of the 5 datasets in the UCI database show that the IKH-KHM algorithm overcomes the noise sensitivity problem to a certain extent and has a significant effect on the optimization of the objective function. However, its slow recovery speed results in a high time cost of the algorithm. In 2017, Li and Liu adopted a combined update mechanism of selection operator and mutation operator to enhance the global optimization ability of the KH algorithm. ey solved the problem of unbalanced local search and global search of the original KH algorithm [44].
For enhancing the global search ability of KH algorithm, a global search operator improved KH algorithm was proposed by Jensi and Jiji [9] and applied to data clustering. e algorithm continuously searches around the original area to guide the krill herd to the global optimal movement. It defines a new step size formula, which is convenient for krill individuals to fine tune their position in the search space. At the same time, the elite selection strategy is introduced into the krill herd update process, which is helpful for the algorithm to jump out of the local optimal solution. Experimental results show that the improved KH algorithm has higher accuracy and better robustness.
In [45], Wang et al. proposed a stud KH algorithm. e method adopts a new krill herd genetics and reproduction mechanism, replacing the random selection in the standard KH algorithm with columnar selection operator and crossover operator. To balance the exploration and exploitation abilities of the KH algorithm, Li et al. proposed a linear decreasing step KH algorithm [46]. In the algorithm, the step size scaling factor is improved linearly, which makes it decrease with the increase of iteration times, thereby enhancing the search ability of the algorithm.
Although KH algorithm and its enhanced version show better performance than other swarm intelligence algorithms, there are still deficiencies such as unbalanced exploration and exploitation. In this paper, to minimize the number of selected features and achieve high classification accuracy, both parameters are introduced into the fitness evaluation function. e physical diffusion motion of krill individuals is nonlinearly improved to dynamically adjust the random diffusion amplitude to accelerate the convergence rate of the algorithm. At the same time, a linear nearest neighbor lasso step optimization is performed on the basis of updating the position of the krill herd, which effectively enhances the global exploration ability. It helps the algorithm achieve better performance, reduce the data dimension of feature selection, and improve the efficiency of intrusion detection.

Algorithm Design
In this section, we first provide a brief description of the KH algorithm; subsequently, we present an improved version of KH, named LNNLS-KH, to address the problem of large number and high dimension in feature selection of intrusion detection.

Standard KH Algorithm.
e framework of KH algorithm is shown in Figure 3. It includes three actions of krill individual, crossover operation, and updating position and calculating the fitness function. Krill individuals change their position according to three actions after completing initialization. en, the crossover operator is executed to complete the position update and the new fitness function is calculated. If the number of iterations does not reach the maximum, krill individuals repeat the process until the iteration is completed.
As a novel biologically inspired algorithm for solving optimization tasks, the KH algorithm expresses the possible solution of the problem with each krill individual. By simulating the foraging behavior, the krill herd position is continuously updated to obtain the global optimal solution. e motions of krill individuals are mainly affected by the following three aspects: (1) Movement induced by other krill individuals (2) Foraging activity (3) Physical diffusion motion e KH algorithm adopts the Lagrange model to search in multidimensional space. e position update of krill individuals is shown as follows: where X i � X i,1 , X i,2 , . . . , X i,NV , N i is the movement induced by other krill individuals, F i is the foraging activity of krill individual, and D i is random physical diffusion based on density region.

Movement Induced by Other Krill
Individuals. e movement induced by other krill individuals is described as follows: where N max is the maximum induction velocity of surrounding krill individuals and it is taken 0.01(ms − 1 ) [5], ω n represents the inertial weight in the range [0, 1], N old i is the result of last motion induced by other krill individuals, α local i is a parameter indicating the direction of guidance, and α target i is the direction effect of the global optimal krill individual.
α local i is defined as follows: where K best and K worst are the best and worst fitness value of krill herd, K i is the fitness value of ith krill individual, K j represents the fitness value of ith neighbor krill individual (j � 1, 2, . . . , NN), andNN represents the total amount of neighbors. e ε at the denominator position is a small positive number to avoid the singularity caused by zero denominator.
When selecting surrounding krill individuals, the KH algorithm finds the number of nearest neighbors to krill individual ith by defining the "neighborhood ratio." It is a circular area with krill individual i th as the center and perception distance d s,i as the radius. d s,j is described as follows: where N is the amount of krill individuals and X i and X j represent the position of ith and jth krill individuals. α target i is defined as follows: where C best is the effective coefficient between ith and global optimal krill individuals: where I is the number of iterations, I max is the maximum number of iterations, and rand is a random number between [0, 1], which is used to enhance the exploration ability.

Foraging Activity.
Foraging activity is affected by food distance and experience of food location, and it is described as follows: where V f is foraging speed and it is taken 0.02(ms − 1 ) [41], ω f is inertia weight in the range [0, 1], and β i indicates foraging direction and it consists of food induction direction β food i and the historically optimal krill individual induction direction β best i . e essence of food is a virtual location, using the concept of "centroid." It is defined as follows: (1) e induced direction of food to i th krill individual is expressed as follows: where C food is the food coefficient, and it is determined as follows: (2) e induced direction of historical best krill individual to i th krill individual is expressed as follows: where K i.best represents the historical best individual influence on ith krill individual.

Physical Diffusion Motion.
Physical diffusion is a stochastic process. e expression is as follows: where D max is the maximum diffusion velocity in the range [0.002, 0.010](ms − 1 ). According to [41], it is taken Security and Communication Networks 7 0.005(ms − 1 ). δ represents the random direction vector and the value is taken the random between [− 1, 1].

Crossover.
Crossover operator is an effective global optimization strategy. An adaptive vectorization crossover scheme is added to the standard KH algorithm to further enhance the global search ability of the algorithm [41]. It is given as follows: where r is a random number and r ∈ [1, 2, . . . , i − 1, i + 1, . . . , N], X i,m represents the mth dimension of the ith krill individual, X r,m represents the m th dimension of the rth krill individual, and Cr is the crossover probability, which decreases as the fitness increases and the globally optimal crossover probability is zero.

Movement Process of KH Algorithm.
Affected by the movement induced by other krill individuals, foraging activity, and physical diffusion, the krill herd changed its position towards the direction of optimal fitness. e position vector of [t, Δt] krill individual in interval [t, Δt] is described as follows: where Δt is the scaling factor of the velocity vector. It completely depends on the search space: where NV represents the dimension of decision variables, LB j and UB j the upper and lower bounds of the j variable, j � 1, 2, . . . , NV, and C t is the step scaling factor in the range [0, 2].

e LNNLS-KH Algorithm.
In view of the weakness of the unbalanced exploitation and exploration ability of KH algorithm, we propose the LNNLS-KH algorithm for feature selection to improve the performance and pursue high accuracy rate, high detection rate, and low false positive rate of intrusion detection. e improvement is reflected in the following three aspects.

A New Fitness Evaluation Function.
To improve the classification accuracy of feature subset detection, we introduce the feature selection dimension and classification accuracy into fitness evaluation function. e specific expression of fitness is as follows: where α ∈ [0, 1], which is a weighting factor used to tune the importance between the number of selected features and classification accuracy. Feature selected is the number of selected features, Feature all represents the total number of features, and Accuracy indicates the accuracy of classification results. Moreover, k-nearest neighbor (KNN) is used as the classification algorithm and the classification accuracy is defined as follows: where TP, TN, FP, and FN are defined in the confusion matrix, as shown in Table 2.

Nonlinear Optimization of Physical Diffusion
Motion. e physical diffusion of krill herd is a random diffusion process. e closer the individuals are to the food, the less random the movement is. Due to the strong convergence of the algorithm, the movement of krill individuals presents a nonlinear change from quickness to slowness, and the fitness function gradually decreases with the convergence of the algorithm. According to equations (2) and (9), the movement induced by other krill individuals and foraging activity are nonlinear. In the physical diffusion equation (14), the diffusion velocity D i of ith krill individual decreases linearly with the increase of iteration times. In order to fit the nonlinear motion of krill herd, we introduce the optimization coefficient λ and the fitness factor μ fit of krill herd into the physical diffusion motion. e optimized physical diffusion motion expression is defined as follows: where λ is in the range of [0, 1] and μ fit is defined as follows: where K best is the fitness value of the current optimal individual and K i represents the fitness value of ith krill individual. As the number of iterations increases, K i gradually decreases until approaches K best . erefore, μ fit is in the range of (0, 1]. Introduce the fitness factor μ fit into equation (20) to get the new physical diffusion motion equation: According to equation (22), the number of iterations is I, the fitness K i of krill individual, and the fitness K best of the current optimal krill individual jointly determine the physical diffusion motion, so as to further adjust the random diffusion amplitude. In the early stage of the algorithm iteration, the number of iterations is small and the fitness value of the individual is large, so the fitness factor is small, which is conducive to a large random diffusion of the krill herd. As the number of iterations gradually increases, the algorithm converges quickly and the fitness of krill individuals approaches the global optimal solution. At the same time, the fitness factor increases nonlinearly, which makes the random diffusion more consistent with the movement process of krill individual.
To further evaluate the effect of the KH algorithm for nonlinear optimization of physical diffusion motion (NO-KH), we conducted experiments on two classical benchmark functions. F 1 (x) is the Ackley function, which is a unimodal benchmark function. F 2 (x) is the Schwefel 2.22 function, which is a multimodal benchmark function. e experimental parameters of F 1 (x) and F 2 (x) are shown in Table 3. Figure 4 shows the Ackley function and the Schwefel 2.22 function graphs for n � 2. We use standard KH algorithm and NO-KH algorithm to find the optimal value on the unimodal benchmark function and multimodal benchmark function, respectively. e number of krill and iterations are set to 25 and 500. Table 4 shows the best value, worst value, mean value, and standard deviation, which are obtained by running the algorithms 20 times. We can see that compared with standard KH algorithm, NO-KH algorithm searches for the smaller optimal solutions on both the unimodal benchmark function and multimodal benchmark function, and its global exploration ability is improved. e smaller standard deviation obtained from repeated experiments shows that NO-KH algorithm has better stability. erefore, nonlinear optimization of physical diffusion motion of KH algorithm is effective. e above analysis shows introducing the optimization coefficient λ and the fitness factor μ fit into the physical diffusion motion of the krill herd is conducive to dynamically adjusting the random diffusion amplitude of the krill individuals and accelerating the convergence speed of the algorithm. Meanwhile, it increases the nonlinearity of the physical diffusion motion and the global exploration ability of the algorithm.

Linear Nearest Neighbor Lasso Step Optimization.
When KH algorithm is used to solve the multidimensional complex function optimization problem, the local search ability is weak and the exploitation and exploration are difficult to balance. For enhancing the local exploitation and global exploration abilities of the algorithm, the influence of excellent neighbor individuals on the krill herd during evolution is considered and an improved KH algorithm is proposed in [42]. e algorithm introduces the nearest neighbor lasso operator to mine the neighborhood of potential excellent individuals to improve the local search ability of krill individuals, but the random parameters introduced in the lasso operator increase the uncertainty of the algorithm. To cope with the problem, we introduce an improved krill herd based on linear nearest neighbor lasso step optimization (LNNLS-KH) to find the nearest neighbor of krill individuals after updating individual position and linearly move a defined step to derive better fitness value. With introducing the method of linearization, the nearest neighbor lasso step of the algorithm changes linearly with iteration times, accordingly balancing the exploitation and exploration ability of the algorithm. In the early iteration, the large linear nearest neighbor lasso step is selected to facilitate the krill individuals to quickly adjust their positions, so as to improve the search efficiency of algorithm. In the later stage of iteration, the nearest neighbor lasso step decreases linearly to obtain the global optimal solution.
In krill herd X � X 1 , X 2 , . . . , X n , assuming that j th krill individual is the nearest neighbor of i th krill individual, the Euclidean distance between two krill individuals is defined as follows: where X i , X j ⊂ S and i ≠ j. e equation of linear nearest neighbor lasso step is defined as follows: e fitness function is expressed as equation (18). erefore, the smaller fitness value means that the number of feature selection is less under the condition of higher accuracy, i.e., the position of krill individual is better. e schematic diagram of LNNLS-KH is shown in Figure 5. e new position Y k of j th krill individual is expressed as follows: Considering that the ith and krill jth individuals move to both ends of the food, the new position Y k will be far from the optimal solution after the linear neighbor lasso step optimization processing, as shown in Figure 6.     The position of food The position of krill X i The position of new krill Y i after LNNLS The distance between two krills The length of LNNLS Food Figure 5: Optimization of linear nearest neighbor lasso step for krill individuals at the same end of food.
The position of food The position of krill X i The position of new krill Y i after LNNLS The distance between two krills The length of LNNLS Figure 6: Optimization of linear neighboring lasso step for krill individuals at both ends of food. e pseudocode of LNNLS-KH algorithm is shown in Algorithm 1.

Analysis of Time Complexity.
In KH algorithm, each krill individual updates its position after movement which is induced by other krill individuals, foraging activity, and physical diffusion motion, with the time complexity of O(N). After I max iterations, the time complexity of the algorithm is O (I max · N). In LNNLS-KH algorithm, the modified fitness function and the nonlinear optimization of physical diffusion motion hardly perform additional calculations, so the time complexity is not changed. In addition, the linear nearest neighbor lasso step optimization process of the algorithm adds the calculations of equations (24) and (25) after the krill individual completes the position update during iteration, and the time complexity is O (I max · N). erefore, the total time complexity of the LNNLS-KM algorithm is O(2I max · N).

Description of the LNNLS-KH Algorithm for IDS Feature
Selection. IDS is a system to recognize and process malicious usage of computers and network resources. e intrusion detection dataset records normal and abnormal traffic, including network traffic data and types of network attacks, and provides data support for the research and development of intrusion detection technology. IDS is generally composed of data acquisition, data preprocessing, detection units, and response actions, as shown in Figure 7.
e LNNLS-KH algorithm is used to select the highquality feature subsets of IDS. e features of the intrusion detection dataset are randomly initialized to different real numbers in the range of [0, 1], which constitute the position vectors of the krill herd. By calculating the fitness function and carrying out the LNNLS-KH algorithm, the position vectors of the krill herd are constantly updated. e fitness function is determined by the number of feature selection and the accuracy of classification, so the position vectors of the krill herd move toward the optimal fitness value. According to [47], it is appropriate to set the feature selection threshold to 0.7. When the maximum number of iterations is reached, the position vector of the krill population larger than the threshold is selected. e selected features constitute the feature subset of intrusion detection data. Furthermore, selected feature subset is sent to the detection units. In view of the K-Nearest Neighbor (KNN) algorithm which is relatively mature in theory, the detection units adopt KNN algorithm to construct intrusion detection classifier. Finally, the intrusion detection results are evaluated through test dataset. e process of LNNLS-KH algorithm for IDS feature selection is shown in Figure 8

Results and Discussion
To verify the performance of the LNNLS-KH algorithm in IDS feature selection, we adopt the NSL-KDD network intrusion detection dataset and the CICIDS2017 dataset for experiments.

Datasets Analysis.
e NSL-KDD dataset is a classic dataset that has been used in the field of anomaly detection. As an improved version of the KDD CUP 99 dataset, it is currently one of the most reliable and influential intrusion detection datasets. Compared with the KDD CUP 99 dataset, the NSL-KDD dataset eliminates duplicate data, so the dataset hardly contains redundant records. Meanwhile, the proportion of each type of record in the NSL-KDD dataset has been adjusted to make the proportion of each type of data reasonable. Each record in the NSL-KDD dataset includes 41-dimensional features and a classification label. KDDTraint+ and KDDTest+ in the NSL-KDD dataset are selected as the training subset and the test subset. e types of attacks are divided into four types: denial of service (DoS), scan and probe (Probe), remote to local (R2L), and user to root (U2R). e detailed attack names and distribution of sample categories are shown in Tables 5 and 6. e features of NSL-KDD dataset are shown in Table 7.
e NSL-KDD dataset includes four types of features, which are the basic features of TCP connections (9 in total), the contents of TCP connections (13 in total), the time-based network traffic statistics (9 in total), and the host-based network traffic statistics (10 in total). Among all the features, "Protocol_type," "service," and "flag" are features of character types, which need to be preprocessed and mapped to ordered values. Because the mixed data types of numeric and character are difficult to deal with, the one-hot encoding is used to map different characters to different values. For example, the "Protocol_type" feature includes three types of protocol denoted by icmp � [1, 0, 0], tcp � [0, 1, 0], and udp � [0, 0, 1]. Similarly, the 70 attributes in "service" and the 11 attributes in "flag" are also numeralized in the same way. e 41-dimensional feature is expanded to 122-dimensional after one-hot encoding. At the same time, the dataset is normalized to eliminate the influence of features of different orders of magnitude on the calculation results, thus reducing the experimental error. e data preprocessing is helpful to improve the accuracy of classification and ensure the reliability of the results.
e values corresponding to each feature are normalized to the interval [0, 1], and the normalization expression is as follows: where X * is the normalized eigenvalue, X is the original eigenvalue, and X max and X min represents the maximum and minimum values in the same dimension feature. Although NSL-KDD is a benchmark dataset in the field of network intrusion detection, some of the attack types are outdated due to the rapid development of network technology. erefore, it hardly reflects the current real-network environment. CICIDS2017 is a novel network intrusion detection dataset released by the Canadian Institute for e dataset collected traffic data for five days, with only normal traffic on Monday and attacks occurring in the morning and afternoon from Tuesday to Friday. It includes "FTP patator," "SSH patator," "DoS GoldenEye," "DoS Slowhttptest," "Dos Slowloris," "Heartbleed," "Web Attack Brute Force," "Web Attack Sql Injection," "Web Attack XSS," "Infiltration Attack," "Bot," "DDoS," and "PortScan," which are common types of attacks  in modern networks. e distribution of attack time and types of CICIDS2017 dataset is shown in Table 8. We use the MachineLearningCVE file in the CICIDS2017 dataset as the dataset, which contains 78 features and an attack type label. e number and name of the feature are shown in Table 9. Compared with the NSL-KDD dataset, the attack types in the CICIDS2017 dataset are more in line with the situation of modern networks.

Experimental Results and Discussion of NSL-KDD Dataset.
e experiment is conducted in MATLAB R2016a on Windows 64 bit operating system, with the processor of Intel (R) core (TM) i7-4790. Since the training of the algorithm requires normal and abnormal samples, we mix normal samples and different types of attack samples to construct train sets and test sets of four different attack types. In order to reduce the time of searching the optimal feature Input: Training set Output: Global best solution, the number of selected features, and feature selection time (1) Begin: (2) Initialize algorithm parameters: N max , V f , D max , NV, I max , UB, LB (3) Initialize the krill herd position (4) Evaluate the fitness of krill individuals and find the individuals with the best and worst fitness values (5) for I � 1 to I max do (6) for each krill individual i(i � 1, 2, . . . , m) do (7) Calculate the three components of motion: (8) (1) e motion induced by other krill individuals (9) (2) e foraging activity (10) (3) e nonlinear optimized physical diffusion (11) Implement crossover operator (12) Update krill herd position and fitness values (13) Calculate the linear nearest neighbor lasso step and new position using equations (24) and (25), and update new fitness values. (14) if Kyk > Ki or (Kj) (16) Leave Ki or (Kj) and delete Kyk (17) else (18) Leave Kyk and delete Ki or (Kj) (19) end if (19) end for (20) Update Xgb and Kgb of the globally optimal individuals (21) end for (22) Output the global best solution, the number of selected features and feature selection time (23) End ALGORITHM 1: e LNNLS-KH algorithm.  For the LNNLS-KH algorithm, the maximum number of iterations I max and quantity of krill individuals N are set to be 100 and 30, respectively. In [41], the foraging speed of krill individuals V f is set to be 0.02, the maximum random diffusion rate D max is set to be 0.05, and the maximum induction speed N max is set to be 0.01. In [47], the threshold θ is set to be 0.7. As the LNNLS-KH algorithm is preferentially designed to ensure high accuracy and posteriorly reduce the number of features, the weight factor α in fitness function is set to be 0.02: We adopt the iterative curve of global optimal fitness value, feature selection time, test set detection time, data dimension after feature selection, classification accuracy, detection rate (DR), and false positive rate (FPR) as evaluation measures of feature selection for IDS. e accuracy represents the ratio of the correctly classified samples to the total number of samples, which is defined as equation (19). FPR is also known as false alarm rate (FAR), which represents the ratio of samples that are incorrectly detected as intrusions to all normal samples, as shown in  equation (27). DR, also known as recall or sensitivity, represents the probability of being correctly detected in all abnormalities, as shown in equation (28). e crossovermutation PSO (CMPSO) algorithm [47], ACO algorithm [48], KH algorithm [41], and IKH algorithm [9] are set to be comparative experiments. e experimental results of Probe, DoS, R2L, and U2R dataset are shown as follows.
For reflecting the performance of the LNNLS-KH algorithm intuitively, the convergence curves of fitness function for Probe, DoS, U2R, and R2L datasets are shown in Figure 9.
In conclusion, LNNLS-KH feature selection algorithm performs excellent in the global optimal fitness iteration curve, test set detection time, number of dimensions of feature subset, classification accuracy, false positive rate, and detection rate. Although the offline training time of the LNNLS-KH algorithm is longer than the CMPSO, ACO, KH, and IKH algorithms, its lower feature dimension reduces the detection time. Moreover, the algorithm has faster convergence speed, higher detection accuracy, and lower classification false positive rate and detection rate.

Experimental Results and Discussion of CICIDS2017
Dataset.
e experiment is conducted in MATLAB R2016a on Windows 64 bit operating system, with the processor of Intel (R) core (TM) i7-4790. e MachineLearningCVE file in the CICIDS2017 dataset includes 8 csv files of all traffic data, which contain 78 features plus an attack type tag by removing some duplicate features. We annotate traffic records according to different attack periods and types and standardize and normalize the dataset. Due to the excessive amount of data contained in the analyzed CSV file, problems such as excessively long time consuming and slow convergence rate of the model will occur when the host is used for model training. erefore, we simplified and reintegrated these CSV data files while preserving the original attack timing features. We selected a total of 12090 records and 5 types of traffic, including 1 type of normal traffic and 4 types of attack traffic, respectively: "DoS," "DDoS," "PortScan," and "WebAttack." e data are randomly divided into training sets and test sets in a 2 : 1 ratio with independent and repeated experiments.
CMPSO, ACO, KH, and IKH algorithms are used as the comparison of LNNLS-KH algorithm. e preprocessed Normal, DoS, DDoS, PortScan, and WebAttack subsets are input into the algorithm model successively, and the dimension and feature subsets of feature selection are obtained. We adopt the KNN classification model as the classifier and get the accuracy of intrusion detection through test set data. e results of feature selection dimension for the CICIDS2017 dataset are shown in Table 14. According to different attack types, LNNLS-KH algorithm selects different features. For example, the selected features of DOS subset are "Total Length of Bwd Packets," "Fwd Packet Length Min," "Flow IAT Min," "FIN Flag Count," "RST Flag Count," "URG Packets/Bulk," "Bwd Avg Packets/Bulk," "Idle Mean" and "Idle Std." For WebAttack subset, "Total Fwd Packets," "Bwd IAT Max," "Bwd PSH Flags," "Fwd Packets/s," "Bwd Avg Packets/Bulk," "Subflow Fwd Bytes," "Active Max," and "Idle Max" are selected as attack features by LNNLS-KH algorithm. It reduces the feature dimension of IDS dataset while ensuring high accuracy. e average feature dimension selected by LNNLS-KH algorithm is 10.2, accounting for 13.08% of the total number of features in CICIIDS2017 dataset. It decreases the  number of features by 57.85%, 52.34%, 27.14%, and 25%, respectively, compared with the CMPSO, ACO, KH, and IKH algorithms. Figure 12 shows the feature selection time and intrusion detection time of 5 different feature selection algorithms to further evaluate the performance of the feature selection algorithm. It can be seen from Figure 12(a) that, in the feature selection stage, the LNNLS-KH algorithm consumes a long time in finding the optimal feature subset due to the linear nearest neighbor lasso step optimization after the position update of the krill herd. Compared with the KH and IKH algorithms, it increases the time by an average of 14.38% and 9.32%. Although the LNNLS-KH algorithm occupies more calculation time, the convergence speed and global search ability have been improved. Figure 12(b) shows the intrusion detection time of 5 different feature selection algorithms. It is the detection time of the sample dataset by the KNN classifier after the feature subset is searched, excluding the time of searching for the optimal feature subset. e feature dimension of LNNLS-KH algorithm is low and the amount of data processed in the classification of detection sample dataset is small, which result s in the reduction of classification detection time. Compared with the CMPSO, ACO, KH, and IKH algorithms, the intrusion detection time of the LNNLS-KH algorithm is reduced by 6.52%, 5.17%, 2.14%, and 2.28% on average. e selection results of CMPSO, ACO, KH, IKH, and LNNLS-KH algorithms are used as feature subsets, and the KNN classifier is used to detect the test dataset. e classification accuracy of different algorithms is shown in Table 15. For five types of subsets, the average classification accuracy of the proposed LNNLS-KH algorithm is 95.86%. In particular, the classification accuracy reached 97.55% for the PortScan subset. Compared with the other four feature selection methods, the LNNLS-KH algorithm has an average increase of 3.11%, 8.52%, 8.58%, 2.45%, and 4.29% on the   Normal, DoS, DDoS, PortScan, and WebAttack subsets, respectively. Table 16 shows the classification FPR and DR of different feature selection algorithms on the test sets. Based on the detection of five different test sets, the LNNLS-KH algorithm has lower FPR and higher DR than other four algorithms.
We propose the LNNLS-KH algorithm, a novel feature selection algorithm for intrusion detection. Experiments based on NSL-KDD and CICIDS2017 datasets show that the algorithm has good feature selection performance and improves the efficiency of intrusion detection.

Conclusions
With the rapid development of network technology, intrusion detection plays an increasingly important role in network security. However, the "dimensional disaster" was caused by massive data results in problems such as slow response and poor accuracy of the intrusion detection system. KH algorithm is a new swarm intelligence optimization method based on population, which shows good performance in high-dimensional data processing, providing a new approach for reducing the dimension of intrusion detection data and selecting useful features. In this paper, an improved KH algorithm, named LNNLS-KH, is proposed for feature selection of IDS datasets by linear nearest neighbor lasso optimization. e LNNLS-KH algorithm introduces a new fitness function which is composed of the number of feature selection dimensions and classification accuracy. Nonlinear optimization is introduced into the physical diffusion motion of krill individuals to accelerate the convergence speed of the algorithm. Moreover, the linear neighbor lasso step optimization is proposed to balance the exploration and exploitation abilities and obtain the global optimal solution of the feature subset effectively. Experiments based on NSL-KDD and CICIDS2017 datasets show that the LNNLS-KH algorithm retains 7 and 10.2 features on average, which greatly reduces the dimension of the features. In the NSL-KDD dataset, features are reduced by 44%, 42.86%, 34.88%, and 24.32% compared with CMPSO, ACO, KH, and IKH algorithms. And in the CICIDS2017 dataset, they are reduced by 57.85%, 52.34%, 27.14%, and 25%, respectively. In addition, the classification accuracy of the LNNLS-KH feature selection algorithm is increased by 10.03% and 5.39%, and the time of intrusion detection is reduced by 12.41% and 4.03% on the two datasets. Furthermore, LNNLS-KH algorithm enhances the ability of jumping out of the local optimal solution and shows good performance in the optimal fitness iteration curve, false positive rate of detection, and convergence speed, which demonstrated that the proposed LNNLS-KH algorithm is an efficient feature selection method for network intrusion detection.
In this research, we realized that the initialization of the LNNLS-KH algorithm has a certain degree of randomness. erefore, we conducted independent and repeated experiments to solve the problem, and the results were reasonable and convincing. Although the proposed algorithm shows encouraging performance, it could be further improved.
In future work, we consider using data balancing techniques to preprocess the experimental dataset to obtain more accurate feature selection results and stronger algorithm stability. Meanwhile, we will combine the LNNLS-KH with other algorithms to improve the exploration and exploitation abilities, thereby further shortening the time of training feature subset and classification detection. On the contrary, as the LNNLS-KH algorithm is universally applicable, the LNNLS-KH algorithm can be applied to more feature selection systems and solve optimization problems in other fields.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.