Comparing and Analyzing Applications of Intelligent Techniques in Cyberattack Detection

Now a day’s advancement in technology increases the use of automation, mobility, smart devices, and application over the Internet that can create serious problems for protection and the privacy of digital data and raised the global security issues. +erefore, the necessity of intelligent systems or techniques can prevent and protect the data over the network. Cyberattack is the most prominent problem of cybersecurity and now a challenging area of research for scientists and researchers.+ese attacks may destroy data, system, and resources and sometimes may damage the whole network. Previously numerous traditional techniques were used for the detection and mitigation of cyberattack, but the techniques are not efficient for new attacks. Today’s machine learning andmetaheuristic techniques are popularly applied in different areas to achieve efficient computation and fast processing of complex data of the network. +is paper is discussing the improvements and enhancement of security models, frameworks for the detection of cyberattacks, and prevention by using different machine learning and optimization techniques in the domain of cybersecurity. +is paper is focused on the literature of different metaheuristic algorithms for optimal feature selection and machine learning techniques for the classification of attacks, and some of the prominent algorithms such as GA, evolutionary, PSO, machine learning, and others are discussed in detail. +is study provides descriptions and tutorials that can be referred from various literature citations, references, or latest research papers. +e techniques discussed are efficiently applied with high performance for detection, mitigation, and identification of cyberattacks and provide a security mechanism over the network. Hence, this survey presents the description of various existing intelligent techniques, attack datasets, different observations, and comparative studies in detail.


Introduction
An excessive use of the Internet in various areas are encouraging the researchers and scientists to use intelligent systems that can support the users, and different applications also ensure efficient computation, maintaining the quality of service over the network. e traditional methods were time-consuming, less efficient, and giving average performance and cannot fit for providing the solution for complex, multiobjective, or real-world problems. Hence, the necessity of efficient attack detection systems can reduce the harmful effects of cyber threats [1]. Cybersecurity is the collection of various technologies and security mechanisms that develop for the protection of data, information, network, a program from the different attack activities such as data modification, stealing, unauthorized access, and destruction over the Internet or network. Cybersecurity components concern mainly host protection and network security systems [2]. Currently it is used to protect many areas such as cloud computing [3], wireless sensor network [4], and IoT. ere are a lot of security measures which are available for providing security to the systems or networks such as antivirus, firewall, and IDS. However, still, cyber threats continuously harm and disrupt Internet services every day. is motivates many researchers for providing their extensive contribution to design security systems [5][6][7][8][9]. e following are the popular cyberattacks such as denial of service attack, distributed denial of service attack [10], remote to local attack, probing, user to root attack, adversarial attacks, poisoning and evasion attacks [11], botnet [7], phishing attack [12], spamming [13], and zero-day attack [6].
Many different methods are used for attack detection that broadly categorized into three major categories such as anomaly-, misuse-, and hybrid-based detection. Misusebased detection can be scanned by prestored attack signatures and mostly used to detect identified attacks. It is useful to detect known attacks with minimum false alarms. It requires a certain modification of the signature and rules of attacks on the database. e anomaly-based technique is capable to detect both the attack types either known or unknown. It can capture network and host machine behavior and also determines anomalies as deriving from normal behavior. It is the most popular method because it can detect zero-day attacks. ere are many merits of using this method, and one of them is the customization of profiling actions due to which attackers get confused about which activity they follow to enter and remain undetected. However, besides the merits, there is a drawback also it evaluates with very high false alarm rates and sometimes the legitimate activity considered as an anomaly.
Another is a hybrid technique, a fusion of anomaly-and misuse-based detection. It supports high performance in the detection phase and a minimum false alarm rate.
Here presents some existing research that is lighting the contribution of machine learning and metaheuristic techniques in cyberattack detection, especially focusing on better classification and optimal feature extraction also with their results. Tu et al. [14] proposed hybridization of PSO and SVM for feature extraction, and in this method, fitness function of PSO is used for classification.
Athari and Borna [15] proposed a hybrid metaheuristic particle swarm intelligence, genetic algorithm (GA), and glowworm which are collectively used for classification and optimal feature extraction in the wireless sensor network. e purpose of using the metaheuristic algorithm is to solve the problems such as low convergence and low local optimality. In this paper, certain parameters were calculated as permittivity against DoS attack, reliability, number of active nodes, and energy consumption. e results shown by PSO have the highest permittivity, reliability, and larger number of active nodes compared to the genetic algorithm and glowworm optimization technique for DoS attack, GA has less permittivity, reliability, and number of active nodes than PSO and GSO, and energy consumption of GA is very low compared with the above two techniques because of its simplicity. e bioinspired algorithms are popularly used for the optimal feature selection and solving optimization problems in different fields compared with data mining techniques that were previously used in classification and feature selection in different applications such as pattern recognition, intrusion detection, clustering, and data classification.
Sagarin and Taylor [16] proposed a biological evolutionary system for providing better approaches in the field of security. Jamali and Shaker [17] proposed a metaheuristic approach for recognizing denial of service attack type on TCP protocol called TCP SYN flood attack that requests for TCP connections in the form of huge flood request to the server.
is attack detection framework is designed by particle swarm optimization (PSO) and the queuing model for optimally using the buffer space and solving attack recognition problem over the network.
Tarao and Okamato [18] used an artificial immune algorithm of the metaheuristic family to model the framework for DoS attack detection to overcome the vulnerabilities of the server-side. By this technique, the false alarm rate is minimized and detection performance is simulated by the machine learning approach. Metaheuristic algorithms are very efficiently used in cybersecurity for the implementation of the attack recognition framework with high learning capabilities. Bhattacharya et al. [19] proposed a hybrid principal component analysis and firefly-based model to classify intrusion detection system datasets.
e model performs one-hot encoding for the transformation of the attack datasets and then hybrid PCA-firefly algorithm used for dimensional reduction. e another XGBoost algorithm is used for classification of attacks.
is hybrid model perform well by achieving high accuracy of 99.9, sensitivity 93.1, and specificity 99. 9.
Visumathi and Shunmuganathan [20] proposed intelligent computational techniques such as SOM, SVM, multilayer perceptron (MLP), Bayesian network (BN), and logistic regression for the classification of attack data. Srinoy [21] proposed a hybrid combination of particle swarm intelligence for an optimized feature selection and support vector machine (SVM) that classify attack data. After the result is evaluated, it had been found that that the abovementioned hybrid technique can easily identify not only known attacks but also detect the early apprehensive activities that cause unknown attack. is method is efficiently solved feature selection problem and achieved detection rate of 96.11% with high classification accuracy.
Mourougan and Aramudhan [22] proposed a computational model for solving classification problems and extracting features by the hybrid combination of the PSO technique and the GA algorithm. e proposed model of attack detection can identify DoS attack with maximum detection accuracy and minimum false alarms by genetic particle swarm intelligence-based binding feature extraction that is mostly used for intrusion feature selection. Results are shown with maximum accuracy and minimum false alarms as compared with the fuzzy clustering technique.
Akyazi and Sima Uyar [23] proposed the model which was built on an anomaly-based intrusion detection method and using the artificial immune system (AIS) to improve multiobjective evolutionary algorithm, to get the better performance of the proposed model for the detection of DDoS attack tested on the DARPA-based LLDOS 1.0 dataset. e proposed model is applied iteratively for computation, and if we find the negative selection, then we redefine the objectives with the same concept. e zero-2 Security and Communication Networks percent false positive rate is found by applying the above approach, and hence, results show that the method is successful with better accuracy. Ben Sujitha and Kavitha [24] proposed a method that can efficiently detect cyberattack and provide better accuracy and efficiency. For performance improvement in the attack detection model, optimized features selection approach was used. e proposed system is built to provide an optimal feature extraction algorithm to construct summarized features applied to the multiobjective PSO algorithm. e anomaly detection method was applied, and the proposed system was tested on the KDDCUP99 intrusion dataset. e result of the proposed system shows that it can successfully deal with real-time attacks worked with high speed. e rest of the paper is arranged as follows: Section 2 focuses on important steps of classification and machine learning. Section 3 focuses on the detail of different metaheuristic algorithms used in cyberattack detection. Section 4 describes the different machine learning techniques used in cyberattack detection. Section 5 focuses on different datasets. Section 6 presents observations and evaluations. Section 7 presents challenges and future directions, and finally, Section 8 concludes the paper.

Important Steps of Classification and Machine Learning
Many techniques were previously used in knowledge discovery of database (KDD), especially data mining techniques such as clustering and data classification techniques. KDD is dealing with extracting useful information from the data source. In Figure 1, the various steps for extracting knowledge are data preparation, data selection, data cleaning, and extracting features or patterns from the data. According to Periyar and Salem [22], data preprocessing is a most essential step of machine learning computation that can remove noisy data such as repeated values, out-of-limit values, irrelevant data logics, checking null values, and missing terms or instances. e data preprocessing has certain steps such as learning, normalization, transformation, feature selection, and extraction. Outcomes of preprocessed data are input or works as the training sets to extract knowledge for the testing phase. e precision of any classifier depends on selecting the optimal feature upsets from the original data [22].
Feature selection is the most essential step of data preprocessing, used before the classification process [24].
is method is useful to reduction of some repeated data patterns and noisy and unnecessary features, which is very useful to achieve accuracy in classification and improves attack detection rate. It is the method of selecting some subset of the actual features and can generate different new features [24]. FS has to perform two basic objectives firstly to provide accuracy in classification performance and reduce the number of features. Complex datasets sometimes degrade classification performance in the attack detection process, and it can create problems such as irrelevant data and repeated features, uncertainty, and ambiguity. ese certain problems are obstacles not only in concern of detection speed but also in the performance of the detection process [24]. e approach comprises two major phases which are training phase and testing phase, and these phases are processed by using the following steps: (1) Identification of different features, attributes, or classes of data during the phase of reprocessing these attributes which are extracted from the data (2) Selection of attributes that is useful for the classification (3) Learning processed by the help of training data (4) Training the model used for the detection of unknown threats ese abovementioned are the various steps that are followed to process machine learning. In the training phase, signature-based classes are learned by using some training sets. In the testing phase, testing of new data is carried out by the classifier and they are checked whether they match with that class or not.
In another anomaly-based approach, the regular traffic data are defined in the training phase, this trained model is applied to the new data in the testing phase, and finally, testing sets are classified as a normal or malicious one.
In many research papers, machine learning (ML) is broadly categorised into three phases such as the first training phase, secondly testing phase, and the last validation phase. Machine learning has numerous methods for the training and testing process, some of the popular methods are artificial neural network (ANN) methods such as SVM, SOM, and multiple layer perceptron network (MLP), and these techniques have different parameters such as the number of layers, nodes, and processing units. When the training phase is completed, then a number of models are available and the selection of the model depends on its efficiency, accuracy, and error estimation. . ere are the following three types of ML methods which are broadly classified as supervised, unsupervised, and semisupervised [2]. When the model is trained by certain rules (training sets) and the data are well labeled, then it comes under supervised learning. Most of the supervised anomaly techniques were proposed using a support vector machine (SVM), multilevel perceptron network (MLP), and decision tree [25].
When some part of the dataset is labeled by preprocessing of data methods due to which the problems introduce, that comes under semisupervised learning. If the dataset is unlabeled, then some problems arise in extracting various attributes, classes, structures, and patterns from those data, and such problems come under unsupervised learning [2]. Which is the model or leaning method used depends on the problem that should be solved. Hence, according to the problem, the best-suited learning approach is used. Once the steps of the classification model are completed such as training, validation, and testing sets, hence the model is able to be preferred in the future for further problemsolving strategies.
ere are many machine learning methods which are available for solving any classification problem efficiently such as artificial neural network learning methods, both supervised or unsupervised learning techniques such as self-organizing map, linear logistic regression, and other feedforward neural network methods, naïve Bayes, support vector machine (SVM), and multilevel perceptron (MLP) classifiers. ese different methods were applied to different benchmarks and popularly used for solving classification problems such as well-known KDDCUP99 dataset for intrusion detection according to Srinoy et al. [21] using the anomaly-based approach with a hybrid form of PSO and SVM for the optimal feature selection and classification tasks.
According to Shinde and Parvat [26], using the NSL-KDD dataset, we apply the hybrid form of PSO and ABC on SVM for solving feature selection and classification problems to achieve high DR and low FAR. Prasad et al. [27] analyzed metaheuristic anomaly-based algorithms for realtime detection of application layer distributed denial of service attack successfully detected by using the hybrid combination of cuckoo search, bat, and firefly algorithm and proved to be an efficient technique by improving the parameters such as accuracy, efficiency, and performance analysis. Jadidi et al. [28] proposed multilevel perceptron (MLP) based on the anomaly attack detection method in a high-speed network. e PSOGSA and cuckoo algorithms based hybrid approach was used that ensures improved accuracy to classify abnormal traffic.
Akyazi and Sima Uyar [23] proposed a model for attack detection against the DoS attack by using the AIS algorithm based on anomaly detection and applied on the DARPA LLDOS 1.0 dataset that provides an efficient result, high TPR, and very low FPR. Hence, a multiobjective evolutionary algorithm is used inspired by AIS that is proved to be very effective for DDoS attack detection. Hence, machine learning methods are very popularly used for cyberattack detection and proved to be very efficient on various benchmarks. Today, for the better computational result, hybrid metaheuristic algorithms and ML approaches are used for optimal feature extraction, and in many classification problems, they specially deal with complex datasets.
In the above section of the paper discussing machine learning approaches with some current research studies, now after preprocessing, feature extraction, and classification of the data model we talk about the computational matrices of classification.
ere are several classification matrices which are used for machine learning in the attackdetection process. ese matrices are discussed below in this part. e evaluation can be done on four main parameters such as false positive (FP) attacks which are wrongly classified as attacks, true positive (TP) which shows that attacks are correctly classified, true negative (Tn) which shows that the system is correct in spotting normal conditions, and false negative (Fn) attacks which are correctly classified as attacks [29]. e following are the attack detection matrices based on the anomaly detection method: (i) To measure the overall performance, four matrices are broadly used such as accuracy, error rate (ER), miss rate (MR), and false alarm rate (FAR) [28,30]. In the attack detection mechanism during the classification of data, certain metrics are evaluated as false alarm rate (FAR) and true positive rate (TPR). Both of the abovementioned matrices are directly proportional to relationships with each other. Both of FAR and TPR are plotted with the help of receiver-operating characteristics with different axes, x-axis for FAR and y-axis for TPR, when FAR increases, then TPR increases, and if FAS falls, the TPR falls [18]. e overall performance is measured by certain matrices such as total detection accuracy (TDA) that can be evaluated as a total sum of correctly classified data items to the sum of samples. e average detection time (ADT) is calculated as the total detection time to the total sum of samples [31], performance, class detection rate, detection rate, or false positive rate. e performance matrices are also evaluated by recall or precision factors. Recall is measured as TP/TP + Fn and precision factor is measured as TP/(TP + Fp), other matrices measured based on precision and recall are F-measures, and weight mean acts as a tradeoff between the above two. e F-measures was measured as (2 * Recall * Precision)/ (Recall + Precision) [32].

Metaheuristic Approaches for Optimal Feature Selection
In this section of the paper, we discuss various methods of metaheuristics that are broadly used in many areas for solving different complex optimization problems [5].

Particle Swarm Optimization.
It is a widely used technique and one of the members of the swarm family which was firstly discussed by James Kennedy and Russell C. Eberhart in 1995 [14], and PSO is described in his first research paper "A New Optimizer Using Particle Swarm eory." It is an intelligent optimization technique and a member of a class called metaheuristics. Particle swarm intelligence is stimulated by the socialized behavior of animals such as bird flocking and fish schooling nature.
Particle swarm intelligence is a simple yet powerful optimizing algorithm and also successfully applied to the number of applications of different areas and broadly in fields of science and engineering. According to its concept, each solution in the search space is considered a particle. Taking information from its surrounding particle will be in motion [15]. Its prototype can be implemented with ease programming and economical in terms of both storage and speed. Its computation and working steps are similar to the evolutionary and genetic algorithms [17]. PSO has the strong global best minima, each particle of the population has some randomized positions, and every particle is attached with some velocity. e velocity of the particle is adjusted through some previous behavior of each particle and its neighbors while roaming around the search space. All the particles update their positions and velocity to the best optimal value by communicating with each other [24].
Each particle in the problem space has certain coordinates that are connected along with some fitness value. is value is noted as P best. e other value also used by the PSO is considered as the best value that occurred up to any particle in the neighbors of the particle. en, the particular location of the particle is noted as L best. As soon as a particle takes the population of its surrounding neighbors, the best value, called global best, is considered as best. Hence around this theory of particle swarm optimization every, time there is a change in velocity of each particle that value near its P best and L best locationsis selected [25,33].
Let us consider that each particle is categorized into two parameters which are position vector and velocity vector, denoted as (Y i (t)) and (U i (t)). e location and velocity of particle i at the iteration of t times can be presented as Hence, the performance of the process, each particle having its independent knowledge P best, means that they have their own best value in the position, and collective knowledge G best means best of its best neighbor. e velocity will be updated using formula (3) [33][34][35]. e velocity of particles can be calculated as where w is inertia weight and is denoted as random number that may be considered between 0 and 1, and C1 and C2 are the constant value that can change the velocity of a particle towards the P best and G best, and its value can be set to 2 [25,33,34]. Hence, equation (4) represents update positions as Particle swarm optimization is applicable to various cyberattack detection systems for optimal feature selection problems and providing optimal solution. Here, we discuss some cyberattack detection systems designed in reference papers presented by various authors that use particle optimization for optimal results. Jamali and Shaker [17]   Security and Communication Networks proposed a detection model against denial of service attacks which are very prominent attacks over the Internet that block the legitimate users for gaining the network services. A defensive framework is proposed which had used particle swarm optimization to formulate various optimization problems to optimally solving the problem and improving the performance of an attack detection system with efficient consumption of buffer space.
In Hao et al.'s study [36] for achieving pattern recognition on weblog data for differentiating the normal and abnormal or malicious data, the user sessions are extracted from the log record. k-means clustering techniques are applied with a hybrid form of particle swarm optimization to generate an efficient attack detection model.
is model successfully recognizes a DDoS attack with improved accuracy.
Momanyi Nyabuga et al. [37] provide defensive and preventive models against the denial of service (DoS) attack by applying the particle swarm intelligence algorithm in the VANET network. After the results, they found that PSO is efficient with high accuracy used for optimization in the attack detection system. Shinde and Parvat [26] proposed a framework focused on achieving a high rate of attack detection with minimum false alarms by applying a hybrid form of SVM and swarm intelligence for selecting correct parameters. SVM had enabled us to provide an efficient classification of attack data. e attack detection framework uses knowledge gaining for selecting features and combines with support vector machine classifier. e important features will be selected by using an optimization approach called particle swarm intelligence and applied on the NSL-KDD dataset, and outcomes found high DR and low FAR after compared by regular SVM.
Guoli [38] presented the attack detection model based on which PSO is proposed with the Elman neural network. is fusion is used for optimal parameterization which improves performance. e experimented results are better as compared with traditional techniques.

Genetic Algorithm.
Genetic algorithm (GA) is another metaheuristic technique that works on the concept of theory of evolution. e genetic algorithm (GA) is also called as population-based algorithms. ese bioinspired algorithms are based on the iterative or repeating operations, and its basic ideology is adapted from genetics. Genetic algorithms can be designed as a simulation model in which the population of samples (chromosomes) from solution candidates in optimization problems will lead to an improved solution. One of the main features of the genetic algorithm is that it constantly works on chromosomes and solution space [15]. e population is mainly a collection of chromosomes in which each chromosome represents a certain position in the problem domain and probably a solution to the problem. By applying genetic operators on each population, they result in creating a new population that can have the same number of chromosomes.
en, a fitness function is determined for them, using operators are selection, crossover, and mutation, these are generally used by genetic algorithms, and a new generation can be created. e number of generations such as chromosome population is determined in the algorithm initialization step and methods of setting the parameter.
In the selection phase of GA, several chromosomes are selected from the existing chromosomes in a population to reproduction. Best chromosomes have more chances to be selected for reproduction. Chromosomes that make the next generation are being selected by this operator: Equation (5) represents how to calculate the probability of each chromosome selection. p i represents the probability of selection of ith chromosome, f i is the fitness function value of the ith chromosome, and the dominator part of the equation shows the total amount of fitness function of all chromosomes. In equation (6), l i is the length of the ith chromosome. en, the crossover operator produces the child chromosome.
is operator produces two-parent chromosome genes in the new chromosome (child). e chromosomes which were selected from the initial population as a parent for crossover operation are obtained from e mutation operator also selects a gene from the chromosome arbitrarily and then alters the content of that gene.
e mutation operator guarantees that the genetic algorithm does not fall in the trap of local minimum point and covers all chromosomes which may be destroyed during the performance of other operations such as selection and crossover.
Genetic algorithms are based on the global search optima, and hence, they are efficiently used in the attack detection system; some of the following researchers use this technique for solving the optimization problem in attack data classification.
Siva Sankari et al. [39] proposed a model for the detection of a DoS attack also to observe the attacks over the Internet and predict the attack is DoS or not. In the proposed model, the genetic algorithm (GA) is used for optimizing features for optimal feature selection and identify DoS attacks. e GA is capable to learn the things itself and initiates the process of selection. It is used to generate optimal resolution for making the proper solution to complex problems. Genetic algorithms are implemented through the following steps such as selection, crossover, and mutation to find the optimal solutions. is approach is very accurate and efficient for identifying a DoS attack.
e proposed system results showed better performance, and it is capable of detecting a DoS attack with high accuracy.
Mizukoshi and Munetomo [40] proposed the system which is designed for attack detection by learning the attack patterns and other anomalous traffic. e proposed system works as a real-time traffic pattern analyzer using GA for detecting abnormal traffic behavior. e system is built on using Hadoop distributed infrastructure, and the result shows the effectiveness of the DDoS defense system.
Lee et al. [41] proposed an approach which provides a defensive mechanism against DDoS attack by using a traffic matrix. In this work, they proposed an improved attack detection model that enhances the traffic matrix construction process and some particular parameters were optimized using the genetic algorithm (GA). e experiments were tested on DARPA 2000 and LBL-PKT-4 datasets, and the results were evaluated which provides better detection accuracy with a high speed as compared with previous techniques.
Bhuyan et al. [42] provide an inclusive survey on detection or prevention against DDoS attack and also its detection techniques with its tools in the different networks. e article also discussed the different issues, various challenges, and feasible solutions in the concerned domain.
In Dimitris et al.'s study [43], the neural network detector was designed against the detection of DDoS attack. For selecting optimal features, a genetic algorithm is used that can extract 44 statistical features from the packet header. e computation is based on a genetic algorithm that creates an error-free neural network-based DDoS detector. e experimental results have shown the improved succeed features for DDoS detection with high accuracy.
Lee et al. [44] proposed an attack detection model by improving some parameters of traffic matrix through GA to achieve optimization that utilizes a high attack detection rate. e traffic matrix construction operation improved by hash function for minimizing the rate of collisions also used the packet-based window size to minimize cost. e evaluation is applied on DARPA 2000 LLDOS 1.0 and LBL-PKT-4 attack datasets. e proposed work has shown high feasibility in concern of attack detection accuracy and speed.

Ant Colony Optimization Algorithm.
It is a commonly used technique to resolve combinational optimization-based problems and belongs as a member of the metaheuristic family.
is algorithm works as an agent-based system, simulates the behavior of ants to develop a learning-based system. e ants preferred to move in a straight line for food searching and protecting themselves from different situations, and firstly, they decide to move from left to right randomly. en, some assumptions are taken such as the moving speed of each ant is the same and also depositing pheromone in the trail evenly. Hence, the ants prefer to move from left to right direction and will reach the food earlier, and pheromone accomplished the fast shortest path around the obstacles. While the other ants preferred to follow the way where they found the excess amount of chemical called pheromone, hence all the ants meet the target (source of food) through the shortest path. e ant colony optimization is quite different from the traditional ant system in concern with the pheromone trails which can be updated in two phases. Firstly, when ants decide a tour, they can change the quantity of pheromone locally around routed boundaries by a local updation in position. Secondly, when each of the ants decides their tour, a global updation is applied to adjust the pheromone amount in the boundaries which is considered as best ant tour [21]. Hence, this phenomenon of optimization is used by different research studies for solving optimization problems. Along with the various applications of ant colony optimization, it is also used in cyberattack detection models successfully. Here, we discuss some of the research papers of its contribution in attack detection models. Dimitris et al. [43] proposed an ant colony system-based (DDIACS) framework for identification and detection of a low-rate distributed denial of service (LDDoS) attack detection, another well-known attack over the network. e proposed detection model is built with ant colony optimization, which is another strong optimization algorithm used to resolve complex optimization problems.
e proposed framework has improved some parameters that are very complex while detecting multisource attacks such as flexibility, fast convergence, and robustness. is framework was tested upon the dataset DARPA and KDD.
e outcomes have shown that the proposed method has successfully overcome the problem or errors with high accuracy than existing models. e proposed model found more than 89% of the detection rate and 83% accuracy.
Aldwairi et al. [45] proposed an anomaly-based detection model for the detection of unknown attacks. ey proposed a model by using the ant colony optimization technique for selecting optimized features to improve the overall classification accuracy by rejecting unwanted features. In this proposed work, ant colony optimization of three levels of updating feature selection process had been proposed. is method efficiently used the information of each ant in the process of feature extraction and also improved the accuracy of the proposed system and classification of features. e evaluation results have shown that the proposed approach performed well as compared with previously used feature selection techniques.

Artificial Bee Colony.
Swarm intelligence is a kind of self-organized system that can solve different optimization problems. Artificial bee colony (ABC) is another prominent optimization technique that works by the concept of imitating the foraging technique of bee swarms, firstly predicted by Visumathi and Shunmuganathan [20]. e artificial bee colony algorithm worked in the following three basic steps: the first is food source; it is based on some important factors such as the amount and quality of nectar, the total efforts for its extraction, and nearest to the colony. Secondly, foragers, employed foragers grasp information of food sources, and the third one is unemployed foragers continuously looking for food sources and broadly categorised into two types which are scout bees and onlooker bees. e whole process of searching food starts with scout bees sent for searching food sources in the colony in a random distribution manner. While the scout bees return, the food sources are rated by some threshold value and perform waggle dance [21]. e waggle dance is a unique interaction way and also helps to Security and Communication Networks determine the food source direction through the nectar amount that represents fitness value. Onlooker bees choose the best food source by collecting all the information that is exposed by the waggle dances. is information helps them to reach the best sources without the help of any maps [46]. e following authors used artificial bee colony (ABC) in the attack-detecting system for solving optimization problems. Mahale and Gothawal [47] proposed an ABC algorithm to optimize some attributes of the artificial neural network, improve local optima problem, and also overcome low convergence speed of the neural network. e ABC algorithm can be efficiently used for finding the optimal solutions in minimum time. In this research work, the proposed algorithm was applied for attack detection and the evaluation outcome shows that the proposed method had performed and improved in some parameters such as DR and efficiency.
Priyadharshini and Kuppusamy [48] proposed an attack detection model based on anomaly detection techniques that detect attacks and improve performance by low false alarms.
is proposed work is based on the ABC algorithm by anomaly-based attack detection with a feature extraction technique to optimize some attributes for the classification. e experiments were performed on the KDDCUP99 dataset, and results were evaluated by calculating some parameters such as accuracy and speed. After evaluation, the accuracy rate was noted 97.5% for the known attack, and for unknown attack, it was noted as 93.2%.

Cuckoo Algorithm.
A cuckoo algorithm is one of the optimal search algorithms inspired by the holoparasite act of cuckoo birds. e birds of these types are not able to complete their reproduction phase by lacking proper host, and these birds can lay their eggs to the nest of the birds that contain eggs that look like them, which means they place their eggs inside the nest of other similar birds. e searching approach followed by the bird is acceptable in different areas for solving different optimization problems. Cuckoo search is applied with three traditional rules: firstly, randomly search location of host nest for placing eggs; secondly, the nest that contains similar eggs as compared to a cuckoo egg; third, the finite number of nests that is considered as 15 for cuckoo search. Hence, the probability P can be taken for its eggs as an object which is represented as {P (a)∃a∈(0, 1)}. e following authors used the cuckoo algorithm to achieve optimization in the attack detection model.
Hao et al. [49] proposed security against denial of service attack by using a cross-layer approach as the best solution. e cross-layer approach was the combined form of device-driver packet filter (cuckoo-based filter) and remotely firewall. Packet filter was designed to filter out abnormal network traffic before it utilizes the resource for higher network protocol layers at a server-side. e performance of the proposed technique was checked through wide-ranging simulated by java and performs better for DDoS attack detection.
3.6. Bacterial Foraging Algorithm. BFO technique is stimulated by a collection of forage behavior of bacteria such as E. coli and M. xanthus. Particularly, the BFOA algorithm based on the chemotaxis behavior of bacteria can determine chemical gradients and move toward or away from particular signals.
e information-conveying process of the algorithm is used to allow cells to collect together swarm to optima. is can be implemented by a sequence of three main processes on a population of replicated cells: first, chemotaxis; second, reproduction; and last, eliminationdispersal. ese first steps are responsible for the cost of cells is redefined through the closeness of other cells, and they can move along the modified cost surface at once. e second one in which only those cells are preferred performs best in their whole life that allows being the part of next generation, and in the third one, the cells may discard and low probability new random samples are added.
e following optimization algorithm is efficiently used for cyberattack detection mechanism. Damodaram and Valarmathi [50] applied the bacterial foraging algorithm for the detection of phishing attacks. e traditional systems are intelligent, flexible, and efficient based on association and classification of data mining algorithms, but they are not successful to provide the optimal solution. e proposed model introduced a hybrid optimization algorithm BFOA for achieving an optimal solution for identifying phishing websites. Experimental results were compared with the traditional techniques proved to be very efficient by comparison. Table 1 shows the comprehensive analysis including techniques, datasets, description, and outcomes of different articles from the literature. Table 2 presents the brief description of different metaheuristic techniques with their features and application.

Machine Learning Methods
In our day-to-day life, artificial intelligence plays an important role to solve many complex problems. It includes many applications such as speech recognition, language processing, machine intelligence, and fog computing [53,54]. Machine learning is one of the popular fields of artificial intelligence that is successfully used in solving various computational problems of different areas [56,57]. Now a days it is extended to more deep networks such as deep learning [58], extreme learning [59], deep extreme learning networks etc.
Machine learning algorithms are classified as "classification," "clustering," or "regression." is section of the paper discusses various methods of machine learning used in an attack detection system. Here, certain details of these techniques with their results are presented by taking the help of different research papers for each method. In Figure 3, classification of machine learning techniques such as decision trees (DTs), artificial neural networks (ANNs), naive Bayes (NB), and fuzzy set-based approach are referred from the previous literature survey.
e paper presents a detailed study of some important intelligent classification techniques are discussed below.

Artificial Neural Networks (ANNs)
. ANN is among the efficiently used systems that stimulated its working like the human brain [1]. ANN works like the human brain which Particle swarm optimization KDDCUP99 e proposed model provides a review and discussions of the denial of service attack detection and prevention mechanisms; moreover, it intended to propose the particle swarm algorithm optimally helps to detect DOS attack. e simulated outcomes have shown that the proposed PSO-based model was efficiently used for attack detection as compared with other methods.

Shinde and
Parvat [26] e hybrid form of PSO + SVM NSL-KDD e attack detection model was designed using a hybrid form of SVM machine with the PSO technique for the selection of optimal features to achieve high accuracy and performance also lower the FAR alarm than normal IDS.
e hybrid approach of machine learning and optimization technique (ABC-SVM) provides better results than the other single approach. e results showed a detection rate with 98.53% and a false alarm rate with 0.0374.
Siva Sankari1 et al. [39] Genetic algorithms KDDCUP99 e proposed model is designed by using the genetic algorithm (GA) for the detection of DoS.
is detection approach was betterperformed attack detection but not proved to be very efficient as comparing its performance with the hybrid technique approached model. However, it provides better results than the traditional one.

Mizukoshi and Munetomo [40]
Genetic algorithms KDDCUP99 is proposed model is based on realtime traffic pattern analysis using a genetic algorithm (GA) approach for optimal pattern extraction.
e experimental result has shown that the proposed method performed well as compared with other traditional methods.
Lee et al. [41] Genetic algorithms is proposed model is designed for the detection of distributed denial of service attack using a traffic matrix and optimizes some features of the traffic matrix by using GA.
e detection rate and accuracy by using this method were better compared with other traditional techniques.
Dimitris et al. [43] Genetic algorithms KDDCUP99 is proposed work is designed for the detection of DDoS attacks using a genetic algorithm for efficient feature selection and optimizing some parameters. Genetic algorithm (GA) evaluation used designed error-free neural network detector.
e evaluated results have shown that the features that best qualify for DDoS attack detection were optimally selected by the proposed approach and provide better results.
Chen et.al. [51] Ant colony optimization DARPA/ LLDOS KDDCUP99 is proposed work investigated different complexity of the DDIACS framework and also presents its comparison with the swarm technique and other probability-based techniques.
e results have shown that the proposed framework successfully resolved the problems related to processing attributes, and DDIACS framework provides higher performance than existing methods.

Kumar and Walia [52]
Ant colony optimization KDDCUP99 e objective of this work was to design and implement OSLR and DSR protocols for the blackhole attack also prevent the system from the threat.
After evaluation, results showed that the proposed approach performed well on various network performance metrics such as bit error rate, throughput, delay, and packet delivery ratio.

Rais and Mehmood [53]
Ant colony optimization KDDCUP99 e proposed model used the ACO optimization technique for better feature selection by various stages of pheromones that help ants to find the optimal features.
Evaluation of the result shows that the proposed approach outperformed in optimal feature selection as compared with the traditional techniques.
Bhuyan et al. [42] Artificial bee colony KDDCUP99 is proposed method is applied to ABC algorithm. Anomaly-based attack detection is used by using different feature selection techniques to minimize the number of unwanted features and pick the best one.
Experimental results have shown that the performance of ABC algorithm was better than traditional approaches and also achieved a high accuracy rate. comprises billions of neurons which are interlinked by different synapses, and its functionality is separated into three major layers which are input, output, and hidden layers in which each connection is associated with some weight. e entire networks are trained and learn from its learning phase and training phase through the weight adjustment, so it enables us to calculate the accurate class to the set of inputs. ANN, as shown in Figure 4, is also defined as a network of numerous computing elements or units that are closely interconnected with each other and also transform a set of inputs to the required outputs. e outcomes are evaluated using the unique weights and elements that are related to each other by interconnectivity between them. e network can generate the desired output by modifying links connecting nodes [60]. Activation function is applied to the set of input nodes, then passed through hidden layer nodes, and finally reaches the output nodes.. ANN works as a welldesigned transformation of a set of input to output values. An artificial neural network can work for both the methods of the anomaly-and signature-based attack detection [61].

Anomaly-Based Detection Using Artificial Neural
Network. Jadidi et al. [28] proposed a model for attack detection using the hybrid form of ANN for detecting attacks by using a flow-based dataset and also applied metaheuristic optimization algorithms for achieving an optimal solution. In this proposed work, there were two hybrid heuristic algorithms such as PSOGSA and cuckoo, which were used to efficiently use the interconnected weights of an MLP network. e resultant network analyzed by flow-based datasets compared its performance with the previously used techniques and found that the proposed hybrid technique enables us to detect attacks with better accuracy.
Jiang et al. [62] proposed a model designed by using hierarchical neural networks for an attack detection system that worked on RBF. e proposed method used the combination of the anomaly-and signature-based detection methods, also having the benefit of the RBF for low training with better accuracy. e RBF anomaly classifier is used for the identification of normal or attack data. Hence, the proposed method enabled us to analyze real-time network traffic.
In Jadidi et al.'s study [63], the proposed model was built on an anomaly-based detection approach which is a very well-known technique and efficiently used for detecting unknown attacks. is work is based on the anomaly-based attack detection method and MLP neural network with a single hidden layer was used. In this attack detection system, GSA was used for the optimization of interconnected weights of a multilayer perceptron network. Hence, the proposed GSA-based detection system successfully achieved 99.43% accuracy.
Ryan et al. [64] proposed a model of intrusion detection designed by using ANN in which the BPNN algorithm is used to model attack detection systems in which the system of some users was used. e dataset used for training and testing was taken from the logs of the UNIX environment. e evaluation of the result found 96% accuracy and a 7% false alarm rate.

4.1.2.
e Signature-Based Detection Approach. Cannady [65] proposed a model for intrusion detection that was built by using artificial neural network designed by a multistage classifier approach to detect signature-based (misuse based) detection. e data created by a real-time secure network consist of attack signatures and analyzed approximately thousands of events in which 3000 were simulated attacks.
Nine different features are selected after the data preprocessing step.. Normal or abnormal traffic is recognized by training the system using an artificial neural network, which enables us to learn the collective signatures. e proposed model resulted in 93% accuracy and found efficient after compared with other algorithms.

Bayesian Network.
A Bayesian network is one of the ML techniques that work on the concept of probabilistic graphical model that is represented by some particular variables and the associations between them [66]. e Bayesian network can easily handle incomplete datasets [33]. e network is generally created in the form of a graph where nodes or vertices (V) are used as the random variables and edges (E) as a connecting association between them, and  a directed acyclic graph (DAG) is set up. e lower-level nodes are called child nodes that depend on parent nodes or upper-level nodes. Every node or vertices are assigned a random variable and conditional probability [16]. Bayesian classifiers based on Bayes' theorem are used for the classification of the new instances of a data sample named Y. Each instance is a set of attribute values that are denoted as Y � (y 1 , y 2 ,. . ., y n ). Considering n number of classes, the sample Y is assigned to the class C i if a given condition is satisfied: and j in (1, m). e sample is considered to be the class that has a max probability. In the Bayesian network, the attributes are implicitly conditional independent. Instead of that, naive Bayesian classification provides acceptable outcomes as it focused on the identification of the classes for the instances instead of probabilities. Hence, it can be used in various applications such as text data classification and attack data classification [16]. [61] had proposed a structure for an attack detection system that used the naïve Bayes algorithm, one of the techniques of machine learning. e experiments applied on a 10% KDDCUP99 dataset, and the system is evaluated by tenfold cross-validations.

Anomaly-Based Detection. Panda and Patra
e experimental results show the proposed approach achieved a higher detection rate than other approaches, the detection rate was noted as 95%, the error rate was 5%, and it was fast and cost-effective.
Farid et al. [67] had proposed representation for intrusion detection where data classification can be done by using one of the popular learning algorithms, naive Bayesian technique. e overall working of the proposed algorithm for intrusion detection had been evaluated on 10% of KDDCUP99. e experimental results founded high accuracy with minimum false positives.
Muda et al. [66] proposed a hybrid method in which the hybridization of two machine learning approaches which are naïve Bayes and k-means clustering technique was used for solving a classification problem. e computational evaluation can be performed on the benchmark KDDCUP99. e proposed model worked with two different phases in the first phase, and the grouping of similar data instances was done according to their behaviors by using the k-means clustering technique. In the second phase, the naïve Bayes classifier was used for classification task and the results are achieved by this approach: the accuracy was noted as 99% and false alarm was less than 0.5%.
Ben Amor et al. [33] proposed a model built on the naïve Bayes classifier and built a normal Bayesian network, and the evaluation is applied on the KDD 1999 dataset and collecting the classes of attacks in the following three major stages for performance measurements. In the first stage, calculate single attack in normal data, in the second stage, contain all four attack types of the KDD 1999 dataset, the problems were resolved by using multiclass classification based on the misuse detection technique, and the third stage consists of normal data and all four attack types using anomaly-based attack detection technique. e experimental results found with better accuracy.

e Signature-Based Detection Approach. Panda and
Patra [61] proposed the attack detection model designed by naïve Bayes technique using Weka tool [23], and the experiments were applied on the KDD 1999 dataset that is grouped into different attacks of KDD datasets; finally, the results are compared with the neural network classifier and reported as the naïve Bayes classifier has a high accuracy and false alarm rate than NN.

Support Vector
Machine. SVM is capable of resolving various pattern recognition problems, proposed by Vapnik [25]. SVM uses the concept of supervised learning with related learning algorithms which were mostly applied to signature-based detection in the last few years. It transforms the set of inputs into a high-dimensional space and can be creating an optimal divided hyperplane into the high-dimensional feature space [60]. e SVM classifier is applied to provide improved output for binary classification as compared with further classifiers. SVM promises good performance, and hence, it is used in various fields such as pattern recognition, bioinformatics, text categorization, speaker verification, character recognition, engineering and science, and financial market evaluation [32]. SVM is popular for solving various classification problems because its robustness and efficiently dealing with high-dimensional data also remove the nuisance of the dimensionality problem. SVM was initially designed for binary classification for constructing an optimal hyperplane to maximize the division line among the negative and positive datasets [32].

Anomaly-Based Detection Using Artificial Neural
Network. Mukkamala et al. [68] proposed a hybrid model that is the collection of techniques ANN and SVM for the attack detection system. e purpose of using SVM is to achieve better speed and scalability in attack detection system. e experimental results were carried on the DARPA 1998 dataset. e result of the proposed method shown as training time for SVMs is significantly minimum, and it is reported as 17.77 sec shorter than neural networks. e performance of SVM showed that the attack detection system had a higher rate of detection than neural networks.
Chen et al. [69] proposed a model used the combination of set theory and SVM for the attack detection system. e experiments are applied on the KDDCUP99 dataset, and the rough set theory concept is applied at the preprocessing phase and to optimized features. e selection of best features was selected and applied to train the SVM model and accordingly tested. e experimental result has shown that the accuracy was noted as 86.79% and FPR was 29.97%. e accuracy was found better with a reduced false positive rate.
Wang et al. [70] proposed a model for attack detection designed by using a two-hybrid combination of algorithms that worked on improved SVM by using the collective form of PSO and PCA. e experiments were performed on the KDDCUP99 dataset, and principle component analysis (PCA) was used as an effective technique for decreasing dimensions of the dataset. e particle swarm intelligence technique was applied with different parameters in SVM. e results have shown that the attack detection rate was found to be improved by PCA and PSO combination as compared with PSO-SVM.
Srinoy et al. [21] proposed an attack detection model that is built on a hybrid combination of algorithms which are PSO for SVMs and optimal feature selection, which act as a fitness function of PSO. e evaluation of the result showed that the proposed method successfully recognized both the unknown and known attacks.
is proposed technique achieved better classification accuracy by comparing it with different traditional methods.
Shinde and Parvat [26] proposed a model built by the use of PSO and SVM for choosing optimal parameters for gaining a high detection rate and low false alarm rate. Support vector machine (SVM) has the potential for achieving better classification for the attack detection system. e working of SVM is depending on choosing the better parameters.
is proposed work used the SVM classifier with the knowledge gain for feature selection. e classified parameters of SVM will be optimally chosen by a PSO algorithm. e experiments were applied on the NSL-KDD dataset, and results show that the proposed method can attain a low false alarm rate and higher detection rate.
Saxena and Richaariya [46] proposed a model which was a hybrid form of the PSO method and SVM. e experiments are applied to the KDDCUP99 dataset. e selection of optimized parameters by binary PSO and classification problem was solved by the support vector machine. e binary PSO provides the finest promising feature subset for creating a better intrusion detection system. e proposed method had to complete that task by following certain major steps which are preprocessing, feature reduction using information gain, and training using hybrid SVM-PSO. Finally, the evaluation has shown that the hybrid combination of PSO and SVM achieved a high detection rate than the simple SVM method.
Wang et al. [71] proposed a model designed by using the SVM-based feature selection algorithm for reducing the dimension of sample data. e anomaly-based intrusion detection technique is used, and MSVM is used with highly optimized parameters by the use of particle swarm optimization (PSO) in the collective form detecting anomalous connections. e experiments were performed on the KDDCUP dataset to measure the efficiency of proposed algorithms (FS-SVM and MSVM-PSO) and the detection precision of MSVM-PSO, also comparing the MSVM-PSO with three different algorithms: these were Bayesian algorithm, k-means, and multiclass support vector machine with optimized parameters of the method (MSVM-grid). e experimental results have shown that the hybrid form of MSVM-PSO outperforms than three algorithms in terms of different parameters such as detection rate and accuracy.

Decision
Trees. DT is one of the prominent methods of ML that represented in form of a tree structure where each lower-level node is used to represent a decision or test on the data item which is taken into consideration [1]. e outcome of the test decides the selection of any branch. Classification of any data item is based on a process in which the decision tree algorithms start with its root node and follow the assertions, and the process is carried out until reaching a terminal leaf or node. After reaching the terminal node, a decision is being made. A decision tree is also represented as a unique form of a rule set, categorized by the hierarchical association of rules. A decision tree comprises of following essentials [37]. A decision node which represents a condition or a test on a data item, one of the possible attribute values, or test attribute outcomes is given by branch or edge, and the object belongs to which class is given by the leaf. It starts from the root node of the decision tree and follows the branch indicated by the outcome of each test until a leaf node is reached which is the procedure to classify an object. e leaf node level class is called as unknown object, and the information gaining of the attributes provides the best attribute on the division of subsets.
Ben Amor et al. [28] proposed an attack detection model that was built on the combination of two ML techniques which were DTand Bayesian network. e experiments were applied to the KDDCUP99 dataset and also compared the performance of both the techniques. After the evaluation of the proposed technique, that DT-based method provides much better results than naïve Bayes. If the comparison is based on the computation, then the building of DT is much slower than the Bayesian network. e decision tree selects the optimal features for each node during the creation of the tree based on some defined criteria. e advantage of DT is that it has a good speed of operation and high attack detection accuracy.
Stein et al. [72] proposed the attack detection model which was based on the GA approach for the optimal selection of features used with DT, to achieve maximum detection rate and minimum false alarm rate. e experiments were performed on different attack datasets with different attacks separately. e GA improved some of the categories such as in performance gaining on probe. Hence, it was found that the performance improvement on other attack types is much higher in the testing data.
Karthik et al. [73] proposed the model used three techniques which are used for hybridization, i.e., chi-square, information gain, and relief, and compared their performance using the decision tree classifier. Evaluation can be done by using the KDDCUP99 dataset, and the results have shown that the decision tree which classifies the performance is improved with high accuracy.

Random
Forests. RF is one of the classification techniques that consist of a set of tree-based classifiers [20]. RF is a collection of classification and regression which is invincible in terms of accuracy among the DM techniques. e RF algorithm has numerous applications, used in prediction, probability estimation, and pattern analysis [74]. Random forest classifiers consist of a collection of a huge number of DTs. It is a collection of tree hierarchy in which each tree depends on the values of a random vector that is sampled individually and with the same distribution for all trees in Security and Communication Networks 13 the forest. If new records were taken as input, then random forest makes trees for that records and keeps them in the forest [74]. Malik et al. [75] proposed a hybrid combination of the binary PSO and RF algorithm for the optimal feature selection and classification of attacks in a network. Particle swarm optimization is one of the popular algorithms of the swarm family that has the capability of robust global search and is used for optimal feature extraction, and random forest (RF) is a highly accurate classifier and used for classification. e experiments of the proposed technique are applied to the KDDCUP99 dataset. We also compared the evaluated results of the proposed system with other classifiers, and the final results show that performance achieved by the proposed classifier is much better than the traditional approaches.
Malik and Khan [74] proposed a model designed by using the BMOPSO approach to detect PROBE attacks in the network. e proposed technique focused on two basic parameters to be achieved as first attack DR and second FAR to follow the procedure of feature selection. e experiments were applied to the KDDCUP99 dataset. e proposed approach is used for optimal feature selection from a set of different features and RF techniques used for highly accurate and fast classification.
e results have shown that the proposed method performed well for the classification.

Association Rule and Clustering.
Hao et al. [36] proposed an attack detection model for the application level DDoS attack. e weblog record was used for user session extraction and recognizing patterns of the data that are normal or abnormal, and also evaluating the similarities between different sessions. e traditional k-means clustering algorithm fails into local optimality. e hybrid collective form of particle swarm optimization k-means clustering algorithm (PSO-KMC) was used for constructing an attack detection model. e proposed model enables to detect whether the undetermined sessions are part of DDoS attack or not. e experimental results have shown that the proposed technique can detect attacks effectively with high performance.
Srinoy and Kurutach [76] proposed a model designed by using a data mining-based hybrid approach for attack detection. e algorithm is hybridized by k-means and artificial ant clustering algorithm primarily used to generate raw clusters, and they were refined further by k-means particle swarm optimization (KPSO). e proposed model had been developed as the evolutionary-based clustering technique. We hybridized the k-means algorithm and PSO to find good partitions of the data. e proposed approach allows recognizing not only known attacks but also unknown attacks. After evaluation, the results have shown that the proposed hybrid algorithm performed well with high accuracy.
Ensafi et al. [77] proposed a model designed by using a hybrid combination of algorithms for the attack detection model which were fuzzy logic-based approach and swarmbased approach. e proposed technique is efficiently applied to solve local minima and complex classification problems.
e proposed SFK-means approach used the benefits of k-means, fuzzy k-means, and swarm k-means, and all together successfully resolved most of the problems. e importance of the SFK-means algorithm was to overcome local convergence problems in fuzzy k-means and the sharp boundary problem in swarm k-means. e experiment is applied to the KDDCUP99 dataset and found that the proposed approach was effective in detecting various attacks.

Hidden Markov Models. Markov chains or hidden
Markov models (HMMs) are members of the class of the Markov model. A Markov model consists of set of states that experience some transitions from one state to another by the transition probabilities which decide the logical structure of the model [31]. HMM is one of the machine learning models used in various applications such as speech, pattern, gesture recognition, bioinformatics, and language processing domain. It is also named as a statistical or sequence model in which the system modeling can be done by the Markov process for unknown parameters. e challenging task in the HMM model is to recognize different hidden parameters from the visible parameters [31]. e states of a hidden Markov model represent undetermined conditions that should be modeled and have dissimilar output probability distributions at each state.

Anomaly Detection and Hybrid Detection. Joshi and
Phoha [78] proposed the hidden Markov model for intrusion detection. e HMM was used with the following five states and six symbols per state. e states were interconnected in a manner that anyone of state can reach other states. Baum-Welch et al. were applied to evaluate the hidden Markov parameters. e experiments were applied to the KDD 1999 dataset, also evaluates other parameters such as FP and FN rate. e results showed that the accuracy is significantly improved by using more than five features.

Misuse-Based Detection.
Ariu et al. [31] proposed an attack detection model for the web applications and used hidden Markov models for the attack signature extraction. e proposed method competitively modeled the classifiers. e experiments were applied to the DARPA 1999 and HTTP dataset. e experiments were applied, and the detection ratewas evaluated higher than 0.8.

Deep Learning.
Deep learning is another field of machine learning that deals with algorithms based on structure and function that resemble the working of the human brain which is called artificial neural networks. It belongs to both the categories of supervised and unsupervised learning and dealing with multilevel representation and features of hierarchical architecture in classification and pattern recognition. Deep learning is popularly used in different applications such as image processing and audio, text, and speech recognition. ese techniques are targeted to learn the best feature representation from the bulk amount of unstructured data. Deep learning-based different methods are used to overcome different problems of modeling an efficient attack detection system. ere are various deep learning methods which are available such as recurrent neural network (RNN) [79], Boltzmann machine (BM), restricted Boltzmann machine (RBM), deep Boltzmann machine (DBM), deep neural network (DNN) [80], autoencoder, deep/stacked autoencoder, stacked denoising autoencoder, distributed representation, and convolution neural network (CNN). Self-taught learning (STL) [81] is also one of the DL approaches that consist of two stages for the classification. First, the best feature is selected from bulky data, called unsupervised feature learning (UFL). In the second stage, it learned representation of labeled data and used for the classification. DL architectures are similar to a neural network of multiple layers of architecture and various linear or nonlinear functions. e deep learning algorithm that works with high speed and fast learning capability and provides an efficient solution is said to be a successful method.
Niyaz et al. [82] proposed an attack detection model against DDoS attack, a DL-based approach of the multivector DDoS detection system in an SDN environment. SDN provides facility to program network devices for accomplished different tasks. e proposed system is designed as a network application over the software-defined network environment controller. e feature selection or classification is done by using deep learning. e result showed high accuracy with a low FP rate for attack detection in the proposed system.
Yin et al. [79] proposed a model designed by using deep learning technique, called the RNN-based intrusion detection system. e performance can be evaluated in the form of binary and multiclass classification, and the number of layers, neurons, and different learning rate was included. Hence, comparison was carried out with different techniques such as ANN, RF, SVM, and other machine learning methods that were previously used by the researchers. e experimental results have shown that the proposed intrusion detection model is suitable for modeling complex classification models that have high accuracy and performance is also superior to the traditional machine learning classification methods in both binary and multiclass classification. e proposed model ensures that improved accuracy of the intrusion detection also provides a better method for intrusion detection.
Javaid et al. [81] proposed a deep learning-based IDS model, using self-taught learning (STL), a DL technique was applied, and the experiment was evaluated on the NSL-KDD dataset for network intrusion. e performance of the proposed approach is better than other traditional approaches of previous work. Comparison can be done on certain parameters which are accuracy, precision, recall, and F-measure values. Table 3 presents comprehensive study among the machine learning techniques. Table 4 and Table 5 present the performance comparison among machine learning techniques.

Datasets Used in Cyberattack Detection
Using machine learning and optimization algorithms for classification and feature selection problems, the dataset is considered to be a very important element. Since these techniques are working with the learning and testing phase in which they learn from the existing data, hence it is essential to have proper knowledge of the dataset that should be used to be aware of how the various authors and scientists may apply different machine learning and optimization algorithms. In this section, we describe different types of datasets used for attack detection in detail by applying machine learning and optimization algorithms. Here, we discuss three broad categories of the dataset which are public datasets, net flow datasets, and packet flow-based datasets.

Public Datasets.
For the attack detection system, there are the following public datasets are discussed in detail. e different public datasets are given in the following sections. (DARPA 1998). DARPA 1998 is firstly created through the Cyber Systems and Technology Group of MIT Lincoln Laboratory, under Defense Advanced Research Projects Agency and Air Force Research, a laboratory for the assessment of network attack detection systems. DARPA 1998 and 1999 are widely used in various experiments and repeatedly cited in many publications. e DARPA 1998 dataset is formed by the MIT/ LL. It creates an interest in the various researchers that may work on different issues of network, base station, and network attack detection system. e evaluation is designed to concentrate on core issues of technology and to motivate much participation to work on security and privacy concerns. e DARPA 1999 dataset significantly has many attack types as compared to the DARPA 1998 dataset. ere were two parts of intrusion detection evaluation of 1999 DARPA: firstly offline evaluation and secondly by a real-time evaluation. e attack detection systems are tested in the offline evaluation mode using network traffic and various audit logs together on network simulators. Some batch mode is applied for processing these data.

Knowledge Discovery Dataset (KDD).
e most commonly used benchmark for attack detection is named as the KDD 1999 dataset [66] which was formed for the KDDCUP challenge in 1999. e KDD dataset consists of three major terms which are basic, content, and traffic features and creates 41 attributes (Table 6). e KDD 1999 dataset has some resemblance with the NetFlow dataset, but it is more complicated and has detailed features since the attacks were evaluated.
Another form of NSL-KDD dataset that has 42 attributes ( Table 6) was used in this study. NSL-KDD is an enhanced form of the KDD99 dataset on which repeated instances were removed. e NSL-KDD dataset has many different versions in which only 20 percent of training data are used which are determined as KDD train 20% along with 25192 instances.
e tested dataset is determined as KDD test that has 22544 instances. Table 6 describes the KDD dataset attributes with class labels. Hence, from these 42 attributes, the 41 can be classified into four different classes mentioned as follows: basic (B) features are of individual TCP connections, content (C) features are inside a connection recommended by domain knowledge, traffic (T) features are processed by two-second time window, and host (H) features are planned to assess attacks that last for more than two seconds.

Packet Flow-Based Dataset.
Over the Internet, there are the following broadly used protocols, for example, IP, ICMP, IGMP, TCP, and UDP. Hence, due to the client programs, running these protocols can create huge traffic over the network. e overall incoming and outgoing packets use the physical interfaces such as Ethernet port for transmission and reception of the packets. Because at the network layer, none of the abovementioned protocols is directly transferred to its lower layer, hence these protocols are encapsulated inside the data field of the IP packet format, and then it can be transferable to its lower layer. In the data link layer or MAC  layer, an Ethernet frame consists of about 1500 bytes of the payload, and this consists of encapsulated IP payload with IP header and IP packet contains itself its header and in its data section higher-level protocols such as HTTP, NFS, POP, telnet, and TFTP.

NetFlow Data.
It contains the detail of the router and its features, and the routers have the capability to collect IP packet traffic as these packets are in and out from the router. e NetFlow generally Cisco's version 5 can introduce the network flowing as the sequential flow of packets in the same direction and defines seven types of attributes which are as follows: source and destination IP address, IP protocol, source and destination port, interface, and IP type of service.

Observations and Evaluations
e overall performance of any attack detection system is done by evaluating the performance of the various techniques that are applied to it and by the help of some parameters. In this literature survey, it is observed that the KDDCUP and DARPA benchmarks are very suitable and highly used for the evaluation of the performance of the system by applying different metaheuristic and machine learning techniques. In this study, different machine learning and metaheuristic techniques were not applied to build any IDS system but applied to certain cyber data for evaluating their performance. e major parameters that were calculated by applying different techniques on the cyber data are accuracy, detection rate, false alarms, detection rate, false positive, false negative, etc. ese parameters show that the ability of a particular technique is suitable for attack detection or not. By the study of different research papers, a certain comparison is shown based on parameters and different techniques of metaheuristic and machine learning on the various cyber data. Figures 5-7 show a comparison among various ML and MH techniques applied on different datasets by different researchers and detection rate, accuracy, and FAR parameter evaluated.
In Figures 5-8, the comparison on the basis of different criteria among previous research studies that used ML and MH techniques in cyberattack detection is shown. It is found that ML and MH techniques improve performance in cyberattack detection models. In Figure 5, we have used previous papers and showed improved detection rate of the different models by ML and MH techniques. In Figure 6, we highlight another important parameter accuracy that is also seen to be improved by using ML and MH techniques. In Figure 7, we highlight the performance of algorithm on the FAR parameter that is also seen to be improved, and finally, in Figure 8, we have shown the usability of benchmarks that are popular in cyberattack detection. Hence, in this research, we present the successful usability of ML and MH  Class Host techniques in cybersecurity domain, and furthermore, these techniques will be explored to perform well in this domain. Table 7 presents the comparative study on the basis of performance among various literature articles and techniques.

Challenging Issues and Future Directions
Here, we discuss the different challenges of machine leaning as well as optimization algorithms as follows.
ere are two types of problem are categorized while talking about machine learning: one is the regression and the other is classification. e basic difference between both the approaches is about its output value (either continuous or discrete). In cybersecurity, mostly problems are associated with classification which provides categorical output. In order to design the model that can classify unknown data, it is important to first train the network using representative examples. . is phase is usually called either as training. To achieve the same, robust technique of learning will be take into consideration. For examples, the commonly used techniques are SVM [20], decision tree [72,73], naïve Bayes [20], etc. e input data are an important factor for the learning techniques because they need preprocessing to meet the specification of techniques [87]. e important solutions for betterment in leaning algorithms in performance prospective are as follows: (i) Dataset that is used for training should be labeled in case of output-based learning  Table 7.  Table 7.   Table 7.  Table 7.
(ii) e sample instances while training must represent all classes of the model (iii) Identify the algorithm with better learning function, and train the model and regulate the parameters using separate data (iv) Evaluate the model on dissimilar data such as test data Now, selection of features is a most significant step for achieving better input representation. ere are numerous methods, such as filter based, correlation based, wrapping, and heuristic, which were used. Here, we have discussed about metaheuristic techniques that are popularly used in cybersecurity basically for attack datasets. Metaheuristic methods provide two types of solutions such as single-based or population-based solutions. e population-based solution is mostly used in cyberattack problems due to providing multiple solutions. e most commonly used algorithm are already discussed in this paper such as PSO [21,26,38,88], GA [68], ACO [21,45], ABC [46], etc.
Generally, in cybersecurity, the attack detection problem requires high-level solution methods (metaheuristic methods) that enable us to escape from local optima and execute a robust search of a solution space. However, these methods are unable to perform with largesize or multidimensional datasets and sometimes suffer with convergence problems. Cybersecurity deals with the high-dimensional data as attack datasets are too large to handle. Hence, the advanced technique of learning comes into picture to deal with high-dimensional data. e advanced learning techniques such as deep learning methods [79,81,82] and latest artificial intelligence techniques can provide better solutions for cyberattack detection problem by efficiently covering the classification as well as feature selection problems [89][90][91][92].

Conclusion and Discussion
Cyberattack is one of the challenging areas of research. is study provides imminent research studies in the field of cyberattack detection, a summary of the different techniques related and work done in recent years.
In this paper, more than eighty recent related and fine publications of different conferences and journals were used, which highlights the previous study of different metaheuristic algorithms (MHs) and machine learning (ML) techniques in the attack detection system. ML and MH performance presents in comparative study in Table 1 and  Table 2. In Table 1, we found out that these techniques provide better outcomes; hence there is need of exploring application of such techniques in other fields of cybersecurity.. In Table 2, we discuss about the internal properties of algorithms for their better use in computation.
Initially, the most important step is finding example papers that provide a detailed explanation of different machine learning (ML) and metaheuristic (MH) methods in the cybersecurity environment, for both the signature and anomaly-based detection. So, the analysis of the detailed survey presented in the paper states the fact that the machine learning and optimization techniques are more preferred, but regrettably, techniques that provide maximum efficiency and performances had not been invented yet, it is very difficult to provide one recommendation for each technique, and depending on the type of attack, the system is made-up to detect. For evaluating the performance and effectiveness of any techniques, there are several criteria which are available, and it cannot be decided by taking some of them into account. e parameters that were evaluated are listed as accuracy, detection rate, the time complexity for classifying an unknown instance with a trained model, and understanding of the final solution of each machine learning and metaheuristic technique. One more critical characteristic of machine learning and metaheuristic algorithms is a type of dataset for the training and testing of systems in the attack detection process that should be carefully selected and preceded. Hence, the potential use of ML and MH techniques for the computation of the attack detection system is inspiring the advances required to realize the reliable, efficient, accurate, and robust attack detection systems.