An Ensemble-Based Multiclass Classifier for Intrusion Detection Using Internet of Things

Internet of Things (IoT) is the fastest growing technology that has applications in various domains such as healthcare, transportation. It interconnects trillions of smart devices through the Internet. A secure network is the basic necessity of the Internet of Things. Due to the increasing rate of interconnected and remotely accessible smart devices, more and more cybersecurity issues are being witnessed among cyber-physical systems. A perfect intrusion detection system (IDS) can probably identify various cybersecurity issues and their sources. In this article, using various telemetry datasets of different Internet of Things scenarios, we exhibit that external users can access the IoT devices and infer the victim user's activity by sniffing the network traffic. Further, the article presents the performance of various bagging and boosting ensemble decision tree techniques of machine learning in the design of an efficient IDS. Most of the previous IDSs just focused on good accuracy and ignored the execution speed that must be improved to optimize the performance of an ID model. Most of the earlier pieces of research focused on binary classification. This study attempts to evaluate the performance of various ensemble machine learning multiclass classification algorithms by deploying on openly available “TON-IoT” datasets of IoT and Industrial IoT (IIoT) sensors.


Introduction
For the last few decades, Internet of ings (IoT) technology has been continuously integrated with various application domains, especially in design of automation-enabled homes, cities, and industries. A plethora of physical and virtual "things" communicate with each other using the Internet. IoT has become a usual chunk of people's lives and it is expanding rapidly due to its capability of providing superior services. IoT has improved people's everyday lives by automating very common home services such as controlling the temperature of refrigerators, turning on/off light bulbs, operating ACs, and locking/unlocking doors. IoT has reshaped even the modern technologies with the absolute connection of things in various domains, namely, home, industry, and business [1]. However, due to the frequent rising of IoT technology, it is exposed to many technical and security challenges [2]. e Internet of ings is described as a group of physical objects and applications that are embedded with several components including sensors, actuators, software, processors, and many other technologies and services which enable devices and systems to connect and communicate with other devices over the Internet using a communication network. e sensors and actuators embedded with devices collect valuable information from the related environment of the physical world and transmit it over the network. Data gathered from these automated devices flow in the form of signals which might carry suspected network traffic along with the normal network traffic. e traffic signals flowing over the network might be stored on different levels of IoT platform. Data storage might occur on network, cloud, fog, or device itself, and unauthorized users might smartly access the whole system through anomalous traffic signals [3,4]. IoT devices could be easily compromised by malicious users by merging anomalous traffic with normal traffic. An attacker can easily access the user's login and trace his activities to figure out confidential information [5].
Every device operates in a specific pattern. If a user tries to operate any device differently, that results in a change in the normal behavior of that device, and the modified behavior is considered as malicious [6]. In such situations, the current behavior is matched with the historical behavior of the specific device to verify the mode of behavior (whether safe or unsafe). Intrusions could be identified by recognizing changes that happened in the behavior of events [7]. e actual authorized user could be alerted if such kind of changes occurs. In IoT-enabled smart home environment, the attackers can make physical effects on various devices like smart refrigerators, smart doorbells, fire alarms, smart heaters, domestic useable smart healthcare devices, etc. If these devices are controlled by malicious users, they can make changes in the behavior of these devices. By gaining unauthorized home access, the attackers can disrupt the power supply, close and open door locks without the user's permission, and give wrong instructions to a smart refrigerator or a heater, which may cause massive hazards [8]. Attacks on computers are generally limited to data loss, but attacks on IoT systems might result in the loss of data as well as loss of someone's life too.
e malicious access could be prevented by robust security schemes. In this context, there are several major methods that have been gaining the remarkable attention of researchers. Handling of any cybercrime-related problem should be started with the intrusion detection system (IDS). Intrusion detection (ID) is the most effective mechanism to detect the mode of a device's behavior [9]. Illustration of the user's behavioral pattern and secure cyber systems might lead to anomalous behavior detection [10]. Intrusion detection is a process of scanning the incidents which arise in different network systems and examining them to find clues related to incidents [11]. It is an effective technique that can reduce the growing cybersecurity issues through the proactive security system. So far, IDSs proposed by different researchers have achieved remarkable results to predict known and unknown intrusions in wired and wireless IoT networks. e network traffic might be examined to understand the properties of the attacks and the transmission medium [12]. e main contributions of this study are as follows: (i) is study provides a diversified evaluation of network traffic routines in IoT enabled smart environment. (ii) is study also highlights the review of various decision tree ensemble techniques for the classification of network traffic data of IoT systems. (iii) is study adopts various classification metrics to predict malicious network traffic as well as normal traffic with the help of collected data patterns. (iv) e work in this study also focuses on the comparison of various ensemble learning techniques for multiclass classification and threat detection in IoT environment.
For practical implementation, the "TON_IoT" datasets have been accessed from an open-access location [13]. Here, a CSV file of "TON_IoT" has been downloaded that includes heterogeneous data sources collected from telemetry datasets of IoT and IIoT sensors. It has been classified using various multiclass classifiers to predict the labels by training the model with training data samples. e proposed model is trained using the collected datasets and the extracted features. In a smart IoT environment, a particular device generally operates in a unique pattern. In this article, Section 2 presents the motivation and related work to the proposed work. Section 3 presents the methods and techniques used for the proposed work. Bagging and boosting ensemble approaches have been explored thoroughly for multiclass classification for the purpose of ID. Section 4 comes with details of the considered datasets and the procedure followed for dataset selection used in the practical implementation. Section 5 comprises the experimental results and metrics with a brief discussion, computation, and analysis.

Motivation and Related Work
IoT is an emerging technology that is growing day by day. IoT system's users have to face various unexpected situations. ere are numerous challenges in the path of successful IoT infrastructure. Smart devices are the major pillars of IoT infrastructure. e things of daily needs, which are positioned in domestic, industrial, and other application areas, are now interconnected with the Internet and are enabled with very few security measures. erefore, the security and privacy of the IoT environment are really unpredictable. e interconnected IoT systems generate an amount of digital data related to the objects, applications, and their behavior. e generated data needs to be collected, processed, analyzed, and distributed securely and efficiently.
In view of the increasing number of cybercrimes in connected devices, it is required to formulate new generation approaches to identify the classes of intrusions. e present section provides an exploration of some recently developed ID models using machine learning technologies. e IDS emerged in the 1980s for the security of traditional networks against various malicious activities [14]. So far, many security experts and researchers have formulated several IDSs to identify anomalous activities in IoT-enabled smart environments [15]. However, most of the ID models have been developed using machine learning techniques. IDS designed using a machine learning approach provides a promising solution to identify various security issues. Machine learning-based ID approaches are able to recognize malicious patterns in incoming network traffic [16].
Many pieces of research have been conducted so far for threat detection by inspecting the network traffic [17] and classification of events [2]. Authors in [18] presented an ID model to identify unsupervised anomalies and traffic classification using the dataset KDD-Cup99 [19] and real-world network logs for confirmation of the effectiveness of performance. Incremental statistics have been used for feature extraction. Authors in [20] proposed a dataset to incorporate legal and simulated IoT traffic with different types of attacks. Many researchers [20][21][22] have addressed various existing datasets and evaluated the reliability of the BoT-IoT dataset 2 Computational Intelligence and Neuroscience using statistical machine learning and deep learning approaches for forensics. IoTnetworks incorporate the usual network components (laptops, routers, and workstations) with smart IoT devices. Machine-to-machine communication, cloud instances, and popular IoT application providers come up with services in different distributed IoT environments [23]. In [24], the authors analyzed the vulnerable traffic using three machine learning classifiers, namely logistic regression, random forest, and support vector machine (SVM). e authors analyzed the data collected from various IoT devices using different statistical parameters. Data mining, machine learning, and deep learning are the most recent data processing and forecasting approaches which are useful for ID. However, machine learning is the most effective approach due to its better true positive rate (TPR) as compared to other approaches [25]. Rose et al. in [26] explored the prospective of network profiling and monitoring using a dynamic anomaly-based IDS for the investigation of suspected network transactions and potential attacks. Ibrahim et al. [27] compared the performance of the CatBoost classifier with SVM, logistic regression, gradient boosting, AdaBoost [28], random forest, decision tree, K-nearest neighbor (KNN), and many others. Alqahtani et al. [29] proposed a genetic XGBoost-based IDS model using the dimensionality reduction feature selection method to detect botnet attacks on data traffic in IoT, where a publicly available N-BaIoT [30] dataset was used for practical implementation. Tang et al. proposed IDS using light gradient boosting (LGB) and auto-encoder [31].
LGB is one of the most efficient methods of boosting family. e decision tree is one of the leading approaches of machine learning to make analysis and prediction of intrusions [32]. Decision tree algorithms are supervised machine learning mechanisms that make decisions using bias and variance analysis approaches. Ensemble learning decision tree-based classification model provides more accurate and efficient performance compared to a single decision tree-based classification model. Ensemble methods use the concept of integrating weak learners to attain a strong predictive model to gain better performance. e gradient boosting approach is more promising compared to traditional classification approaches of machine learning [33]. Hyper-parameters are required to be finely tuned to improve the accuracy performance of the model.
Ensemble methods have been implemented in several data mining competitions like KDD-Cup, Netflix Prize, and proposals for the ID framework [34]. Ensemble learning is a learning approach that improves the performance of a machine learning model by incorporating many machine learning models. Most of the winning results of various competitions (like Kaggle) have been in favor of ensemble learning-based approaches. Ensemble methods are commonly used for building ID models due to their feature characterization. Ensembles of different features are combined for a final decision [35]. Ensembles are also used to detect some malicious executables that have never been noticed earlier [36]. It was first implemented by Tianqi Chen but later contributed by many researchers [37]. It relates to a wide collection of tools under Distributed Machine Learning Community (DMLC). XGBoost is one of the best promising ensemble methods that come with competitive outputs [38]. XGBoost allows the tuning of regularization parameters to improve the accuracy, efficiency, and feasibility of the model. Authors in [39] focused on building a strong classification model for IDS. In this article, various decision tree ensemble learning techniques have been analyzed for the prediction of anomalous patterns in the network traffic of IoT systems. In this article, the practical will be carried out using the "TON-IoT" datasets [13].

Methods and Techniques Used for Proposed Work
Data analysis is an approach to discover useful information by evaluating raw data. It is performed using data analysis tools and a sequence of processes including data cleaning, data transformation, and data modeling. Data analysis helps to make useful predictions and forecasting from big data. ere are several techniques to perform data analysis such as traditional techniques, soft computing techniques, and forecasting techniques. e selection of data analysis technique depends on the aim of the investigation and type of data (quantitative or qualitative).
Traditional techniques include regression methods, exponential methods, and least-squares reweighting iterative methods. Traditional techniques are generally used for predictive analysis of small-size datasets for solving simple statistical problems. Some major limitations are associated with the traditional techniques, such as overfitting during processing big data, feature engineering problems, less accuracy, and execution speed. Overfitting and underfitting are the major problems exhibited by statistical models [40]. A statistical model is said to have underfitting when it is not able to capture the underlying logic of the data. It destroys the accuracy of the machine where the model is deployed. It usually happens due to a lack of availability of enough data to build an accurate model. is problem can be overcome by using more data and reducing features in feature selection. Underfitting gives high bias and low variance. On the other hand, a model is said to be overfitted when it is trained with a lot of data. Due to the bulkiness of data in the dataset, the model starts learning with noise and inaccurate data which can lead to the building of unrealistic models. It cannot be avoided completely but can be reduced by reducing the size of the network and appropriate data selection methods [41]. Overfitting has high variance and low bias.
Soft computing techniques include neural networks (ANN), fuzzy logic, knowledge-based expert systems, and genetic algorithms. Soft computing is a new multidisciplinary system that encourages the design of new generation artificial intelligence (AI) to provide solutions to real-world issues. It also motivates the integration of computational tools, techniques, and applications in different combinatorial forms. It is a cost-effective and rapid solution to various complex problems for which solutions do not exist [42]. But it is still a developing and growing technique. Some major limitations are associated with these systems such as high Computational Intelligence and Neuroscience execution time, loss of model interoperability, computational overloading, and limited generalization [43].
Forecasting methods have been focused on for the last few decades for various applications to predict a large number of service points. ese can be called as predictive analysis methods. Conventional forecasting and advanced forecasting techniques are commonly used data analysis and forecasting techniques. Conventional methods include simple linear regression, multiple linear regression, and straight-line methods. ese methods provide sensible forecasting prediction, but modern forecasting approaches provide much better accuracy as compared to conventional techniques. Moreover, modern techniques have advantages like flexibility, higher efficiency, and interpretability on big datasets. Modern forecasting methods include various machine learning methods such as gradient boosting methods (GBM) and deep learning techniques such as longshort term memory (LSTM). Artificial neural network (ANN) is also a type of advanced forecasting. ese techniques provide powerful time-series forecasting and predictive analysis even for big data [44]. is study utilizes machine learning-based modern data analysis methods for constructing and designing the proposed model.

Ensemble Learning.
Ensemble approaches have been extensively deployed for application forecasting in various areas due to their simplicity of implementation. Ensemble means to view a group of elements all together instead of using them individually. In the ensemble-based approach, multiple models are created and combined to solve a complex problem [45]. Machine learning algorithms aim to build an unbiased model from a dataset. e constructed model is designated as training or learning, and the model that learns from the data is named as learner or hypothesis. Figure 1 shows an ensemble-based learning model that assembles a set of classifiers to classify new data points by choosing certain weak predictions to combine them into a strong predictor. Instead of a single classifier, the ensemble methods use a combination of multiple classifiers or predictors which are trained to resolve the same problem and aggregated with each other to obtain better results. Ensemble methods may either use homogeneous base models or different types of base models [43]. Ensemble-based machine learning can optimize the performance of a model by aggregating the prediction results obtained from selected weak models [46]. While aggregating the base models, it is required that a base model with high variance and low bias must be aggregated using a variance reducing scheme, and a base model with high bias and low variance must be aggregated using a bias reducing scheme [47].
Most of the traditional learning approaches which generate a single hypothesis face many computational, statistical, and representational problems which could be conquered by ensemble learning to some extent [48]. e resulting prediction of the ensemble is obtained through majority voting [49]. ese techniques generally depend on randomization approaches, which are able to generate manifold solutions to imminent problems. Ensemble assists to upgrade the generalization and robustness of the model. Decision tree-based traditional learning methods face high variance and bias problems. Actually, the bias-variance trade-off is the basic property of a predictive model. Bias occurs due to wrong belief in the algorithm and high bias indicates the underfitting of the model. On the other hand, the variance occurs due to the sensitivity of the algorithm and indicates that the model is too complex, and it leads to overfitting. Hence, there must be a proper balance between bias and variance in an ideal model. An individual decision tree generates a single hypothesis, and an ensemble of decision trees can produce much better results by reducing bias and variance [48]. Bagging and boosting are most widely used ensemble approaches [50] that will be discussed further in this section.

Bagging.
Bagging is also recognized as bootstrap aggregation, and it uses sequential as well as parallel methods to generate samples. Bagging generally uses analogous weak learners and trains them concurrently, followed by combining them using certain averaging methods. In bagging, multiple base learners are hypothesized on a randomly selected set of training instances with replacement, and a base learner is trained on each set [51]. Bagging follows "voting" and "regression averaging" methods for solving the classification problems. Random forest (RF) is an example of bagging that is widely used in design of an IDS [52]. e structure of bagging algorithm is very much similar to the structure of general ensemble learning and has been shown in Figure 1.
(1). Random Forest (RF). Random forest is one of the most successful and well-known ML algorithms that is known for high accuracy and independent fast learning over datasets of distinct nature. In [49], Breiman proposed a value for this parameter which is "[log 2 (#f ) + 1]", where "f" is the set of features. Random forest algorithm is an ensemble approach that makes predictions on the basis of results obtained from a group of decision trees. It performs the resampling of trees using the bootstrap (bagging) mechanism [53]. e trees in a forest are trained using a bootstrap subset created for training. Each node in RF is split using the best predictor that is chosen randomly at the node level. e additive random layer makes it stronger against overfitting. A small de-correlating twist is made to improve the bagged trees. Many decision trees could be built by bagging on bootstrap sets of training data. A random sample of n predictors is selected as a splitting candidate out of a complete set of predictors. e number of trees in the forest becomes larger, so the generalization error of random forest meets a limit. e major advantage of random forest is that parameters are rarely required to be tuned and remarkable results could be obtained through default parameter settings [54]. Here, the parameters that need to be tuned are related to controlling the depth of the decision tree. e growth of the decision tree could be bounded by tuning the "maximum depth" and "number of instances per node." Random Forest (Bagging) Algorithm Step1: Select P random data points (random samples) from the training set of the given dataset.
Step2: Build the decision tree using selected data points for every point.
Step3: Specify m, the number of decision trees to be built.
Step5: Find the predictive value of each decision tree and the data points to the winner of majority voting.

Advantages of Random Forest
(i) RF has the potential to solve both classification and regression problems.
(ii) RF can improve the accuracy of the model. (iii) RF is less liable to the problem of overfitting.

Disadvantages of Random Forest
(i) Computations may become more complex due to the high number of trees. (ii) e algorithm may make modifications by minor data transformation.

Boosting.
Boosting is a technique used for improving the accuracy of a learning model. Boosting adopts the sequential ensemble method [55]. Using the boosting technique, the ensemble learner can boost the weak learner and convert into strong learner [50]. A strong learner is an optimized learner that approaches nearly perfect (moderate) performance. e idea behind boosting is to reduce classification errors and improve the results over many other classification algorithms. A set of learners are trained sequentially and merged for prediction. Each base model depends on the previous base model. Machine learning models designed using boosting algorithms emphasize the premium quality prediction done by a single model. Models designed using boosting methods produce superior results [56]. A specific weak model can be improved using the boosting mechanism. e boosting algorithm attempts to enhance the prediction potential by training a series of weak models. Each next individual model is trained with the input data and the weakness of its previous model and attempts to recover the deficiency of its predecessor. A model developed using the boosting technique is recognized as a generic model instead of the specific model. e idea behind boosting is to design an efficient algorithm to convert relatively weak hypotheses into very strong hypotheses. ese strong learners of boosting algorithms are also faster than the learners of bagging (random forest). Actually, boosting algorithm boosts the performance of classification as well as regression [57]. Figure 2 shows the boosting ensemble-based learning model, where each individual model learns on the weakness of its previous model. Boosting Algorithm Step 1: Train the first base model (say model 1) with input Dataset D and the learning algorithm.
Step 2: Calculate the result in the form of weight.
Step 3: Train the next base model with the weak predicted result of its previous model and repeat step 2.
Step 4: Repeat step 3 until model N.
Step 5: Obtain weight N as the final prediction and generate the final result.
(2) Gradient Boosting (GB): AdaBoost was later generalized as GB. In AdaBoost, the weak learner refers to the decision tree with a single split called decision stumps. GB is one of the most powerful decision tree algorithms of machine learning. Prediction models built using GB give outstanding accuracy and speed in the case of large and complex datasets (a large number of features and/or samples) [57]. Bias and variance are two significant errors that are solved by machine learning-based models. GB ensemble models reduce bias errors very efficiently. Gradient boosting algorithm integrates a number of weak classifiers to make a strong classifier F(x) [29]. In the GB approach, the classification depends on the residuals of the previous iterations. Consider a training dataset D; where D � {X i y i } 1 ..,..N , and the objective of GB is to get an approximation. Gradient boosting constructs an additive approximation of the weighted sum of functions (F * (X)) that has been presented as follows: where ρ k is the weight of the k th function h k (X). e approximation is built iteratively; for which constant approximation, F 0 (X) is obtained initially for F * (X). e functions are the models of an ensemble technique. If the iterative process is not regularized properly, the built model can face the overfitting problem. ere are many regularization parameters that can be considered to control the GB additive process. Gradient boosting can be regularized naturally using the shrinkage process to reduce every step of the gradient descent F * (X). e following equation introduces the shrinkage into GB using regularization parameters ] and k: where parameter ] is the "learning rate" and k represents the number of components. ese regularization parameters can control the degree of fit that affects the result optimality. Increasing or decreasing the value of learning rate influences the outcome. e effect of every feature is calculated sequentially in order to obtain target accuracy [45].
Loss function L(φ) is used to calculate the residuals. e loss function is optimized using gradient descent. Final result Ø(X) is calculated by adding the results of the T sequential classifiers. f k is the decision tree and M is the total number of iterations [58]. e following equation presents the mathematical calculation of final result of GB: where f k ∈ F. Gradient boosting method requires the following three main components: loss function optimization, prediction using weak learner, and an additive model to add a weak learner for minimization of a loss function. is algorithm incorporates many weak learners into a strong learner in a repetitive manner [59].
Advantages of Gradient Boosting (i) Gradient boosting is a greedy algorithm that can quickly overfit the training dataset. (ii) Performance of the algorithm can be improved by reducing the overfitting.
(iii) Using the regularization approach, various parts of the algorithm can be panelized.

Disadvantages of Gradient Boosting
(i) High running rate, power consumption, memory usage, and training time. (ii) Interpretability problem.
(3). Extreme Gradient Boosting (XGB): Extreme gradient boosting or XGBoost is a decision tree-based improved GB algorithm that can improve the accuracy, efficiency, and feasibility of ensemble-based IDS [60]. It can smoothly deal with bias-variance trade-offs. It is recently being dominated by prediction problems including unstructured data (text, images, voice). It can be used to solve a wide range of problems including classification, regression, user-defined prediction, and ranking. It performs parallel computation at the node level that makes it faster and more powerful than GB [61]. Many researchers have proved it as one of the fastest and memory-efficient machine learning algorithms. XGBoost as an anomaly detection system gives superior performance. e following equation shows the mathematical explanation of XGBoost: e main aim of XGB is residual fitting. Residual is the difference between real and predicted values. Here, X is the input data, F(X, w) is the model to be obtained, h k is used for a single tree, w is the tree's parameter, and α n is weight of k number of trees. We can obtain the optimal model by minimizing the loss function F * [38]. XGBoost also follows the randomization technique to improve the performance of training speed and to reduce the overfitting. e randomization technique in XGBoost contains the following four major hyper-parameters: column subsampling of a tree and its node levels; random subsamples for training independent trees; learning rate; and n estimators. XGBoost is a convenient algorithm to construct a robust classification model. Due to various features, it can efficiently deal with many  Computational Intelligence and Neuroscience issues related to data classification and high-level preprocessing [39]. It is able to convert a weak (hypothesis) learner into a strong (hypothesis) learner using the optimization process. By adding every new tree enables the classification model to develop fewer false alarms, accurate data classification, and easy data labeling [62]. Advantages of XGBoost (i) XGB is comparatively faster than other existing boosting algorithms.
(ii) It contains linear as well as tree learning algorithms. (iii) XGB library is mainly used to design faster and highly efficient decision tree models.

(4). Light Gradient Boosting Method (LGBM)
LGBM is a novel boosting model which was proposed by Microsoft in 2017. e outcomes of various machine learning techniques and the results of ensemble-based techniques are tested with various parameters such as accuracy and speed.
LGB is a distributed, quick, and highperformance gradient-based uplifting algorithm that is derived from popular machine learning algorithms [63]. Samples with small gradients are well trained (sometimes generate a small error in training) and those with large gradients are undertrained. is algorithm expands leaf-wise instead of node-wise and the maximum delta value is chosen for leaf-wise augmentation. It can be used for solving many machine learning problems like regression, classification, and prediction [58,64]. e process of bucketing continuous features into discrete bins increases the training speed. is factor also improves the efficiency. It follows the leaf-wise split method instead of level-wise, which results in much more complex trees.
is factor plays a role in attaining higher accuracy. However, it can lead to an overfitting problem which can be prevented by tuning the parameter "max_depth".
LGB is composed of decision trees which are constructed using the following procedure. e method to calculate the gain of variation occurs under weak and strong gradients. e training samples are organized in decreasing order as per the absolute value of their big and small gradients (g i ). e first s% samples with bigger gradients are preserved to construct the subset of samples S. e remaining set S C is created by the (1 − s) % of samples with smaller gradients [31]. e subset R with size r * | S C | is constructed randomly. Finally, the samples are divided in accordance with the evaluated V * j (d) (variance gain) on S ∪ R subset. Equation (5) presents the mathematical formula for variance gain. Let the feature set be where d is the point of partitioning the dataset to calculate the best gain invariance; and 1 − s/r is used for normalizing the gradient sum over R. Each feature in x i is utilized to calculate the split of training data covering all trees. Important hyper-parameters of LGB are "learning_rate", "max_depth", and "n_estimators" which could be tuned to obtain the best performance of the model. LGB uses exclusive feature bundling and gradient-based one-side sampling (GOSS) for faster processing [65]. Advantages of LGBM (i) LGB is the fastest among all decision tree-based algorithms.
(ii) It serves the fastest training speed and reduces computational complexity.
(iii) In LGB, continuous values are replaced by discrete bins resulting in less memory consumption.
(iv) It consumes low communication cost.
(v) It performs with good accuracy.
(vi) Less time consumption for data preprocessing and decision-making is the most significant feature of LGB.
Disadvantages of LGBM (i) Sometimes it compromises in accuracy.

(5) CatBoost (CB). CatBoost is a novel open-sourced GB
library that strongly deals with categorical features even during the time of preprocessing [27]. It is used as a new framework for leaf value calculation while choosing the tree schema, which helps to minimize overfitting. It gives good performance in terms of accuracy. Decision trees are suitable for datasets containing numerical features. However, datasets that contain categorical features cannot be predicted by a decision tree. Such features contain discrete sets of values such as name and ID, which are not comparable with each other. CatBoost is feasible for such features that convert categorical data to numerical data before training while preprocessing [66]. e main hyper-parameters of CatBoost are given in Table 1.

Advantages of CatBoost
(i) High training and test accuracy.

Disadvantages of CatBoost
(i) It compromises in speed.
ere are certain key differences between bagging and boosting ensemble algorithms which have been identified by exploration of the existing literature and research work done earlier in this section (see Table 2). e next section identifies the differences between bagging and boosting-based algorithms on the basis of practical analysis.

Dataset Selection and Practical Implementation
Several methods of machine learning have been used so far for anomaly detection considering binary classification with different experimental setups [66][67][68][69][70][71]. In many cases, they have achieved outperforming results. In this section, the study explores and analyzes the performance of bagging and boosting algorithms to identify the best algorithm for ID model considering the multiclass classification of the "TON-IoT dataset" which consists of further datasets related to individual IoT home scenarios. Each dataset has a specific number of features and number of instances (see Table 3). rough experiments, the study predicts the nature of the individual record in a dataset means whether it is normal or anomalous. is study utilizes ensemble-based bagging and boosting techniques that are trained on IoT datasets of certain home scenarios. Each dataset has 7 or 8 multiclass labels which will be further classified using ensemble-based classification techniques [72].

Procedure for Data Computing and Analysis.
e entire computing and data analysis procedure will be implemented using python programming in the jupyter notebook: An interactive computing environment.
(i) Identify and adopt a dataset suitable for ID problems that contain the records of network traffic of IoT environment. (ii) Download the "CSV" file containing the "TON-IoT" dataset that will be utilized as an array.
(iii) Load and prepare data to train and evaluate a model. Data will be prepared using certain preprocessing techniques, namely, data cleaning, data transformation [73], scaling, and feature engineering [74]. (iv) Split the dataset array (features or attributes) into X (input) and Y (output). Specify the attribute indices in the format of the NumPy array. After analyzing the significance of features, select the most promising features to compute the output [75]. e feature selection method removes the noisy data and improves the performance of the classifiers [76].
(v) Split the X and Y data into training and test data.
e training data will be used to prepare and train a model and test data will be used to make predictions. Specify the size of test data. Calculate Y_prediction using scikit learn method "model.predict()". (vi) Train the model using different ensemble-based classification algorithms of machine learning.

Experimental Results and Discussion
In this section, the results have been figured out by implementing the ensemble-based machine learning approach on training and test data of TON-IoT datasets based on IoT sensors. Generally, the test_size is taken as 20% to 35% of the total data and the rest of the data is used to train the model. Each individual ensemble algorithm has its classifier method that is utilized for designing the model. Table 1 presents the details of various classifiers considered in this article. Performance of bagging and boosting algorithms could be optimized by tuning their respective hyper-parameters. e values are assigned to the hyper-parameters of each classifier and tested until they reach to their best performance. Hyper-parameters of XGB could be more finely tuned as compared to other classifiers.

Selection of Ensemble Algorithms.
e algorithm for model designing will be selected on the basis of analysis and results obtained using different parameters. Different bagging and boosting algorithms will be examined and validated on couple of important metrics such as accuracy score, speed, precision, recall, F1-score, and mean accuracy. Table 4 shows the computed results of the train and test accuracy of ensemble bagging (random forest) and ensemble GB (XGB, LGB, and CB) classifiers. e accuracy performance depends on train_test_split parameters such as "test_size" and "random_state". e performance has been evaluated on data extracted from different IoT devices (fridge, garage door, GPS tracker, motion light, thermostat, and weather monitoring system) in a smart home environment. In different cases, the accuracy scores of the examined algorithms slightly vary from each other. Accuracy is a prime metric to compare ML-based models, and a good model must attain high accuracy. However, it is a necessary condition, not sufficient. e algorithm must be evaluated on a number of parameters to validate and prove its efficiency. 8 Computational Intelligence and Neuroscience
Just measuring the accuracy with good results is not sufficient to prove it a most efficient algorithm; still, there are chances that the model will predict false negative values.

TPR or Sensitivity or Recall
TPR or recall or sensitivity determines that how many relevant instances have been selected. TPR and TNR determine the percentage of total relevant attack vectors and normal events, respectively, which have been correctly classified by the classifier. TPR and FPR are the detection rates, where TPR is the actual positive rate and the FPR is the actual negative rate. e precision or specificity determines the percentage of relevant outcomes that means how many instances are relevant out of the total selected instances. Equations (9) and (10) refer to the mathematical formula of Precision and F1score, respectively. F1-score refers to the harmonic mean of precision and recall. It might be due to disproportionate class distribution in the training dataset.
Here, the precision, recall, and F1-score of different algorithms have been computed followed by the analysis of the classification reports of different decision tree classifiers for the considered datasets. Table 5 shows that in most cases, boosting-based algorithms result in the highest classification scores. But no specific boosting algorithm produces the highest classification score for all the considered datasets and this analysis could not prove the highest efficiency of any algorithm. Hence, this evaluation could not do enough work to identify the most efficient algorithm for classification and prediction model for ID. Now this study examines the mean accuracy of considered datasets using earlier discussed algorithms and some more GB (HistGradient boosting) algorithms to validate their efficiency (see Table 6). e mean accuracy score evaluation has been performed using "kFoldCrossValidation" method of "sklearn.model_selection" module of sklearn library with random sets of train and test data. It is obtained by calculating the average of k recorded accuracy. It also serves as a performance metric of the model that validates the performance more strongly. is is a method to train and test the model on a different set of samples instead of repeating the same data sample. By selecting the value of k, one can estimate the skill of the model on random partitions of the original data. Here, the "RepeatedStrati-fiedKfold" method of cross-validation has been utilized with parameters "n_splits", n_repeats", and "random_state" to obtain the prediction accuracy. Table 6 shows that using the k-fold cv, the accuracy of XGBoost and CatBoost was found better than other comparative algorithms. e parameters have been tuned to obtain the highest accuracy. Here, the code has been executed with different values assigned to the hyper-parameters. e values of "n_splits � 5" and "n_repeats � 3" have been assigned and the results produced by CatBoost algorithm are highest in accuracy. Table 7 shows the runtime consumed by various ensemble-based classification algorithms examined in this study. Runtime is the execution time that represents the total time consumed by an algorithm from start to stop. If the data size is big, then it will consume time in seconds. It is very much important to select an ideal algorithm that can execute the problem in minimum time duration. For example, a model has been designed for the classification of records for prediction of crime, and if it is taking too much execution time, then it might lead to a delay in crime investigation. It may also cause destruction and theft of pieces of evidence during the delay time. Hence, runtime is also an important parameter to select an algorithm. Here, the LGB classifier has taken the minimum time (in seconds) that is many times less compared to some other algorithms. erefore, an ID model designed using LGB can give the best possible performance in terms of speed. Figures 3(a) to 3(x) present the ROC-AUC curves for the "TON_IoT" dataset in different home scenarios. e curves measure the correctness of the rank order of classification [34]. e results have been represented using the receiver operating characteristic (ROC) curve that shows the graph of the performance of a classification framework at each classification threshold.
ese are important aspects of machine learning for graphical representation of true positive (actual positive) rate and false positive (actual negative) rate. ROC curves are generated by plotting the connection (trade-off) between TPR (recall) and FPR on distinct threshold locations. ROC curves are used to test and compare the adequacy of a model. Figures 3(a) Table 8 presents the ROC_AUC score that is computed using the ROC_AUC_Score () method whose parameters are "y", "predict_proba(X)", and "multi_class � ovr". e pre-dict_proba(X) calculates the probability of the types of output classes on the basis of input samples. e performance of the classification results generally depends on some hyper-parameters of "make_classification" function, namely, "Number    of samples", "selected features", and "random states". Here "ovr" stands for One-Vs-Rest that is used for multiclass classification. It divides the multiclass dataset into numerous binary classification problems. AUC presents the relation between TPR and FPR. e highlighted values in Table 8 show that the boosting classifier, especially LGB, produces highest ROC_AUC score in most of the cases. e ROC_AUC curves in Figures 3(a) to 3(x) show that the random forest and the light gradient boost (LGB) algorithms have the highest TPR and FPR. LGB takes the lowest runtime and high TPR and FPR. Sometimes LGB compromises in the accuracy but not too much. e accuracy results produced by LGB are not very much less than others. Runtime must also be considered for selecting an efficient predictive algorithm in designing a model for an IDS.

Conclusion
In this article, the ensemble-based machine learning algorithms handled very complex and big data of Internet of ings with good potentiality. e ensemble bagging and  LGBM, (d) Fridge using CatBoost, (e) Garage_Door using random forest, (f ) Garage_Door using XGBoost, (g) Garage_Door using LGBoost, (h) Garage_Door using CatBoost, (i) GPS_Tracker using random forest, (j) GPS_Tracker using XGBoost, (k) GPS_Tracker using LGBoost, (l) GPS_Tracker using CatBoost, (m) Motion_Light using random forest, (n) Motion_Light using XGBoost, (o) Motion_Light using LGBoost, (p) Motion_Light using CatBoost, (q) ermostat using random forest, (r) ermostat using XGBoost, (s) ermostat using LGBoost, (t) ermostat using CatBoost, (u) Weather using random forest, (v) Weather using XGBoost, (w) Weather using LGBoost, and (x) Weather using CatBoost. boosting classification approaches have been analyzed to retain a good multiclass classification model that can predict the types of normal and anomalous classes in IoT network traffic. is article also has validated the algorithms for designing a model for intrusion detection by training on several sets of train data and evaluated on separate test data by tuning the values of hyper-parameters. For avoiding the repetitions, each time the training and test sets have been shuffled using "RepeatedStratifiedKfold" methods of crossvalidation. In this article, the comparison of ensemble bagging and boosting algorithms has been carried out in terms of accuracy score, mean accuracy score, speed, precision, recall, F1-score, and auc_score. e light gradient boosting (LGB) algorithm is the most efficient algorithm in terms of speed and ROC_AUC score. Sometimes LGB compromises in terms of accuracy. Still, the accuracy score is very good, and it is not very much less than other classification algorithms. Random forest is also one of the most accurate algorithms but it takes high execution time and computational power and it is too much complex. Speed is an important metric for an intrusion detection model to obtain quick outcomes and LGB was found to be the fastest algorithm with lowest runtime. Hence, light gradient boost is the best algorithm to be selected for designing an efficient intrusion detection system. In future, the proposed work will be helpful to design a very fast, accurate, and lightweight intrusion detection model for IoT-based smart environment. e proposed work will be helpful to perform multiclass classification of datasets containing large number of complex data records. ere is a wide scope of the proposed work not particularly for intrusion detection in IoT but also for classification and prediction of various applications of different environments.
Data Availability e data are available on request from the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.