A New Intrusion Detection System Based on KNN Classification Algorithm in Wireless Sensor Network

. The Internet of Things has broad application in military field, commerce, environmental monitoring, and many other fields. However, the open nature of the information media and the poor deployment environment have brought great risks to the security of wireless sensor networks, seriously restricting the application of wireless sensor network. Internet of Things composed of wireless sensor network faces security threats mainly from Dos attack, replay attack, integrity attack, false routing information attack, and flooding attack. In this paper, we proposed a new intrusion detection system based on 𝐾 -nearest neighbor ( 𝐾 -nearest neighbor, referred to as KNN below) classification algorithm in wireless sensor network. This system can separate abnormal nodes from normal nodes by observing their abnormal behaviors, and we analyse parameter selection and error rate of the intrusion detection system. The paper elaborates on the design and implementation of the detection system. This system has achieved efficient, rapid intrusion detection by improving the wireless ad hoc on-demand distance vector routing protocol (Ad hoc On-Demand Distance the Vector Routing, AODV). Finally, the test results show that: the system has high detection accuracy and speed, in accordance with the requirement of wireless sensor network intrusion detection.


Introduction
Internet of Things refers to the network which combines various sensing devices, such as radio frequency identification (RFID) devices, infrared sensors, global positioning systems, laser scanners, and other various devices with the Internet.This paper focuses on the security of Internet of Things composed of wireless sensor networks (WSN).With the rapid development of microelectronic technology, computer technology, and wireless communication technology, the Internet of Things has broad application prospects in military field, commerce, environmental monitoring, and many other fields.However, the open nature of the information media and the poor deployment environment have brought great risks to the security of wireless sensor network, seriously restricting the application of wireless sensor network [1].Internet of Things faces security threats mainly from DoS attack [2], replay attack, integrity attack, false routing information attack, and flooding attack.
In this paper, firstly we propose a new intrusion detection system based on KNN classification algorithm in wireless sensor network.This system separates abnormal nodes from normal nodes by observing their abnormal behaviors, and we analyse parameter selection and error rate of the intrusion detection system based on KNN classification algorithm [3].The paper elaborates on the design and implementation of the detection system.This system makes use of the GAINZ Zigbee nodes designed by Integrated Circuit Co., Ltd., Ningbo Branch.The nodes use UC Berkeley TinyOS operating system.By improving the wireless ad hoc on-demand distance vector routing protocol (Ad hoc On-Demand Distance the Vector Routing, AODV), the intrusion detection system can achieve efficient, rapid intrusion detection.Finally, the test results show that the system has high detection accuracy and speed, in accordance with the requirement of wireless sensor network intrusion detection.
The contributions of this paper are as follows.
(1) We have identified and presented a new intrusion detection system based on KNN classification algorithm in wireless sensor network.It separates abnormal nodes from normal nodes by observing their abnormal behaviors.
(2) We present the design and implementation of the detection system.By improving the wireless ad hoc on-demand plane distance vector routing protocol (Ad hoc On-Demand Distance the Vector Routing, AODV), the intrusion detection system can achieve efficient, rapid intrusion detection.(3) The test results show that the system has high detection accuracy and speed, in accordance with the requirement of wireless sensor network intrusion detection.
The remainder of this paper is structured as follows.Section 2 introduces a new intrusion detection system based on KNN in wireless sensor network.In Section 3, we describe the design and implementation of the detection system, presenting simulation experiments.Section 4 briefly discusses related work.Section 5 concludes the paper.

The Intrusion Detection Algorithm Based on KNN
Related studies have already raised many intrusion detection models.According to the method of analysis, these models can be divided into two categories: feature-based intrusion detection model and anomaly-based intrusion detection model.Related researches have proposed some feature-based intrusion detection systems, but there are some problems in extracting and analyzing the features.There are also some related researches having proposed anomaly-based intrusion detection systems.Through statistical analysis of the data, anomaly-based intrusion detection systems can identify the anomalistic data which deviate from the mean value seriously.Data mining technology can effectively mine the regular pattern of the data; thus, it can be applied to intrusion detection on the Internet of Things.
In this paper, we use data mining technology to design and implement the intrusion detection system.The system has three advantages: (1) the value of  for mining has little effect on the results; (2) the cutoff value used to determine the abnormal node is easy to determine; (3) the algorithm is fast and efficient.

Overview of 𝐾-Nearest Neighbor (KNN) Classification
Algorithm.-nearest neighbor (KNN) classification algorithm is a data mining algorithm which is theoretically mature with low complexity.The basic idea is that, in a sample space, if most of its  nearest neighbor samples belong to a category, then the sample belongs to the same category.The nearest neighbor refers to the single or multidimensional feature vector that is used to describe the sample on the closest, and the closest criteria can be the Euclidean distance of the feature vector.
In the intrusion detection algorithm, we use a -dimensional vector to represent nodes, such as 1, 2, . . ., .These dimensions can be as follows: the routing message number can be sent in a period of time, the number of nodes with different destinations in the sending routing packets, the number of nodes with the same source node in the receiving routing packets, and so on.In general, the node of the same type has the same characteristics.Thus the abnormal nodes will be distinguished.As is shown in Figure 1.
Wireless sensor network intrusion detection algorithm based on KNN classification algorithm (hereinafter referred to as "KNN") requires two parameters:  value and the cutoff value. value refers to the number of most adjacent nodes.Cutoff value refers to the threshold used for the judgment of the abnormal nodes.In order to describe the process of this algorithm, we have the following definitions: (1) the feature vector describing the node  is , , . .., a total of ; (2) the assemblage of all nodes in the network (including abnormal nodes and normal nodes) is NS; (3) the Euclidean distance of two different nodes  and  is eudis(, ); (4) the -distance function of node  is the value got by the summation of all the  most adjacent nodes' Euclidean distance, divided by .

The Application of KNN Algorithm in terms of Flooding
Attack.In this section, we describe the application of KNN algorithm in terms of flooding attack.We first describe the process of flooding attack.Flooding attack [2] can result in denial of service when used against on-demand routing protocols for mobile ad hoc networks, such as AODV [4] and DSR [5].The intruder broadcasts mass useless route request packets or sends a lot of useless DATA packets to exhaust the communication bandwidth and node resource so that the valid communication cannot be kept.
In KNN, the appropriate  value is a major factor that affects the detection effectiveness and cost, while cutoff value directly affects the detection error rate.The KNN detection algorithm uses the feature where abnormal nodes send RREQ messages more frequently than normal nodes in the flooding attack.By comparing the frequency to send RREQ messages of each node in the network, we can find the abnormal nodes.The feature vector describing the node is the frequency of RREQ messages.
Here we will discuss how to choose an appropriate  value and the cutoff value.We have the following assumptions and definitions of the KNN mining model: (1) the number of normal nodes in the network is 1 and the number of abnormal nodes 2, and 1 > 2.The number of all nodes in the network (including abnormal nodes and normal nodes) is card (NS) = 1 + 2.Detection nodes in the network know the value of 1 and the general scope or the upper limit of 2.We set the upper limit of 2 to . (2) An appropriate  value is the value that makes the -distance function of abnormal nodes as large as possible, while making the -distance function of normal nodes as small as possible.
KNN detection algorithm identifies abnormal nodes by comparing the -distance function and cutoff value of each node.And  value can affect the result of -distance function.We will discuss how  value affects the result of distance function of abnormal nodes and normal nodes and then get the reasonable range of  value.
(1) Analysis on How  Value Affects the -Distance Function of Abnormal Nodes.When  < 2, as is shown in Figure 2, the abnormal node number 2 is 3 (including one blue node and two red nodes), and the  value is set to 2. We need to calculate -distance function of the blue node.The nearest  neighbor nodes of the blue node are two red nodes pointed by the arrowhead.As the distance between the blue node and two red nodes is small, the -distance function of the blue node is small.An appropriate  value is the value that makes the -distance function of abnormal nodes as large as possible; thus, the  value is not appropriate.
When  ⩾ 2, as is shown in Figure 3, the abnormal node number 2 is 3 (including one blue node and two red nodes), and the  value is set to 3. We need to calculate distance function of the blue node.The nearest  neighbor nodes of the blue node include two red nodes and one normal red node.As the distance between the blue node and the normal red node is far larger than that of the blue node and the two red nodes, the -distance function of the blue node is larger.Thus, the  value is appropriate.
Therefore, when we compute and analyse the -distance function of abnormal nodes, the  value should be greater than or equal to 2.
(2) Analysis on How  Value Affects the -Distance Function of Normal Nodes.When  < 1, as is shown in Figure 4, the  value is set to 4, the normal node number 1 is larger than the  value, and the blue node is a normal node.We need to calculate -distance function of the blue node.The  nearest  neighbor nodes of the blue node are all normal nodes.As the distance between the blue node and normal nodes is small, the -distance function of the blue node is small.An appropriate  value is the value that makes the distance function of normal nodes as small as possible; thus, the  value is appropriate.
When  ⩾ 1, as is shown in Figure 5, the  value is larger than normal node number 1, and the blue node is a normal node.We need to calculate -distance function of the blue node.The nearest  neighbor nodes of the blue node include all the red normal nodes and a red abnormal node.As the distance between the blue node and the abnormal red node is far larger than that of the blue node and the red normal nodes, the -distance function of the blue node is large.Thus, the  value is not appropriate.
Therefore, when we compute and analyse the -distance function of normal nodes, the  value should be less than 1.
In summary, since the detection nodes usually do not know the specific number of abnormal nodes, we can use  (the upper limit of 2) to determine the range of  value, and the appropriate range of  value should be [, 1).
Cutoff value directly affects the detection effect.The appropriate cutoff value should make the error rate of detection algorithm as low as possible.Due to the characteristics of wireless sensor network, we are unable to give a single, accurate communication model of it at present.For the unknown distribution model of the network, we cannot get the best cutoff value through the Bayes Decision of Minimum Error Ratio.Nevertheless, it is certain that we can determine the range of an appropriate cutoff value through observation and statistics of the network.

Evaluation
In this section, we introduce the simulation setup, describe the simulation results, and discuss the reason behind the simulation results.

System Implement.
To study the performance of the intrusion detection system, we have implemented the intrusion detection system.The hardware platform includes GAINZ wireless sensor nodes and terminal equipment equipped with wired network card.GAINZ wireless sensor nodes are produced by Ningbo Zhongke integrated circuit Co., Ltd., and they are used to acquire network traffic and broadcast redlist.Terminal equipment is used to detect control system, analyze network traffic, judge the abnormal nodes, and respond to attacks.The software platform includes a serial communication assistant, the TinyOS operating system, and the AVRStudio integrated development environment.The serial communication assistant is used to exchange the control information message with users.
The intrusion detection system includes the following modules: wireless network interface module, data storage module, analysis and judgment module, and intrusion response module.The wireless network interface module is implemented by GAINZ wireless sensor nodes.The data storage module receives data from the wireless network interface module, obtaining statistical information, storing the information into the data domain to be read by the analysis and judgment module.The analysis and judgment module reads the test parameters and the data from the data storage module to analyze and make a judgment, keeping the intrusion response module informed of abnormal nodes.The intrusion response module adds abnormal nodes to the redlist and submits the redlist to the wireless network interface module.A redlist recording the abnormal nodes will be broadcasted in the network; then, all the normal nodes will no longer receive or forward RREQ messages from the abnormal nodes.At the same time, the redlist will be forwarded to other nodes to realize flooding attack response.The system structure and module function are shown in Figure 6.

System Test Solution.
Considering the feasibility and effectiveness, the test solution of this system is realized with wireless sensor network.Wireless sensor nodes used in this system test solution are GAINZ nodes designed by Ningbo Zhongke integrated circuit Co., Ltd., using Zigbee technology, compatible with 2.4 GHz wireless sensor nodes.The nodes use TinyOs operating system, and the routing layer uses ad hoc on-demand distance vector routing protocol (AODV).
The network test model is composed of a detection node, several common sensor nodes, and several attacking sensor nodes.The network test model is shown in Figure 7.The detecting node is composed of a control computer and a sensor node, receiving messages from all the other sensor nodes.It works as the core module of the system, responsible for collecting and detecting network data, providing alarm information and making a response.Common sensor nodes are responsible for the establishment of the test network, using the AODV protocol to transmit data.Attacking sensor nodes start flooding attack, broadcasting a large number of RREQ packets to the network, increasing the network load, and consuming resources of other nodes.The dotted line represents a hop transmission distance between two sensor nodes.

Simulation Results of Flooding
In the experiment, we choose flooding attack frequency, the number of flooding attack nodes, and normal nodes as the attack parameters.By adjusting these parameters, we observe the flow of one node in a certain link.First, we study the relationship between flooding attack frequencies and the flow.The first scenario is performed under different flooding attack frequencies with the same number of flooding attack nodes and normal nodes.The number of normal nodes is 3, while the number of flooding attack nodes is 1.
Figure 8 shows the average flow as the flooding attack frequency varies.When the attacker broadcasts 10 RREQ packets every second in a flooding attack, the average flow changes from 3675.4 Byte/s before the attack to 3312.2 Byte/s after the attack.When the attacker broadcasts 50 RREQ packets every second, the average flow changes from 3517.6 Byte/s before the attack to 2776.3 Byte/s after the attack.When the attacker broadcasts 100 RREQ packets every second, the average flow changes from 3360.0 Byte/s before the attack to 2498.5 Byte/s after the attack.
The flow obviously decreases with the increase of the flooding attack frequency.This is explained by the fact that the more RREQ packets the attacking node broadcasts, the heavier the network load is, thus leading to decrease of other nodes' flow in the network.
Similar experiments show that the flow obviously decreases as the number of flooding attack nodes increases.
Also the flow decreases as the number of normal nodes increases.This is because there are more nodes participating in forwarding RREQ packets the attacker sends.Thus flooding attack has a serious effect in the self-organization network with a large number of nodes.

Simulation Results of Attack Prevention.
In this section, we analyze the simulations of the method to prevent the flooding attack, which is presented in Section 2.
In our experiment, the  value is set to 4. We choose flooding attack frequency, the number of flooding attack nodes, the number of normal nodes, and the cutoff value as the attack parameters.The experimental data of average value is shown in Table 1.
Experiment 1 shows that when the  value is not between the number of attacking nodes and the number of normal nodes, detection system will bring a considerable rate of false alarm, which is consistent with our previous analysis.Experiments 2 and 3 show that the detection effect is obvious and the detection delay is tolerable when we set the appropriate  and cutoff values.In Experiment 4, the detection system cannot detect the attacking node as the cutoff value is not appropriate.Experiments 4, 5, and 6 have the same cutoff value and different attack frequencies; we can see that the more noticeable the attack feature is, the better the detection effect is, the less the detection delay is.This is explained by the fact that the more information the detection system obtains, the easier it detects.So the detection system can work better in larger networks.
Furthermore, we carry on experiments in a large network.The scenario is performed under different cutoff values with the same flooding attack frequency, the same number of flooding attack nodes and normal nodes.The number of normal nodes is 20, while the number of flooding attack nodes is 5.The  value is set to 10.In order to ensure the accuracy of experimental data, the test is repeated 50 times and the data is the average value of the 50 tests.Figure 9 shows the detection effect as the cutoff value varies.
We can see that the detection rate of this detection system is basically above 98.5%, and the false alarm rate 4.63% is relatively high when the cutoff value is 10.This is consistent with our previous analysis.When the cutoff value is above 20, the average detection rate is 99.0%, and the average false alarm rate is 1.5%.Thus the system to detect and prevent the flooding attack is efficient with high correct detection rate and low false alarm rate.

Related Work
Due to ubiquitous architecture and wireless transmission channel, the Internet of Things is vulnerable to many security attacks.The Internet of Things is composed of application layer, transport layer, and perceptual layer; thus, security threats the corresponding are also divided into application layer security threats, transport layer security threats, and  perceptual layer security threats.Since the transmission medium broadcast nature, wireless networks have susceptibility to protection attacks such as denial of service (DoS), wormhole attack, Hello flood attack, sinkhole attacks, and Sybil attack [6].The structure of the Internet of Things is complex and the environment it faces is also complex.Wireless sensor network is an important part of perceptual layer, so we should carry on comprehensive study on its password and security technology [7,8], secure routing technology [9,10], secure data fusion technology [11][12][13], secure localization technology [14,15], and privacy protection technology [16,17].
In fact, data mining technique has been widely applied to intrusion detection of wired network [18].Meanwhile, previous studies [19,20] have proposed intrusion detection system based on -means clustering analysis, but the system has two major problems: (1) the  value has serious influence on the mining results and (2) the abnormal node group is difficult to determine.
Yi et al. discussed flooding attack [2,21].The intruder broadcasts mass Route Request packets or sends a lot of attacking data packets to exhaust the communication bandwidth and node resource.He presented neighbor suppression, which is a generic defense against the flooding attack in MANETs [22].He analyzed effect of DoS attack in wireless network [23][24][25].Yi et al. presented a cross-layer detection [26], which is an adaptive approach to detecting red and gray hole attacks in ad hoc network based on a cross layer design.He presented the other intrusion detection methods including based finite state machine detection [27], based artificial immune systems detection [28], and distributed intrusion detection [29].He also presented intrusion prevention mechanism [30] in wireless network, including mobile firewall [31], multiagent cooperative intrusion response [32], and green firewall [33].
Choi et al. proposed a flooding algorithm with retransmission node selection (FARNS) [34] for wireless sensor networks.It is an efficient cross-layer based flooding technique to solve a broadcast storm problem that is produced by simple flooding of nodes in wireless sensor networks.FARNS can decrease waste of unnecessary energy by controlling retransmission action of whole network nodes by deciding retransmission candidate nodes that are selected by identifier information of neighbor nodes in MAC and distance with neighborhood nodes through received signal strength information in PHY.Nigam et al. proposed a profile based protection scheme (PPS) [35] security scheme against DDoS (distributed denial of service) attack.The profile based security scheme checks the profile of each node in network and only the attacker is one of the nodes that flooded the unnecessary packets in network then PPS blocks the performance of attacker.Rughinis ¸and Gheorghe presented the Storm Control Mechanism [36] that aims at mitigating flooding and denial-of-sleep attacks.The system tracks the frequency of the received packets, triggering an alert when it goes beyond a configured limit.The node tries to send the alert to the base station and then it shuts its wireless transceiver for a predefined period of time.Magotra and Kumar proposed a noncryptographic solution for HELLO flood attack detection [37] in wireless sensor network (WSN).Du et al. presented an effective scheme to defend DoS attack on broadcast authentication in sensor networks [38].They proposed using sender-specific one-way key chain for broadcast authentication.
Wazid et al. proposed an algorithm named Topology Based Efficient Service Prediction (TBESP) [39] algorithm depending upon the analysis done which will help in choosing the best suited topology as per the network service requirement under red hole attack.They also proposed a novel technique [40] for the detection and prevention of red hole attack in WSN.

Conclusion
As a new technology, the Internet of Things has been more and more widely used.Many related applications have appeared.As one of the applications, the wireless sensor network is becoming more and more popular.As one of the most important technologies in the 21st century, wireless sensor network plays an important role in connecting the logic information world and the existing physical world.
However, the open nature of the information media and the poor deployment environment have brought great risks to the security of wireless sensor networks, and it is seriously restricting the application of wireless sensor networks.In this paper, we proposed a new intrusion detection system based on KNN classification algorithm in wireless sensor network.The system can detect flooding attack in wireless sensor network.We also conduct experiments to investigate the effect of flooding attack.The simulation results show that flooding attack can seriously affect the flow especially in larger networks.After analyzing the flooding attack, we present the detection and prevention method to detect the attack.The simulations show that the system can prevent the flooding attack efficiently.

Figure 1 :
Figure 1: The schematic diagram of KNN intrusion detection algorithm.

Figure 6 :
Figure 6: The schematic diagram of KNN intrusion detection algorithm.

Figure 7 :Figure 8 :
Figure 7: The schematic diagram of the test network.

Figure 9 :
Figure 9: The detection effect with different cutoff values.

Table 1 :
The experiment data of average value under detection.