A Dynamic Programming Model for Internal Attack Detection in Wireless Sensor Networks

. Internal attack is a crucial security problem of WSN (wireless sensor network). In this paper, we focus on the internal attack detection which is an important way to locate attacks. We propose a state transition model, based on the continuous time Markov chain (CTMC), to study the behaviors of the sensors in a WSN under internal attack. Then we conduct the internal attack detection model as the epidemiological model. In this model, we explore the detection rate as the rate of a compromised state transition to a response state. By using the Bellman equation, the utility for the state transitions of a sensor can be written in standard forms of dynamic programming. It reveals a natural way to find the optimal detection rate that is by maximizing the total utility of the compromised state of the node (the sum of current utility and future utility). In particular, we encapsulate the current state, survivability, availability, and energy consumption of the WSN into an information set. We conduct extensive experiments and the results show the effectiveness of our solutions.


Introduction
WSN (wireless sensor network) is always vulnerable because it is usually deployed in hostile environments [1].The attack behaviors in WSN are mainly divided into two types: external attack and internal attack.For the improvement of hardware performance, which makes the public cryptography possible, the external attacks in WSN can be prevented effectively with the security structure based on cryptography [2][3][4].Thus, the focus of the study is about internal attack such as detection, revocation, and tolerance of the compromised nodes and replicated nodes that have been physically captured.Normally, there are three ways to detect internal attacks: analyzing the attack behavior [5][6][7][8], detecting the compromised nodes [9][10][11][12][13], and verifying replica attack [14][15][16][17].
In a WSN, the states of a sensor are typically distinguished into healthy, compromised, responsive, or fail state.At any time, a sensor stays precisely at one of the four states.For the existence of internal attacks, the sensor transits among the states in its lifecycle.In this paper, we leverage the continuous time Markov chain (CTMC) to model the state transition of sensors.In addition, we built up an internal attack detection model for WSN based on classical SIR epidemiological model.The model described the behaviors of the sensors in a WSN under internal attacks.
Thereafter, we can detect the internal attacks over the models.According to our study, the detection rate can be viewed as the rate of the transitions from a compromised state to a responsive state.In this way, the system responds immediately when a sensor changes its state to a compromised state; that is, the node has been attacked.Traditionally, the existing studies on internal attack detection in WSN focus on more efficient detection methods and higher detection rates [18][19][20], while the detection rate is actually not the higher the better in practice, especially when it is constrained with limits of network characteristics of a WSN such as power and computing capability.In contrast, we are more concerned with the trade-off between detection rate and network characteristics.
Therefore, we proposed a solution to find the optimal detection rate rather than choose the highest rate.By using the Bellman equation, the utility for the state transitions of a sensor can be written in standard forms of dynamic programming.In addition, we encapsulate the four parameters, that is, current state, survivability, availability, and energy consumption, into information set.The information set is a good indicator for achieving the balance between network characteristics and security.We can find the optimal detection rate by maximizing the total utility.Extensive experiments have been conducted to show the effectiveness of our solutions.The experimental results show that our solution can indeed improve the survivability of WSN and therefore guide the design of WSN.
The rest of this paper is organized as follows.In Section 2, we give related work and outline the perspectives and approaches in the existing literatures.In Section 3, we propose the state transition model of internal attack and internal attack detection model, based on CTMC and epidemiological model, respectively.Thereafter, we establish dynamic programming model via the Bellman equation to find the optimal detection rate.In Sections 4 and 5, we present the numerical simulation study for our methods.Finally, we conclude our study in the paper and the future work in Section 6.

Related Work
The epidemiological model has been widely used to analyze the spread of malware in wired networks [21][22][23][24][25].In literature [26], the impact of the network topology on the viral prevalence was studied and author proposed a nodebased approach.In literature [27], epidemic processes were studied in complex networks.In literature [28], a theoretical assessment approach was proposed on the impact of patch forwarding on the prevalence of computer virus.
In recent years, application of the epidemiological model in WSN has become increasingly widespread [29].The analyses based on the simulation and experiment research show that the epidemiological model can effectively describe the dynamic propagation of malware when the number of nodes in the network is large enough.In literature [30], the attack behavior of malware was studied by combining the epidemiological model with a loss equation.In literature [31], the reactive diffusion equation model of malware propagation was proposed based on the theory of epidemiological diseases.
Normally the state of the sensors in a WSN is either healthy, compromised, responsive, or failed.At any time, a sensor stays precisely at one of the four states.The state of a sensor will transit to other types if it suffers an internal attack.Therefore, we use the CTMC to model the state transition of a sensor, though the decision of the "malicious attacker" is not random in the attacked WSN, while the attack time is randomly distributed.The lifecycle of sensors can be regarded as a dynamic system, so the stochastic process can be used to establish the corresponding model.In some related papers, the Markov chain [32] is also widely used to simulate the spread of malware in WSN.State transition processes are as follows: a node in WSN in  was functioning correctly at the beginning.We suppose that the healthy sensor becomes a compromised node under an attack; that is, the state of the sensor turns to  from .When the compromised state has been detected, the state of it will change to ; otherwise, the state of it will change to  or remain at .If a sensor stays in , a response action will be carried out.If we get an acknowledgement from the node, then it moves to .Otherwise, it will be viewed as .The response actions include software rejuvenation and reconfiguration as a countermeasure against attacks.Since a WSN is usually deployed in hostile environments or areas, the sensors could be failed for the influence of environment and outage of power.

Internal Attack Detection Model.
We explore the impact of detection rate on sensors under internal attack and metrics by combining a classical epidemiological model and an economic behavioral model based on a forward-looking, representative agent.Detection efforts determine the detection rate that will determine nodes from  to  by some specific rate.It will affect the survivability and availability of nodes.The survivability and availability of one single node will have influence on the entire cluster and network.
There are four types of nodes in the WSN.Assume we have  sensors in total, and let   ,   ,   , and   be the number of healthy nodes, compromised nodes, responsive nodes, and failed nodes, respectively.Then we have the following differential equations: Equations ( 1) formalize four-state transition processes when a sensor in the WSN is under an internal attack.  in the equations is the detection rate, that is, the rate that nodes detected in  at every interval.The transition rate from  to  is taken as the detection rate   ; that is,   =   .In other words, response measures should be taken immediately as long as the node is recognized as .However, the other types of state transition do not depend on the detection rate.
The above model (model 1) illustrates the dynamic evolution process of WSN under internal attack within a certain period.The dynamics of internal attack detection model cannot be analyzed thoroughly in a short period of time, so we will focus on the process of the long-term dynamic evolution on the WSN.With the power of WSN limited and deployed in harsh environments, a large number of redundant sensors are normally deployed in WSN for the sensors cannot be able to be repaired once they transited to the failure state.After the sensor fails, the redundant node will be the suitable alternatives.We will call it "death" and "birth"; we will put forward model 2: Assume the immutability of the sum of the sensors (including   ,   ,   , and   , excluding abundant nodes),  0 is the number of the "births", and it is equal to the number of "deaths," namely the abundant nodes which replaced the "death".To simplify the counting process, let   =   =   = .
Dynamic analysis is carried out on model 2 and both the existence and stability of the equilibrium point will be discussed.According to (2), we find the steady state as follows: The Jacobi matrix of the model is acquired: (1) The Jacobian corresponding to  0 (1, 0, 0) is that and, thus, the eigenvalues of the Jacobian at  0 (1, 0, 0) must have negative real parts, which are equivalent to  1 = − < 0,  2 =   −   −  < 0, and  3 = −  −  < 0.
By using linear analysis, we can find that  * is always stable.
Model 1, which is the key of the article, is the basis of the model behind and simulation test.The dynamics analysis is only carried out on model 2.

Dynamic Programming.
We next present a dynamic programming paradigm to find the optimal detection rate.The method is based on an interesting observation that the highest detection rate does not always act as the best choice.So many factors influence the detection rate in WSN, such as availability, survivability, and energy.Suppose we have a healthy sensor under attack.The sensor still can provide service even though it transits to  due to the attack.However, the service will break off if the sensor, currently staying in , moves to .The service continues when the sensor restores to a healthy state successfully.The availability of the WSN declines when the sensor in  is doing that recovery.The utility of  is greater than  and the compromised nodes might as well have not been detected in this case.So higher detection rate does not always mean better utility.Moreover, higher detection rate means more energy consumption, which violates the efficiency rules in WSNs.Above all, we focus on the optimal rate instead of the highest one.All the factors we were concerned about have been abstracted to be part of the information set.
We propose a new objective, namely, utility, measuring the quality of the information set.The detection rate will maximize the expected net value of the present utility, while influencing current utility and expected utility in future periods.To model this dynamic maximization, we define utility within a period and define the probability of transiting across states.We switch to a discrete-time formulation, with time incremented in days and transition probabilities reformulated below on the basis of (1).
Suppose that we have complete statistics about the current value of utility, including the negative utilities, with its information set including knowledge about survivability, availability, energy consumption, and   ,   ,   , and   .
Let   () be the current utility of a sensor at time  in  ( ∈ {, , , }).Then, the utility of the sensor in  at time  is formally defined as follows: The utility function   is a hybrid indicator measuring the content of the information set that has been mentioned before, which can simplify the model and enhance the generality of it.The utility function is concave and unimodal.The coefficients,  and , in ( 7) can be adjusted according to the application.
According to (1), the transition probabilities between a pair of states are written as follows: The detection rate is determined by the current utility, at time , and the expected utility at time  + 1, of compromised nodes.We use the Bellman equation to calculate the optimal detection rate and utility equations can be written as standard forms of dynamic programming In the equation system,   () ( = , , , ) is the utility for a sensor staying in  at time  and  is the discount factor.  () is current utility. +1 () is the expected utility and   stands for the transition probabilities between states (see (8)).The second term of the right member in each equation indicates that the utility of the future ( + 1) moments is discounted to the present () utility.
Since the utilities are written in the standard form of dynamic programming, we can optimize the detection rate   dynamically with a planning horizon of length .If  = 0, then  0 is chosen to solve the problem formalized by ( 9)- (12).In period  = 1, the system updates knowledge on information set and uses ( 9)-( 12) to optimize anew over the next  planning periods.The process continues in this way.For example, if  = 7, then on February 1 the horizon is through February 8, but on February 2 the horizon extends to February 9, and so on: In ( 13), if we take the maximum value of ( 10), the optimal   can be obtained.So partial derivative of ( 13) is formalized as The left member in ( 14) stands for the gain of utility, at time , for a unit increase of the detection rate.The right member in (14) is the expected benefit from a unit increase of the detection rate at time , which comes from future discounts.
The optimal detection rate   is determined by the information set at time  and its effects on the future values of , , , and .It is reasonable to assume that the system adapts to forecasts on the basis of the current information set.
The optimal detection rate can be reached with the equation system ( 9)-( 14) by using backward induction over the planning period [0, ].

Experiments
In this section we present the experimental studies of our models.In the experiments, we simulate two different WSNs that are under internal attacks and conduct three groups of experiments with them.The first group of experiments is designed to find the optimal detection rate   by using the dynamic programming paradigm.In the second group, we verify the models.In the third one, we present comparative studies by varying the value of detection rate   .

Experimental Setup. We simulate two different WSNs in the experiments:
(1) For the first WSN, the number of healthy sensors is much larger than that of compromised sensors, where   = 0.9,   = 0.1,   = 0, and   = 0. (2) In the second one, the number of healthy sensors is almost the same as that of compromised sensors, where   = 0.6,   = 0.4,   = 0, and   = 0.
The settings of the parameters of the models are summarized in Tables 1 and 2. Particularly, the utilities of , , , and  fall in [0, 1].To note is that the parameters can be changed according to various application scenarios.

Experimental Results
The Optimal Detection Rate.In the first group of experiments, we are to find the optimal detection rate   .In this experiment, the current utilities of , , and  are initially set to 1, 0, and 0.6, respectively.We evaluated the detection rate   for the two WSNs.As we can see from Figure 2, there is no significant difference of the detection rates between the two WSNs.The results show that the ratio of healthy sensors and the compromised sensors have little influence on the detection rate   and the value of   gradually converges to 0.75 after  = 5.The optimal value of   will be obtained when  = 9, where the optimal values for both WSNs fall into [0.74,0.75].
Verifying the Models.We apply the optimal detection rate   = 0.75 in second group of experiments.Figures 3-6 plot the change in the number of sensors in , , , and  for WSNs in nine days.As we can see from Figure 3, the number of sensors in  decreases when  is in [0, 1].After the decline, there is a sudden increase and the number of healthy sensors gradually converges to a constant value after  = 4.For example, the ratio of healthy sensors is around 0.9.Since The initial value of nodes: H t = 0.6, C t = 0.4, R t = 0, F t = 0 The initial value of nodes: H t = 0.9, C t = 0.1, R t = 0, F t = 0 we have more healthy sensors, the WSN is therefore robust.In contrast, as shown by Figure 4, the number of sensors in  drops quickly to 0. The results justify the effectiveness of our detection mechanism and the optimal detection rate is very effective for the transition of compromised nodes (detection rate in the model is transition rate).From Figure 5, we observe that the number of responsive sensors jumps quickly to a peak at  = 1 and then gradually decreases to 0. When  is in [0, 1], the number of nodes in  is greatest and it is the period of most numbers of nodes from  to .So the number of nodes in  increases quickly and reaches the peak.In Figure 6, we can see that the number of failed sensors increases monotonically as the time is elapsing.This is because a WSN is usually deployed in hostile environments and the sensors cannot get repaired once they failed.From Figures 3-6, we observe that there are big deviations between the dashed lines and solid line at beginning, but the deviation drops off gradually to 0 as time is increasing.It means that each of the WSNs used in our experiments converges to a steady state regardless of the initial condition during an observation period.Therefore, we can conclude that our model is general enough and it is applicable to a large range of WSNs.
Comparative Studies.In Section 3, we have made an assumption that the optimal detection rate is better than the highest one.To justify this assumption, in this group of experiments, we census the number of sensors being in (, , , and ) by varying the detection rate   .In the previous simulation, we have got the optimal detection rate   = 0.75 and we have also proved that our model is valid for both WSNs.So we can conduct the comparative experiments over only one WSN.We use the WSN with   = 0.9,   = 0.1,   = 0, and   = 0.In the literature [33], the author chose five empirical values at the transition rate from  to , and we select the highest value 0.3 as   .In addition, we select another detection rate,

Time
The initial value of nodes: H t = 0.6, C t = 0.4, R t = 0, F t = 0 The initial value of nodes: H t = 0.9, C t = 0.1, R t = 0, F t = 0

Time
The initial value of nodes: H t = 0.6, C t = 0.4, R t = 0, F t = 0 The initial value of nodes: H t = 0.9, C t = 0.1, R t = 0, F t = 0   = 0.9, to compare with.We plot the results in Figures 7-10, where the blue solid line represents the results   = 0.75, the green dashed line represents the results   = 0.3, and the red dashed line represents the results for   = 0.9.
As shown in Figure 7, there is a drop at the beginning for each line, but the blue solid one rises immediately when  = 1.The other two lines,   = 0.9 and   = 0.3, get to rise until  = 2.This shows that our model can make the WSN more robust, since it gets restored faster.Figure 8 shows the number of compromised sensors.We observe that the higher the detection rate   the faster the line drops.The red dotted line and the blue solid line move gradually close to zero after  = 2, which means that the reliability of the WSN is getting improved.We can also observe that the blue solid line converges in almost the same speed with the red dashed

Time
The initial value of nodes: H t = 0.6, C t = 0.4, R t = 0, F t = 0 The initial value of nodes: H t = 0.9, C t = 0.1, R t = 0, F t = 0

Time
The initial value of nodes: H t = 0.6, C t = 0.4, R t = 0, F t = 0 The initial value of nodes: H t = 0.9, C t = 0.1, R t = 0, F t = 0 line.In other words, our model and   = 0.9 have the same performances, which are much better than that of   = 0.3.Figure 9 plots the number of the responsive sensors.As we can see from the figure, the blue solid line is completely below the other red dashed line.It is clear that optimal detection rate is better than the higher one.Although it only beats by   = 0.3 at  = 1, it gets improved fast after that time.In addition, we observe the blue solid line drops first, which indicates the recovery process starts earlier than other choices.Figure 10 plots the change of the failed sensors, where the three lines show similar trend.To note is that the WSN has more failed nodes when the detection rate   goes larger.when   is 0.75 and   is 0.3, the number of failure nodes is similar.
We have compared our solution with other ones from the recovery time, the recovery rate, the number of the final failed nodes, and the energy consumption.In general, the simulation results show that our solution outperforms the other ones.It justifies our observation that the highest detection rate is not always servers as the best choice.

Conclusion
In this work, we investigated the problem of finding the detection rate of WSN under internal attacks.Firstly, we established a state transition model of sensors based on the CTMC.The model described the behaviors of sensors in a WSN under attacked nodes and the transition between states.We are the first to observe that detection rate is irrelevant to other state transitions except the transition from  to .Therefore, we take the detection rate as the transition rate from  to .Secondly, we modeled the state transition process of the sensors in a WSN under internal attacks by using the epidemic model and make a formal description about this model.Thirdly, by using the dynamic programming paradigm (Bellman equation), we can easily find the optimal detection rate for WSN under internal attacks.In addition, we encapsulated the influencing factors into an information set which captures the current utility and the utility in future time.In this way, the detection rate can be optimized by maximizing the total utility of the current and future utility discount in .The experimental studies justified the validity of our models.
In the future, we would like to quantize the influencing factors with respect to survivability, availability, and energy consumption in order to improve the accuracy and practicability of detection rate.Moreover, it is more meaningful to set the parameters applied in the simulation according to a real world application.In addition, we will introduce the immune state into the model and refer to the SIRS model [34,35] for further study.Therefore, it will accelerate the design of WSN and then improve the availability and survivability of WSN.

Figure 2 :
Figure 2: The value of   for several planning horizons.

Figure 3 :
Figure 3: The proportion of healthy sensors.

Figure 4 :
Figure 4: The proportion of compromised sensors.

Figure 5 :
Figure 5: The proportion of responsive sensors.

Figure 6 :
Figure 6: The proportion of failed sensors.
failed, which are marked with , , , or , respectively.Each arc in the diagram associates with a rate   , ,  ∈ {, , , }, which indicates the rate of the transition from state  to state  when the node suffered an internal attack.

Table 1 :
The parameters in models.

Table 2 :
The parameter value.