Secure localization under different forms of attack has become an essential task in wireless sensor networks. Despite the significant research efforts in detecting the malicious nodes, the problem of localization attack type recognition has not yet been well addressed. Motivated by this concern, we propose a novel exchange-based attack classification algorithm. This is achieved by a distributed expectation maximization extractor integrated with the PECPR-MKSVM classifier. First, the mixed distribution features based on the probabilistic modeling are extracted using a distributed expectation maximization algorithm. After feature extraction, by introducing the theory from support vector machine, an extensive contractive Peaceman-Rachford splitting method is derived to build the distributed classifier that diffuses the iteration calculation among neighbor sensors. To verify the efficiency of the distributed recognition scheme, four groups of experiments were carried out under various conditions. The average success rate of the proposed classification algorithm obtained in the presented experiments for external attacks is excellent and has achieved about 93.9% in some cases. These testing results demonstrate that the proposed algorithm can produce much greater recognition rate, and it can be also more robust and efficient even in the presence of excessive malicious scenario.

The location information of the sensor node performs a critical role for numerous applications in wireless sensor networks (WSNs) such as environment monitoring, target tracking, and automatic surveillance. It also helps some fundamental techniques in sensor networks (e.g., geographical routing protocol and topology control) to be aware of where the messages are located. Driven by those demands, earlier research efforts have resulted in many localization schemes, with most assuming that the sensors are deployed in a benign scenario. But when the sensor nodes are deployed in malicious environments, it is prone to different forms of threats and risks. A simple malicious attack can disturb the accurate position estimating and even make the entire network functioning improperly [

In recent years, designing secure localization schemes that provide valid location information resistant to externals attacks has received much research attention [

Most existing works for WSN localization security focused on either achieving high detection ratio under different types of attacks or developing robust positioning methods. Unfortunately, none of these techniques can explicitly differentiate those attacks. This may make the network defense fall into the passive situation and have a negative effect in preventing future repeated attacks. If the network only detects localization attacks without type classification and analyzing, the possible consequence can be implied as follows. One of the main results is that it is not convenient for network to restore location-related information. The other is that it could make the network difficult to provide more information services and evidence in security event processing. Only after alert information is collected and analyzed can we determine the dangerous region where attack frequently takes place and then design targeted localization scheme according to certain threat. Therefore, attack classification in localization is not only the premise and foundation of threat analysis, but also a crucial component in network security situation awareness. And attack recognition algorithm should be executed as second line of protection against attacks before the location information can be used by other applications.

In this work, we proposed a localization attack classification method based on the distributed expectation maximization algorithm followed by support vector machines called PECPR-MKSVM. The classification mechanism consists of two phases: the feature extraction phase and the classification phase. The techniques developed in our solutions offer the advantage of classifying various kinds of attacks. More specially, our approach possesses the following contributions.

To extract more efficient attack features, an Exponential-Gaussian (EG) mixture distribution is firstly modeled by investigating the common properties of initial features based on their probability distribution. The initial features are composed of distance and topology-related measurements.

A distributed version of expectation maximization (EM) algorithm which exchanges information with neighbor sensors is implemented for density estimation and feature extraction, where one term for time dependent information averaging is combined with another term for iterative information propagation.

In order to recognize multiple attacks more accurately and adapt to the distributed characteristics of sensor networks, we design an exchange-based classifier called proximal extension contractive Peaceman-Rachford splitting-multiple kernel support vector machines (PECPR-MKSVM).

To identify the effectiveness of our distributed recognition approach, comprehensive designed experiments are conducted by testing the attacks dataset under different conditions. Compared with other similar schemes, we find that the results obtained in these comparisons clearly show that the distributed classification algorithm achieves better recognition performance and has stronger robustness, with very competitive runtime.

The remainder of the paper is structured as follows. Some related works on secure localization and recognition algorithms are reviewed in the next section (Section

To investigate the scheme for classifying localization attack in WSN, a necessary literature survey on secure localization mechanism is firstly provided. Moreover, we provide a succinct summary of research on two essential components of the proposed method, that is to say, the EM algorithm for feature extraction and support vector machines for classifier.

In the prior work about the secure localization, one theme is able to discover and eliminate the suspicious nodes. In [

The other theme is an error-intolerant localization when there exist malicious adversaries and great measurement inaccuracy. Li et al. in [

According to our current knowledge, the problem of localization attacks recognition for sensors network, which is our focus here, has not been well studied.

Unsupervised feature selection/extraction techniques are generally classified into three categories as wrappers, filters, and integrated-learning approaches. Several integrated-learning feature extraction algorithms like EM have been developed in various fields. In [

SVM, the most popular branch of machine learning theory to address classification and regression problems, was firstly presented from research in statistical learning theory. Then the introduction of kernel skill breeds a new group of techniques for nonlinear program with high-dimensional or small-sample data [

It is considered that there exist three classes of nodes distributed randomly in the sensing area: sensors, anchors, and malicious nodes. The random network topology is modeled as Erdös-Rényi (ER) random graph denoted by

The WSN is assumed to be deployed in an adversarial attacking environment. The adversary launches only external attacks to disrupt the localization procedure, which means it implements malicious behaviors without right cryptographic key. Moreover, the presence of malicious nodes is a small number compared with the benign number in local area. The attack type of the malicious node is divided into three categories including wormhole, replay, and interference attack [

Sensor localization under a wormhole attack.

Because there is no single variable to directly characterize the external localization attack, it is necessary to build the original feature set. From the above-mentioned description of external attacks, we find that it might interact directly or indirectly with distance between nodes. The value of distance

It seemingly makes sense that the value of the original feature will vary when sensors are under attacks. However, we found that the difference of change in original features between some types of localization attack is not significant, which cannot be classified by a threshold. Furthermore, this change will be expanded under multiple attacks. Thus an effective feature extraction method needs to be explored. We note that the above-mentioned original features can be described by statistics modeling. If a distribution model is constructed to represent the original features obtained by each sensor, the different attack type can be described more accurately by the model parameters extracted.

For each single element such as

Moreover, the feature vector

For the topology-related feature, the probability distribution is further investigated to model the irregular deviation along with the normal and attack scenario. Because of limited space, two representative parameters that form the mixture distribution were chosen to analyze the impact of external attacks.

The first parameter analyzed is the node degree. It is expressed by the total amount of neighbors connected to a picked sensor. Degree distribution is defined by a probability

In this formula,

Figure

Degree distribution of sensors network and the probability density function of its Gaussian distribution approximation for different external attacks scenario. The parameters in Gaussian distribution are estimated as (a)

Unassailed

Wormhole attack

Interference attack

Replay attack

The second property analyzed is normalized betweenness centrality. The betweenness centrality [

Figure

Normalized betweenness centrality distribution of sensors network and the probability density function of its exponential distribution approximation for different external attacks scenario. The parameter in exponential distribution is estimated as (a)

Unassailed

Wormhole attack

Interference attack

Replay attack

For demonstrating the capability of the distribution approximation, the mean square error (MSE) of the approximation curve related to the probability of observed data for all topological features is calculated by setting different attack scenarios and the result is listed in Table

Mean square error (MSE) of the distribution approximation curve for all topological features under different external attacks.

Feature category | Unassailed | Wormhole | Interference | Replay |
---|---|---|---|---|

Degree | 0.0552 | 0.0376 | 0.0475 | 0.0437 |

Clustering coefficient | 0.0195 | 0.0285 | 0.0124 | 0.0122 |

Normalized betweenness centrality | 0.0153 | 0.0150 | 0.0146 | 0.0143 |

Coreness | 0.0246 | 0.0232 | 0.0280 | 0.0273 |

For a set of

In order to explore the statistical properties embedded in the mixture density function and to describe the behavior of attack more completely, the EM algorithm can be adopted for calculating unknown model parameters [

The exchange-based distributed EM method we proposed is to calculate and update the parameters in classic EM method by using the neighbors’ information, which is based on the idea of distributed averaging approach in [

Then, based on the probability density in Section

Then the log-likelihood for the features vector satisfies

After initializing

Then the condition expectation with respect to the actual observed feature

Note that the calculation of current intermediate state

Time cost of distributed EM versus variation of

Consequently, the estimation of parameters can be updated as follows:

When feature extraction using distributed EM algorithm is finished, five new features are acquired for each node, which is defined as

After these features have been selected and further extracted, we plan to perform classification to recognize the external attacks. A classification process with excellent generalization properties and minimal test error is sufficient to compensate for deficiency in the feature dimensions. As described in the last section, the EG mixture modeling and distributed EM feature extraction all belong to the generative model, which could establish more distinct features from the variation of distance and topological parameters by exploiting their probability density. The generative model possesses excellent ability of modeling and flexibility for the nonnormalized data. However, the optimization capability of generative scheme in recognition phase is always weaker than its discriminative counterpart, especially when the labelled data is sufficient [

Here, it is noticed that, besides the MK-SVM algorithm, logistic regression (LR) is another prominent and competitive methodology among the discriminative classifiers, which has been used for an extensive range of recognition tasks [

For the multiclass problem, we can equal it to a linear equality-constrained optimization problem which consists of multiple separable objective functions:

Referring to the literature [

Considering a labelled training set

In order to convert the inequality constraints to an equality

Then optimization procedures can be divided into two parts, with

Substitute equations in (

In the following, we consider the

Repeat the above iterations until convergence. Therefore, we name (

While the PECPR-MKSVM algorithm runs effectively under the condition of normal, it ignores the attack scenarios that the intermediate variables may be negatively disrupted by the modified data from attacker. To address these inadequacies, a simple two-stage calculation verification policy is supplemented to avoid the adversary and improper output. It requires the following steps to carry out during the PECPR-MKSVM training. (1)

Based on the above design, the message transmission through neighboring nodes in the distributed recognition method can be summarized as follows and is clarified with Figure

Overview of distributed calculation among sensor nodes in the proposed recognition algorithm.

Iteration visualization of distributed EM algorithm in feature extraction phase

Iteration visualization of PECPR-MKSVM classifier in training phase

To assess the effective aspects of our mechanism, we presented four groups of experiments that were carried out under different localization attacks. In our simulation, 600 sensors including 48 anchors are randomly distributed over an

For the effectiveness evaluation of combining distributed feature extraction and classifier scheme, the recognition performances on two kinds of feature datasets are compared first between the proposed classifier and four similar classifiers, such as a distributed SVM (MoM-DSVM), a multiple kernel SVM (SimpleMKL), a typical SVM (C-SVM), and a logistic regression (LR) classifier. Table

Comparison of recognition rate in percentage (%) on the attack feature sets with different classifiers.

Related works | Average recognition rate per category under localization attacks (without feature extraction) | Average recognition rate per category under localization attacks (with distributed feature extraction) | ||||||
---|---|---|---|---|---|---|---|---|

Unassailed | Wormhole | Interference | Replay | Unassailed | Wormhole | Interference | Replay | |

PEPMK-SVM | 80.27 | 73.78 | 79.03 | 78.49 | 91.56 | 91.97 | 90.85 | 93.28 |

MK-SVM | 72.99 | 64.29 | 67.47 | 70.59 | 80.47 | 76.98 | 79.31 | 84.56 |

C-SVM | 59.14 | 56.49 | 38.63 | 61.17 | 62.54 | 66.15 | 49.25 | 68.72 |

MoM-DSVM | 73.42 | 68.96 | 73.85 | 75.21 | 82.39 | 83.76 | 80.02 | 86.54 |

LR | 71.84 | 59.31 | 71.75 | 66.84 | 79.27 | 70.38 | 82.59 | 82.81 |

We further explore classification robustness of the PECPR-MKSVM classifier with different kernel function. Figure

Average recognition accuracy comparison of the proposed classifiers with MoM-DSVM and MK-SVM under different kernel function.

When 20% sensors are malicious

When 30% sensors are malicious

In order to assess the impact of positive scalar

Evolution of test error with different positive scalar

Additionally, to assess different algorithms in saving the time cost of the classification, we further perform experiment under the situation that the number of sensors varies from 200 to 1000 and plot the computational time in Figure

Time cost comparison by using different methods.

This paper generalized a distributed classification scheme, which is used for external localization attack classification in WSN. A novel distributed version of EM feature extractor and MK-SVM classifier is also proposed. These new schemes help each sensor computing during feature extraction and recognition across different neighbor sensors. The algorithm models the distance and topological based features into a mixed distribution at the first frame of the phase. Then the parameter features are extracted with a distributed EM scheme that fuses the time and neighbors’ information, as it evolves over iteration. Eventually, a distributed classifier, which corporates MK-SVM with extension for CPRSM, is designed to classify localization attack datasets into multiclass. The experimental results have shown that using the distributed EM as feature extractor and PECPR-MKSVM as classifier can be able to achieve higher classification accuracy than other similar methods. Moreover, the attack recognition scheme presented in this paper is more robust to a wide range of attacks with competitive time efficiency.

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work was supported by the National Natural Science Foundation of China (no. 61401360), the Fundamental Research Funds for the Central Universities of China (no. 3102014JCQ01055), and the Natural Science Basis Research Plan in Shaanxi Province of China (no. 2014JQ2-6033).