Distributed Classification of Localization Attacks in Sensor Networks Using Exchange-Based Feature Extraction and Classifier

the


Introduction
The location information of the sensor node performs a critical role for numerous applications in wireless sensor networks (WSNs) such as environment monitoring, target tracking, and automatic surveillance.It also helps some fundamental techniques in sensor networks (e.g., geographical routing protocol and topology control) to be aware of where the messages are located.Driven by those demands, earlier research efforts have resulted in many localization schemes, with most assuming that the sensors are deployed in a benign scenario.But when the sensor nodes are deployed in malicious environments, it is prone to different forms of threats and risks.A simple malicious attack can disturb the accurate position estimating and even make the entire network functioning improperly [1].The existing attacks in localization can generally be divided into internal and external attack.Internal attackers usually are compromised nodes whose encryption key has been extracted, which can be prevented by using advanced cryptography techniques.The external attack is launched by one or more malicious nodes to distort the information without system's authorization, which means that traditional security mechanism like cryptography is limited to defend against this type of attack.In this paper, we will mainly analyze the recognition of the external attacks on the localization procedure.
In recent years, designing secure localization schemes that provide valid location information resistant to externals attacks has received much research attention [2][3][4][5][6][7][8].Most of these secure location mechanisms can be broadly divided into two categories: cheating node detection and robust localization algorithms.The former such as [3,5,7] are characterized by verifying some location-related parameters like distance or time during positioning process to detect the inconsistency and then eliminating abnormal nodes, while the latter [2,4,6,8] depend on designing robust localization schemes to tolerate attacks rather than detecting them.
Most existing works for WSN localization security focused on either achieving high detection ratio under different types of attacks or developing robust positioning methods.Unfortunately, none of these techniques can explicitly differentiate those attacks.This may make the network defense fall into the passive situation and have a negative effect in preventing future repeated attacks.If the network only detects localization attacks without type classification and analyzing, the possible consequence can be implied as follows.One of the main results is that it is not convenient for network to restore location-related information.The other is that it could make the network difficult to provide more information services and evidence in security event processing.Only after alert information is collected and analyzed can we determine the dangerous region where attack frequently takes place and then design targeted localization scheme according to certain threat.Therefore, attack classification in localization is not only the premise and foundation of threat analysis, but also a crucial component in network security situation awareness.And attack recognition algorithm should be executed as second line of protection against attacks before the location information can be used by other applications.
In this work, we proposed a localization attack classification method based on the distributed expectation maximization algorithm followed by support vector machines called PECPR-MKSVM.The classification mechanism consists of two phases: the feature extraction phase and the classification phase.The techniques developed in our solutions offer the advantage of classifying various kinds of attacks.More specially, our approach possesses the following contributions.
(1) To extract more efficient attack features, an Exponential-Gaussian (EG) mixture distribution is firstly modeled by investigating the common properties of initial features based on their probability distribution.The initial features are composed of distance and topology-related measurements.(2) A distributed version of expectation maximization (EM) algorithm which exchanges information with neighbor sensors is implemented for density estimation and feature extraction, where one term for time dependent information averaging is combined with another term for iterative information propagation.(3) In order to recognize multiple attacks more accurately and adapt to the distributed characteristics of sensor networks, we design an exchange-based classifier called proximal extension contractive Peaceman-Rachford splitting-multiple kernel support vector machines (PECPR-MKSVM).(4) To identify the effectiveness of our distributed recognition approach, comprehensive designed experiments are conducted by testing the attacks dataset under different conditions.Compared with other similar schemes, we find that the results obtained in these comparisons clearly show that the distributed classification algorithm achieves better recognition performance and has stronger robustness, with very competitive runtime.
The remainder of the paper is structured as follows.Some related works on secure localization and recognition algorithms are reviewed in the next section (Section 2).In Section 3, we describe the attack assumptions and model the initial features with a joint Exponential-Gaussian distribution, while Section 4 presents the distributed EM algorithm based feature extraction method by employing the distributed averaging approach.In Section 5, by improving the contractive Peaceman-Rachford splitting method algorithm, a novel distributed classifier PECPR-MKSVM is presented.In Section 6 we verify the performance of the classification algorithm by means of extensive experiments.Finally, some conclusions are devoted to Section 7.

Related Work
To investigate the scheme for classifying localization attack in WSN, a necessary literature survey on secure localization mechanism is firstly provided.Moreover, we provide a succinct summary of research on two essential components of the proposed method, that is to say, the EM algorithm for feature extraction and support vector machines for classifier.

Secure Localization Mechanism.
In the prior work about the secure localization, one theme is able to discover and eliminate the suspicious nodes.In [9], the authors proposed a beacon-based securing localization method, which is also used by a Minimum Mean Square approach to filter out suspicious nodes.This work was implemented by observing the inconsistency in location references between the malicious beacon nodes and the benign ones [10].Similarly, Du et al. [11] created a general scheme by using network deployment knowledge to detect localization anomalies if the level of inconsistency in expectations of the derived positions exceeds the certain threshold.Recently, another detection-based secure localization algorithm by Han et al. was proposed, which has two steps.The anchor nodes first identified the suspicious node if it sends abnormal reference information.Then a mesh generation method was developed to separate suspicious nodes [12].
The other theme is an error-intolerant localization when there exist malicious adversaries and great measurement inaccuracy.Li et al. in [13] employ an improved LM approach to achieve the goal of securing localization in a scenario where the fraction of the malicious nodes is less than 50%.Based on candidate locations identifying, a similar method called random sample consensus (RANSAC) algorithm was proposed in [14].This method used picked subsets of sensors to detect and choose the value which minimizes the median of the remains as its solution.Alternatively, by using the Taylor-series least squares scheme with different weighting, Yu et al. developed two-stage secure localization method which applied beta distribution function to tolerate the presence of malicious beacons [15].Some other approaches try to realize the secure location estimation by expressing it as a global optimization problem.For instance, by taking advantage of improved least median squares, a robust statistical method was developed to make positioning attack tolerant.In [16], Doherty et al. designed a feasible secure localization methodology using convex optimization based on pairwise angles and connectivity between nodes.Bao et al. extended the work from static to mobile scenario with the help of a game-based strategy [17].
According to our current knowledge, the problem of localization attacks recognition for sensors network, which is our focus here, has not been well studied.

EM Algorithm for Feature Extraction.
Unsupervised feature selection/extraction techniques are generally classified into three categories as wrappers, filters, and integratedlearning approaches.Several integrated-learning feature extraction algorithms like EM have been developed in various fields.In [18], the features were extracted from the continuous-valued dataset by using a primary integrated-learning strategy.In another algorithm of feature extraction, the feature saliency is firstly regarded as relevant features, and the pruning behavior is then outlined by using EM optimization.Moreover, a double-loop EM algorithm was applied in medical detection such as epileptic seizure so that the supervised learning could fit well with the mixture of experts network structure [19].In [20], EM algorithm was applied in image feature extraction to identify parameters for generalized Gaussian mixture model.Subsequently, a Kullback divergence-based similarity measure was presented and analyzed.However, the fact that class of texture distribution is under the influence of its neighborhood is neglected.To address the issues of information loss, a shuffled frog-leaping method is added to the EM algorithm to enhance the performance of crack image segmentation [21].According to an evaluation threshold , neighborhood of each pixel was classified into three types, respectively.Because the value threshold  is selected by the experience, it may lead to inaccurate segmentation.

Support Vector Machines (SVM) for
Classifier.SVM, the most popular branch of machine learning theory to address classification and regression problems, was firstly presented from research in statistical learning theory.Then the introduction of kernel skill breeds a new group of techniques for nonlinear program with high-dimensional or small-sample data [22,23].Based on MK-SVM, Yeh et al. proposed a new composite multiple kernel in a form of a linear weighted combination.They combine multiple kernel with SVMs to design a counterfeit banknotes detection system [24].Although all of these centralized learning approaches have been well preformed in various scenarios, they also increased memory and computational resource consumption, especially for low energy constrained WSN.Therefore, some new algorithms on the topic of distributed SVM have recently been presented.In [25], Forero et al. proposed a distributed SVM scheme that combines alternating direction method of multipliers (ADMM) with consensus-based SVM to reduce the training time cost.This algorithm enhanced the prediction performance with the help of ADMM optimization.However, the collaborative pattern may face risk from shortage of local processors with the increment of the multiple kernel number.

Network Assumptions and Statistic Based
Feature Model We assume that the sensors' communication range is assumed to be a circle with the same value  while the malicious nodes' communication range is unlimited.The distance between two sensors is estimated by the received power strength, whose background noises are Gaussian distributed.Likewise, the distance between the sensor and the anchor can be provided by the measurement from anchor.We also assume that each sensor has its own ID and can broadcast it with distances between its neighbors, passively collects adjacent sensors' broadcast, and then makes one list of ID and position which is also called the sensor's neighborhood observation.When all the sensors receive such multiple kinds of packets from neighbors, they transport the information to nearest sensor node with the most energy in a multihop fashion, and the node is engaged in the calculation of feature extraction and recognition and so on in WSNs.The WSN is assumed to be deployed in an adversarial attacking environment.The adversary launches only external attacks to disrupt the localization procedure, which means it implements malicious behaviors without right cryptographic key.Moreover, the presence of malicious nodes is a small number compared with the benign number in local area.The attack type of the malicious node is divided into three categories including wormhole, replay, and interference attack [26].Wormhole attack can eavesdrop on the packets of location reference at one position and then create a tunnel and send to other sensors that are far apart, thus causing inaccurate location estimating [27].As illustrated in Figure 1, sensor  1 could only capture the beacon signal of anchor  1 in normal conditions.When a wormhole attack is launched, the malicious node  2 copies the message from anchor  2 and sensor  2 and then tunnels it through a bidirectional link and replays it at the location  1 .Eventually, the node  1 will determine its location based the positions of  1 and  2 ; it may consider sensor  2 as neighbor at the same time.In interference attack, the hostile sensor may be an obstacle between signal sender and receiver to distort the signal measurement or time of arrival for ranging.For example, if a signal strength based localization process suffers range enlargement attack, attackers may attenuate the node's transmission power.Replay attack is another common type of attack which is more likely to appear under the circumstance that energy and computing resources are limited to the adversary.The location message will be captured by the malicious nodes from one anchor.Then an incorrect location reference will be retransmitted to the receiving sensor later.The position calculation in sensors can be frequently affected by the invalid information.In addition to the above characteristics, the adversarial node of wormhole or replay attack in practical environment also has the same ability of data communicating and processing as other normal sensors, which means that this malicious node can be capable of overhearing other types of packet and then modifies and broadcasts it [28,29].Again, in order to acquire more accurate attack related information, it is inappropriate to use encryption techniques to eliminate all the adversaries in advance.Furthermore, these attacks might be launched with irregular schedule during the whole classification process.But, in this study, the problem of recognition for localization attacks is more concerned, and, thus, it is required that the proportion and extent of modification in other types of packets do not exceed those in distance related information.
Because there is no single variable to directly characterize the external localization attack, it is necessary to build the original feature set.From the above-mentioned description of external attacks, we find that it might interact directly or indirectly with distance between nodes.The value of distance  and , which is closely associated with whether the node suffers from attacking, could be considered as one main initial feature for recognition.And thus the distance feature vector VD for sensor  is gained as VD  = [ 1 , . . .,   ,  1 , . . .,   ].While the distance value could describe some information about external attacks, it is still insufficient to classify those external attacks by this single feature.To handle this problem, the complex network theory is introduced to express feature information more comprehensively.Because WSN is comprised of large amounts of sensors, it belongs to complicated network structure.Furthermore, the topological properties will vary with the fluctuation of sensor's location and distance.It implies that these properties can reveal the impact of localization attack from a complex network perspective.Up to now, a number of indexes have been developed to measure the behavior in complex network such as degree and clustering coefficient, which also supply a framework that reflects various features of network.In this work, the indexes considered are degree, clustering coefficient, betweenness centrality (normalized), and coreness.The topological feature based vector VT for sensor  is defined as VT  = [  ,   ,   ,   ], where   ,   ,   , and   represent degree, clustering coefficient, betweenness centrality (normalized), and coreness, respectively.
It seemingly makes sense that the value of the original feature will vary when sensors are under attacks.However, we found that the difference of change in original features between some types of localization attack is not significant, which cannot be classified by a threshold.Furthermore, this change will be expanded under multiple attacks.Thus an effective feature extraction method needs to be explored.We note that the above-mentioned original features can be described by statistics modeling.If a distribution model is constructed to represent the original features obtained by each sensor, the different attack type can be described more accurately by the model parameters extracted.
For each single element such as   in feature vector VD  , the probability of it can be modeled into the Gaussian distribution with mean  and variance Σ according to [30], which is analyzed from the point of error measurement: Moreover, the feature vector VD  is constituted by the shortest path length, which also possesses the property of complex network.In [31,32], the length of shortest path is investigated as a negative exponential distributed variable with rate parameter : Thus, in order to obtain more detailed properties, the distance vector VD  is modeled as a mixed distribution as For the topology-related feature, the probability distribution is further investigated to model the irregular deviation along with the normal and attack scenario.Because of limited space, two representative parameters that form the mixture distribution were chosen to analyze the impact of external attacks.
The first parameter analyzed is the node degree.It is expressed by the total amount of neighbors connected to a picked sensor.Degree distribution is defined by a probability (), which is the proportion of the sensors with the same amount of  connections.A graph of Erdös-Rényi random WSN has a vertex degree following the Poisson distribution as [33] In this formula,  indicates the expectation value of number of sensors with degree .Meanwhile, we observe that the mixture distribution of the distance feature is known as a continuous probability distribution while Poisson distribution in degree feature is discrete.It is formidable to construct a unified model by using these two diverse variables.Moreover, if only employing the single parameter  in Poisson distribution, it may be hard to distinguish between multiple attacks.It has been found that, according to the limit form when  → ∞ using central limit theorem [34], the Poisson probability density function tends to be excellently achieved by a Gaussian distribution with a high mean value in Poisson distribution.The mean  of node degree in Poisson distribution under normal condition, calculated by maximum likelihood estimation, is equal to 14.72, which is not fairly satisfactory.But, for the sake of reducing computational complexity and realization of a feasible mixture model, the Gaussian probability density function is still applied in approximating Poisson distribution.
Figure 2 exhibits the degree distribution of the WSN and its variation under different external attacks.The curves of the probability density function (pdf) with Gaussian distribution approximation are also added.As seen in Figure 2(a), we notice that, for unassailed scene, the degree distribution approximately agrees with a Gaussian distribution.However, the measured distances by RSSI in real-world are affected by the multipath fading or data modification by attackers.Therefore, the value of probability for each degree varies with a fluctuation.Figures 2(b)-2(d) compare the variation of degree distribution and the probability density function of their Gaussian distribution approximation under three types of external attack, respectively.As depicted by Figure 2(b), a peak of probability emerges around degree 14, while another weak intensity probability peak appears around the degree of 35.And the mean  in Gaussian approximation function decreases to 14.16.The reason for this change is that wormhole tunnel makes some nodes far away be identified incorrectly as neighbors.When under interference attack, the maximum probability in the degree distribution is lower than that in Figure 2(a).The proportion of sensors with low degree value increases.And under the approximation of Gaussian distribution with  = 15.71 and Σ = 45.32, the spread of shape in probability density function looks wider than that in Figure 2(a).As shown in Figure 2(d), the degree distribution has similar variation as Figure 2(c).We also note that the variance Σ in the approximated Gaussian distribution has the highest value, which may correspond to the fact that relayed packets from malicious node increase the nonexistent connections.The above results demonstrated that the parameters in Gaussian distribution approximating help to differentiate these external attacks.Then, using the similar analysis as mentioned above, another feature clustering coefficient is also fitted by Gaussian distribution.
The second property analyzed is normalized betweenness centrality.The betweenness centrality [35], denoted by   , is used to examine the potential of a sensor on the connection control with other sensors and evaluate ratio summation of shortest paths passing through sensor .Therefore, the betweenness centrality   of sensor  is formulated as where   represents the entire quantity of shortest paths from sensor   to sensor   and   () represent the shortest paths quantity from   to   including sensor .For convenience, the normalization form of   is obtained by Figure 3 plots the normalized betweenness centrality distribution and its probability density function of exponential distribution approximation with the same scenarios as node degree.It is noticed that the normalized betweenness centrality distribution in all scenarios are peaked at initial part and then decrease monotonically.Previous works found that normalized betweenness centrality tends to obey a power-law distribution [36].However, the descending speed of the normalized betweenness centrality distribution for each scenario does not appear so sharply.Furthermore, in order to build the mixture model, the remainder of features should be also presented as a continuous function.Based on these considerations, the distribution of the normalized betweenness centrality is alternatively approximated by a negative exponential distribution which is of the form where  is a rate parameter.In addition, it can be observed in Figure 3(b) that the proportion of  with high probability value is increased by a small amount because the malicious nodes indirectly enhance the communication capability of their neighbors.And under the approximation of negative exponential distribution with  = 0.007114, the decay of pdf looks more rapid than that of Figure 3(a).
Comparing to Figure 3(b), the variation of distribution for the interference attack case (Figure 3(d)) is analogous to that for wormhole case, but it has milder decreasing, whose rate parameter  in exponential distribution approximation reaches the value of 0.007020.Referring to Figure 3(c), the distribution varies slightly, which leads to the change of approximation distribution parameter  being not significant.In general, the introduction of the new parameter  will contribute to distinguishing certain attacks and improving the performance of classification too.
The last topological feature coreness has similar distribution characteristics with normalized betweenness centrality.Accordingly its distribution is also approximated by negative exponential function.
For demonstrating the capability of the distribution approximation, the mean square error (MSE) of the approximation curve related to the probability of observed data for all topological features is calculated by setting different attack scenarios and the result is listed in Table 1.In general, the MSE basically maintains at the same magnitude even under attacks, except for some values in the wormhole attack.Secondly, it can be seen that the feature of normalized betweenness centrality yields the smallest MSE value compared with other topological features, which suggests that the exponential distribution is the best approximation.However, from the point of fitting accuracy, it is not easy to say that these approximation distributions precisely fit the feature data, even for normalized betweenness centrality.The reason is that, in the simulated localization of WSN, the distance related message and other data are influenced by some other factors such as channel fading, internode interference, and packet modification by malicious sensors.These elements will bias the true measurement and further increase the approximation error, which will also affect the recognition performance.Therefore, it is indispensable to integrate other approaches like classifier to strengthen the recognition ability in later processing.
For a set of  features collected by one sensor, with distribution of Gaussian and negative exponential, respectively, the probability density function for the mixed features  may be divided into two parts.One part is associated with node degree and clustering coefficient denoted by ( |   with (7), the probability density of features vector observation  is modeled in the following manner [37]: where ( | ) represents the mixture probability density of a features vector.The vector of distribution parameters to be estimated is .

Distributed Feature Extractor Design
In order to explore the statistical properties embedded in the mixture density function and to describe the behavior of attack more completely, the EM algorithm can be adopted for calculating unknown model parameters [38].However, in face of hostile environments, it is unable to confirm whether the sensors for computing and recognition are malicious or not.The data may be ruined or viciously modified by adversary itself if only the centralized computation is used, and the correctness of feature extracted and classifier recognition will be further decreased.Consequently, the issues of security in computation should be taken into consideration.The great success that has been made currently on the research of distributed computing attracts our attention [39].The primary benefit of this technique is that the multiple consistent intermediated variables updated at each incremental step can be conveyed to several adjacent nodes; the records kept on these nodes will help to detect and separate the attacker.

Journal of Sensors
Depending on this merit, a distributed scheme of feature extraction based on information exchange is presented, and then a verification policy is added.The exchange-based distributed EM method we proposed is to calculate and update the parameters in classic EM method by using the neighbors' information, which is based on the idea of distributed averaging approach in [40].We use   to denote set of nodes that communicate with sensor ; that is, there exists an edge {, } between  and any sensor or anchor .A distributed linear iteration problem for value  among sensor  and  could be described as where  is a time variable and  denotes a weight matrix where  ∈  × and   ̸ = 0 only if  ∈   and  ∈   .In order to solve problem (10) asymptotically by average consensus,  should be assumed symmetric and ensure the necessary and sufficient constraints as follows [40]: where 1 ∈   represents one vector whose elements are all equal to 1. (⋅) is known as the spectral radius for the given matrix.11  / denotes the averaging matrix.Then, based on the probability density in Section 3, the mixture distribution for  features is Then the log-likelihood for the features vector satisfies After initializing  0 1 ,  0 2 ,  0 , Σ 0 ,  0 , the distributed EM algorithm can be written as follows.
(A) Expectation Process.Let ℎ be the binary hidden variable vectors of having observed the th density given   .For one feature   , we calculate the a posteriori probabilities of ℎ  using Bayes rule and the previous values of parameters Then the condition expectation with respect to the actual observed feature  is defined by ( | ): (B) Maximization Process.In the maximization process, the model parameters are updated by maximizing (,  (ste−1) ), which compute the intermediate variables along with iterative step  = 1, . . ., : Note that the calculation of current intermediate state   (ste) at the th sensor at its ste time exchanges information with its neighbors by application of averaging matrix , where   () is nonzero for  ∈   .It became a weighted combination of a prediction and the value derived from neighborhood averaging.By this mean, the local information of   () gradually spread over the network.Thus, each sensor can update its prediction values   1 ,   2 ,   , Σ  , and   using the intermediate variable   () until all nodes reach a fixed point on their values.The mixing parameter   in (15) determines the influence of information transmitted across the network, whereas the predictor parameter   associated with the convergence rate.These two step-size coefficients are predecided real constants.It is necessary to investigate the choice for the value of   and   for our scenario.The timerelated mixing parameters   and   are defined as [41]   = 1  , where  is a growth rate parameter.And, based on the analysis in [42], the convergence rate of distributed averaging algorithm is going to speed with a large value of  when the random network is well connected.Therefore, a tentative experimental test on the time cost of the proposed algorithm is conducted with the change on the  value, in which the variation interval is 0.05.time decays gradually when  is increased from 0.05 to 0.95, which means the convergence rate of distributed EM method increases.Moreover, the decline of the time cost nearly stops improving when the  value achieves 0.8, which demonstrated that the effect of  will be not necessary after 0.8.And thus the value of rate parameter  for distributed EM is chosen as 0.8.According to available convergence analysis not included due to space limitation, it is claimed that the proposed method converges to a fixed point of the centralized EM solution when it holds the assumption in (10).Consequently, the estimation of parameters can be updated as follows:   1 ,   2 ,   , Σ  , and   : Iterate processes (A) and (B) until a suitable stopping criterion is reached.After each update for condition expectation   () and mixture model parameters, the neighbor sensors organized into computing store their local values into memory.And before the end of operation, every node begins to compare the calculated results with at least two neighbors' record after fixed time interval.If it is found a discrepancy in the information, the node with inconsistent values will be considered as an adversary attack and discarded.Then the remainder of sensors will rerun distributed EM algorithm.
When feature extraction using distributed EM algorithm is finished, five new features are acquired for each node, which is defined as VE  = [  1 ,   2 ,    , Σ   ,    ].These new features are used to provide more statistical feature information for attack classification.As a result, the sum of feature vector for sensor  can be expressed by   = [VD  , VT  , VE  ], in which the dimension of   is equal to  +  + 9.And then   will be entirely used as an input into the classifier at the next stage of recognition.

Distributed Classifier Design
After these features have been selected and further extracted, we plan to perform classification to recognize the external attacks.A classification process with excellent generalization properties and minimal test error is sufficient to compensate for deficiency in the feature dimensions.As described in the last section, the EG mixture modeling and distributed EM feature extraction all belong to the generative model, which could establish more distinct features from the variation of distance and topological parameters by exploiting their probability density.The generative model possesses excellent ability of modeling and flexibility for the nonnormalized data.However, the optimization capability of generative scheme in recognition phase is always weaker than its discriminative counterpart, especially when the labelled data is sufficient [43].Another class of technique for recognition is discriminative method.It maps the posterior probability directly as a class label, which avoids the rigid hypotheses for background posterior probability estimation.It generally obtains lower asymptotic error than generative approach in recognition task [44].However, this manner cannot capture the intrinsic relationship between the feature distribution and the observed feature.In order to make the best of advantages from discriminative approach and generative approach, it is better to couple the generative features with a discriminative classifier to get higher recognition accuracy.
Here, it is noticed that, besides the MK-SVM algorithm, logistic regression (LR) is another prominent and competitive methodology among the discriminative classifiers, which has been used for an extensive range of recognition tasks [45].Although the computational time of LR is fast and it can often achieve higher accuracy than support vector machine, especially for huge dataset case.Localization attack data classification by using LR has a potential challenge.LR is known as a linear classifier, which means that it can make the best performance on the linear separable features.For the distance and topological features, the variation trend of their approximated distribution and parameters has close relationship with each other.But there still exists certain difference between them for some attacks, which will bring nonlinearity into the features extracted from the unified mixed distribution.In addition, the uncertain data modification by the malicious node will also enhance the nonlinearity.The accumulative nonlinearity may degrade the accuracy of LR in attack recognition.Comparatively, the MK-SVM utilized a kernel function to transform the feature into higherdimensional space nonlinearly, which is more appropriate to make the feature distinguishable.Therefore, the MK-SVM is chosen as the classifier for attack classification.Furthermore, in order to adapt to distributed sensor networks, the PECPR-MKSVM algorithm is devised to fully exploit the strengths of machine learning, and a two-stage data verification policy is added finally.

Extension for CPRSM.
For the multiclass problem, we can equal it to a linear equality-constrained optimization problem which consists of multiple separable objective functions: where   is closed proper convex function, V  is the feature vector,   are given matrices, and  is a designated vector.Although the objective function in ( 18) is convex and linearly constrained, it is not suited for a classic centralized optimization to solve due to a lack of safety and time efficiency.On one hand, after feature extraction, a new feature vector is generated and the potential adversaries are discarded.Although the period of attack launching is uncertain, there is still a small possibility for the undetected adversaries to launch a data modifying assault, which could pose severe damage to recognition performance.On the other hand, the input feature dataset is always large and high dimensional, which is not easy to fulfill the classifier training and testing task for a common optimization scheme.So it becomes important to continue exploiting a parallelizing optimization method to prevent the malicious sensors and process the large feature dataset.
Referring to the literature [46], the contractive Peaceman-Rachford splitting method (CPRSM) has been developed for the linearly constrained convex optimization problem that has been split into two parts.The augmented Lagrangian iterations are given by where  1 and  2 are closed convex functions, V 1 ⊂   1 and V 2 ⊂   2 are primal variables,  1 ∈  × 1 and  2 ∈  × 2 are given matrices, and  is a designated vector.  and  +1/2 are the intermediate updated Lagrange multiplier corresponding to the linear constraints and  > 0 is a penalty scalar; here the value of relaxing factor  ∈ (0, 1) is not determined to ensure the sequence derived by (19) under strictly contractive condition.For convenience, it is assumed that  is chosen close to 1. Inspired by effectiveness of CPRSM, a natural idea for solving (18) is to extend the CPRSM scheme from the special situation to the general situation, so the straightforward extension of CPRSM results is the following scheme: . . .

Proximal MK-SVM with ECPR Substitution.
Considering a labelled training set TS = {(V 1 ,  1 ), . . ., (V  ,   )}, where feature vector V  ∈ R  and   ∈ {+1, −1}, MK-SVM places a separating hyperplane between the two categories in feature space.So the minimization optimal problem of the MK-SVM utilizing the unweighted kernels combination is given as follows [47]: where   denotes the th element of weight vector.bia represents a bias term corresponding to the hyperplane.  is a basic kernel function.The objective of this formulation is to optimize the variable of  and bia, which will also find the maximum margin and the minimum empirical error.
In order to convert the inequality constraints to an equality  constraint, a slack variable  2 is introduced into optimization problem: Then optimization procedures can be divided into two parts, with , bia optimization as a group and  2 as another.
The first part is to solve the minimization problem with respect to parameters , bia by using ECPR when  2 is fixed.For solving the , bia optimization, the augment Lagrangian function of ( 24) can be expressed as where V  denotes the feature vector and   denotes the th element of weight vector.bia represents a bias term corresponding to the hyperplane.  is a basic kernel function,   is a Lagrange multiplier, and  is a positive scalar.By applying ECPR to the augmented Lagrangian function, the distributed iterative form of problem (25) is obtained.To reduce the calculation of derivative, we then use a linearized proximal method that was proposed by Xu and Wu [48]: After taking the differentiation of ( 26), the primal variables can be calculated as Substitute equations in (27) into primal problem (  , bia,   , ), and the primal problem of minimization is converted to a dual function: In the following, we consider the  2 optimization by fixing , bia which can be easily solved via gradient method.Setting derivatives of (25) with respect to  2 equal to 0 yields the following results: Repeat the above iterations until convergence.Therefore, we name ( 28) and ( 29) as the proximal extension contractive Peaceman-Rachford splitting-multiple kernel support vector machines (PECPR-MKSVM).
While the PECPR-MKSVM algorithm runs effectively under the condition of normal, it ignores the attack scenarios that the intermediate variables may be negatively disrupted by the modified data from attacker.To address these inadequacies, a simple two-stage calculation verification policy is supplemented to avoid the adversary and improper output.It requires the following steps to carry out during the PECPR-MKSVM training.(1) Neighbor Node Verification Stage.As the variable updated at each neighbor node is not the same, the sequence of data forwarding for all neighbor nodes should be rearranged such as forward or backward one position after the first fixed time interval.Then the set of renewing parameter information produced in the next interval is checked with the record maintained in the same sensor.Finally, the first sensor with divergence is determined to be attacker.(2) Host Node Verification Stage.Although the validation on neighbor node can eliminate the malicious node, it does not exclude the potential risk on host node itself.Therefore, it is necessary to conduct the algorithm repeatedly on the neighbor node that has been authenticated, which ensures the attacks prevention and calculation precision.

The Process Overview of the Proposed Algorithm and
Calculation Verification Policy.Based on the above design, the message transmission through neighboring nodes in the distributed recognition method can be summarized as follows and is clarified with Figure 5.As shown in Figure 5(a), in the feature extraction phase, when every node obtains its own original feature set, then it conveys to the sensor with the most energy (named  1 ) and its one-hop neighbors to compute statistical attack feature set and verifies its local states   () by exchanging record with neighbors.As shown in Figure 5(b), in the PECPR-MKSVM training phase, the authenticated sensor  1 with the most residual energy sets an initial value to  0 1 , . . .,  0  , ∑  =1  0  ,  0  and then computes  +1 1 via (23); next, node sends its newly updated  +1 1 to one of its one-hop neighboring sensors.After receiving  +1  1 , iteration resumes when another node updates  +1 2 with the features set included in itself.According to the forwarding rule, all the intermediate variables will be transmitted along the path in order of  1 's direct neighbor sensor one by one.
Eventually,  2 was sent to  1 to start a new circulation.And the final global minimum of the associated cost function can be got by iterative update on the distributed classifier.

Experimental Setup and Results
6.1.Simulation Setup.To assess the effective aspects of our mechanism, we presented four groups of experiments that were carried out under different localization attacks.In our simulation, 600 sensors including 48 anchors are randomly distributed over an 300 m × 300 m area.We set the value of communication distance regarding sensors and anchors all to 35 m.Moreover, three types of external localization attacks (wormhole, replay, and interference) exist in network simultaneously.The fraction of malicious sensors is 20%, where each kind of external localization attack has one-third number of the total.If the sensor responsible for performing computation happens to be the adversary, there will be a possibility and range of data modification which is lower than 30%.At first all the sensors in the network begin to collect the information of original feature and then convey the feature data to the sensor with the most energy in the network and its one-hop neighbors.And these sensors will conduct distributed EM scheme to compute the new statistical features.At last, we chose one of the authenticated sensors with new feature dataset to run PECPR-MKSVM for training and classification.The experiments are then repeated for 5 times.The features of first four times were adopted as training sets whereas the ones of the last time were used as testing sets.Our classifier for PECPR-MKSVM uses RBF+Poly kernels and one-versus-all approach.

Attack Classification Performance with the Proposed
Algorithm.For the effectiveness evaluation of combining distributed feature extraction and classifier scheme, the recognition performances on two kinds of feature datasets are compared first between the proposed classifier and four similar classifiers, such as a distributed SVM (MoM-DSVM), a multiple kernel SVM (SimpleMKL), a typical SVM (C-SVM), and a logistic regression (LR) classifier.Table 2 shows the average recognition accuracy obtained by these algorithms under different external localization attacks.In general, as depicted in Table 2, the average success classification rate for each kind of attack using feature extraction technology significantly rose 9.4% compared to the one recognized only by classifier.Furthermore, it is worth mentioning that the proposed classifier obtains relatively higher accuracy than the rest of other classifier schemes.For example, for the replay attack, the proposed classifier with the features extracted offers the highest classification accuracy of 93.28%.For the same case, SimpleMKL and C-SVM only result in recognition accuracy of 84.56% and 68.72%, respectively.Although the MoM-DSVM classifier achieves satisfactory classification performance using a consensus-based support vector for replay attack, it is still not sufficiently to recognize the wormhole and interference attacks.The recognition performance of LR is improved obviously by the extracted features, whose performance is superior for the interference attacks compared to the MoM-DSVM and Simple-MKL.However, it is still not comparable to the PEPMK-SVM due to the limitation of safety and the nonlinear features.These comparisons show that the combination of distributed feature extraction and the proposed classifier is able to achieve higher recognition accuracy than any other recognition methods.

Classification Robustness of the PECPR-MKSVM with
Different Kernel Function.We further explore classification robustness of the PECPR-MKSVM classifier with different kernel function.Figure 6 shows the average recognition accuracies for varied numbers of multikernel classifiers by combining different kernels, such as RBF kernel, sigmoid kernel, and polynomial kernel.In Figure 6(a), the average recognition accuracy using the proposed classifier is 4%-7% higher than the MoM-DSVM and MK-SVM method.Moreover, the kernel combination of RBF and polynomial kernel achieves higher recognition accuracy than the others; on the contrary, a single kernel fails to offer good recognition accuracy.For classification error existing in the result, it can often be attributed to the lack of sufficient training samples for classifier.Next, to show the robustness of the proposed classifier, Figure 6(b) compares the recognition performance under a higher malicious sensors ratio.When the ratio of the malicious sensors exaggerates, the average attack recognition rates have a certain improvement for all classifiers, which means that the additional data of the malicious sensors provides more sample to the classifier and affects the classification hyperplanes.Particularly, the average recognition rate of the proposed classifier for RBF+Poly kernel is increased from 91.9% to 93.9%.Thus, the proposed algorithm is more robust to recognize localization attacks even under a severe scenario.

Convergence Performance of PECPR-MKSVM with Different
Positive Scalar  and Relaxation Factor .In order to assess the impact of positive scalar  and relaxation factor  for the proposed classifier, each node trains a local PECPR-MKSVM and its convergence of test error is compared with the one obtained via MoM-DSVM.We first fix  and choose two different values of  = {1, 10} with respect to PECPR-MKSVM classifier.Then the evolutions of iteration are plotted for each choice of .For comparison purposes, we also plot the convergence performance of MoM-DSVM with  = 1 and  = 10.As illustrated in Figure 7, we see that the test error of PECPR-MKSVM reduces very rapidly with a fewer steps of iterations and soon approaches the minimum value, which outperforms MoM-DSVM based method.Moreover, plot in Figure 7(a) also reveals that a very large value of  may lead to dispersion and hinder the convergence rate.These results further reflect the importance of choosing  when constructing the EPRSM classifier.Last, plot in Figure 7(b) illustrates that, for each of the test scalar , larger relaxation factor  tends to accelerate the convergence rate of the proposed classifiers, thus shortening the runtime.and plot the computational time in Figure 8.Here, we combine all the classifiers with the proposed feature extraction process.Generally, it is not hard for us to find that the proposed algorithm is the fastest among three schemes.More importantly, we can observe that the time for the proposed algorithm increased linearly with the number of sensors, but the growth rate is slower, even though the number of sensor increases to 1000.This is because the classification process is distributed computing, and the computational complexity is depending on the number of neighbor sensors.In contrast, although time cost of the consensus-based MoM-DSVM scheme is more efficient than the MK-SVM and LR algorithm, it still requires higher calculation amount in the training process.The performance of LR algorithm lies between the MK-SVM and the distributed SVM.The MK-SVM algorithm uses the centralized architecture to execute the classification, which increases the number of iterations and computational complexity.Thus, the proposed algorithm is more computationally efficient than the MoM-DSVM, LR, and the MK-SVM method.

Conclusion
This paper generalized a distributed classification scheme, which is used for external localization attack classification in WSN.A novel distributed version of EM feature extractor and MK-SVM classifier is also proposed.These new schemes help each sensor computing during feature extraction and

Figure 3 :
Figure 3: Normalized betweenness centrality distribution of sensors network and the probability density function of its exponential distribution approximation for different external attacks scenario.The parameter in exponential distribution is estimated as (a)  = 0.006963, (b)  = 0.007114, (c)  = 0.006892, and (d)  = 0.007020.

Figure 4 Figure 4 :
Figure 4: Time cost of distributed EM versus variation of .

Figure 5 :
Figure 5: Overview of distributed calculation among sensor nodes in the proposed recognition algorithm.

Figure 7 :
Figure 7: Evolution of test error with different positive scalar  and relaxation factor  for the proposed and MoM-DSVM method.
It is considered that there exist three classes of nodes distributed randomly in the sensing area: sensors, anchors, and malicious nodes.The random network topology is modeled as Erdös-Rényi (ER) random graph denoted by  = (, ), where  symbolizes node set and  indicates edge set.The node set  consists of  sensors and  anchors, in which the sensor is expressed by { 1 , . . .,   , 1 ≤  ≤ } and the anchor is indicated by { 1 , . . .,   , 1 ≤  ≤ }, respectively.The malicious node, labelled by { 1 , . . .,  ℎ , 1 ≤ ℎ ≤ }, is included in sensor set.We define the distance between sensors  and  as   and the distance from sensor  to anchor  as   .The total number of links through sensor , which is calculated by the shortest path, is equal to  + .

Table 1 :
Mean square error (MSE) of the distribution approximation curve for all topological features under different external attacks.

Table 2 :
Comparison of recognition rate in percentage (%) on the attack feature sets with different classifiers.