BCEAD: A Blockchain-Empowered Ensemble Anomaly Detection for Wireless Sensor Network via Isolation Forest

*e distributed deployment of wireless sensor networks (WSNs) makes the network more convenient, but it also causes more hidden security hazards that are difficult to be solved. For example, the unprotected deployment of sensors makes distributed anomaly detection systems for WSNs more vulnerable to internal attacks, and the limited computing resources of WSNs hinder the construction of a trusted environment. In recent years, the widely observed blockchain technology has shown the potential to strengthen the security of the Internet of *ings. *erefore, we propose a blockchain-based ensemble anomaly detection (BCEAD), which stores the model of a typical anomaly detection algorithm (isolated forest) in the blockchain for distributed anomaly detection in WSNs. By constructing a suitable block structure and consensus mechanism, the global model for detection can iteratively update to enhance detection performance. Moreover, the blockchain guarantees the trust environment of the network, making the detection algorithm resistant to internal attacks. Finally, compared with similar schemes, in terms of performance, cost, etc., the results prove that BCEAD performs better.


Introduction
In recent years, the booming Internet of ings is revolutionizing the world. As its supporting technology, wireless sensor networks have also received extensive attention [1,2]. WSNs are a multihop self-organizing network formed by many sensor nodes deployed in the monitoring area to communicate. It gets rid of the limitation of the cable, realizes the wireless communication of the network, and has a wide range of application scenarios. However, due to the backwardness of WSNs security technology, various security issues limit the practical application of WSNs [3,4].
Various security technologies and strategies have emerged for protecting network security. Intrusion detection is a classic network security technology [5]. Early intrusion detection systems (IDS) mostly utilize misuse detection. Misuse detection record the attacks by a signature database, then judge an intrusion with the events or data matching the signatures. However, misuse detection is not practical enough because it cannot detect unrecorded attacks. Nowadays, anomaly detection has been more widely used [6] with the development of machine learning. Anomaly detection comes from the statistical community [7]. It establishes a standard model and judges the events or data that do not match with the model as an intrusion. Although anomaly detection requires some model training time and produces a higher false alarm rate, it can detect new unknown intrusions. e performance of anomaly detection will continue to increase and make outstanding contributions to protecting network security with the optimization of modeling algorithms in anomaly detection. e structure of intrusion detection systems has become richer for stronger practicability and applicability. For example, the proposal of distributed intrusion detection systems (DIDSs) eases the pressure of detecting heterogeneous networks. e DIDS is similar to ensemble learning [8], and the system builds multiple detection models in a large-scale network. erefore, the DIDS not only balances network energy consumption but also improves detection performance. However, distributed anomaly detection must ensure trust between nodes to prevent internal attacks. It is the prerequisite for its further application.
In recent years, the emergence of blockchain technology has pointed out a path worth trying to solve the distributed trust problem in the Internet of ings environment. Blockchain is a peer-to-peer distributed network with features such as non-tempera, decentralization, transparency, and system autonomy [9], which can effectively enhance device security and network collaboration in the Internet of ings. Nowadays, some distributed intrusion detection based on blockchain has been proposed [10]. e system packs the detection results or alarms into blocks and shares them in the network to manage trust among domains. However, due to the limitation of the data shared by the blockchain, the above scheme still has its limitations.
erefore, we propose a blockchain-based ensemble anomaly detection scheme. e scheme stores the model of a typical anomaly detection algorithm (isolated forest) in the blockchain network and performs distributed anomaly detection in the WSNs. By constructing a suitable block structure and consensus mechanism, the global model for detection can be iteratively updated to enhance the detection performance. Moreover, the blockchain guarantees the trust environment of the network, making the detection algorithm resistant to internal attacks. Finally, compared with similar schemes in terms of performance, cost, security, etc., the results prove that BCEAD performs better.

Contributions.
(1) We propose an ensemble anomaly detection network structure for WSN. e structure has three levels, and high-performance nodes in the sink layers bear most of the storage and computing overhead required by the solution. Multiple sink nodes will independently detect anomalies in the local network based on the global anomaly detection model and optimize the model by submitting some parameters. is network structure is suitable for resource-constrained and heterogeneous WSNs and can be equipped with multiple types of anomaly detection algorithms to ensure network security.
(2) We design an anomaly detection model stored and updated by the blockchain. e detection node filters a batch of isolated trees stored in the blockchain to form an isolated forest for detection. In addition, new isolated trees are continuously generated and published to the blockchain, so that the isolated forest model, which is utilized to detect, is constantly updated. In this scheme, the detection model keeps dynamically updated with the environment, which enhances the detection performance while maintaining security.

Anomaly Detection Structure.
Intrusion detection is an important part of network security protection, and it is a network security technology that actively protects its network and system from illegal attacks. An intrusion detection system [11] usually consists of data collectors, data analyzers, alarm modules, and other parts. Traditional intrusion detection systems begin to feel weak with the expansion of the network scale and the complexity of information. erefore, distributed intrusion detection systems and collaborative intrusion detection systems have been proposed. ey not only analyze system logs but also analyze network traffic and introduce a distributed data collection mechanism into the structure. Subsequently, the anomaly detection structure for network attacks has gradually changed from local and centralized to a distributed structure. Specifically, the new detection system also deploys multiple data analyzers. Data analyzers in each network domain can communicate with each other and share detection models and strategies.
Distributed or collaborative anomaly detection has better detection performance for large-scale heterogeneous networks. However, the distributed anomaly detection has to consider the issue of trust among nodes. For example, the system may crash if a detection node in the detection system falsely sends alarms or maliciously updates the global detection model. erefore, it is vital to build a trusted environment that can prevent nodes from internal attacks. In addition, distributed detection faces privacy issues. During the detection process, a large amount of data is collected and uploaded by the agent, which exposes the system to hidden dangers and threats of data leakage.

Anomaly Detection Method.
Anomaly detection is the identification of events or observations that do not match the expected pattern. In different scenarios, anomalies are also called outliers, noises, deviations, etc. Isolation forest is a type of typical anomaly detection algorithm [12,13], which distinguishes abnormal data with "few" and "different" characteristics. Compared with other classic classification methods, isolated forests consume fewer computing resources and can still maintain good performance when processing large amounts of high-dimensional data. e isolated forest algorithm continuously divides the data by isolated trees, calculates the isolation score according to the height of the data point in the tree, and judges the anomaly according to the average isolated score. Suppose T is a node of an isolated tree, then T may be a leaf node, or an intermediate node with a decision threshold β and two child nodes (T l , T r ). Assuming a dataset X � x 1 , ..., x n , each data x is a d-dimensional vector. By continuously selecting attributes q ∈ d randomly and the decision threshold β ∈ q to continuously classify X ′ ⊆X, an isolated tree can be established. e isolated forest classifies the data by constructing a large number of random isolated trees. In general, data points classified into leaf nodes earlier may be more suitable for the definition of anomalies. erefore, the isolation forest can quantify the degree of the anomaly of the data by the average tree length path of the data points.
Specifically, given a sample set with φ instances, the average path length of each isolated tree is c, where H(i) is the harmonic number, which can be estimated with ln(i) + 0.5772156649 (Eulerian constant). c(φ) is the average h(x) for a given φ. e anomaly score s of instance x is defined as where E(h(x)) is the average value of h(x) in an isolated tree, which is the average height of the tree. From the results, the score of a sample close to 1 is judged to be abnormal; the score close to 0 is judged to be safe; and if the scores of all samples are 0.5, it means that the sample set has no obvious abnormality. However, the isolation forest algorithm also has shortcomings. Because it belongs to unsupervised learning, the algorithm is more dependent on the quality of the training set. In practical applications, it is necessary to ensure the real-time performance of the training set and continuously train and change the detection model to adapt to environmental changes and ensure detection performance.

Anomaly Detection and Blockchain.
Blockchain is a new decentralized infrastructure and distributed computing paradigm that has gradually emerged with the increasing popularity of digital cryptocurrencies such as Bitcoin. At present, it has been highly valued and widely concerned by government departments, financial institutions, technology companies, and capital markets [14]. Blockchain is often understood as a data structure [15]. e block stores the data composed of Merkle tree and forms a chain through hash pointers, thereby ensuring that the data are difficult to be changed. In addition, the blockchain uses pure mathematical methods to establish trust relationships among distributed nodes to form a decentralized trusted distributed system, which has the characteristics of decentralization, network robustness, security, and credibility. In addition, the blockchain uses pure mathematical methods to establish trust relationships among distributed nodes to form a decentralized trusted distributed system, which has the characteristics of decentralization, network robustness, security, and credibility. erefore, the blockchain enables mutual trust between different participants [16] in the Internet of ings environment, which greatly reduces the cost of reshaping or maintaining trust for each node. For the trust problem of distributed detection, blockchain is also a kind of solution worth trying. Some schemes, which combine intrusion detection with blockchain to ensure trusted data sharing in distributed intrusion detection, have been proposed [10,17,18]. ey store different data, such as detection characteristics, detection alarms, and detection results in the blocks, and publish to the blockchain network to share. Subsequently, a type of anomaly detection framework driven by blockchain on edge intelligence appeared [19]. e framework stores the data features to be analyzed on the blockchain, then the cloud-based detection model reads the data features on the blockchain for anomaly detection and feeds back the detection results. In addition, the framework transfers the overhead pressure for detection to the distributed edge network, which is more suitable for the system structure of the IoT. erefore, the advantages of low detection delay and global model update brought by distributed detection could be supported by the blockchain. Recently, a scheme for detecting abnormal behavior in social networks [20] has been proposed, which combines isolation forest and blockchain technology. e authors claim that the blockchain can protect the privacy leakage problem in the anomaly detection process. ey execute the isolated forest algorithm to detect data anomalies by the smart contracts, then marked and stored the abnormal data on a separate blockchain.
All the above detection schemes utilize the advantages of blockchain to solve a certain degree of security or privacy issues, but each has certain limitations. Specifically, they all store detection-related data on the blockchain. Although each solution has made optimizations to reduce the storage overhead, for example, the blockchain only stores the characteristic value or the hash value of the detection data. However, as the detection cycle lengthens, the blockchain will still face a storage bottleneck, resulting in much loss of detection performance. erefore, we propose the BCEAD, which is different from the traditional scheme, to solve the storage problem and enhance the detection performance by storing the detection model.

A Blockchain-Based Ensemble Anomaly Detection
As shown in Figure 1, there is the multilayer network structure in BCEAD, which consists of the sensing layer, the sink layer, and the blockchain layer. e roles and functions of each layer are explained as follows: Sensing Layer. e sensing layer contains a large number of low-cost, low-energy, and low-performance sensor nodes. e sensor node collects physical information from the external environment in real-time and converges it to the sink node. e collected data will be processed and used to generate the corresponding feature matrix for subsequent anomaly detection. Sink Layer. e sink layer converges the environmental information collected by the sensors and submits it to the base station. Compared with sensor nodes, sink nodes have better computing and storage capabilities. erefore, the sink node is responsible for most of the Security and Communication Networks 3 work of BCEAD. In detection, the sink nodes select a batch of the latest isolated trees from the blockchain by the sliding window algorithm. en, an isolation forest is constructed to detect anomalies for the data submitted by the sensor layer. After detection, the sink node will contribute a new isolated tree to help optimize the global model. Finally, the sink node will respond according to the detection result. Blockchain Layer. e blockchain layer is mainly responsible for maintaining a safe, usable, and constantly updated global detection model in a distributed network. e sink nodes keep verifying each released block according to the consensus mechanism to ensure that each update for blockchain is benign. e isolated forest, which is a global detection model, will be redundantly stored on the blockchain. Finally, the sliding windows mechanism will screen out suitable blocks to form the isolation forest model for anomaly detection.

Isolation Blocks.
In the detection, the sink node located in the sink layer utilizes the detection model stored in the blockchain layer to detect the data of the local sensor layer. e detection model (isolation forest) is composed of several isolated trees, so a single block in the blockchain is an isolated block containing an isolated tree. en, the isolated blocks continue to increase, and the blockchain is periodically updated. erefore, the detection model keeps iterative dynamically.
In BCEAD, the block format includes timeStamp, block ID, the hash value of current and previous blocks, node ID, and isolated tree, An isolated tree contains several nodes, each node contains the decision threshold β and the left node T l and the right node T r . e data evaluated by the tree will help the isolated tree to form its structure, iTree � iTree left ‖β‖iTree right .
As shown in Figure 2, the isolation tree iTree is the parameters submitted for each update of the global model. It is also the main content of the block released by the node after each round of consensus. is is the difference between BCEAD and previous solutions: BCEAD does not stores the detect-related data on the blockchain but instead stores the detection model. It avoids data privacy issues and reduces storage and communication overhead. However, the update of the global model can also be consensus because the global nodes can verify whether an isolation tree is appropriate.

Sliding Window.
e number of samples (isolated trees) used by the isolation forest will affect the detection performance of the model. e optimal value of the number of samples has been verified as φ � 250. BCEAD stores the detection model on the blockchain network, and the number of isolated blocks included in the blockchain keeps increasing. erefore, the detection scheme has to filter the appropriate number of isolated blocks.
As shown in Figure 3, BCEAD sets up a sliding window algorithm to screen isolated trees for the detection model. e sliding window is essentially a fixed-size list, including a  series of block indexes. With the addition of new blocks, the total length of the blockchain will gradually increase, but the swiping window always traces back from the latest block and includes φ blocks. e update of the swiping window after each round of consensus in the blockchain network, each sink node will get a new slide window when synchronizing the latest blockchain. erefore, after detection, the sink node submits an isolated tree trained by the current data. e isolated tree will be packaged into blocks and released to the blockchain network. e isolated tree will be packaged into blocks and released to the blockchain network, and the model generated by each subsequent detection may utilize the isolated tree.
is setting ensures the real-time update of the global detection model. Moreover, the training set used to generate the model is the current real-time detection data, which ensures the performance of the detection algorithm.

Block Forest.
In the blockchain network, all nodes store a public ledger locally. During the continuous update of the ledger data, all nodes always communicate to reach consensus and ensure consistency. In BCEAD, the blockchain network uses IOTA (tangle) consensus. Compared with the traditional blockchain, the block in tangle is a single transaction. Each new transaction will verify and quote the previous two, so the network does not need to reach a consensus immediately [21]. erefore, the network does not require miners, which avoids mining attacks such as in [22,23]. Users can complete transaction verification by themselves, and the cost of the transaction is only the computational cost of verifying the other two transactions [24]. erefore, the tangle blockchain can achieve mutual trust between nodes in distributed detection systems. It can both ensure security and considerable performance.
All sink nodes in BCEAD act as participants in the tangle, using isolated trees instead of blockchain transactions, and each block packs an isolated tree separately. In the block releasing, the new isolated tree will be packed into a specified structure block, and quote after the previous two verified blocks on the chain. e specific process is as follows: (1) Verify Block. e sink node visits the tips list (maintains unconfirmed blocks in the network) and randomly selects blocks for verification. e verification will review the format of the block, the node ID of the publishing block, and its reference block. When a block is verified, the sink node puts it into the list, which contains references, and ends the verification work when the list has 2 elements.
(2) Release Block. e sink node refers to two blocks in the list, attaches the block ID, its ID, encrypts the block content with its private key, and calculates the hash value of the current block, then broadcasts the block to the blockchain network. Other nodes in the network will review the format and source of the new block, add it to the end of the tangle, and update tips. After several rounds of blockchain updates, subsequent references to a block will prove its credibility. e application of a blockchain network for trusted communication among nodes can resist internal attacks because the blockchain network always verifies every message, even it comes from the trusted nodes. erefore, attackers can only deplete the performance of the global detection model by publishing extreme or malicious isolated trees. erefore, in the blockchain consensus, the verification includes querying the ID of the promulgator, and the purpose is to prohibit a node from frequently publishing the block. e experimental part in Section 4 proves that a single node publishing block has to set two rounds of consensus cycle cooling time, which can effectively prevent malicious nodes from destroying the model and ensure the benign performance of the detection model.

Algorithm Description.
is paper proposes an isolation forest-based anomaly detection algorithm based on blockchain. e algorithm stores the isolation forest in the blockchain, and each detection will build a model based on the blockchain to detect the network data. e details are shown in Algorithm 1.
In this algorithm, first initialize the sink node SN (line 2). e sensor node runs snifferPackets to capture the external environment information and obtain the unprocessed sensor information Raw d ata (line 3). rough the feature extraction function featureExtractor, the feature matrix FM is obtained after processing the sensor data (line 4-6).  Security and Communication Networks e sink node uses the algorithm sli de Win do w to filter out suitable blocks from the blockChain to form an isolation forest iForest (line 8). en, the sink node detects the feature matrix FM according the iForest and returns the detection result state (line 9). When the system detects an abnormality, that is, state � False, use Response to respond (line [10][11]. Finally, the blockchain will be updated according to the update algorithm up da teBlcokchain, which is detailed in Algorithm 2 as follows: In Algorithm 2, first initialize the number of samples of the detection algorithm, that is, the number of isolated trees φ (line 2). e algorithm MCMC filters out the latest and quotable block list tips in the chain (line 4). According to the selection algorithm Select, the block to be quoted is selected from the list of references (line 6) and is verified (line 7), and the index of the verified block in the List (line 8) is recorded until the number of List's elements reaches two (line 5). e isolated tree iTree to be submitted into blocks is packed and the two previous blocks that have been verified (line 11) is quoted. Finally, the packaged block to the entire network is broadcasted and the blockchain is updated (line 12).

Data Processing.
We implement related experiments of the proposed scheme through Python3.8. e isolation forest algorithm comes from the machine learning package Scikitlearn by Python. All experiments are executed on an x64 Windows10 personal computer using an Intel(R) Core (TM) i5-8500 CPU 3.00 GHz processor. We choose kddcup.da-ta_10percent.gz in the popular KDD CUP′99 dataset in IDS research, that is, 10% of the dataset sampling. We select four typical attack samples with different attributes, Buffer o verflow, po d, guess p assws, and nmap, remove the data labels, and mix them with normal samples to generate raw data for simulation. In the simulation process, samples are continuously sampled from the raw data to BCEAD for detection. e changes in the environment are simulated by controlling the abnormal proportions in the samples.
In this section, we conduct evaluation experiments through the following indicators: Table 1 is the confusion matrix, where TP indicates that the real sample is positive and the prediction is also positive. FP indicates that the real sample is negative, but the prediction is positive. FN indicates that the real sample is positive, but the prediction is negative. TN means that the real sample is negative and the prediction is also negative. Some evaluation indicators, such as accuracy rate, precision rate, recall rate, and F1 value, can be obtained by the confusion matrix.

Accuracy.
e ratio of the number of correctly classified samples to the total number of samples.

Precision.
e probability that a sample predicted to be positive is indeed a positive sample.

Recall
Rate. e probability of being correctly predicted as a positive sample among all positive samples is

Detection Performance.
Jia et al. [25] proposed the connection between model generalization error and individual learners in ensemble learning. e upper limit of the generalization error is Here, p is the average value of all relevancies between every two classifiers, and S denotes the mean intensity of the individual classifier. Equation (9) shows that the generalization performance of the global model is better when the classification of individual classifiers is stronger, and the correlation between the classifiers is smaller. In EAD, isolated trees are generated independently by sink nodes during detection, so the low correlation between isolated trees ensures the detection performance of the global isolation forest.
As shown in Figure 4, some sampling points of the Nmap attack are drawn after being processed by PCA, and BCEAD can effectively distinguish between normal points and abnormal points. e ACU can reach 96.67%, and the F1 value is about 0.6 in the detection. It shows that BCEAD can effectively detect anomalies and distinguish between normal traffic and attack events in the network. Figure 5 shows the accuracy of BCEAD's detection of various typical attacks. e detection performance of the detection model is maintained at a high level of 94% to 96%, and it has a good detection accuracy rate for all kinds of attacks. Figure 6 shows the false positive rate (FPR) of BCEAD detection of various typical attacks. e detection effect is worse for Guess passw d and Nmap attacks, which is consistent with the response of the curve in Figure 5. e difference in the detection performance for different attacks is mainly due to the different launch frequency of various attacks. It confirms that the quality of the training set directly affects the performance of the detection model, and BCEAD, which updates the training data in real-time, can effectively guarantee its detection performance. Figure 7 shows the F1 value changes of the scheme proposed by Liu et al. and BCEAD during a period. In the figure, the abscissa indicates the time by the ratio change of abnormal and normal data in the real-time detection. e blockchain-based anomaly detection proposed by Liu et al. does not have the update of the detection model, so the detection performance of the data is always constant in a input: the iTree given by last detection output: the newest detection model given by the blockchain (1) begin: (2) φ ← this; (3) //estimatorSize, the size of slideWindow (4) tips ← MCMC(); (5) while length(List)< 2 do (6) preBlock � Select(tips) (7) if Verify(preBlock) �� True then (8) add (List, preBlock) (9) end (10) end (11) curBlock � publish (List, iTree) (12) blockchain � Brodcast (curBlock) (13) end ALGORITHM 2: Update Blockchain.  Security and Communication Networks 7 fixed range. However, BCEAD will dynamically select the training set to update the detection model in real-time. erefore, the BECAD will optimize performance in longterm detection according to environmental changes. Figure 8 shows the cost comparison between the scheme proposed by Liu et al. and BCEAD during the detection. e experiment specifically compared the communication overhead and storage overhead. As can be seen from the figure, because Liu et al.'s scheme stores detection data by the blockchain, the overhead will continue to grow during the long-term detection process. However, BCEAD uses the blockchain to store the detection model, so the cost in the early detection stage will be slightly higher, but the increase in cost is low. After a period of detection, the cost of BCEAD is significantly lower than similar blockchain-based detection schemes.

Security.
Yang et al. proposed the security entropy to evaluate the benefits of system offense and defense [26], which can quantify the security of the system. We can suppose that the insecure factors caused by the attacker are q 1 , q 2 , ..., q n , and the insecure entropy of the detection system at time t is Q(t, q 1 , q 2 , . . . , q n ), or simply Q(t). If Q(t) grows with time, that is, differential dQ(t)/dt > 0, then the system will become more and more unsafe. If Q(t) decreases with time (differential dQ(t)/dt < 0), then the system will become more and more secure.
In BCEAD, the factors that affect the security of the system are generally classified into two: (1) the attacker publishes bad blocks, depletes the global model, and increases insecure entropy. (2) Ordinary nodes publish normal  blocks, gaining the global model, and not increasing the insecure entropy. en, the changes in the security entropy of the two factors are as follows: Solving this system of equations, Set a � a 1 /a 2 , set b � c 1 /c a 2 ; then, In addition, it can be obtained that It can be seen from Equations (13) and (14) that the attacker's loss rate and the system's loss rate of the detection model is a power function with each other, and the ratio of the two to the model's loss change rate is a constant. In BCEAD, WSN's device authentication has prevented external attacks, and internal attackers can only undermine system security through the loss detection model. Because the blockchain guarantees the format of the network communication content, the attacker's loss model can only be done by publishing unhelpful isolated tree blocks. erefore, restricting this single evil means can ensure the safety of the detection system. e experimental results of malicious attacks on the loss of the system are shown in Figure 9. When malicious blocks continue to be released, the performance indicators of the detection system continue to decrease. It can be seen from the figure that after the release of 30 malicious blocks, the detection performance has dropped significantly. Since the BCEAD verification block is randomly selected, the probability of continuous bad blocks is low, and the submission of normal blocks has a gain effect on the detection model, so the solution itself has a certain attack resistance. From the above analysis, it can be seen that the ratio of the attacker's loss to the model and the program's gain to the model is constant, so this confrontation can find a balanced threshold. By disabling the node's continuous release of blocks, the security resistance of the detection system can be sufficient to resist malicious internal attacks.

Conclusion
is paper studies the security of wireless sensor networks, applies distributed anomaly detection to WSNs by the blockchain technology, and proposes the BCEAD scheme. e scheme divides the WSN into multiple layers. e sink layer performs anomaly detection for each network domain in the sensor layer. e detection model (isolation forest) is stored in the blockchain (tangle). Besides, this paper compares and analyzes the detection performance of BCEAD through experiments and proves its superiority. However, the actual deployment of the blockchain may affect the performance of the detection system, and the communication and storage overhead of the blockchain technology is also difficult to be balanced. erefore, we will do some practice for the blockchain in the future. Nevertheless, the experiment proved that the scheme is compatible with some existing security mechanisms [27] for detection, which is enough to guarantee the application potential of the scheme. Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.