Detection of Node Failure in Wireless Image Sensor Networks

A sequenced process of fault detection followed by dissemination of decision made at each node characterizes the sustained operations of a fault-tolerant wireless image sensor network (WISN). This paper presents a distributed self-fault diagnosis model for WISN where fault diagnosis is achieved by disseminating decision made at each node. Architecture of fault-tolerant wireless image sensor nodes is presented. Simulation results show that sensor nodes with hard and soft faults are identified with high accuracy for a wide range of fault rate. Both time and message complexity of the proposed algorithm are O(n) for an n-node WISN.


Introduction
WISN is emerging as a promising solution for a variety of remote sensing applications like battlefield surveillance, environmental monitoring, intruder detection systems, intelligent infrastructure monitoring, and scientific data collection [1].Irrespective of their purpose, all sensor networks are characterized by the requirement for energy efficiency, scalability, and fault tolerance.These requirements are particularly crucial in image sensor networks.There are certain issues which need to be addressed for the sustained operations of WISN: (1) WISN consisting of image sensor nodes may be deployed in unattended and possibly hostile environments which increases probability of node failure and (2) unlike conventional sensor nodes, image sensor nodes generate bulk amount of data which is routed to the sink node.Erroneous data generated by faulty sensor nodes must be protected from entering the network for effective bandwidth and energy utilization.These issues motivate to explore distributed self-fault diagnosis processes for WISN.
In this work, a distributed diagnosis algorithm is proposed which detects both hard and soft faults in the network.Each sensor node makes a decision based on comparison between its own reading and readings of its 1-hop neighbors.The sensor node is detected as fault-free if the sensor reading agrees with readings of more than T h neighbors where T h is a threshold.A timeout mechanism is used to detect hard faults where an unreported node is detected as hard faulty.All local diagnostic information is finally disseminated in the network in order to ensure that each mobile will have a global view of the network fault status, that is, each fault-free mobile correctly diagnoses the state of all the mobiles in the system.A spanning tree (ST) which spans all fault-free sensor nodes disseminates local diagnostics.
The proposed image sensor node architecture (refer to Figure 1) is simple and can be implemented with limited additional hardware complexity by extending the architecture proposed in [2,3].Each block is subject to failure, which in turn results in system failure.A node is detected as soft faulty when CMOS camera or the image processing module or embedded processor is faulty.A node is detected as hard faulty due to either of following reasons: (i) communication subsystem is faulty, (ii) battery is drained, and (iii) node is completely damaged.
The process of local detection and global diagnosis from a given fault instance is a multifaceted problem.The main contributions of this paper are as follows.
(1) It proposes an architecture for image sensor nodes for fault-tolerant WISN.
( The remainder of the paper is organized as follows.Section 2 presents related works.Section 3 presents the system model.Distributed diagnosis scheme is investigated in Section 4. The performance of the proposed work is evaluated in Section 5, and finally conclusion and future work are given in Section 6.

Related Works
System-level fault diagnosis was introduced by Preparata, Preparata et al. in 1967 [4], as a technique intended to diagnose faults in a wired interconnected system.Comparisonbased diagnosis is an effective approach to system-level fault diagnosis.The first comparison-based model proposed by Malek [5] (asymmetric comparison model), Chwa and Hakimi [6] (symmetric comparison model) assume the existence of a central arbiter which gathers information about comparison.This comparison syndrome is then used to diagnose the system.Previously developed distributed diagnosis algorithms were designed for wired networks [4][5][6][7][8][9][10] and hence not well suited for wireless networks.
The problem of fault detection and diagnosis in wireless sensor networks is extensively studied in literatures [11][12][13][14][15][16][17].The problem of identifying faulty nodes (crashed) in WSN has been studied in [11].This paper proposes the WINdiag diagnosis protocol which creates an ST for dissemination of diagnostic information.Authors in [12] have proposed a fault-tolerant detection scheme that explicitly introduces the sensor fault probability into the optimal event detection process where the optimal detection error decreases exponentially with the increase of the neighborhood size.Elhadef et al. have proposed a distributed fault identification protocol called Dynamic-DSDP for MANETs which uses an ST and a gossip-style dissemination strategy [13].In [14], a localized fault diagnosis model for WSN is proposed that executes in tree-like networks.The approach proposed is based on local comparisons of sensed data and dissemination of the test results to the remaining sensors.
In [15], the authors have presented a distributed fault detection model for wireless sensor networks where each sensor node identifies its own state based on local comparisons of sensed data against some thresholds and dissemination of the test results.Krishnamachari and Iyengar have presented a Bayesian fault recognition model to solve the faultevent disambiguation problem in sensor networks [16].A distributed fault detection scheme for sensor networks has been proposed in [17].It uses local comparisons with a modified majority voting where each sensor node makes a decision based on comparisons between its own sensing data and neighbor's data, while considering the confidence level of its neighbors.
Most of the existing literature addresses the fault detection and diagnosis problem in WSN by considering sensor nodes as temperature, humidity, or pressure sensors.In the author's knowledge, there has been little work on the design of a fault diagnosis model for WISN.Although there is considerable amount of research on fault detection and diagnosis in WSNs, the current approaches may not be suitable for WISNs due to associated processing and communication cost.Czarlinska and Kundur [18] have investigated the event acquisition properties of WISNs.These techniques include lightweight image processing, decisions from n sensors with or without cluster head fault, and attack detection.In [19], the authors have investigated the problem of image transport over error-prone wireless sensor networks, where a two-state Markov model of node transitions between an on and off state is considered.In their proposed work, authors have not investigated any node failure detection scheme.In [20], an improved distributed fault detection scheme is proposed which shows a better performance from detection accuracy perspective but needs more message exchange and thus not energy efficient.In [21], authors have proposed FIND, a method to detect nodes with data faults.In their work, nodes are ranked based on their sensing readings as well as their physical distances from the event.A node is considered faulty if there is a significant mismatch between the sensor data rank and the distance rank.
The authors believe that it is necessary to discuss why image sensor node fault detection model is indispensable.First, image data requires transmission bandwidth, that is, orders of magnitude higher than that supported by currently available sensors.Second, image compression models require complex hardware and make the energy consumption for computation comparable to communication energy dissipation.If a faulty image sensor node is allowed to participate in the network activity, then data generated by it will be routed to the sink node.All the intermediate nodes will dissipate energy in relaying this faulty information.For a high rate of node failure, this leads to severe decrease in network lifetime and wastage of network bandwidth.

System Model
The proposed model considers a densely deployed wireless sensor network which includes camera-equipped nodes.It has been assumed that there are n sensor nodes nonuniformly distributed in a square area of side L, which is much larger than the communication range of the sensors.Every camera-equipped node is a full-function device (FFD).A node responds to an image query by generating a raw image within its sensing area, compressing the raw image and then applying forward error correcting (FEC) code before transmitting this image which is a general process of image transport in WISN.
The proposed model considers both hard and soft fault [22].In hard-fault situation, the sensor node is unable to communicate with the rest of the network, whereas a node with soft fault continues to operate and communicate with altered behavior.These malfunctioning (soft faulty) sensors could participate in the network activities since still they are capable of routing information.The proposed model assumes that the sensor fault probability p is uncorrelated and symmetric, that is, where S is the sensed image data by the sensor node, and A is the actual image data.

Architecture of Proposed Wireless Image Sensor Nodes.
In this section, the architecture of the proposed image sensor nodes is described in details (Figure 1).CMOS image sensors have received greater attention over the last few decades because their performance is very promising compared to CCDs [2,3].However, remote and dangerous environments put more stress on the image sensing system (from radiation, heat, or pressure), possibly leading to pixel failure while making the replacement of faulty systems difficult.A fault-tolerant architecture [23] for CMOS camera can be adapted that effectively combines hardware redundancy in the active pixel sensor (APS) cells and software correction techniques.But this fault-tolerant architecture can tolerate up to certain pixel failure rate (PF rate ), beyond which the quality reduction (QR) of a corrected image may not be tolerable, and the CMOS camera may be detected as faulty.
Uncompressed raw image data require excessive bandwidth for a multihop wireless environment.Conventional image compression models [24] are not suitable for resourceconstrained wireless sensor networks because they require complex hardware and make the energy consumption for computation comparable to communication energy dissipation.The proposed architecture uses compression technique as suggested in [25].
Forward error correction coding is required to achieve reliable transmission.The proposed architecture uses Reed-Solomon (RS) codes to identify and correct errors in transmission.Coding redundancy determines the error correction capability of an RS code.A self-checking RS encoder [26] is used by the proposed architecture.As suggested in [3], wireless connection to other motes in the network can be established through a Texas Instruments CC2420 2.4 GHz IEEE 802.15.4/ZigBee-readyRF transceiver.Each device in ZigBee contains information about those devices located within its transmission range.This information is held in a table called the neighbor table N(i).As suggested by the authors in [2], SAMSUNGs S3C44B0X is adopted as the embedded processor of image sensor node.

Distributed Fault Diagnosis Scheme
This section describes the novel model for energy-efficient diagnosis of WISNs.The proposed diagnosis scheme has two main phases: (i) detection phase and (ii) dissemination phase.

The Detection Phase.
In this phase, the node enters to normal mode (S3C44B0X mainly consists of four modes: normal mode, slow mode, idle mode, and stop mode).The normal mode is used to supply clocks to CPU as well as all peripherals in S3C44B0X.CPU wakes up image sensor and image processing module from power down mode.Image sensor starts to capture image.In spite of the fault-tolerant architecture described in Section 3.1, an image produced by the image sensor may not be acceptable if the pixel failure rate is high.Thus, the CPU calculates the quality reduction (QR) in the corrected image using methods suggested in [23] and then makes a decision about whether or not to discard the image reading by comparing (QR) with a threshold (I th ).The embedded processor set F state is soft faulty if (QR) ≥ I th .The RS-encoder fault status of the proposed architecture can be mapped as follows: where PC out is the parity checker [26] output.Using (2), the embedded processor set F state is soft faulty or fault-free.The image processing module fetches the 8 × 8 test image stored in shared memory.The test image is processed in the processing module, and the generated coded bit stream is sent to the embedded processor.Then, the processed image is packed into the diagnosis packet format required by network protocol.CPU configures CC2420 into transmission mode.Packets are broadcasted by CC2420, and the node returns to the receive state.For each fault-free sensor node, its neighboring fault-free sensor nodes have broadcasted similar coded information.Let v i be neighbor of v j and C i contains the coded information at node v i .The node v i agrees with v j only when the hamming distance is between C i and C j ; (1) Obtain the sensor reading (image) (2) Evaluate QR i and RS statusi .
(3) Broadcast the coded test image S i .(4) Set timer T out (5) Obtain the sensor readings of 1-hop neighbors {N i }. (6) if T out = true then (7) Declare unreported nodes v j ∈ {N i } as hard faulty.i.e., F statej ← hard faulty.(8) end if (9) Determine {E}, the set of 1-hop neighbors report identical sensed data S. (10)  H i j ≤ δ where H i, j = number of ones in (C i XORC j ) and δ is the maximum number of bits a Reed-Solomon decoder can correct.For RS(n, r) with s-bit symbols, δ = (n − r)/2 .An arbitrary node v i receives the sensor reading from neighboring nodes and forms a set ({E} ⊂ {N (i)}) of nodes with similar reading S. Node v i then compares its own reading S i and takes a decision on the basis of agreement and disagreement.In this phase, each sensor node makes a decision about whether or not to discard its own sensor reading in the face of the evidences |{E}|, QR, and PC out .A formal description of this phase is presented in Algorithm 1.The value for this threshold is T h = 0.5(N − 1) (see the Appendix).
The detection algorithm uses timeout mechanism to detect hard faulty nodes.The node v i declares node v j ∈ N i as hard faulty if v i does not receive the sensor reading from v j before T out .The node v j cannot report to v i if either the transceiver of v j is faulty or battery is drained or node is completely damaged.At the end of detection phase, every fault-free node in the network has the local diagnostic view.

Dissemination Phase.
The local diagnostic snapshots are disseminated to obtain a global diagnostic view of the network.The local diagnostic views are disseminated using as ST which is constructed immediately after the deployment of the network.This work uses UDG-NNT algorithm [27] to construct an ST where each node is assigned a rank.The sink node has the highest rank in the network.Each node v i , except sink node, selects the nearest node v j among its neighbors such that rank(v i ) < rank(v j ) and sends a connect message to v j to inform that (v i , v j ) is an edge in the ST.In order to maintain a connected ST, immediately after detection phase nodes check whether they are still connected to the ST or not.If a node notices that its parent is faulty, then it sends a connect message to nearest fault-free node with higher rank.
All leaves of the ST send their local diagnosis views to their parents.Each parent has to wait until it collects diagnostics from each of its children.Once the parent has collected the diagnostics, it combines all of them with its own local diagnostic and updates its fault table.After updating the aggregated diagnostic message is transmitted to its parent in the ST, and the process continues until the sink node collects all the local diagnostics.Once sink node has the global diagnosis view, it disseminates it down the tree to all nodes.The proposed model now can identify the set of faulty nodes {v i } i∈FT present in the network.Here, F T is the true set of faulty nodes present in the network at time T. The set of faulty node inferred by the model is F T .The difference between F T and F T , that is, (F T − F T ), is the diagnosis error.

Performance Evaluation
The four performance metrics, namely, diagnosis latency, message complexity, detection accuracy (DA), and false detection rate (FDR) are used to evaluate the performance of the proposed algorithm.DA is defined as the number of faulty sensor nodes detected to the total number of faulty sensor nodes in the network.FDR is defined as the ratio of number of fault-free sensor nodes detected as faulty to total number of fault-free nodes in the network.The upper bound time complexity is expressed in terms of the following bounds: (i) T p : an upper bound on the time needed to propagate a message between sensor nodes; (ii) T dip : an upper bound on the time required to encode (compression and RS encoding) the image.
Lemma 1.The proposed diagnosis model terminates before time T dip + (2d st + 3)T p + T out , where d st is the depth of the spanning tree.
Proof.The detection phase takes at most T dip + T out time in detecting its own status and to obtain IDs of hard faulty 1-hop neighbors.In ST maintenance phase, the node with faulty parent needs at most 3T p time to get connected with ST.In at most d st T p , the sink node obtains the global diagnostic view of the network.The sink node disseminates this view that reaches the farthest node in at most d st T p .
In worst case, d st = n − 1.Now, the upper bound time complexity can be expressed as The total number of messages exchanged by nodes to establish a complete and correct diagnosis is termed as message complexity.
Lemma 2. The proposed model has a worst-case message complexity O(n) in the network.
Proof.The diagnosis starts at each node by sending the coded message to its neighbors, costing one message per node, that is, n messages in the network.In ST maintenance phase, the node with faulty parent needs three message exchanges to get connected with ST.In worst case, all nodes except sink node need to find a new parent to maintain ST, that is, 3(n − 1) messages need to be exchanged in the network to maintain ST.Each node, excluding the sink, sends one local diagnostic message.Each node, excluding the leaf node, sends one global diagnostic message, and in worst case, depth of ST is n−1.Thus, message cost for disseminating diagnostic messages is 2(n − 1).So, the total number of exchanged messages is 5.1.Simulation Results.Performance of the proposed scheme via simulations is presented in this section.This work uses OMNET++ as the simulation tool where all simulations are conducted on networks using the IEEE 802.15.4 at the MAC layer.The free space physical layer model is adopted where all nodes within the transmission range of a transmitting node receive a packet transmitted by the node after a very short propagation delay.The set of simulation parameters are summarized in Table 1.
The RS code is used with m = 8 bits per symbol, n = 255, and r = 223.For RS encoder, the time cost is 1.02 msec to encode bit stream for 8 × 8 image.The time consumed in compression is 4.08 msec [25] (for 8 × 8 test image).The threshold value is I th = 30% pixel failure rate.The test image used is the 8×8 block of Lena image.Every result shown is the average of 100 experiments.Each experiment uses a different randomly generated topology.

Experiment 1.
In this experiment, the two performance metrics, namely, DA and FDR, of the proposed work are compared with the schemes proposed by [15,16] for varying node failure rate and average numbers of neighbor nodes (d).In this simulation experiment, sensor nodes are assumed to be faulty with probabilities of 0.05, 0.10, 0.15, 0.20, 0.25, and 0.30.Both hard and soft faulty nodes are randomly deployed in the network.The simulation result for low average number of neighbor nodes d ≈ 4 is shown in Figure 2.
The main reason for not achieving an extremely high performance is that for a low d fault-free sensor nodes are unlikely to pass the threshold test.The detection accuracy of the proposed work outperforms that of the scheme proposed by [16].The work of [15] shows a marginal improvement over our work.The reason is that for T h = 0.5(N − 1) there is a probability that a faulty node with more than 0.5(N − 1) faulty neighbors is detected fault-free.The scheme proposed by [15] considers T h ≈ d and the probability of a node with d number of faulty neighbors is very less.Further, their scheme needs more n number of message exchange in the network to achieve this marginal improvement.However, the proposed work shows better performance in terms of FDR.If we put these results into context, we will find that since the proposed scheme will be used in WISNs, which are known to be resource constraint, it would be preferable for a proposed scheme to maintain lower FDR and to be communication efficient.In other words, it would be better to achieve high network reliability while maintaining high level (>95%) of detection accuracy, which is what the proposed work tries to achieve.DA and FDR for d = 8 and d = 12 are plotted in Figures 3 and 4, respectively.The key conclusion from these plots is that the performance of the detection model increases with the increase of d.For d = 12, DA of the proposed work is very close to the scheme of [15] while maintaining low FDR.Due to the expected high node degree in wireless sensor networks, the proposed fault diagnosis scheme is robust.

Experiment 2.
In this experiment, the average and worst-case latency of isolation of unhealthy nodes for varying node failure rate and d = 12 is analyzed.Figures 5 and  6 show the diagnosis latency of the proposed work.From Lemma 1, it is obvious that dissemination of diagnostics contributes more to change in diagnosis latency with respect to node density.The depth of the ST decides the variation in diagnosis latency, as it is used to disseminate diagnostics.Thus, as expected and depicted in Figure 6, the time required to diagnose the WISN remains almost constant with change in fault rate.

Conclusions and Future Work
This paper presents a distributed model to address the fundamental problem of identifying faulty (soft and hard) nodes in a WISN.The model is simple and detects faulty sensor nodes with high accuracy for a wide range of fault