A Perturbed Compressed Sensing Protocol for Crowd Sensing

Crowd sensing network is a data-centric network consisting of many participants uploading environmental data by smart mobile devices or predeployed sensors; however, concerns about communication complexity and data confidentiality arise in real application. Recently, Compressed Sensing (CS) is a booming theory which employs nonadaptive linear projections to reduce data quantity and then reconstructs the original signal. Unfortunately, privacy issues induced by untrusted network still remain to be unsettled practically. In this paper, we consider crowd sensing usingCS inwireless sensor network (WSN) as the application scenario and propose a data collection protocol called perturbed compressed sensing protocol (PCSP) to preserve data confidentiality as well as its practicality. At first, we briefly introduce the CS theory and three factors correlated with reconstruction effect. Secondly, a secure CS-based framework using a secret disturbance is developed to protect raw data in WSN, in which each node collects, encrypts, measures, and transmits the sampled data in our protocol. Formally, we prove that our protocol is CPA-secure on the basis of a theorem. Finally, evaluation on real and simulative datasets shows that our protocol could not only achieve higher efficiency than related algorithms but also protect signal’s confidentiality.


Introduction
Crowd sensing network is a powerful sensor network utilizing the force from crowd. Crowd sensing is a form of network wireless sensing, which can be achieved by exploiting WSN. With enormous sensors deployed, WSN is limited by its relatively weak computational capability and low energy reservation. The primary task of WSN is to sense, transmit, and process packets while maintaining the energy cost to the minimum.
In traditional WSN, where communication is conducted via intranet or private network, bandwidth is severely consumed and certain commands from sensor nodes cannot be timely relayed to information server because great amounts of data collected during collection phase need to be transmitted. On the other hand, since trust management is maintained in public network, data confidentiality may be exposed. Hence, how to reasonably design secure transmission schemes in WSN has become a precondition for applying WSN to many fields extensively.
Without the traditional signal acquisition process constraint, Compressed Sensing (CS), proposed by Candes et al. [1] and Donoho [2] in 2006, is a booming theory that captures and represents compressible signals at a sampling rate significantly lower than the Nyquist rate [3][4][5][6]. It first employs nonadaptive linear projections that preserve the structure of the signal, and then the signal reconstruction can be conducted using an optimization process from these projections. Compressive sensing has a wide range of applications such as compressive detection and estimation, DNA microarray, and distributed compressed video sending [7].
Moreover, traditional data compressing method of WSN comes with several disadvantages, including the following.
(1) Several important components and corresponding locations need to be preserved after orthogonal transformation in data compressing; otherwise, the original data could not be recovered [7]. (2) In layered multihop WSN, owing to the hardware limitation, sensors' energy storage is constrained to a low level. Intuitively, nodes closer to sink node will die sooner thanks to their faster battery consumption rate, which would result in the imbalance of energy consumption among sensors in different positions. Due to the advantages of CS, more and more CS techniques have been integrated into WSN, but most of them only consider the time relativity of a single node. In fact, space relativity can also be traced in nodes of WSN, leading to Distributed Compressed Sensing (DCS) which views the raw data as original signal and compress the signal before transmitting. DCS has advantages as follows. (1) The random measurement from DCS is a random linear combination of every element in original signal. Thus, losing part of measurement will not affect the reconstruction of original signal. (2) In DCS-based WSN model, data quantity of each node remains the same, so energy consumption is balanced and network lifetime is prolonged.
Although DCS can effectively solve the problems raised by traditional methods, data security can never be overlooked. Researches on CS security still need to be explored. Some [8][9][10][11] tried to modify the measurement matrix but failed to apply their schemes in WSN; others [12] performed encryption (like AES, etc.) after the data is compressed to protect data security, but secure network is required. Notice that most WSN is deployed in remote, unattended, or even hostile environment, meaning node's reliability is difficult to guarantee. Therefore, it is crucial to design a secure model. In this paper, we propose a perturbed compressed sensing protocol (PCSP) to preserve data confidentiality with high practicality. Our contributions are listed as follows.
(i) We propose a perturbed compressed sensing protocol (PCSP) in WSN for crowding sensing and our PCSP can reduce communication complexity explicitly. (ii) We prove that our PCSP can provide data confidentiality; to be more specific, our PCSP is proved to be chosen-plaintext attack secure. (iii) We systematically evaluate our PCSP by comparing its performance with existing approaches. Experiments show that our PCSP achieves higher accuracy of recovery.
Organization. The rest of this paper is organized as follows.
In Section 2, we review the related work presented in the literature. Then, we briefly introduce the main idea of CS in Section 3. Section 4 illustrates our protocol in detail. While security is discussed in Section 5. We systematically evaluate performance of PCSP by making comparisons with existing approaches in Section 6; in addition, limitations of our protocol and future work are explained in Section 7. At last, we conclude this paper in Section 8.

Related Work
Compressed Sensing (CS) is a new method for compressing signal which breaks through the traditional limit of sampling frequency. Through matrix computation at the encoding end, we can compress the original signal from high dimension to low dimension with a small sampling frequency and low computation complexity. At the decoding end, the original signal is reconstructed by solving a convex optimization problem. Meanwhile, CS is capable of providing a good encryption feature on its interior structure level. Because the projection is a function value of measurement matrix which can be seen as a shared key between encoding end and decoding end.
Researches on CS put focus upon three factors associated with the reconstruction effect: sparse representation, measurement matrix, and reconstruction algorithm improvement. As a precondition for applying CS, common methods for sparse representation are discrete cosine transform basis, fast Fourier transform basis, disperse wavelet transform basis, Curvelet basis, Gabor basis, and redundant dictionary [15]. In particular, redundant dictionary or overcomplete dictionary can adaptively find out the optimal base according to the sparse property of different signal such that the minimum sparsity on this base and the best signal compression degree are both reached. For measurement matrix, Null Space Property (NSP) [16] and Restricted Isometry Property (RIP) [1,[17][18][19] should be satisfied; these matrixes include Gauss random matrix, Bernoulli measurement matrix, sparse stochastic matrix, toeplitz matrix, and circulant matrix. The work in [1,2,15,20] proved that measurement matrix making up of independent and identical distributed Gauss random variable is irrelevant with any overcomplete redundant dictionaries, and accurate recovery of original signal can be guaranteed even after the signal is compressed. Hence, Gauss random matrix is one of the best options for measurement matrix, but doing so brings high complexity and pseudorandom matrix is an alternative choice in researches. In recent years, researchers have been working on robust pursuit algorithm, such as greedy pursuit (including MP [21], OMP [22], StOMP [23], and ROMP [24]), convex relaxed approach (including BP [25], interior point method [26], gradient projection method [27], and iterative threshold method [28]), and the combination of the former two (including Fourier sampling [29] and HHS [30]).
The classic OMP [22] is a greedy pursuit, the basic idea is transvection computation, and the most related (to compressed value ) column vector is selected in each iteration, until the reconstruction sparse representation of original signal is found. Then we can retrieve original signal through spares inverse operation and decryption. Its advantage is convenient implementation, whereas the disadvantage is that multiple measurements are required.
As long as CS is proposed, how to use CS to provide data security is also a research hotspot. The work in [30][31][32][33] pointed out that the linear projection on measurement matrix is essentially a protection of data secrecy to some extent. The work in [30] analyzed the security of CS under several possible attacks. The work in [31] compared CS with other encryption methods through quantization. The work in [32,33] designed the measurement matrix as symmetric secret keys such that eavesdroppers cannot obtain original signal. The work in [12] adopted AES and SHA to provide data confidentiality and data integrity after data compression.
Regarding the security problem raised by applying CS to WSN, this paper proposes an encryption method based on existing DCS model. Analysis and experiments show that our approach can provide data confidentiality with high accuracy.

Preliminary
First, let us take a review at the basic principles of CS. CS theory suggests that -dimension original signal can be linearly projected into × 1 matrix by × measurement matrix Φ. If using some orthogonal basis or atomic set Ψ, such as Gabor basis and redundant dictionary [15], which is used in our frame, can be interpreted as a vector ∈ with only nonzero elements which means (1) We call -sparse and the solution to equation above sparse representation or sparse decomposition. To further explain (1), we have and (2)  . (3) Then can be projected on × measurement matric Φ to obtain × 1 vector: where = ΦΨ is the sensing matrix. Meanwhile, the measurement matrix requires satisfying NSP and RIP. In [18,34], Gauss random matrix is proved to be appropriate, so it is used in our protocol to measure signal. Then thedimension projection is transmitted to receiver for recovering original signal. As introduced in Section 2, in CS field, OMP algorithm is a classical recovery algorithm, which can obtain the sparsity coefficient of data. Therefore, our recovery algorithm is based on OMP algorithm. To further study it, OMP algorithm is described in Algorithm 1.

Perturbed Compressed Sensing
Protocol (PCSP) 4.1. Network Assumption. For simplicity, we denote smart device and sensor as node in the rest of the paper. Also we assume a general multihop network with nodes and alive, a sink node , and a trusted server . The overview of WSN is shown in Figure 1. Each node is required to register with the trusted registration authority RA to share a secret key with . Nodes can collect environmental information such as temperature, humidity, and pressure. They can also receive node information from last hop node and forward node information to the next hop node. can compute Input: compressed signal , sensing matrix ; and signal sparsity : , compute the least square solution:  each node's corresponding measurement coefficient matrix as sensing matrix for reconstruction.
Each alive node generates a packet. As packets travel towards , our protocol allows each node to choose the nearest node whose distance to is smaller as the next hop node to forward the packet. The category of collected information is distinguished by the network layer data packets. The format of a packet (8 bytes) is shown in Figure 2. Where ID is the ID number of current node. Flag represents the category of collected information by nodes (1 is temperature, while 2 indicates humidity, and 3 represents pressure; also, we use 4 and 5 to denote light and salt, resp.). The value of collected environmental information can be read from Value. For ID List, it is NULL if the node is a leaf node; otherwise, we use received ID List with ID of current node appended as the ID List. The number of node information gathering round is stored in Round Number. The checksum of all bits is written in Check. Due to the fact that framework of this paper is independent with network layer protocol, the data is abstracted to pure digital signal in our following discussion.

Adversarial Model.
We consider a setting with a polynomially bounded adversary capable of controlling a certain number of nodes completely. Once the adversary compromises a node, it can obtain all the node's secret keys and modify, forge, or discard messages or simply transmit false aggregation results, and its goal is to launch stealthy attacks [35] where the attacker's goal is to make accept false aggregation results while not being detected.

PCSP.
We assume that the final result sensed by nodes is × 1 matrix. Disturbances ( , ) ( = 1, 2, . . . , ) are added to correlative element of to ensure confidentiality, where is the number of the round. Each node encrypts its sensory data to Enc( ) which is transformed into linear projection on measurement matrix Φ . From the perspective of the whole network, the raw data is changed to encrypted data Enc( ), which is transformed into compressed data . When final projection arrives at through Internet from , a perturbed orthogonal matching pursuit algorithm (POMP) is performed to recover the data Enc(̂), and then should decrypt it to obtain original datâ. Data transformation based on PCSP is shown in Figure 3. Our protocol can be divided into two major components expounded as follows.

Data Compression and Encryption during Free Routing.
Before sensing from nodes, the trusted server should do some preparing work, as shown in Algorithm 2.
For node , its task is to collect raw data, compute linear projection on measurement matrix, and forward message, which are described as follows.
In round , first senses raw data (like temperature) and encrypts to ciphertext: where is the secret key of and is a hash function. We can see Enc( ) as Then computes its corresponding measurement coefficient matrix: which is th column in measurement matrix Φ. At last, forwards signal (message): to the next node. After receiving message , node (using the same method to obtain ) only needs to add its measurement to and sends the result to next hop until the last one sends data Input: length , key generation algorithm keyGen, original signal : begin (1) round number = 1; (2) for ← 1 to do = keyGen( ); distribute to node ; (3) construct Gabor dictionary parameter group ⟨ , /2, , V⟩;

Security Analysis
Adversaries can compromise a fraction of nodes in sensor network. After a node is compromised, its private information such as secret key and ID will be leaked to adversary who can launch stealthy attack to make accept false data without being detected.
We consider the situation where the adversary is trying to forge a valid Enc( ) without the knowledge of . Apparently, the possibility relies on the pseudorandomness of the hash function we chose and we believe the probability of generating an authentic Enc( ) is approximately 1/2 . Formally, our protocol is proved to be a chosen-plaintext attack secure based on Theorem 1.

Theorem 1. If is a pseudorandom function, the PCSP scheme is secure under a chosen-plaintext attack.
Proof. Assume that is a random function. We construct a new scheme which is exactly same as PCSP scheme, except that the pseudorandom function is replaced by . Since is a random function, the probability that the adversary chooses the correct plaintext from the challenge cipher text is exactly 1/2. Now we consider the PCSP scheme in the chosenplaintext attack. Here we define the probability that the adversary wins the chosen-plaintext attack: that is, 1/2 + ( ), where is the security parameter. We then construct a distinguisher to distinguish and as below: runs the adversary to attack PCSP scheme under chosen-plaintext attack experiment.
(1) When a message needs to be encrypted, sends the adversary + ( , ).
(2) When two plaintexts 0 and 1 are received, flips a coin , ( = 0 or 1), and sends the adversary + ( , ). Here is one of pseudorandom functions or random functions.
(3) When the output of the adversary is received, outputs = if the adversary wins; otherwise, outputs = .
From the viewpoint of , if = , the probability that the adversary wins is 1/2 + ( ). Otherwise, the probability that the adversary wins is 1/2, since the challenge cipher text is a random number. Therefore, the probability that wins is ( ). Finally, ( ) must be negligible.

Evaluation
In this section, we attempt to present the performance evaluation results on the real and simulative datasets. To evaluate the efficiency of our protocol, we follow the estimation error used in [36] to compare the accuracy among PCSP and three related algorithms (see details in Experiment 1). Later, we conduct simulation experiments with encryption/decryption and then encryption/decryption is removed in Experiment 2 for proving that our proposed protocol is effective to protect the confidentiality of data while preserving accuracy (as shown in Experiment 2).

Experiment 1 (comparison with related algorithms on real datasets). Datasets used in this experiment contain NBDC-CTD [14]
and InelLab [13], of which attributes are summarized in Table 1. We investigate performance of our method compared with the following state-of-art methods.
(1) Baseline. This algorithm uses basic routing and estimation methods, which is seen as baseline in [36]. Sensor node transmits packets to using the shortest path. When receives the final packet, it sends the final packet to information server, which takes advantage of the -Nearest Neighbors (kNN) [37] Algorithm to recover the data.
(2) CDG [38]. In this framework, the following treebased routing and traditional methods of CS for reconstructing the data collected from WSN are used. A sensor node will not send a packet to its parent node until receiving all packets from its children, so it collects all sensor readings to a packet. Convex optimization methods are used by information server to estimate the signal.  (3) CDC [36]. Opportunistic routing with compression and a NSRP-based estimator are utilized in the CDC scheme. The compression scheme adds or subtracts the reading of last hop node as the packet travels towards . Information server employs random linear projections of the orthonormal basis to estimate the coefficient vector to recover original data, because nonuniform sparse random projections (NSRP) used in compressing can preserve inner products within a small error.
We follow a classic evaluation criterion named as estimation error [36] (EE) defined in (11) and observe the performance of our method compared with CDG, CDC, and baseline algorithms: We run all of these algorithms 50 times and calculate the mean of their EE, respectively. A conclusion that the estimation error of our protocol is robust to the scale of the WSN can be inferred from Experiment 2 (comparison with encrypted and unencrypted data on simulative datasets). First of all, initialization algorithm (Algorithm 2) is run to start network sparse learning on encrypted and unencrypted data. In encryption process, nodes sense and encrypt data. Then we use pseudorandom Gaussian matrix to generate measurement matrix Φ and final signal is arrived at . takes advantage of POMP algorithm to obtain̂. In the process without encryption, nodes just sense data. Then we make use of the measurement matrix Φ generated in encryption process, and then arrives at . Later on, runs OMP algorithm to reconstruct original signal̂. If round number is bigger than the threshold, then reinitialize the whole network. The parameters of two experiments are listed in Table 2.
To estimate the performance of our method compared with unencrypted data method, we employ EE mentioned in Experiment 1 and another criterion defined in We conduct experiments 50 times in which the mean of EE is calculated, also original data, recovered data, encrypted data, compressed data, and estimation error EE as well as error are recorded, as shown in Figure 5. Figure 5(a) indicates that original signal, recovered encrypted signal, and unencrypted signal keep the same trend. While Figure 5(b) presents the encrypted data, which cannot be utilized to speculate on the original data. Compressed result of encrypted data can be seen in Figure 5(c), whose dimensionality is lower than original data (125 < 200). As shown in Figure 5  that of original data. We demonstrate error density of these two experiments in Figure 5(e) and details in Figure 5(f).

Discussion and Future Work
Crowd sensing by applying compressed sensing to WSN is an extremely complex task. Despite the fact that work done in this paper can initially perform the task with sensor energy balanced while preserving data privacy, some challenges still remain to be addressed. Firstly, network synchronization is necessary between WSN and to obtain the number of rounds and keys for encryption/decryption. Another one is that our work only considers protecting data confidentiality rather than preserving data integrity and availability. Several improvements still need to be considered as follows. (1) The accuracy of reconstruction algorithm can be increased. (2) More security features (such as data integrity and availability) can be further studied.

Conclusion
In the context of crowd sensing in WSN, we proposed a perturbed compressed sensing protocol (PCSP) combined with compressed sensing technology to solve the issues about data confidentiality and sensor energy. Our protocol can be summarized into two components, in which encrypted data is obtained by perturbing sensor data gathered by each node; then, data compression by crowd sensing in WSN is enforced by linear projection utilizing compressed sensing. Afterwards, we presented performance analysis and security analysis along with experiments results which demonstrated that our protocol is capable of transmitting signal at a low energy cost while preserving data confidentiality. At last, we described limitations of our protocol with future work followed.