A Random Walk-Based Energy-Aware Compressive Data Collection for Wireless Sensor Networks

The energy e ﬃ ciency for data collection is one of the most important research topics in wireless sensor networks (WSNs). As a popular data collection scheme, the compressive sensing- (CS-) based data collection schemes own many advantages from the perspectives of energy e ﬃ ciency and load balance. Compared to the dense sensing matrices, applications of the sparse random matrices are able to further improve the performance of CS-based data collection schemes. In this paper, we proposed a compressive data collection scheme based on random walks, which exploits the compressibility of data vectors in the network. Each measurement was collected along a random walk that is modeled as a Markov chain. The Minimum Expected Cost Data Collection (MECDC) scheme was proposed to iteratively ﬁ nd the optimal transition probability of the Markov chain such that the expected cost of a random walk could be minimized. In the MECDC scheme, a nonuniform sparse random matrix, which is equivalent to the optimal transition probability matrix, was adopted to accurately recover the original data vector by using the nonuniform sparse random projection (NSRP) estimator. Simulation results showed that the proposed scheme was able to reduce the energy consumption and balance the network load.


Introduction
This paper considers the energy efficiency issue of compressive data collection in wireless sensor networks (WSNs). A WSN is consisted of low-cost, low-power, and energyconstrained sensors which acquires and transmits information to the sink through wireless links [1][2][3][4][5]. In the area of Internet of Things (IoT), a WSN is regarded as a key technology for the data sensing and collection [6,7]. One of the most important factors that affects the performance of WSNs is the energy limitation of sensors [8,9]. A sensor will cease to operate if it depletes its battery energy. We intend to design an energy-efficient data collection scheme by applying the compressive sensing (CS) technology and random walks.
A popular approach to the data collection problem is the application of the CS technology [10,11]. In the CS technology, data are assumed to be sparse or sparse under some basis, which is very appropriate for data in WSNs [12,13].
The key idea behind CS is that, by exploiting the sparsity of the original data vector, a high dimensional data vector can be reliably recovered from a significantly lower number of measurements. Early works mainly focus on applying the CS technology with a dense sensing matrix [14][15][16]. Luo et al. [14] proposed the Compressive Data Gathering (CDG) scheme in which the sink collects linear combinations of the original data vector instead of the individual data sample. The sink is able to recover the original data vector through solving an ℓ 1 -based convex optimization as long as a sufficient number of linear combinations are collected. Compared with traditional schemes, the CDG scheme not only reduces the energy consumption but also evenly distributes loads across the network. The reference [15] improved the CDG scheme and proposed a hybrid-CS scheme in which data are only encoded at overloaded nodes. This significantly reduces the load of nodes which are far away from the sink. The authors showed that, compared with the CDG scheme, the hybrid-CS scheme can further improve the throughput of networks. Adopting the idea of the CDG scheme, the reference [16] considered not only the energy efficiency but also the delay of data collection. The authors proposed a joint optimization problem which aims to minimize the delay of data collection with bounded transmissions. The NPhardness of the joint optimization problem was proved. Thus, the authors proposed an approximation solution which decomposes the joint optimization problem into a forwarding tree construction subproblem and a link-scheduling subproblem.
Without the sacrifice of recovery fidelity, sparse random matrices have been proven to give better energy efficiency than the dense random matrices [17,18]. Under the CDG framework of WSNs, the sparse random matrix can be either uniform [17][18][19][20] or nonuniform [21][22][23][24]. In the uniform sparse random matrix, each entry is equal to zero with an identical probability. However, in the nonuniform sparse random matrix, entries in different columns are equal to zero with variational probabilities. Wang et al. [17] proposed a class of uniform sparse random matrices that do not compromise the recovery performance when compared to a Gaussian sensing matrix. In their scheme, each node aggregates one measurement as a linear combination of the original data vector. The sink collects measurements from nodes through different shortest paths. Zheng et al. [18] proposed a random walk-based data collection scheme and provided mathematical foundations from the perspectives of the CS and graph theory. They showed that uniform sparse random matrices which are constructed from the proposed random walk scheme satisfy the expansion property of expander graphs. Singh et al. [19] proposed an On-Demand Explosion-Based Compressive Sensing (ODECS) technology to reduce the required number of measurements for the recovery of data vector by exploiting the rate of change of the data vector. The ODECS technology is able to adapt itself to the occurrence of events. It has very low communication rate when events are absent. Considering problems in existing schemes, such as the semidynamic routing, the nonuniform sampling, and the dependence on global coordinate information, Zhang et al. [20] proposed a dual random walk-based compressive data collection scheme. In the proposed scheme, a dual random walk, which does not rely on coordinate information, was first designed to achieve a uniform sampling. Then, depending on the dual random walk, a dynamic and distributed CDG-based scheme was proposed to enhance the network dynamic adaptability.
Recently, nonuniform sparse random matrices were proved to give similar performance as the uniform sparse random matrices [21][22][23][24]. Liu et al. [21] proposed a novel compressive data collection scheme which compresses data under an opportunistic routing. The proposed scheme requires fewer compressed measurements and allows a simpler routing strategy without excessive computation and overheads. Moreover, the authors proposed the nonuniform sparse random projection (NSRP) algorithm to recover the original data vector. They proved that the NSRP-based estimator can achieve the optimal estimation error bound. Considering the large transmission energy consumption and low recovery accuracy problem in traditional schemes, Zhang et al. [22] proposed a ring topology-based compressive sensing data collection scheme. In the proposed scheme, the total number of hops is reduced by a ring topology-based random walk, and the recovery accuracy is improved by the dual compensation-based compressive sensing measurements. Huang and Soong [23] proposed a cost-aware stochastic compressive data collection scheme, where the cost diversity and the stochastic data collection process are considered by using the Markov chain model. The proposed scheme is aimed at minimizing the expected cost of a random walk subjected to constraints on the global degree of randomness and recovery error. Without loss of the recovery accuracy, the proposed scheme not only reduces the expected cost but also prolongs the network lifetime due to the load balance feature. The reference [24] proposed a mobile CDG scheme including a random walk-based algorithm and a kernel-based method for sparsifying sensory data from an irregular deployment. The sensing matrix, which is constructed from the proposed random walk algorithm combined with a kernel-based sparsity basis, was proved to satisfy the restricted isometry property. Moreover, the authors proved that Oðk log ðn/kÞÞ measurements, which can be collected within Oðk log ðn/kÞÞ steps, were sufficient for the accurate recovery of k-sparse signals in a network with n nodes.
In this paper, we propose a data collection scheme for WSNs by integrating the compressive sensing technology and random walks. The total amount of energy consumption is reduced by exploiting the compressibility of the original data vector. Measurements are collected along random walks so that the local energy consumption is balanced. Specifically, each measurement is a linear combination of the original data in nodes which occur in a random walk. Each random walk is formulated as follows. Initially, a node except for the sink is selected as the starting node with a probability that is determined by the residual energy of every node in the network. Then, data are forwarded to the sink in a multihop manner. During the transmission process, each node selects the next hop node from its candidate nodes according to a probability distribution that is determined by the residual energy of its candidate nodes. We can model this stochastic process as an absorbing Markov chain. The key problem is that how to determine the transition probabilities of nodes in every random walk. We formulate this problem as an optimization problem which aims to find the optimal transition probability matrix such that the expected cost of a random walk is minimized. The Minimum Expected Cost Data Collection (MECDC) scheme is proposed to iteratively find the optimal transition probability matrix. After obtaining the optimal transition probability matrix, the sink is able to construct an equivalent sensing matrix based on the optimal transition probability matrix. Eventually, by using the NSRP-based estimator [21], the original data vector can be accurately recovered. For the compressive data collection problem, the reference [23] adopted a similar idea as this paper. However, in their scheme, each random walk starts at a fixed node, which results in the rapid energy expenditure of the fixed node. This paper extends the reference [23] mainly in five aspects: (1) the starting node of random walks is variational; (2) nodes' residual energy is considered for the balance of network load; (3) the MECDC scheme along with its distributed realization is proposed; (4) the process of collecting measurements is accelerated by partitioning the 2 Wireless Communications and Mobile Computing network into layers; (5) computation of optimal transition probability matrix is simplified.
The main contributions of this paper are summarized as follows: (i) We propose a random walk-based compressive data collection scheme which exploits the compressibility of the original data vector. Random walks are responsible for the collection of measurements. In order to reduce the energy consumption and balance the network load, the residual energy of nodes is considered in the process of data collections (ii) The absorbing Markov chain model is adopted to characterize the stochastic of a random walk. We formulate an optimization problem to minimize the expected cost of a random walk and propose the MECDC scheme to find the optimal transition probability matrix (iii) A distributed realization of the MECDC scheme is proposed, where the update of transition probabilities for each node can be obtained only based on the information of its neighbors (iv) Simulation results are provided to demonstrate that the proposed scheme can both reduce the energy consumption and balance the network load The remainder of this paper is organized as follows. In Section 2, we present preliminaries of this paper. Next, we introduce the system model and problem formulation in Section 3. The MECDC scheme is proposed in Section 4. In Section 5, we present simulation results. Finally, Section 6 concludes the paper.

Preliminaries
We will use boldface letters to denote vectors and matrices. The ith entry of vector x is denoted by x i . The entry in the i th row and jth column of matrix A is denoted by a ij . Denote ½n ≔ f1, 2,⋯,ng. A vector x is said to be k-sparse if the number of nonzero entries does not exceed k. Consider the following linear model: where A ∈ ℝ m×n (m ≪ n) is referred as the sensing matrix, each entry y i in vector y is referred as a measurement, and x ∈ ℝ n is the data vector to be recovered. It is well known that the Gaussian random matrix can be used as the sensing matrix. When m ≥ Oðk log nÞ, a k-sparse data vector can be recovered with high probability via linear programming [25,26]. It has been shown that sparse random matrices provide similar recovery performance as the Gaussian random matrix [17,27]. A sparse random matrix is a matrix whose entries are zero with some probability. Importantly, if a ij = 0, we will not need the data x j when collecting y i , because y i is a linear combination of entries in x. By exploiting the sparsity of the sensing matrix, potential improvement including the reduced energy and data collection delay can be obtained [8,9]. In this paper, we consider the problem of recovering a compressible data vector by using a nonuniform sparse random matrix. Compressible data vectors can be seen as a subset of sparse data vectors. Specifically, a compressible data vector x can be represented as x = Ψθ, where Ψ is an n × n orthonormal basis and θ is a coefficient vector that decays according to the power law [28]. If we rearrange entries of θ according to the magnitude, then the ith largest entry θ ðiÞ satisfies where c is a constant and z controls the rate of decaying. Throughout this paper, we assume that the data vector x is compressible in some basis. The best k-term approximation of x is to keep the largest k coefficients and set the others to zero. Let b θ k be the coefficient vector of the best k-term approximation of x. Then, we have that [28].
wherex k = Ψ b θ k and ζ r are constant that only depends on r.
For a compressible data vector x, the reference [21] proposed a nonuniform sparse random projection-based estimator which gives comparable recovery performance as the best k -term approximation provided that m = Oðk 2 log nÞ and entries in A ∈ ℝ m×n are drawn i.i.d. from the following distribution [17,21,23].
where 0 ≤ π j ≤ 1 is a probability. Unlike the uniform sparse random matrices, the probability of being zero for entries in different columns of the nonuniform sparse random matrix varies.

System Model and Problem Formulation
3.1. Network Model. In this paper, we consider a multihop wireless sensor network consisting of n nodes with node n being the sink. Sensors are randomly deployed in a sensing field to sense the surrounding environment and then periodically report readings to the sink through multihop transmissions. Define x t i as the reading of sensor i at time instant t. The sink aims to collect data x t = ½x t 1 , x t 2 ,⋯,x t n−1 for different time instants. Previous works [17,28] have shown that most natural classes of signals, such as smooth signals with bounded derivatives and bounded variation signals, are compressible in some transform domain. As stated in the previous section, we assume that the data x t is compressible.

Wireless Communications and Mobile Computing
Without loss of generality, we assume that sensors are randomly deployed in a unit square, and each sensor is equipped with an identical battery with the initial power E 0 . Any two nodes are able to communicate with each other if the Euclidean distance between these two nodes is no more than the communication range R. The WSN is modeled as a connected graph G = ðV, EÞ with V = ½n the set of nodes including the root/sink n and E the set of edges/wireless links. Each edge is associated with a weight which is related to the residual energy of nodes. Specifically, we define w ij , the ijth entry of the weight matrix W ∈ ℝ ðn−1Þ×ðn−1Þ , as the weight of edge ði, jÞ representing the cost of transmitting data from node i to node j. Note that we omit edges related to the sink. Suppose each node knows information of its neighbors. Expect for the sink, we partition nodes into layers L k , k = 1, ⋯, T, where L k is consisted of the nodes at distance k from the sink. For any node i ∈ L k , its neighbors are divided into two disjoint sets: the successors set S i ≔ fj | ði, jÞ ∈ E, j ∈ L k−1 g and the predecessors set D i ≔ fj | ði, jÞ ∈ E, j ∈ L k+1 g. Let E r ðiÞ be the residual energy of node i. At the beginning of data collection, each node contains an identical initial energy E 0 .

Opportunistic
Routing. In this subsection, we describe how measurements are collected by the sink through random walks. The process of collecting measurements can be modeled as a discrete absorbing Markov chain [29] with the state set fs 1 , s 2 ,⋯,s n g and the transition probability matrix P. Each node in the network corresponds to a state in the discrete absorbing Markov chain. Specifically, we assume that node i ∈ ½n corresponds to the state s i , and s n is the absorbing state. The ijth entry in P, i.e., the state transition probability p ij , corresponds to the probability that the data is transmitted from node i to node j.
Our goal is to collect m measurements through m random walks. Each measurement corresponds to a random walk that starts from a randomly selected node and ends at the sink. Figure 1 shows the process of collecting a measurement, say the jth measurement y j , which corresponds to the jth random walk. Initially, node 1 in layer L k is chosen as the starting node. Then, it transmits data +x 1 or −x 1 to the randomly selected node 2 ∈ S 1 . Note that S 1 ⊆ L k−1 . Subsequently, node 2 adds or subtracts the received value to its own data and transmits the result, i.e., ±x 1 ± x 2 , to a randomly selected node, say node 3 ∈ S 2 . The above process is repeated until the sink receives the measurement y j = ∑ k i=1 ± x i . We can observe that the length of jth random walk is exactly the layer index of the starting node.
In general, the process of collecting each measurement starts at a randomly chosen node. In this paper, a node iði ≠ nÞ is selected as the starting node with probability p i . Then, node i randomly selects a successor according to a certain probability distribution and subsequently transmits its compressed data to the selected successor. After receiving data, the selected successor adds or subtracts its own data to the received data and transmits the result towards the sink. The process is repeated until the sink collects every measurement. Figure 2 shows the process of collecting seven measurements. In this figure, nodes are partitioned into layers based on its length to the sink, and there are five layers in Figure 2.

The Transition Probability Matrix and the Sensing
Matrix. The long-term behavior of random walks is closely related to the sensing matrix under the CS framework. In order to see this, let us write the transition probability matrix in the canonical form [29]. Figure 1: The example of collecting a measurement through a random walk. The random walk starts at node 1 ∈ L k and ends at the sink.  In other words, f ij is the expected number of occurrence of node j if the random walk starts at node i. In our formulations, every node occurs at most once in a random walk. Therefore, f ij represents the probability of a random walk that passes the node j if it is started at the node i. Furthermore, excepting for the sink n, node i is selected as the starting node with probability p i . Then, we have that where π j , the probability of node j in a random walk, is referred as the compression probability. As stated in the references [21,23], π j is exactly the nonzero probability of entries in the jth column of the sensing matrix A.

Problem Formulation.
The energy efficiency is the key issue in this paper. We intend to decrease the energy consumption of data collection and meanwhile balance the load of sensors. Since the opportunistic routing is a stochastic method, a natural idea to minimize the expected cost of a random walk. Specifically, we define c i , the ith entry in the vector c ∈ ℝ n−1 , as the expected cost of a random walk if node i is selected as the starting node. The goal is to minimize the expected cost of a random walk which is given by Furthermore, for any c i , we have that An immediate observation is that c i > c j if i ∈ L k , j ∈ L r with k > r. In other words, the expected cost of a random walk with the starting point in a high layer is more than that in a low layer.
Similar to the reference [23], we introduce the concept of randomness for data collection in order to avoid the vulnerability to attack and load unbalance. Specifically, by using the Shannon entropy [30], the local randomness of node i is denoted by Obviously, the uniform distribution achieves the maximum local randomness for each node. Let us consider a random walk with starting node k. The randomness of such a random walk is defined as the sum of weighted local randomness of nodes in the random walk: Eventually, the expected randomness of a random walk is given by The expected randomness of a random walk measures the uncertainty of the measurement that is collected through this random walk.
Recall that the goal is to estimate the transition probability matrix Q such that the expected cost of a random walk is minimized. Specifically, given the expected randomness of any random walk H and the probabilities of each node being the starting node p = ½p 1 , p 2 ,⋯,p n−1 T , the problem can be formulated as follows.
where constraint (13) shows how to compute the cost of a random walk with a given starting node, constraint (14) guarantees the uncertainty of the collected measurements, and constraint (16) states that the sum of the probabilities of selecting successors must be one. In order to save the energy consumption and balance network loads, we relate the energy efficiency issue to the weight of edges and p i 's. The idea is to assign smaller edge weight and larger starting probability to nodes that contain more residual energy. Specifically, let w ij and p i be functions of r i where r i ≔ E r ðiÞ/E 0 is defined as the proportion of the residual energy of node i normalized by the initial energy E 0 . Suppose the starting probability of node i in a random walk is proportional to r i , i.e., p i = αr i , where α is a constant. In order to calculate the constant α, let us recall that the sink needs to collect m measurements so that the data vector can be precisely recovered. This means that m random walks are required for the data recovery. Since p i is also the expected number of random walks that starts at node i, we have that ∑ n−1 i=1 p i = m. Therefore, the constant α is given by

Wireless Communications and Mobile Computing
The similar idea is also applied to the computation of edge weights. For a node i, we assign larger weight to the edge which is connected to the successor with smaller proportion of the residual energy. The weight of transmitting data from node i to node j is defined as

Minimum Expected Cost Data Collection
In this section, we propose a network layer-based Minimum Expected Cost Data Collection (MECDC) scheme by using the absorbing Markov chain model. The MECDC scheme is consisted of two phases. In the Phase I, the transition probability matrix Q is calculated, and the sensing matrix A is constructed based on Q. In the Phase II, measurements are collected by applying random walks with the transition probability matrix Q. After receiving enough number of measurements, the sink is able to recover the original data vector by using the NSRP decoder with the sensing matrix A [21,23].

Solution of the Optimization Problem.
Let us first discuss how to derive the transition probability matrix Q. By leveraging the idea in the reference [23], we apply the Lagrange multiplier method to iteratively update transition probabilities. The Lagrange for the optimization problem is given by where λ i , μ i , and η are the Lagrangian multipliers. By setting ∂L/∂q kl = 0, we have that where h j = −∑ k∈S j q jk log q jk . After some simple manipulations, we obtain that where β k = ∑ n−1 i=1 p i f ik and Applying equation (21) for l ∈ S k to the fact that ∑ l∈S k q kl = 1, we have that Substituting equation (23) into equation (21), we obtain that In order to update q kl for a given Q, we need to compute parameters λ k and ξ kl . Setting ∂L/∂c k = 0, we have that Thus, the Lagrange multiplier λ k can be computed layers by layers.
Next, we compute By substituting equation (26) into equation (22), we obtain that Given a guess of Q, transition probabilities can be updated based on equation (24). Note that it is impossible to obtain an analytical expression of the Lagrange multiplier η [23]. It controls the degree of randomness of a random walk. Larger value of η implies larger degree of randomness. Algorithm 1 shows how to compute the transition probability matrix Q iteratively. In line 5 of the Algorithm 1, ε represents the threshold of the stopping criterion.
The sensing matrix A can be constructed based on Q. Given Q, the fundamental matrix is given by F = ðI − QÞ −1 . Then, each entry in A is identically and independently drawn from the following distribution 6 Wireless Communications and Mobile Computing where π j = ∑ n−1 i=1 p i f ij . After obtaining the transition probability matrix Q, Phase II collects measurements through random walks. Except for the sink, any node i starts a random walk with probability p i . Then, packets are transmitted towards the sink in a layer-by-layer manner as stated in Section 3.2. Given A, m = Oðk 2 log nÞ measurements are sufficient for the sink to recover the original data vector by using the NSRP-based estimator [21,23] .

Distributed Realization of MECDC.
In this subsection, we show that the update of transition probabilities can be realized locally and distributively. In other words, q kl can be computed only using information of neighbors. In order to see this, let us consider the node k. Suppose each node knows the probability of being the starting node. Then, all of the parameters which are required to update q kl can be computed based on the neighboring nodes as follows.
(i) λ k . From the equation (25), λ k can be calculated by using the information of predecessors. Specifically, λ k = −p k for any k ∈ L T . Then, λ k for any k ∈ L i can be computed based on nodes in D k ⊆ L i+1 . In such a manner, the values of λ k can be computed layers by layers (ii) β k . Similar to λ k , the value of β k can be computed based on predecessors of node k. Recall that β k = ∑ n−1 i=1 p i f ik , which can be rewritten as Therefore, starting from the layer L T , the values of β k can be obtained layers by layers.
(i) h k . Recall that h k = −∑ l∈S k q kl log q kl , which can be computed based on successors of node k (ii) ξ kl . Denote g l = ∑ n−1 j=1 f lj h j . In order to obtain ξ kl , we first compute g l for each l ∈ ½n − 1. Based on the information of successors, we obtain that g l = ∑ j∈S l q lj g j . Thus, starting from the layer L 1 , the parameter g l can be computed layers by layers as follows: 〠 j∈S l q lj g j , otherwise: Based on the values of g l and the equation (27), we have that ξ kl = β k g l .
In summary, the computation of transition probabilities can be realized distributively by saving the information of parameters λ k , β k , h k , and g k in every node.

Simulation Results
In this section, we numerically evaluate the performance of the proposed scheme with the baseline scheme. Suppose n nodes are uniformly and randomly deployed in a unit square area. The sink is located at the top right corner. There exists 1 Input the graph G = ðV, EÞ, the weight matrix W, the starting probabilities of nodes p 1 , p 2 , ⋯, p n−1 and the randomness of random walks η. 2. Compute layers L 1 , L 2 , ⋯, L T , the successors set S i and the predecessors set D i for every node i ∈ ½n − 1.
3. Output an estimator of Q. 4. Initialize the step index t = 0 and Q = Q 0 such that.
8. Compute λ k based on equation (25) in a layer-by-layer manner. 9. Compute β j = ∑ n−1 i=1 p i f ij for any j ∈ ½n − 1. 10. Compute h j = −∑ k∈S j q jk log q jk for any j ∉ L 1 . 11. Compute ξ kl based on equation (27) for any k ∈ ½n − 1, l ∈ S k . 12. Update q t kl based on equation (24) for any k ∈ ½n − 1, l ∈ S k . 13. Update the step index t = t + 1. 14. end while Algorithm 1: Iteratively solve for transition probabilities. 7 Wireless Communications and Mobile Computing an edge between two nodes if the distance between these two nodes is not greater than the communication range 0:2. We assume that m = k 2 log n measurements are required to recover the data vector. The sparsity of the data vector is set to k = 5. Initially, each node is equipped with an identical battery that contains 100 joules of the energy. For simplicity, we assume that a packet transmission consumes 0.1 joules of the energy. In the baseline scheme, except for the sink, each node is selected as the starting node of a random walk with probability p = m/ðn − 1Þ. In a random walk, each node transmits data to its successors with an identical probability, i.e., q ij = 1/|S i | for any j ∈ S i . In the proposed scheme, we first compute the starting probability of every node and the transition probability matrix Q at the beginning of collecting every sample. Then, measurements are collected through random walks with the obtained parameters.
Let us first look at the convergence speed of Algorithm 1. Figure 3 shows the number of iterations until the proposed algorithm converges when the network size increases. In Figure 3, we set the threshold of the stopping criterion ε = 0:1. We observe that Algorithm 1 converges very fast when the network size is not large. The convergence rate increases as the network size increases. One possible reason is that the candidate edge ðk, lÞ with jðq t kl − q t−1 kl Þ/q t−1 kl j achieving the maximum value increase as the network size increases. Furthermore, we observe that the slope of the convergence rate   Wireless Communications and Mobile Computing curve decreases as the network size increases. This implies that there may exist an upper bound for the convergence rate of Algorithm 1. Next, we compare the energy efficiency between the proposed scheme and the baseline scheme. Figure 4 shows the distributions of the normalized residual energy of nodes in the network. In Figure 4, we set the network size to n = 500, and the residual energy is obtained after 50 samples are collected. Note that the residual energy is normalized based on the initial energy. We observe that the residual energy of nodes in both of the two schemes mainly concentrates on the interval 90%-100%. This is because CDG-based schemes can balance network loads. However, the number of nodes with large residual energy in the proposed scheme is more than that in the baseline scheme. This demonstrates that the proposed scheme is able to further reduce the energy consumption and balance the network loads. Figure 5 compares the normalized expectation of the total energy consumption between the proposed scheme and the baseline scheme. In Figure 5, the residual energy is computed after 50 samples are collected. The expectation of the total energy consumption is normalized based on the initial total energy of nodes in the whole network. We first observe that, for a fixed number of nodes, the normalized expected total energy consumption in the proposed scheme is smaller than that in the baseline scheme. This demonstrates that the proposed scheme is able to reduce the total energy consumption by considering the residual energy of nodes and optimizing the transition probability matrix. Another observation is that, as the number of nodes increases, the normalized  expected total energy consumption decreases in both of the two schemes. This implies that collecting a fixed number of samples consumes less normalized total energy for largescale networks. In other words, the proposed scheme is more suitable for large-scale networks. Finally, we observe that the gap of the normalized expected total energy consumption between the proposed scheme and the baseline scheme increases as the number of nodes increases. This also implies that the proposed scheme performs better in large-scale networks.
Next, let us consider the minimum residual energy of nodes in the network. Large minimum residual energy implies balanced load of the network. Figure 6 compares the minimum residual energy of nodes between the proposed scheme and the baseline scheme. The residual energy is computed after 50 samples are collected. Similar to Figure 5, the residual energy is normalized based on the initial energy. A direct observation is that the minimum residual energy in the proposed scheme is larger than that in the baseline scheme. This demonstrates that the proposed scheme is able to balance the network load. Another observation is that the gap between the proposed scheme and the baseline scheme increases as the number of nodes increases, which suggests that the proposed scheme is more suitable for large-scale networks. Figure 7 compares the minimum residual energy of nodes when the number of collected samples increases. In Figure 7, we set the number of nodes n = 500. In order to collect a sample/data vector, the sink needs to collect m measurements so that the data vector can be precisely recovered. The residual energy is normalized based on the initial energy. We observe that, for a fixed number of samples, the minimum residual energy of the proposed scheme is larger than that of the baseline scheme. This is because the proposed scheme is able to balance loads of the network. Furthermore, the gap of the minimum residual energy between the proposed scheme and the baseline scheme increases as the number of collected samples increases. This demonstrates that the proposed scheme is more suitable for long-running networks.

Conclusions
In this paper, we studied the data collection problem in WSNs. Random walks and the compressive sensing technology with nonuniform sparse random matrices are adopted to collect measurements. Each measurement is collected through a random walk which is modeled as an absorbing Markov chain. By exploiting the residual energy of nodes, we formulate the process of collecting measurements as an optimization problem, which seeks to find optimal transition probabilities of nodes so that the expected cost is minimized. An iterative method, which is referred as the Minimum Expected Cost Data Collection (MECDC) scheme, is proposed to solve this optimization problem and collect measurements. A distributed realization of MECDC, where only local information is needed in the collection of measurements, is proposed. Simulation results show that the proposed scheme not only reduces the energy consumption but also balances the network loads.

WSNs:
Wireless sensor networks CS: Compressive sensing MECDC: Minimum Expected Cost Data Collection IoT: Internet of Things CDG: Compressive data gathering NSRP: Nonuniform sparse random projection ODECS: On-Demand Explosion-Based Compressive Sensing.

Data Availability
Data settings can be found in the draft.

10
Wireless Communications and Mobile Computing