A Reinforcement Learning-Based Dynamic Clustering Algorithm for Compressive Data Gathering in Wireless Sensor Networks

Compressive data gathering (CDG) is an e ﬀ ective technique to handle large amounts of data transmissions in resource-constrained wireless sensor networks (WSNs). However, CDG with static clustering cannot adapt to time-varying environments in WSNs. In this paper, a reinforcement learning-based dynamic clustering algorithm (RLDCA) for CDG in WSNs is proposed. It is a dynamic and adaptive clustering method aiming to further reduce data transmissions and energy consumption in WSNs. Sensor nodes act as reinforcement learning (RL) agents which can observe the environment and dynamically select a cluster to join in. These RL agents are instructed by a well-designed reward scheme to join a cluster with strong data correlation and proper distance. It is also a distributed and lightweight learning method. All agents are independent and operate in parallel. Additional overheads introduced by RL are lightweight. Computations of a linear reward function and a few comparison operations are needed. It is implementable in WSNs. Simulations performed in MATLAB validate the e ﬀ ectiveness of the proposed method and simulation results show that the proposed algorithm achieves the desired e ﬀ ect as well as ﬁ ne convergence. It decreases data transmissions by 16.6% and 54.4% and energy consumption by 6% and 29%, respectively, compared to the two contrastive schemes.


Introduction
Wireless sensor networks (WSNs) are the perceptual layer of Internet of Things (IoTs). With increasing applications of IoTs, the scale of WSNs is expanded than ever before. Data transmissions in WSNs are growing exponentially. It is a severe challenge for resource-constrained WSNs. Fortunately, data compression technology can alleviate this dilemma. Data compression cuts down data transmissions and reduces energy consumption of sensor nodes and prolongs lifetime of WSNs.
Traditional data compression techniques, such as source coding and in-network coding [1][2][3], are usually computationally complex and need prior and global information of networks, which is hard to implement in WSNs. Compressive data gathering (CDG) combines compressive sensing (CS) [4] with data gathering and compresses sensing data while sampling. Nodes in WSNs execute compression encoding, which needs only a few simple multiplications and additions. Complicated decoding algorithms are carried out at the sink. Prior and global information about networks is not needed. So it is an excellent data compression solution for WSNs.
Luo et al. firstly developed the complete process of CDG in WSNs in [5]. Then, in [6], they reduced redundant data and improved the basic scheme. Reference [7] denoted CDG scheme in [5] as plain-CDG and proved that plain-CDG had limited effect on improvement of networks' thoughput. They proposed a scheme called hybrid-CDG which is similar to the framework in [6]. In order to further decrease data transmissions, reference [8] proposed a sparse-CDG scheme in which partial nodes in WSNs need to take part in data gathering. Plain-CDG, hybrid-CDG, and sparse CDG are three basic frameworks in subsequent references.
In the past decade, a large amount of literature committed to enhance the performance of CDG through various technologies of routing, clustering topology control, combination of metaheuristic algorithms or learning algorithms, etc. All of these works obtained performance gains which will be described in details in the next section. However, there are still further researches worth doing.
In the published works, most clustering and routing strategies in CDG are static, which cannot be adaptive with fluctuations of sensing data and environment variations in WSNs. Secondly, they are usually based on minimum distance which do not consider characteristics of the CS theory. Sensing data from nodes with minimum distance may not be most correlative. Thirdly, in the majority of previous methods, the required number of CS measurements used for data reconstruction, generally denoted as M, is constant. In reality, M is relevant to sparsity of sensing data which is variable in time. Due to constant M, two opposite results are likely to introduce, i.e., data redundancy or data insufficiency, which will decrease transmission efficiency of networks or accuracy of data recovery.
To overcome the above problems, we propose a reinforcement learning-based dynamic clustering algorithm (RLDCA) for CDG in WSNs. We introduce reinforcement learning (RL) into the process of CDG. The learner of RL, i.e., the agent, observes the environment and takes actions and then receives rewards. The agent attempts to maximize its received rewards during the learning process. Because it learns from interaction, the agent can make adaptions according to variations in the surrounding environment. Meanwhile, different from other learning schemes, there are several lightweight operations in RL which are implementable in resource-constrained WSNs. The main contributions of this paper are summarized as follows: (i) The proposed scheme first adopts CDG to cut down data transmissions and then combines the RL algorithm to enhance the performances of CDG. It takes sensor nodes as agents. Each agent dynamically selects a cluster to join in by a well-designed RL scheme with trade-off between data correlation and distance in order to minimize data transmissions and reduce energy consumption to prolong lifetime of the WSN (ii) The lightweight Upper Confidence Bound (UCB) algorithm is used as the guideline for agents to take actions. The corresponding state vector, action set, and reward function are designed based on intracluster data correlation and distances between nodes and CHs (iii) Dynamical CS measurement number M is computed based on intracluster data sparsity in each round of data gathering assuring both data transmission efficiency and accuracy of data recovery The rest of this paper is organized as follows. Section II briefly introduces some related works about CDG and RL. Section III describes the system model and problem formu-lation. Section IV presents the proposed RLDCA for CDG in WSNs in details. Simulation results for performance evaluation are given in Section V. Finally, Section VI concludes this article. Table 1 lists notations and major abbreviations used in this paper.

Related Works
In this section, a review of the research status about CDG technologies is presented. There are many studies on improvement of routing strategies for CDG. Reference [9] improved the Minimum-Spanning-Tree (MST) and took the sink as the root of the MST, which can reduce transmission overhead from projection nodes to the sink. Reference [10] proposed a Weighted Compressive Data Aggregation method which introduced the parameter of tree's cost. Both hops and distances of routing paths are considered into the tree's cost. References [11][12][13][14] adopted random walk (RW) routing schemes in CDG to reduce computations of precise routing paths and extra information exchanges and enhance network adaptability. The improved methodologies of routing strategies for CDG did not take data correlation into consideration. Our proposed method is designed based on data correlation among nodes. It can further cut down data transmissions of CDG.
A large amount of literature devoted to enhance the performance of CDG by various clustering technologies. Clustering technology is benefit for traffic load balance and energy efficiency of the network, especially in large-scale WSNs. References [15][16][17] designed proper clustering methods by improving criteria of CH selection. They considered a lot of elements, such as residual energy, distance to the sink, intra-to-inter distance ratio, node density, and compression ratio, into the criteria of CH selection. References [18,19] paid attention to the load balance problem in clustering-based CDG. They grouped nodes into clusters based on node distribution density or equal energy consumption. Reference [20] devoted to address the hotspot area problem to achieve load balance. References [21][22][23] exploited data characteristics to devise clustering algorithms. Reference [21] exploited spatial correlation among sensors' data to decrease data transmissions. Reference [22] considered event-driven WSNs and estimated the location of event source. Nodes around event source form a cluster. Reference [23] proposed a compressibility-based clustering algorithm. Reference [24] noticed that if sizes of some clusters are too small, it is inefficient to implement CDG. So they proposed a cluster size load balance technique for optimal utilization of CDG by keeping the minimum number of nodes in clusters. On the whole, most of these clustering methods are static and based on minimum distance that cannot be adaptive with fluctuations in sensing data or observed environments. Our proposed scheme is a dynamic clustering method based on RL algorithm with better adaptivity.
Besides routing design and clustering topology control, a lot of other technologies were introduced into CDG for further improvements of performances. Metaheuristic algorithms are good at some optimization problems. Reference [25] introduced a Grey Wolf optimization algorithm to search the best path from each CH to the sink with minimum energy consumption in clustering CDG. Reference [26] used the Bees algorithm to determine optimal results of CS recovery in clustering CDG. Reference [27] adopted multiple-objective genetic algorithm to calculate the optimal number of CS measurements and measurement matrix in CDG. Various learning algorithms can make CDG schemes more intelligent and adaptive. References [28,29] used the dictionary learning method to obtain a sparse basis which has better sparse representation ability in CDG. References [30,31] incorporated deep learning into CDG and designed a deep compressed sensing network to build a measurement matrix and reconstruct data from CS measurements. Reference [32,33] adopted fuzzy logic in data aggregation. Reference [32] proposed a two-tier distributed fuzzy logic based protocol (TTDFP) for efficient data aggregation. The twotier protocol includes a fuzzy clustering algorithm and a fuzzy routing procedure. Reference [33] used a fuzzy rule system-based RL algorithm to select data aggregator nodes. Mobile collectors and mobile sink can save energy consumption of transmission at node side. Taking Unmanned aerial vehicles (UAVs) as a mobile data collector has better flexibility to traverse the whole coverage area of WSNs. Reference [34,35] introduced mobile sink or UAVs into data gathering and searched the optimal traversing paths with the minimum length and energy consumption. References [36,37] noticed that data sparsity is variable in most scenarios due to their time-varying nature, so they changed the compression ratio of CDG in real time with data sparsity to transmit data more efficiently. They used a pretrained deep learning model for mapping sensor data to an optimal compression ratio. These novel technologies combined with CDG have improved the network performance with high energy effi-ciency and accuracy of recovery data. However, most of them are still static methods and some of them are computationally complex. Our proposed method is distributed and lightweight which is implementable in resourceconstrained WSNs.

System Model and Problem Formulation
3.1. Wireless Sensor Network Model. In this paper, we consider a two-dimensional WSN as shown in Figure 1. N sensor nodes, denoted by a node set N = fs 1 , s 2 , s 3 ,⋯,s N g, are randomly deployed in the plane area A for environmental monitoring, e.g., temperature, humidity, PM 2.5, and so on. A clustering topology control technology is adopted in the WSN. N sensor nodes are grouped into p clusters. Each cluster is managed by a cluster head (CH) node denoted as ch j , and these p CHs are denoted as a CH set C = fch 1 , ch 2 , ⋯,ch p g. All the nodes are static and periodically sense ambient data and then transmit to a CH through a single hop. CHs compress intracluster data and then send compressed data to the sink through a backbone tree among CHs. The sink is located at the center of the area A, where the compressed data is recovered for further use.
Sensor nodes own equal initial energy denoted as E 0 . The CH nodes in C are homogeneous and equipped with more energy, more computational and storage resources than normal sensor nodes in N. The initial energy of CHs are denoted as E c . The maximum communication radius of normal sensor nodes and CH are denoted as r and r ch , respectively.
Sensing data in the network is time-varying. Hence, the adopted clustering scheme is based on RL algorithm to enhance adaption of the network. It is a dynamic clustering technology in which nodes will choose a cluster once again at the beginning of each round. All the nodes acted as RL agents independently perform the RL algorithm under the uniform scheduling. In each round of work, node s i takes

Mobile Information Systems
action and selects the jth cluster to join in and then sends its sensing data to the CH ch j . Then, s i receives a reward of the current action. The reward is preserved in the memory of s i which is accumulated as learning experiences.

Data Transmission Model of CDG.
Suppose N sensor nodes are divided into p clusters and there are L i cluster members (CMs) in the ith cluster, i.e., ∑ p i=1 L i = N. In the i th cluster, the sensing data from CMs to the CH is denoted as According to the CS theory, if x i is sparse or compressible and its sparsity is K i , then x i with length L i can be compressed to y i with M i measurements, i.e., where Φ i is a Gauss random matrix generated by the CH, and Ψ i is a L i × L i discrete cosine transformation basis used to convert a compressible signal into a sparse signal in the transformation domain. A i = Φ i Ψ i is called sensing matrix or measurement matrix in CS. If A i satisfies the restricted isometry property (RIP) rule, where C is a constant and K i denotes sparsity of x i ; then, x i can be recovered from M i measurements in y i with a probability close to 1. On the premise of Φ i is a random matrix, [4]. The data correlation can be characterized by the Pearson correlation coefficient, as follows: where ρ ij represents correlation coefficient of two data sets d i and d j , and d i and d j , respectively, represent the mean value of d i and d j . Data sparsity is relevant to the correlation within the data set. Hence, those sensor nodes with strong correlation in the sensing data are grouped into a cluster. The intracluster data can be compressed more compactly, and then, data transmissions in the whole network can be greatly reduced.

Energy Consumption
Model. The energy of nodes is mainly consumed at the data transmission stage. The objective of the proposed RLDCA is to cut down the energy consumption of nodes through decreasing data transmissions, thereby prolonging the lifetime of the WSN. We adopt the classical energy consumption model in [38]. When sending a data packet of l bits, the energy consumption E Tx in a node mainly includes the energy consumption in processing circuits and transmitted amplifiers. When receiving a data packet of l bits, the energy consumption E Rx mainly consists of the energy consumption in processing circuits, as follows: where E elec = 50nJ/bit is the energy consumed in processing circuits to transmit or receive 1 bit data, d is the distance between a pair of sender and receiver, and d 0 is the threshold of distance in free space. When d < d 0 , the channel model in free space is adopted and the amplifier consumes ε f s d 2 energy to send 1 bit data, where ε f s = 10pJ/bit/m 2 . When d ≥ d 0 , the multipath fading channel model is adopted and the amplifier consumes ε mp d 4 energy to send 1 bit data, where ε mp = 0:0013pJ/bit/m 4 .

Problem Formulation.
In this paper, we devote to design a dynamic clustering algorithm for CDG based on RL and utilize characteristics of CDG to minimize data transmissions, balance traffic load, and prolong lifetime of the WSN.

Problem Formulation.
According to the CS theory, if the correlation among original data is stronger, less CS measurements are required for accurate data recovery. Thus, in the proposed RLDCA, each node chooses the cluster with strong data correlation to join in. The objective function is to minimize data transmissions in the WSN thereby reduce the energy consumption and prolong lifetime of the network, as follows: where L i is the intracluster data transmissions in the ith cluster, M i is the data transmissions of the CH ch i in the ith cluster, and r max denotes the number of running rounds during lifetime of the WSN.

Algorithm Description
The proposed RLDCA will be introduced in detail in this section. Each normal sensor node is an agent and performs a RL procedure to take an action (select a cluster) and then 4 Mobile Information Systems gets a reward stored in its memory. It is an on-line learning scheme which does not need training data. The computation of RLDCA is also lightweight at normal sensor nodes. Only a set of linear reward functions and comparing operations are performed. As for storage consumption, only a few reward tables need to be stored. All agents are independent and operate in parallel. They are scheduled by the sink. Triggered by a START message, each agent begins a round of workflow as shown in Figure 2. The process of learning and data transmission are alternate in a round. In the first and second phase, each agent selects an action (a cluster) and then performs this action (sends its sensing data to the CH of the selected cluster). In the third phase, the CH of the selected cluster returns a reward of data correlation to the agent. Together with a reward of distance to CHs saved locally, each agent computes an integrated reward and saves in its memory. In the last phase, CHs perform CDG and compress intracluster data and then transmit data to the sink though a backbone tree.

Reinforcement Learning. RL imitates human's behavior which learns from interaction with ambient environments. It is a unsupervised learning method. A RL agent (learner)
is not told which action to select but instead takes action autonomously and then receives a reward. It tries to find out which action yields the best reward. The objective of an agent is to maximize the total reward over the long run. Thus, an agent is instructed to work as we expect by a well-designed reward scheme.
The procedure of RL is that an agent observes the environment and gets state vector S, then chooses an action from the action set A, and at last receives rewards r based on a reward function R. In order to maximize the total reward, the value function is used to measure the average reward of each action in A. However, in most case, the value function is unknown and needs to be evaluated by continuously trying. An agent can exploit the evaluated value function to select the action with the highest value. But the evaluated value function may deviate from the true value. An agent also needs to explore those actions without the highest value. RL algorithms need to trade off exploitation and exploration during the process of learning. It is called the explorationexploitation dilemma.
In this paper, the task of agents is to choose a cluster to join in. It is a multichoice problem which can be well solved by the Upper Confidence Bound (UCB) algorithm [39]. The UCB algorithm can properly balance the explorationexploitation dilemma in RL. The strategy of action selection is based on the following: In Equation (6), A t denotes the action selected at the time step t, and Q t ðaÞ is the evaluated value function of action a which is defined in Equation (7), c > 0 is a constant used to control the degree of exploration, ln t denotes the natural logarithm of t, and N t ðaÞ denotes the number of times that action a has been selected prior to time t. In Equation (7), r i denotes the corresponding reward at time i.

Parameters Design of RLDCA.
The RLDCA is implemented in a distributed way. Each sensor node rather than the sink in the WSN acts as a RL agent. This distributed way decreases the scale of state and action vector and reduces communication overheads from the sink to sensor nodes.
As described in Section III, there are N sensor nodes in the WSN. These N nodes are N parallel and independent RL agents in the proposed RLDCA. Although they are independent, N agents interact with the same environment and perform the same learning procedure. Thus, they own the same state vector S, action set A, and reward function R described as follows.

The State Vector
S. An agent observes the environment and get the state vector S based on which a rewards is calculated. The state vector of the RLDCA is S = ½ρ i , d i , where ρ i represents the data correlation between an agent and ith cluster which is calculated by Equation (3), and d i represents the distance from an agent to the CH of ith cluster.

The Action Set A.
According to the network model, there are total p clusters in the network. At the beginning of each round, each node chooses a cluster to join in. So the action set A is designed as A = fch 1 , ch 2 ,⋯,ch p g, where ch i is the cluster head of the ith cluster. An agent takes an action of ch i in A means that this node joins in ith cluster and transmits its sensing data to the cluster head ch i .

The Reward Function R.
The reward function R considers two elements in the state vector S and is defined as where 0 < α < 1 is a constant used to modify the ratio of the two elements in S. If the value of α is higher, each node will place more emphasis on data correlation when selecting a  5 Mobile Information Systems cluster to join in. r c and r d are rewards based on ρ i and d i in S, respectively. In order to simplify nodes' computational complexity, a linear and relative reward function model is adopted in this paper. First, p actions in the action set A are sorted, respectively, in descending order of ρ i and in ascending order of d i . Then, the normalized reward value r c and r d is allocated to p actions in A according to the previous order. The reward values in r c and range from 1 to -1, and the step between two adjacent values is Δ calculated by.
4.3. The RLDCA Procedure. The complete procedure of the proposed RLDCA for CDG is presented in this part. The pseudocodes of the RLDCA are listed in Algorithm 1. In initial phase, all nodes create the initial action set A, reward Table R, and action selection value (ASV) table Α t . After receiving the START message, nodes start performing the RL strategy (lines 4-12). Lines 4-9 choose an action, and lines 10-11 get the reward and update the ASV table. Since there is no experience in the initial round, the reward and ASV table of all actions are zero. So each agent randomly chooses an action from A. In the remaining rounds, the agent chooses the best action based on accumulated experiences preserved in the reward and ASV table. Lines 13-19 perform CDG by CHs. CHs calculate the sparsity of the intracluster sensing data and generate a Gauss random matrix to compress the intracluster sensing data and then send the compressed data to the sink. At last, the sink verifies the integrity of the received data and then broadcasts the START message of the next round. It is an on-line learning algorithm. Considering WSN is a resource-limited network, the time and space complexity of the algorithm should be analyzed. We focus on the complexity of each round at normal node side. The extra required memory space introduced by learning processes at each normal node (agent) side is O ð5 * pÞ used to store values of N t ðaÞ, Q t ðaÞ, and A t in Equations (6) and (7) and r c , r d in Equation (9). Computation resources are mainly consumed at comparison operations of action selection and reward computation. The action selection is based on Equation (6), and it needs p comparison operations. In the computation of reward r c , p numbers need to be sorted and the computational overhead is O ðp log ðpÞÞ. Besides, the computation complexity of Equation (7) is O ðpÞ. On the whole, the proposed RLDCA is a light-weight algorithm for normal sensor nodes.

Performance Evaluations
In this section, we present simulation results to evaluate the performance of the proposed RLDCA. Simulations are performed on the MATLAB platform. During the experimental evaluation, 100 sensor nodes are evenly distributed in a 70 1: initialize the algorithm's parameters, iteration round t = 0, maximum iteration number r max , action set A = fch 1 , ch 2 ,⋯,ch p g, reward table of each action R = f0, 0,⋯,0g, and ASV table Α t = f0, 0,⋯,0g 2: while t < r max do 3: wait for START message 4: for all nodes 5: if A t == 0 6: randomly select an action from A 7: else 8: select the action a with the max value in Α t 9: end if 10: send the node's sensing data to the selected CH 11: calculate the corresponding reward using (11) and update the value of Α t using (8) 12: end for 13: for all CH i 14: CH i receives L i intra-cluster data packets, 15: calculates the sparsity K i of intracluster data, 16: generates a M i × L i random Gaussian measurement matrix Φ i , 17: compresses the intra-cluster data using (1)  Mobile Information Systems m × 80 m rectangular area. These 100 nodes are divided into 6 clusters. Each cluster is managed by a fixed CH. So there are total 6 fixed CH nodes in the network. At the beginning of each round, sensor nodes dynamically select a cluster to join in based on the proposed learning algorithm. All the nodes are homogeneous and have the same initial energy. The 6 fixed CHs are super nodes with 10 times initial energy of normal sensor nodes. The sea surface temperature data set published by the Earth System Research Laboratory [40] is taken as the sensing data set. The parameters used in the simulation are listed in Table 2. We compare the proposed RLDCA with the Hybrid Clustering-based Data Collection Scheme (HCDCS) in [41] which is a static clustering CDG method without RL and the Minimum Distance-based Clustering (MDC) method without CS and RL. The parameters of data transmissions and energy consumption in the WSN are analyzed. Figure 3 shows the WSN's total data transmissions and energy consumption of the RLDCA, HCDCS, and MDC. We can see that the proposed RLDCA has less total data transmissions (Figure 3(a)) and energy consumption (Figure 3(b)) than the two contrastive algorithms. The HCDCS is also a data correlation-based clustering method but a static clustering method without RL. Simulation results confirm the advantage of the proposed algorithm. At the end of 500th round, the proposed RLDCA reduces total data transmissions by 16.6% and 54.4% and energy consumption by 6% and 29%, respectively, compared to the HCDCS and MDC. The drop of data transmissions is more obvious than that of energy consumption. The reason is that energy consumption is related to distance. Agents sometimes choose a far but more correlated cluster which will increases energy consumption to some extent. Both the proposed RLDCA and the HCDCS are CDG methods which obviously outperform the MDC method without CDG. The advantage of CDG is validated.
In order to analyze statistical significance of the obtained performance gain, we have repeated this comparison exper-iment 10 times with different data segments. The final performance gain is calculated by averaging values of 10 experiments (With the help of statistical analysis software Statistical Program for Social Sciences (SPSS), paired sample T test was used to analyze the statistical significance of 10 groups of experimental data. The results show that there are significant differences (P < 0:05) between the proposed algorithm and the two contrastive algorithms).
However, the advantage of RL does not appear from the very beginning. We notice that total data transmissions and energy consumption of the proposed RLDCA are larger than HCDCS in the early stage. We magnify the result of the first 100 rounds in Figure 3 as shown in Figure 4. It is shown that total data transmissions and energy consumption of the proposed RLDCA are higher than the HCDCS in the first 100 rounds. It is in accordance with characteristics of RL. At the beginning, RL agents own little experience and cannot choose a proper action. It leads to higher data transmissions and energy consumption. That is, it is needed a process with duration about 100 rounds to accumulate enough experience. The proposed RLDCA stands out after this process.
Convergence of the proposed algorithm is verified. Figure 5 shows the average reward of the sensor node s 8 with different values of the parameter c in Equation (6). We can see that the proposed RLDCA is convergent. The average reward received by the agent rises steadily with learning rounds. The parameter c in Equation (6) controls the degree of exploration in RL. Figure 5 shows how different values of c influence the average reward. In the proposed method, when c is equals to 1, the agent gets the maximum average reward. Hence, we choose c = 1 in the simulation. Figure 6 shows the relationship between the energy consumption and the parameter α in the reward function of Equation (9) in the 500th round. During the process of clustering, α is used to balance the distance to CHs and data correlation. If the value of α is higher, an agent places more emphasis on data correlation when it chooses a cluster to join in. We can see from Figure 6 that the energy consumption of the network declines with the increase of the value of α. It is testified that considering data correlation during the clustering phase can effectively reduce the energy consumption of the network. However, when the value of α exceeds 0.8, the energy consumption goes up. That means besides data correlation, the distances between nodes and CHs cannot be ignored in the proposed clustering algorithm. We find that the optimal value of α is 0.8 in the proposed RLDCA which results in the minimum energy consumption.

Conclusion
In this paper a reinforcement learning-based dynamic clustering algorithm (RLDCA) for CDG in WSNs has been proposed. It is a distributed method in which all the normal sensor nodes are independent RL agents. The well-designed RL algorithm instructs the agents to select a cluster with strong data correlation and proper distance. It is a dynamic clustering method because each agent selects a cluster to join in at the beginning of each round of data gathering. This dynamic scheme has well adaptivity to time-varing data in the network. Learning parameter c = 1

Mobile Information Systems
The fixed CHs receive intracluster data and perform CDG to cut down the data transmissions and energy consumption of the WSN. It is also a lightweight and distributed algorithm which is implementable in resource-constrained WSNs. The additional computational overhead at normal node side introduced by the RL algorithm is only a few comparison operations and linear computations.
Simulation results validate the effectiveness of the proposed RLDCA by comparing it with the HCDCS and MDC algorithms. The proposed RLDCA reduces total data transmissions by 16.6% and 54.4% and energy consumption by 6% and 29%, respectively, compared to the two contrastive algorithms. The convergence of the algorithm is also verified in the simulation. At last, due to characteristics of    Mobile Information Systems clustering and distributed learning, the proposed RLDCA can also be easily extended to large-scale WSNs.

Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this paper.