Data Aggregation in Heterogeneous Wireless Sensor Networks by Using Local Tree Reconstruction Algorithm

Aiming at the transmission of heterogeneous data in heterogeneous networks, a topology optimization algorithm of heterogeneous wireless sensor networks based on local tree reconstruction is proposed, which can achieve better data transmission in the heterogeneous networks. First, the algorithm divides the nodes of the network into diﬀerent layers by their hops and chooses diﬀerent numbers of nodes as relay nodes in diﬀerent layers. Second, the nodes are set with diﬀerent initial energies in diﬀerent layers. Because the packets of diﬀerent nodes have diﬀerent sizes, we adopt the corresponding data aggregation coeﬃcients according to the actual data requirements of the network in data transmission. Finally, the lifetime of the network is prolonged by real-time updating of the topology of the tree during the data transmission. The simulations indicate that after the aforementioned three steps, the proposed algorithm prolongs the lifetime of the heterogeneous networks and improves the nodes utilization eﬀectively.


Introduction
In wireless sensor networks (WSNs), the transmission data collected by the sensor nodes will be classified as different types.
e data are homogeneous if they have the same properties; otherwise, they are heterogeneous data with different properties. In practical applications, the requirements for the data are always heterogeneous such as image, video, and multidimensional big data. e requirements for the heterogeneous data-transmitting function are often for special purposes, because the heterogeneous data are multidimensional data [1]. In the process of data transmission, aggregating the data from the subnodes is called the data aggregation technology. e nodes in networks are heterogeneous for the network extensive applicability: the nodes have different energies, functions, communication abilities, and so on. Some nodes in the network are endowed with stronger ability to enhance the network communication ability, which are called supernodes.
Researchers often adopt the method of abstracting the WSNs into graphs and then build a mathematical model and solve the model in network topology research. Most of the sources of their ideas are referred to the research on the graph theory. e building of the network topology must satisfy the requirements of the design of network topology optimization and destruction resistance of the network. e graph is an undirected graph because the communication between the nodes is mutual. erefore, the study of WSNs can be modelled as mathematical problems, such as connectivity, sparse, symmetry, node weights, and duality theory.
e core idea of topology control in the WSNs is to ensure the safety and reliable transmission of the collected data by the sensor nodes. e main method of maintaining the constructed network topology is to ensure the integrity of the maximum connected graph. In recent years, the requirements of the application fields of WSNs are continuously increasing. e topology structure is formed by selforganization network. Not only the complete communication but also the scalability and universality of the network are required in many cases. In other words, a topology is required to satisfy the requirements of various types of users [2]. is topology greatly expands network operation. e network demands both theoretical operability and practical expansion operation, which put forward higher requirements for the constructed topology [3,4]. e topology optimization is often designed for a specific network among the study of the current mainstream network topology in WSNs. For example, the goal of the MST algorithm is to maximize the network lifetime, and the lifetime index systems are the main research target for many researchers. e MST algorithm considers multiple target optimization. However, it only optimizes the energy consumption and robustness of the network. e applications of heterogeneous WSNs are becoming more and more widely. e topology optimization technology needs to keep pace with the practical applications.
e heterogeneous data-transmitting network constructed in this paper is a tree topology network, which shows good performance in data transmission [5][6][7][8][9][10][11][12]. e tree topology can transmit the collected data better than other network topologies and also has a strong destruction resistance. e advantages of the tree topology are the high efficiency in data transmission and data aggregation by nonleaf nodes in the tree. In references [7][8][9], the requirements of energy and network delay are considered comprehensively when constructing the minimum tree topology. Node energy consumption and throughput of the WSNs are the main criteria to measure the advantages and disadvantages of the established network model [10][11][12].
ere are many network topologies constructed for good transmission of the collected data, such as cluster-based topology [13][14][15][16][17], tree-based topology [18][19][20], and other topologies [21]. In references [22][23][24], the authors give comprehensive introduction to the performance of the abovementioned topologies in the WSNs, which indicate that the quality of topology has a great influence on data transmission, especially in some practical networks [23,25,26]. Some types of computer software commonly used to simulate the algorithm performance are proposed. e simulation results for different network sizes are often different, and the gap is big in some situations. e real time of the transmitted data is strictly required [27], and data security and energy efficiency are also considered in some network topologies [2,18,19,28,29]. Haseeb et al. [30] provide an energy-efficient and secure routing protocol for intrusion avoidance in IoT-based WSN, which has a better performance in a physical security network. Liu et al. [31] use a disjoint routing for adaptive big data transmission and the data security is verified in simulations. erefore, the construction of the network topology should be based on specific network requirements. e constructed model needs to satisfy a variety of node types as many as possible and also the network transmission performance.
In view of the different heterogeneous WSNs, we put forward to solve a more practical algorithm, which conforms to the requirements of modern scientific research issue and focuses on future research. e relationship between the graph and its property is studied, and the corresponding conclusion is obtained. We establish the reasonable transmitting models and aim at practical applications through the network topology model after the above steps.
For the heterogeneous data transmission, data aggregation by using the local tree reconstruction algorithm (DA-LTRA) in the heterogeneous WSNs is proposed in this paper. e network divides the nodes into different layers. en, each layer selects a certain number of nodes to be relay nodes and each node has different initial heterogeneous energies in different layers. e network updates the tree topology during data transmission in real time to prolong the nodes' survival time. e simulation results show that the lifetime of the proposed network is prolonged effectively after the proposed adjustment of heterogeneous networks. It indicates that the effective utilization rate of the nodes is improved effectively. erefore, the heterogeneous network will achieve better performance in data transmission.
Heterogeneous data transmission has a very high requirement for network topology in the context of big data, especially for the network in a specific application environment. e research on real-time data transmission and network energy efficiency is a hot issue at present. Motivated by the aforementioned observations, we explore the data aggregation in the heterogeneous WSNs by using local tree reconstruction algorithm in this paper. e main contributions are summarized as follows: (1) e properties of the whole graph can be obtained by extending local optimization to global optimization for the network, which reflects the performance of the maximum connected subgraph of the studied graph. Our research provides a theoretical basis and a new idea for future research on data transmission in distributed multilayer networks. (2) e optimal heterogeneous data transmission network topology is constructed in the context of big data. e routing topology of the whole network is realized through the maintenance and reconstruction of local network topology, which is more conducive to the big data transmission technology in some application backgrounds. (3) Flexible data aggregation technology is proposed to remove the redundant data and to ensure the authenticity and accuracy of the data in this paper. e constructed routing model guarantees the low latency of data transmission and the low energy consumption of the nodes in the network at the same time and prolongs the lifetime of the network. e operation of the real network is abstracted as a mathematical problem by establishing and solving the mathematic model to get the results. Our model for data transmission has a strong innovation and provides a new way for mathematical and engineering problems of cross research. e rest of the paper is arranged as follows. e second part introduces the network model and communication model. e third part is the theme algorithm of our scheme, which models and analyzes the local tree reconstruction 2 Complexity technology. e fourth part is the simulation part. e fifth part summarizes this paper.

Network Model.
In order to better describe the proposed algorithm, we define the proposed model as follows. e topology model in WSNs can generally be represented by the set of points, edges, and communication radii. is set can be named as a graph G � (V, E, r), where V represents the set of sensor nodes, E represents the set of edges formed by two nodes that can communicate in the network, and r represents the radius of the sensor nodes. e network deploys N nodes in a circular region randomly, and the nodes in the network are divided into two states: working time and sleeping time. e location of Sink is on the edge of the circular area, and all nodes transmit data to Sink through the relay node in the network. We assume the following: (1) All nodes are not moved after deploying in the designated area. (2) All nodes are heterogeneous. We set different initial energies for the nodes which are expressed as E initial . (3) Each node has the same signal intensity value, and the power of the node signal is adjustable. e communication radius of the nodes is an equal value r.
(4) e nodes can detect the information of the neighbor nodes. (5) ere are no attack nodes in the network. (6) Sink is located at the edge of the network.

Energy Model.
e energy model is similar as the energy model given in reference [32]. e energy consumption of transmitting data and receiving data according to the wireless communication model is formulated as follows, respectively, where E tx (l, d) represents the energy consumption of transmitting l-bit data; E rx (l) represents the energy consumption of receiving l-bit data; E elec is the energy consumption of the wireless transceiver circuit in transmitting or receiving 1-bit data; and ε fs represents the magnification of the signal amplifier power in the free-space model. e node energy consumption while aggregating m-bit data is

Data Aggregation Model.
In the PEDAP algorithm [33], the node i aggregates its own data packets and its children data packets into a single data packet and then transmits the data packet to the next relay node. Sink has a certain amount of original data in many situations. However, there are very few data remaining when they arrive to Sink based on the above aggregation method. It cannot guarantee the accuracy of the information. We hope that the data information received by Sink is as much as possible as that comes from the source nodes. Assume that the parent of the source node i is the node j and the parent of the node j is the node k. e node k is without the communication radius of the node i, which means the nodes i and k have low correlation. We hope that a few amount of source data of the node i is aggregated by the node j, so the node k can retain more original data. For example, if we follow the above first data aggregation method, there just remains 1 * 0.5 * 0.5 � 0.25 unit data packets of the node k which comes from the node i. e original information will be very few if the node k continues to transmit the data to Sink through multihops. It can be seen that the above data aggregation algorithm has a serious shortcoming and will lead to a loss of the data information.
erefore, the node aggregation policy needs to be redefined. Define the node aggregation coefficient as where hop max is the maximum number of the node hops in the network and hop(i) is the hop numbers from the node i to Sink. erefore, there is a relationship between the data packets before data aggregation and after data aggregation for the node i:

Data Aggregation in Heterogeneous WSNs by
Using the Local Tree Reconstruction Algorithm e tree network topology structure in the WSNs is more suitable for the fusion of the global information network. For instance, in a nuclear power plant, we can establish a data aggregation tree to learn the useful information of each workshop without the nuclear radiation. Figure 1 is the schematic diagram of the data-transmitting topology structure model in this research.
e tree structure has a good scalability. e nodes will cause fast energy consumption because they need to receive and transmit a lot of data when they are near Sink. erefore, the first step of the tree reconstructing method is to reselect the relay node at the time of tree updating. As a result, the main purpose in the network topology is that the lifetimes of bottleneck nodes are prolonged.
As shown in Figure 1, the nodes a-o are the parent nodes. e parent nodes need to collect and receive the data from their children nodes and then transmit the data back to their parent nodes after aggregating all data. If the nodes have the same initial energy in the network, the bottleneck nodes will cause lots of energy consumption and will also lead to the nonuniformity of the node energy consumption.
Complexity erefore, the end result is that the lifetime of the network is shortened and the network topology fails.
With a view to the information similarity property of the data collected by the nodes whose geographic positions are close, we let some parent nodes only transmit the data based on the proportion considered above. e amount of data will be reduced; meanwhile, the premature occurrence of the bottleneck nodes will be restrained after this treatment.

Selecting Method of Relay Nodes.
In this paper, the first step is to improve the energy balance of the whole network. e method we adopted is to choose the nodes as relay nodes whose positions are near Sink.
As shown in Figure 2, assume that the node data aggregation rate is 1 and the packet size generated by each node is 1 in the whole network. In Figure 2(a), the network leading to the bottleneck node appears too early because the nodes a, c, d, and g have massive data packet sizes. We find that the transmit data packet size will be reduced to nearly half proportion of nodes a, c, d, and g after we select other nodes as relay nodes in a certain proportion as shown in Figure 2(b). It means that the network load is balanced.
However, the network needs to collect the accurate data while we prolong the lifetime of the network. e key here is that we do not allow more nodes to be relay nodes. is ensures that more raw data information is collected and transmitted.
In order to ensure that the more nodes a layer has, the more nodes are chosen as relay nodes, and the fewer nodes a layer has, the fewer nodes are chosen as relay nodes, the proportion of relay nodes is selected as according to the number of nodes in different layers, where ρ i−hop is the proportion of relay nodes in the layer i, n i−hop is the number of nodes in the layer i, and N is the number of total nodes in the network. erefore, the number of relay nodes in the layer i is In this way, the nodes can save energy effectively.

Method of Initial Energy Setting for Heterogeneous Nodes.
In heterogeneous networks, the task of the nodes is to monitor the network in real time. e method of data transmission is introduced in Section 2.3, so that each node will transmit different size packets to the relay node. Obviously, the nodes that have lower hop will transmit and receive more data and will consume a lot of energy and die early either.
As shown in Figure 3(a), assume that the initial energy of the whole nodes is 40. e remaining energy of the node d is 4 after 3 rounds. It is not enough to sustain the next round of data transmission as shown in Figure 3(b).
In view of the above deficiencies, the method we adopt is to set different initial energies for the heterogeneous nodes in different layers. We set higher energy for the nodes whose layers are lower. It means that the nodes near Sink have higher initial energy. Assume that the node energy with the maximum hop is E initial and the influence of aggregation proportion is taken into account in this research. e energy of the nodes in different hop layers is defined as  Complexity where E i−hop is the node energy in the layer i. As shown in Figure 4, the node d and the other nodes near Sink have higher initial energy after the initial node energy resetting. Assume that 4 is the highest hop of the nodes in the network and the initial energy of the node d is 40 * 1.5 � 60. erefore, we improve the usability of the node d effectively because the node d can run 5 rounds and the network run time is increased by 2 rounds. So the lifetime span of the nodes in the heterogeneous network is prolonged obviously after the second step.

Load-Balancing Method of the Nodes.
e network collects, aggregates, and transmits data according to the topology (as shown in Figure 5(a)). e energy consumption of the node d is 12 in each round, and the node d contains 12 units of energy after running 4 rounds. e node d will die after the next round if the tree still runs in the same way. erefore, the node lifetime in the network is influenced greatly. We find the node e which is near the node d has higher remaining energy in the same layer, and then, the node d transfers its children to the node e. Subsequently, the    node d is transformed into the leaf node and is no longer responsible for aggregating and transmitting data. At the same time, the node e is transformed into the relay node, and it is responsible for aggregating and transmitting data. Finally, the network is rebuilt according to the above method.
As shown in Figure 5(b), the remaining energy of the node f is 0 after running 4 rounds if we continue to run the network. But it still prolongs the network lifetime to 4 rounds (as shown in Figure 5(c)) when we use the above load balance method similarly. erefore, the lifetime of the network is prolonged if we follow the same node topology adjustment method.
We extend the overall network lifetime after using the above method repeatedly, and the energy consumption of the nodes in the same layer is balanced after the above operation.

Flow Chart of the Algorithm.
To better describe the proposed algorithm in this research, a flow chart of the algorithm is shown in Figure 6.
First, the greedy algorithm is used to build a minimum routing tree network topology from Sink to the leaf node. e relay nodes are selected according to the first step. en, the different initial node energies are set up according to disparate tasks in the different layers in the second step. Finally, the network performs the dynamic local routing tree evolution. e number of surviving nodes is used to determine the state of the network in practical applications. e network dies and outputs the results when the nodes die by a certain proportion. Otherwise, the surviving nodes in the network are reconstructed. We repeat the network optimization operations until the given node death proportion is reached.
Our research on max-flow is a maximum connected subgraph optimization problem actually. e desired results are obtained by studying the maximum connected subgraph.

Theoretical Analysis on the Effectiveness of Energy Consumption of DA-LTRA
In this section, we analyse the effectiveness of the energy consumption in the proposed DA-LTRA. rough the analysis on the proposed model, we come to the conclusion that the proposed energy consumption model is mainly embodied in the following two aspects: (i) for each node in local network, node energy and load of packet have been greatly reduced owing to the participation of relay nodes and heterogeneous energy; (ii) for the whole nodes in the global network, in the process of relay transmission, the node selects the relay node in the upper layer for data transmission when the node's energy is more than others. To sum up, the average transmission distance, packet size, and energy consumption of the network are much smaller than those of network which are not dynamically adjusted and not divided into multilayers. We compare the proposed DA-LTRA with DADAT [32] in the context of the same initial energy.
In DADAT, all the nodes can be chosen as relay nodes. e parent of the node i is the node j. Assume there are n nodes in the network, and the relay node i receives l i -bit data and produces m i -bit data. e energy consumption of the node i has three aspects: data reception, data aggregation, e energy for the node i is erefore, the energy consumption of all the nodes in each round is Begin Use greedy algorithm to build the minimum routing tree.
The initial node energy is set up in the heterogeneous networks The relay nodes are randomly selected by acertain proportion.

Complexity
To ensure the smooth and efficient operation of the network, we simplify equation (11) to find minimum energy consumption. e result is shown in the following equation: Assume that the average data produced by each node are m and the average distance between the node and its relay node is d C−S . We can obtain the minimum value of the energy consumption according to equation (12), which is shown as follows: In a similar way, for our design of the DA-LTRA, assume that there are k leaf nodes and n − k relay nodes in the network. In the layer i, the number of nodes is n i−hop . erefore, the relay nodes' number in layer i is For the leaf node j, the relay node is o and the other assumptions are the same as the algorithm DADAT. e energy consumption is For the relay node p, the relay node is q and the energy consumption is erefore, the energy consumption of all the nodes in each round for the DA-LTRA is Similarly, we can obtain the maximum of the energy consumption according to equation (16), which is shown as follows: By analysing the result of equations (13) and (17), we can see that the maximal energy consumption of DA-LTRA is smaller than that of DADAT obviously. is means that we can get a simple mathematical analysis and the optimization of our algorithm reduces the redundant data, namely, reduces the energy consumption. Furthermore, our model enhances the initial energy of the nodes based on the network layer and thus prolongs the lifetime of the network.

Experimental Environment and Simulation Parameters.
In order to verify the effectiveness of the algorithm proposed in this research, we perform the simulation experiment to compare the proposed algorithm with the similar DADAT and GIT (Greedy Incremental Tree) algorithms. e simulation system environment is Win7, 64 bit, and the software is MATLAB 2012a, the CPU is i7-4720hq, and the memory is 8.00 GB. e proposed and compared algorithms are simulated under the same network environment, and the initial setting of the network is similar. e simulation parameters in the network are shown in Table 1.

Simulation of the Optimization on the Network Update
Rounds. Under the optimal parameters, the simulation results can better reflect the network performance. e proposed algorithm is compared with DADAT in algorithm simulation experiment. e DADAT tree is adopted to reestablish the tree every 90 rounds under the same simulation environment, that is, the number of update rounds is 90.
It can be seen from Figure 7 that the lifetime of the network is reduced double when the number of network update rounds is more than 110 or less than 70. It indicates that the energy consumption of the network in data calculation and data transmission has a great impact on the lifetime of the network. e network has highest lifetime while the update rounds are 90. erefore, we select 90 as the update rounds in this simulation.
If the network is updated too frequently (less than the optimal value 90), the data transmission and computation of the nodes will cause a lot of energy consumption in the network, which will lead to the shortening of the overall network lifetime. On the other side, some nodes will also consume energy rapidly if the network is updated too slowly because the nodes undertake too much data forwarding mission in the network. In the simulation experiment, the optimal tree topology update times are selected and fixed in advance in the research. erefore, the topology evolution in the optimal environment can maximize the utilization rate of the network.

Simulation of the Network Lifetime.
e number of surviving nodes can reflect the variation of the maximum connected subgraph in the network. As shown in Figures 8  and 9, the proposed DA-LTRA improves the lifetime of the network effectively. When the first node in the network dies, the network lifetime of the DA-LTRA is 9160 rounds, DADAT is 5945 rounds, and GIT is 4121 rounds e performance improvement rates of network lifetime can reach up to 122% and 45% compared with GIT and DADATA, respectively. is is because the proposed algorithm is better in the process of the initial node energy setting and the relay node selecting than the DADAT and GIT algorithms in the same heterogeneous condition. Additionally, we use the local tree reconstruction technology to improve the network load balance in the tree maintenance phase. Figure 10 gives the number of packets of the network per round under 200 nodes and 300 nodes, respectively. We can see from the simulation diagram that there are more packets of our proposed algorithm DA-LTRA per round. e packets number in one round is 1550, 1315, and 1250 under 200 nodes, and 1735, 1437, and 1367 under 300 nodes for DA-LTRA, DADATA, and GIT, respectively. erefore, the proposed algorithm still maintains high data volume to maintain data accuracy even after data aggregation. In GIT algorithm, the data aggregation ratio for each relay node is 1, which leads to loss of some packets easily. In DADAT algorithm, the aggregation ratio is considered according to node distance and a lot of nodes are selected as relay nodes. e proposed algorithm in this paper improves the selection scheme of relay nodes and aggregation ratio to avoid the above deficiencies. By selecting the high-energy nodes to be relay nodes for data transmission, the accuracy of data can be guaranteed and energy consumptions are reduced.  respectively. e topological stability for data transmission can reach a very high level. erefore, our model achieves an optimal network topology for data transmission, and the whole network is extended to fast convergence of global optimization through the local optimization. Figure 13 gives the overheads of energy under different rounds. When the network runs 5000 rounds, the energy consumptions for DA-LTRA, DADATA, and GIT are 0.17 J, 0.22 J, and 0.39 J, respectively. And those are 0.3 J, 0.7 J, and 0.9 J, when the rounds reach 9000. e performance improvement rates of the DA-LTRA algorithm reach 129% under 5000 rounds and 200% under 9000 rounds compared with GIT. erefore, the performance of routing overheads of energy has been improved a lot with the increase in the number of network running rounds. Due to different energy settings of the nodes in different layers, the performance of the proposed algorithm suggests that the heterogeneous network topology for big data transmission has better properties.

Routing Overheads Evaluation.
Real-time data transmission is another important measure of network overhead. In our work, the data   transmission delays are compared for different algorithms. From Figure 14, we can see that the proposed algorithm DA-LTRA has low delays compared with GIT and DADAT. e delays are 20.75 ms, 21.42 ms, and 21.85 ms under 200 nodes and 21.60 ms, 22.25 ms, and 22.45 ms under 300 nodes for DA-LTRA, DADATA, and GIT, respectively. e performance improvement ratios almost reach 5% and 2% compared with GITand DADAT. We can see from Figure 14 that the overall delays of all algorithms grow with the increase in network nodes. is is because the density of nodes leads to the increase of data and subroutes, which results in an increase in data transmission delays from the source node to Sink. However, the nodes do not select nodes in the same layer as relay nodes and only part of nodes in each layer are selected as relay nodes to reduce the subroutes in DA-LTRA. In this way, the delay is reduced effectively.

Packets Drop Ratio.
Packets drop ratio (PDR) is the ratio of the number of dropped data to Sink. e lower value of the PDR results in better performance of the algorithm. Figure 15 shows the packets drop ratio of the proposed algorithm DA-LTRA with the algorithms DADAT and GIT.
e proposed system results in a lower packets drop ratio than the compared algorithm. Especially, when the network runs 9000 rounds, the packets drop ratio of the proposed algorithm DA-LTRA is 3.3%, 4.6%, and 5% for DADAT and GIT, respectively. e proposed algorithm still maintains a low packet drop ratio when there are some dead nodes. From the above experimental results, it obviously proves that the proposed algorithm DA-LTRA can perform better than the algorithms DADAT and GIT. Hence, the objective of the DA-LTRA is achieved after three steps of adjustment.

Conclusions
e tree topology has a strong damage resistance, and the maximum connected subgraph is the basis of studying topology properties. In this research, the tree topology optimization algorithm is proposed based on the heterogeneous data, which have higher requirements on network topology.
In general, the homogeneous network is difficult to meet the service quality requirements of heterogeneous data transmission.
erefore, the strategy we adopted in this research is to use the heterogeneous nodes to complete the data transmission of these different types. Such a datatransmitting mode can meet the real-time and accuracy requirements of data transmission. In the process of topology optimization, this research adopts three steps to adjust the structure of the network. e network has a strong topology after the three steps of the network topology optimization. e simulation results of the proposed algorithm have higher performance and reduce the energy consumption of the nodes in the network effectively under the same simulation conditions and parameters. e data transfer in heterogeneous networks contains different types of data, which include two-dimensional data, audio, image, and video. ese data extremely demand for good network topology, so it is difficult to realize the transmission requirements of these data in homogeneous networks. Usually, the size of heterogeneous data packets is very large. e research work in this paper only involves the optimization technology of data transmission in WSNs. e tree-based sensor network topology proposed in this paper only considers the data transmission between link layers. But there are many problems need to be studied further, for example, the topology power control of sensor nodes between different layers and the node power control strategies at different layers (such as physical layer, link layer, and network layer). Furthermore, for the reason of technical difficulty, in this paper, we have not studied the optimization problem under the data-sending rate and it is necessary to study this problem in the data transmission-based network. In our future work, we will study the related network topology optimization research problems under the datasending rate scenarios.

Data Availability
e data (including the simulation parameters and simulation program) used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.