A Data Dissemination Method Based on Region Type Correlations for Mobile Opportunistic Networks

In Mobile Opportunistic Networks (MONs), due to the node movements and the uncontrollable on/off switches of the carried communication devices, the contacts between nodes may be scarce and momentary, and thus a data packet should be transferred through some discrete hops. To avoid the costly flooding of data packets, the data packets are typically disseminated to some relay nodes selected by data holders. However, the mobility patterns of nodes will become different in different types of regions (such as residential regions, commercial regions, scenery regions, or industrial regions); i.e., themovement directions andmovement ranges of nodes are frequently varied when the nodes move among various regions. At present, the issues regarding the region types and region type correlations have not been investigated for the data dissemination in existing works. To this end, we propose a Region Type based Data Dissemination Method (RTDDM) for MONs, which exploits the region type correlations and selects the proper relay nodes through a Markov decision model. To verify the performance of RTDDM, we give some theoretical analysis as well as an elaborated simulation study, the results of which show that RTDDM can improve the delivery ratio and reduce the delivery delay, especially in the applications with various region types.


Introduction
Mobile Opportunistic Network (MON) [1,2] is one of the emerging communication paradigms in wireless mobile communications. An MON is defined as a mobile network where the communications between nodes are challenged by sporadic and intermittent contacts, as well as the frequent disconnections and reconnections, and thus a stable communication path from the source to the destination usually cannot be obtained. The MON applications are diverse, such as the vehicle networks for traffic information sharing, the mobile sensor networks for wildlife tracking, and the pocket switched networks comprised of human-carried mobile devices.
In most traditional routing methods, the end-to-end paths must be established, and then the data packets are disseminated along these paths. Therefore, the global connectivity of network topology should be satisfied, such that at least a stable end-to-end communication path can be found from the network topology. However, the MON nodes always move frequently, and the communication devices carried by nodes are switched between on/off statuses unpredictably. These characteristics of MONs make the end-to-end paths extremely difficult to be established, and hence the traditional routing methods cannot be applied to MONs. In MONs, the node contacts must be exploited for the data dissemination; i.e., the data packets are disseminated from the ℎ (the nodes carrying data packets) to the encountered (the nodes not carrying data packets) and finally are transferred to the destination node through discrete hops.
Therefore, the held data packets are allowed to be disseminated to some relay nodes by the data holders. Due to the unpredictability in future movements of nodes, the data dissemination method (especially how to select the proper relay nodes) becomes a vital issue for the improvement of delivery ratio and the reduction of delivery delay.
The node contacts are concerned with the region types. In many MON applications, the nodes are likely to appear in some regions with special types more frequently or stay in these regions for a longer period. For example, the tourists visit some scenery regions more frequently, and the residents are more likely to stay in some residential regions, as shown in Figure 1. These features can help improve the data dissemination in MONs; i.e., the future contacts between nodes can be predicted according to the region types, and the proper relay nodes can be selected to make the data packets be delivered to the destination nodes as soon as possible. At present, the issues regarding the region types and the region type correlations have not been paid enough attention for the data dissemination of MONs. Note that the mobility patterns of nodes will be different in the different types of regions, and the movement directions and movement ranges of nodes are frequently varied when the nodes move among various regions. Thus, the data dissemination can be improved through differentiating the region types and investigating the region type correlations.
Our research is also inspired by the following observations: (a) the movements of communication devices in MONs are controlled by their carriers (human beings), and the mobility of human beings is driven by their habits which are typically stable in long terms. Therefore, the movement directions and movement ranges in different types of regions (such as residential regions, commercial regions, scenery regions, or industrial regions) are quite different; (b) the randomness in node mobility makes the data dissemination problem be mapped into a sequential decision problem mathematically. Besides, the Markov decision method is a preferable decision method and is suitable for the sequence decision problem [3,4].
This work is an extension of our previous conference paper [5]. In this paper, we improve the Region Type based Data Dissemination Method (RTDDM), especially in terms of the action determinations and message interactions, and some theoretical analysis is conducted for RTDDM. Besides, more simulation results are also provided to further clarify the merits of RTDDM. The remainder of this paper is organized as follows. Section 2 discusses related studies. Section 3 describes the network model. Section 4 presents our algorithm. Section 5 provides some theoretical analysis regarding RTDDM. The simulation study is reported in Section 6. This paper is concluded by Section 7.

Related Work
Recently, some relevant research has been conducted on the data dissemination in MONs. Mcgeehan et al. propose a message routing approach ChitChat [6], which exploits users' direct and transient social interests via discriminatory gossiping to penetrate messages deeper into the network. ChitChat enables message carriers to make opportunistic and distributed routing decisions based on the likelihood of forwarding the messages to the destination. In [7], a geographicbased spray-and-relay (GSaR) routing scheme is proposed. By estimating the movement range of the destination via the historical geographic information, GSaR expedites the message being sprayed toward this range and postpones that out of this range. Reference [8] gives two sociality based routings, named as the contact based routing and community aware two-hop routing, respectively. These routings are designed based on the graph properties (such as the average path length and modularity) and the node encounter process. Cai et al. construct a cross-layer distributed opportunistic routing protocol [9], which considers the channel sensing strategy, the forwarder selection for each secondary user, and the package division scheme on each link.
To overcome the communication problems of changing topology and missing infrastructure, [10] presents a lightweight and scalable routing protocol that exploits the limited social circle of each node. In [11], the Community-Based Forwarding (CBF) considering the community interactions is developed. CBF is able to explore the intermediate connections between clusters to route the messages with more balanced node participation and higher levels of reliability. To improve the dissemination efficiency, [12] presents a routing algorithm called Popular Node Gateway Protocol (PNGP). PNGP is validated through simulations, the results of which indicate that it performs satisfactorily in terms of delivery ratio, delivery delay, and transmission cost. In [13], an analytical model of message dissemination by applying the concept of mean free path is proposed, and each node contact is represented as a collision between two particles. In order to relieve the congestion without affecting the delivery delay, Ciobanu et al. present an interest-based dissemination algorithm where nodes tend to group together based on interests [14]. Reference [15] proposes a stochastic approach for modeling the message dissemination in opportunistic networks to choose the appropriate relay nodes according to the features of the application scenario.
In this paper, the issues of region type correlations and region weights are specially exploited and formulated, and then a Markov decision model is applied to determine the relay nodes of data holders. Thus, the data packets could be continuously propagated to the proper relay nodes to improve the delivery ratio and reduce the delivery delay.
Wireless Communications and Mobile Computing 3

Network Model
The time is divided into discrete time slots with an equal length s . Each data packet needs to be disseminated from the source node to the destination region during * time slots.
. . Nodes. Suppose there are mobile nodes in the network, denoted by 1, 2, . . . , , . . . , , and each node is with the same communication range . The coordinate of node at theth time slot is denoted by ( ) ( ) . The distance between node and node at the -th time slot is marked as ( , ) ( ) . If ( , ) ( ) ≤ , then and are neighbors.
. . Regions. The network area Ω is divided into regions 1 , 2 , . . . , , . . . , . Each region is denoted by a triple ( , , ), where ( , ) is the center coordinate of , and denotes the region type of . Suppose there are different region types, such as residential regions, business regions, industrial regions, and educational regions. The distance between two regions and is calculated as The human mobility in MONs is analyzed by a realworld dataset (Dartmouth Colleges wireless local area network (WLAN) trajectories [16]) which are recorded as the connections or disconnections of nodes with some access points (APs). Figure 2 illustrates that the nodes usually visit a few APs frequently, and note that the average probability of visiting these APs is even larger than 0.7. The results inspire us to put forward the notion home regions for nodes; i.e., the nodes will appear in the home regions with very large probabilities. The home regions of node are marked as a region set ( ), where the probability of appearing in each region belonging to the set ( ) is larger than a preset threshold ℎ .
Besides, the distance between ( ) and ( ) is denoted by , and is computed as the minimum distance between ( ) and ( ), . . Region Type Correlation and Region Weight. The nodes are prone to appear in the correlated regions. The correlation of different region types is calculated based on the movement trajectories of nodes. With regard to each region , the correlation between and is computed as where ( , , ) is set as 1 if there are at least two regions belonging to ( ) simultaneously; otherwise, ( , , ) is set as 0. The region weight of is denoted by , where ∈ [0, 1]. The value of indicates the possibility of data packets being disseminated from to the destination region . is expressed as where and are preset exponents. The value of and reflects the influences of region type and region distance on the region weight, respectively. At the -th time slot, the weight of receiving and storing the data packets is expressed as

Data Dissemination Method
In this section, we propose a Region Type based Data Dissemination Method (RTDDM) for MONs. In particular, in RTDDM, the Markov decision model is used to determine the relay nodes whose home regions are with larger region weights.
. . Dissemination Process Based on Markov Decision. The dissemination process based on Markov decision is described as follows: (1) Decision stages: T = {1, . . . , }, which indicates that the node encounters neighbors during a time slot.
(2) State set: S = {0, 1, △}] is a finite set of states, where "1" indicates that the current encountered node is the best relay node and "0" indicates that the current encountered node is not the best relay node. △ indicates the termination of dissemination at current time slot; i.e., the data holder has already encountered nodes or the data packets have been disseminated to the encountered nodes.
(3) Action set: the action set A( ) is defined as where denotes the current state. is the action of the current encountered node storing the data packets, and is the action of the current encountered node rejecting the data packets.
(4) Reward function: when the number of nodes is large enough, the nodes are considered to be evenly distributed approximatively. At the -th time slot, the obtained reward is expressed as where / is the probability of the best relay node falling into the set of the first encountered nodes and ℎ denotes theth encountered node.
According to (8), the transition probability is equal to 1/( + 1) when the ( + 1)-th encountered node is by far the best relay node.
Then, we give the solution process for the above problem. At the -th time slot, the maximum probability of the best relay node having been selected during the period from the current decision stage to the end stage is denoted by ( ) (1); i.e., the -th encountered node is the best one in the previous encountered nodes. Likewise, from the next decision stage to the end stage, the maximum probability of selecting the best relay node is denoted by ( ) (0); i.e., the -th encountered node is not the best one in the previous encountered nodes.
. . RTDDM. In RTDDM, to restrict the dissemination cost, the maximum number of disseminated copies at each time slot is set as ( ≤ ). At each time slot, the relay nodes will be selected through a Markov decision model as described in Section 4.1. In particular, in RTDDM each data holder makes the decision about whether to disseminate the held data packets to the neighbors through several decision stages. In each decision stage, each data holder disseminates the held data packets to at most neighbors, a part of which will become the relay nodes. Thus, the decision stages will be repeated until relay nodes have been selected or all the neighbors have received the data packets. The detailed description of RTDDM is given as follows.
Step . According to the previous trajectories of nodes, each node determines its home regions. The region type correlations and the region weights are calculated. Step . At each time slot (suppose the -th time slot), each node broadcasts an inquire message , as shown in Figure 3, where ( ) denotes the list of held data packets by at the current time slot.

Wireless Communications and Mobile Computing
Step . After the reception of (as shown in Figure 3) from , each neighboring node responds with a . If the destination node has received an from , it will send the data packets to destination node immediately; otherwise, Step 4 is carried out.
Step . Each data holder disseminates the held data packets s to nearest nodes. With regard to the receivers of the data packets, if the current state of the -th neighbor (denoted by ) is equal to 1, and there is ( ) (ℎ)⋅ / ≥ ( ) (0), then the action will be taken; otherwise, if is equal to 0 or there is ( ) (ℎ) ⋅ / < ( ) (0), then the action will be taken. As mentioned above, the action ( ) is determined by the following expression: Step . If neighbors have chosen the action at the last decision stage, then and should be updated as After that, if and satisfy that < and > 0, Steps 4 and 5 will be repeated; if and satisfy that = or = 0, the dissemination process at the current time slot is terminated.
Step . The above steps will be repeated until the data packets have been disseminated to the destination region, or the period of * time slots has been expired. If the data packets have been delivered to destination region before the deadline ( * time slots), an announcement message will be propagated to all data holders to stop the further dissemination. The size of announcement message is very small and can be piggybacked with other messages.
The pseudocode of RTDDM is depicted in Algorithm 1.

Theoretical Analysis
In this section, we give a theoretical analysis for RTDDM.
. . RTDDM Complexity. In Step 2, each node broadcasts an ; therefore the total message amount will reach . In Step 3, there will be 2 ⋅ ( ⋅ 2 )/Ω s to be transmitted. Steps 4 and 5 make each data holder disseminate at least copies of data packets, and thus the number of transmitted messages is O( ). As a result, the total number of transmitted messages of RTDDM is at most O( 2 ).

. . Expected Delivery Ratio and Expected Delivery Delay.
With regard to each data packet, the total number of copies during * time slots is expressed as ( + 1) * . Similar to [17], we apply the continuous-time Markov model to analyze RTDDM, as shown in Figure 4, where = ( + 1) * . The state indicates that there are data holders in network, and the state indicates that the data packets have been delivered to the destination region. ( | − 1) denotes the probability of the transition from state − 1 to state .
When there are data holders, the probability of the data packet being disseminated to the neighbors is expressed as (( − ) ⋅ )/( ⋅ ). The average contact rate between nodes is marked as which can be computed from the historical trajectories. Hence, ( | ) is expressed as ( ⋅ ∑ =1 ( ) ( ))/ , and ( | − 1) is expressed as Let ( ) and ( ) be the probability distribution of delivery delay of the state and the state , respectively. According to Kolmogorov equation [18], we obtain the firstorder derivative of ( ) as  Each receiver responds a . 6: end for 7: Neighbor sets of nodes are updated. 8: for each data holder do 9: while < and > 0 do 10: disseminates the held data packets to neighbors; 11: for each neighbor having received the data packets do 12: ( ) is calculated; 13: The received data packets are stored or rejected according to ( ); 14: end for 15: ( ≤ ) neighbors become the relay nodes.
where ⨂ denotes the convolution operation. Therefore, the expected delivery delay is expressed as and the expected delivery ratio is expressed as The theoretical delivery delay and theoretical delivery ratio are calculated from (17) and (18) to compare against the simulation results of RTDDM in the next section.

Simulations
In this section, we present a thorough performance evaluation of our proposed RTDDM, along with a comparison study with the binary spray [19] (labeled by BINARY), BUBBLE Rap [20] (labeled by BUBBLE), and GSaR [7]. The performance metrics include the delivery ratio and the delivery delay. We develop a simulator using C++ language, and the mechanisms of overhearing and collisions (including the messages interactions for synchronization and backoff) have been realized in our simulator. Besides, the simulation results are averaged over 500 runs. The main parameter settings are shown in Table 1.
Before the simulations, we collect the historical trajectories of node movements during the first 40 time slots, so that the average contact rate between nodes can be calculated, and the home regions of nodes can be determined. We compare  the simulation results of RTDDM with theoretical results which are calculated by (17) and (18). From Figure 5, we observe that the overall trends of the two kinds of results are consistent, and the theoretical results are superior to the simulation results of RTDDM. This is because the theoretical results are always the optimal results which are obtained from centralized computations. Note that the standard deviation curves drop gradually with the increase of the number of nodes, which indicates that the simulation results of RTDDM can approach the optimal results when the nodes are deployed densely. Figure 6 shows the results of RTDDM, BINARY, BUB-BLE, and GSaR. Obviously, the delivery ratios of three algorithms are increasing as the number of nodes increases. This is because a denser deployment of nodes usually provides more opportunities for data holders to disseminate the held data packets. As illustrated in Figure 6(a), RTDDM achieves a larger delivery ratio than the other algorithms. Similarly, in Figure 6(b), RTDDM also outperforms other algorithms in terms of delivery delay, and this phenomenon is attributed to the fact that RTDDM exploits the region type correlations which can reflect the movement preferences of nodes, and  thus the data packets can be delivered to the destination region more quickly.
The delivery ratios of three algorithms perform better when the number of allowable times slots becomes larger, and the reason is that more data packets can be delivered to the destination region during a longer period, as shown in Figure 7.
Besides, we observe the impact of the maximum number of copies , and the results are given in Figure 8. We find that a larger (i.e., more copies of each data packet) gives rise to a larger delivery ratio, along with a shorter delivery delay, although the dissemination cost will be accordingly increased. In some practical applications (especially the costsensitive applications), the dissemination cost must be taken into account, and the value of should be restricted to make a preferable tradeoff among the metrics of delivery ratio, delivery delay, and dissemination cost.

Conclusions
This study explores the data dissemination problem for the delivery ratio enhancement and delivery delay reduction in MONs. The region type correlations and region weights are explored and utilized to select the relay nodes for data

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.