Mobile Data Traffic Offloading through Opportunistic Vehicular Communications

To cope with an exponentially increasing demand on mobile data traffic in cellular network, proximity-based opportunistic vehicular communications can be exploited as a complementary mean to offload and reduce the load of cellular network. In this paper, we propose a two-phase approach for mobile data traffic offloading, which exploits opportunistic contact and future utility with user mobility. The proposed approach includes one phase of initial source selection and subsequent phase of data forwarding. In phase 1, we build a weighted reachability graph, which is a very useful high-level abstraction for studying vehicular communication over time. Then, we propose an initial source selection algorithm, named VRank, and apply it in the weight reachability graph to identify some influential vehicles to serve as initial sources according to the rank of VRank. In phase 2, we formulate the forwarding schedule problem as a global utility maximization problem, which takes heterogeneous user interest and future utility contribution into consideration. Then, we propose an efficient scheme MGUP to solve the problem by providing a solution that decides which object should be broadcast. The effectiveness of our algorithm is verified through extensive simulation using real vehicular trace.


Introduction
With the development of wireless communication technologies and the increase in the number of mobile devices such as smartphones, tablets, and cars, cellular communication is facing a critical challenge of explosively increasing traffic demands [1,2]. According to a recent report from Cisco, mobile data traffic will grow at a compound annual growth rate of 47 percent from 2016 to 2021, reaching 49.0 exabytes per month by 2021 [3]. Although the fourth-generation mobile cellular network is able to provide content downloading with broad coverage and high bandwidth, the immense mobile traffic demand has imposed a heavy burden on current cellular networks. Especially during peak time and in urban central areas, cellular-based communication will face extreme performance hits in terms of low network bandwidth, missed calls, and unreliable coverage [4,5]. Therefore, it is essential to come up with an efficient and effective method to ease the burden of cellular networks.
Recently, an efficient alternative, generally known as mobile data offloading, which delivers mobile data traffic originally planned for transmission over cellular networks to other networks, has attracted much attention in the literature [6][7][8][9][10]. Traffic offloading can be implemented by Wi-Fi [11], femtocells [12], or opportunistic networks [6]. Wi-Fi and femtocells have evolved as mature technologies, but they rely on infrastructures. On the other hand, opportunistic networks allow users without permanent connection to communicate using the low-cost proximity-based connection when they encounter opportunistically. Therefore, opportunistic networks offer a very powerful alternative to relieve part of the mobile traffic from the cellular infrastructure. Vehicular networks [13] are an important class of opportunistic networks because the contacts, or the transmission opportunities, between vehicles happen in a dynamical and unexpected manner. In addition, vehicles are highly mobile and can communicate with each other using the dedicated short-range communication (DSRC) radio, 802.11p, and LTE-V [14]. These features make vehicular networks a suitable candidate for cellular traffic offloading.
The offloading process for disseminating content objects to certain interested subscribers (vehicles) can be described as follows: the cellular base station can firstly deliver the content objects to only a small group of users (we use the terms vehicle, user, and subscriber interchangeably), called seeds or initial sources. After that, the initial sources can propagate the objects to all the subscribers through opportunistic forwarding, and any user that receives the data further forwards the data to others resulting in an information epidemic. As seen, the efficiency of such opportunistic traffic offloading is highly determined by two key factors: (1) initial source selection and (2) opportunistic forwarding strategy. A suitable set of initial sources can quickly distribute content objects over vehicular networks. In addition, different forwarding sequences of a user with multiple objects to transmit can result in different results. Hence, it is essential to address these two key issues in an opportunistic traffic offloading system.
In the literature, there have been some existing studies on traffic offloading through opportunistic communication. For the issue of initial source selection, most works choose the initial sources using various centrality metrics [15][16][17] or according to the features and properties of users' contact in the historical trace [7,18]. Although these works are helpful to understand how to select preferable seeds, the main limitation is that they are suitable for static social network where the topology of the network is relatively stable. In addition, it is hard to accurately infer the uses' future centrality due to the highly dynamic network topology of the vehicular network. For the issue of data forwarding, most existing works mainly focus on forwarding performance in terms of delivery ratio, delay, and network overhead [19][20][21]. The users' interests and forwarding scheduling of multiple content objects are seldomly taken into consideration. However, how to schedule the objects to maximally satisfy users' interest in the limited contact duration should also be addressed.
Based on these insights, in this paper, we propose a twophase approach for mobile data traffic offloading. Especially, phase 1 selects the initial sources based on a weighted reachability graph, which characterizes vehicles' transmission opportunities via instantaneous communication and opportunistic communication. We apply the proposed algorithm VRank to identify influential users as the initial sources, which lead to quick and wide spreading of content objects. After that, it follows with phase 2 for data forwarding to accommodate heterogeneous users' preferences to distribute the appropriate objects in short periods of time. As such, objects can be more effectively scheduled to satisfy users' interests and gained the maximal utility.
The major contributions of this paper are as follows: (1) For the issue of initial source selection, we propose an algorithm, named VRank, which is expected to exploit opportunistic communication with user mobility, and apply it on weighted reachability graph to select some influential users to serve as the initial sources (2) For the data forwarding, we formulate the problem as global utility maximization, which takes heterogenous users' interests and future utility into consideration. An optimal solution MGUP is proposed to solve the problem (3) We conduct extensive simulations with real vehicular trace to evaluate the performance of our proposed approach. The simulation results show that the twophase approach can effectively improve offloading rate and global utility to satisfy users' interests The rest of the paper is organized as follows: Section 2 reviews related work regarding the mobile data traffic offloading. The system scenario and network model are introduced in Section 3. The initial source selection algorithm is given in Section 4. In Section 5, we formulate the forwarding scheduling as the global utility maximization problem and solve it. The effectiveness of our approach is verified using real vehicular trace in Section 6. The last section is for a brief summary.

Related Work
With the growing popularity of mobile accesses to the cellular, a number of innovative solutions have emerged to offload data traffic and reduce the load of cellular networks. According to [11], the methods roughly fall into three categories: traffic offloading through small cells, traffic offloading through Wi-Fi networks, and traffic offloading through opportunistic networks.

Traffic Offloading through Small Cells.
Traffic offloading through small cells is an effective way to reduce traffic load and network energy consumption. Small cells are small cellular base stations typically designed for indoor use, which deliver wireless services to a small coverage area and are most likely to be user-installed. Existing works have shown most of the mobile traffic is generated indoors. The cellular network operators thus have the opportunities to offload heavy data to small cells and provide users with seamless quality of experience. In [22], Wang et al. proposed to use an auction-based algorithm, femto-matching, to achieve both load balancing among base station and fairness among users. The potentials of social and spatial proactive aching in small networks in terms of mobile data traffic were investigated in [23]. The results show that precaching strategic contents at the network edge engenders significant backhaul offloading gains and resource savings.

Traffic
Offloading through Wi-Fi Networks. Recent research [11] has shown that Wi-Fi networks have already carried and offloaded a large amount of mobile data traffic. It has been shown that approximately 65% of cellular traffic can be offloaded by merely switching from cellular networks to Wi-Fi when Wi-Fi connectivity is available [12,24]. Bulut and Szymanski [25] compared different methods of deploying Wi-Fi APs for efficient offloading of mobile data traffic. In order to reduce cellular network congestion and improve the user-perceived network performance, an offloading mobile data offloading scheme by leasing wireless bandwidth and caching space of residential 802:11 (Wi-Fi APs) was proposed in [26]. In addition, Sou and Peng [27] presented an analytical model for multipath Wi-Fi offloading in deriving the aggregate offloading time via an alternative path for the use of multipath offloading.

2
Wireless Communications and Mobile Computing

Traffic Offloading through Opportunistic Networks.
Opportunistic communications have been lately considered as an important way for offloading mobile data traffic [6].
Using opportunistic communications to offload cellular traffic for mobile content dissemination applications is a novel and interesting idea and it has drawn great attention of the researchers. The architecture is actual a hybrid SDVN (Software Defined Vehicular Network). In [28], Zhao et al. investigate the routing-related issues in SDVNs and present a comprehensive overview of the state-of-the-art architectures, protocols, challenges, and potential solutions.
A number of studies have been developed to explore the opportunistic mobile network for mobile data traffic offloading. In [6,29], Han et al. were the first to exploit opportunistic communications to facilitate information dissemination and reduce the amount of mobile data traffic. They studied how to select the target set with k users, such that the mobile data traffic over cellular networks is minimized. Greedy, heuristic, and random algorithms are proposed to solve this problem. The simulation results show the heuristic algorithm can offload mobile data traffic by up to 73.66% for a realworld mobility trace. In order to avoid the information is disseminated in the same community, a community-based algorithm is proposed in [7] to diffuse the information to the entire network as soon as possible. They are taken into both the contact probability and social relationship into account to select the initial seeds belonging to different communities. In [30], Liu et al. proposed a multiple source selection method to find the optimal number of initial sources. In addition, a multilayer-based seed selection approach is designed in [31] to maximally satisfy users' interest. The results show that multilayer-based seed scheme allows to maximize content utility.
With the short contact duration and the large content size taken into consideration, Li et al. [32] developed a contact-duration-aware offloading scheme, named Coff, which adopts the network coding to better utilize the short contacts. In [33], they qualitatively analyzed how the D2D communication can benefit from social features, and quantitatively evaluated the huge potential gains attainable in a practical social aware D2D communication system. Sciancalepore [34] et al. developed a theoretical model to analyze the performance of opportunistic dissemination when data can be selectively injected through a cellular network.

System Model
We consider a mobile traffic offloading scenario as illustrated in Figure 1. The network system is composed of base stations and mobile vehicles equipped with two different network interfaces: a cellular interface and a short range communication interface. The former is used to communicate with the base stations and the latter is devoted to the opportunistic communication between vehicles. The base stations of the cellular network are connected to the content servers in the Internet through wired links. Content server has some objects to distribute to a set of users before deadline. The information to be delivered may include weather forecasts, multimedia newspapers, and movie trailer generated by content service providers. Due to the delaytolerant nature of non-real-time applications, service provider may deliver the objects to a small fraction of selected users, which are referred to initial sources. The initial sources will further disseminate the objects to the corresponding vehicles that request them through opportunistic communication, which occurs when two vehicles move into the communication range of each other. In addition, a vehicle may receive multiple objects, but it can only broadcast a single object at a time. As a result, different forwarding schedules, i.e., the forwarding sequence of the forwarder's objects, could result in different utilities. Each forwarder should carefully consider when to deliver which content objects. Therefore, the problem considered in this paper has two components: (1) initial source selection, how to identify some important vehicles to be the initial sources to assign the objects and (2) opportunistic forwarding strategy, how to schedule the sequence of objects to achieve the maximum utility.

Network
Model. Due to the highly dynamic network topology of the vehicular network, the real vehicular networks are actual temporal graph where the existence of links may only last for a short period [35]. Therefore, we represent the vehicular network as a temporal graph G = ðV , EÞ, where V represents all the vehicles and E represents the links between two vehicles, of which the Euclidean distance is smaller than the wireless communication range R. An undirected edge e ∈ E is a quadruple ðu, v, t, λÞ, where u, v ∈ V , t is the contact time between u and v, λ is the link duration time to go from u to v (also from v to u) starting at time t, and t + λ is the link ending time. We denote the starting time of e by tðeÞ and the duration time of e by λðeÞ. As a pair of nodes may communicate at multiple time instances, the number of temporal edges between u and v can be large.

Time-Ordered Path or Journey.
A time-ordered path P in a temporal graph G is a sequence of vertices P = <v 1 In the literature [36], the terms "journey" and "timerespecting path" have also been used for time-ordered path. We denote by startðP Þ and endðP Þ, the starting time t 1 and the ending time t k + λ k of a time-ordered path P . Time-ordered paths can be thought of as opportunistic communication routes over time from a source to a destination. We also define the temporal distance of P as distðP Þ = end ðP Þ − startðP Þ corresponding to the delay of route. Objects or resources can be transmitted through opportunistic communication from node u to node v in vehicular networks only if they are joined by a time-ordered path.

Phase 1: VRank-Based Initial Source Assignment
In this section, we present how to identify a set of vehicles to be the initial sources to distribute the objects. We propose an algorithm named VRank which can rank the nodes in a directed graph. Reachability graph [37,38] is a very useful high-level abstraction for studying vehicular communication 3 Wireless Communications and Mobile Computing over time in vehicular networks. Therefore, we firstly introduce the concept of reachability graph and how to construct it. Next, we apply the VRank algorithm in reachability graph to select some influential vehicles to serve as initial sources.

Weighted Reachability Graph.
For a given temporal graph G and any time interval ½t 1 , t k , we define C k = ðV k , J k Þ to be the reachability graph of G at time k. The set of nodes V k is subset of V and represents the set of vehicles that were sampled in the given time interval. The set of arcs J k represents every available direct link or time-ordered path P in G within t k . In other words, if ðu, vÞ ∈ J k , it means that u is able to communicate with v through instantaneous communication or along the temporal path from u to v. Notice that is not symmetric: the existence of time-ordered path from u to v does not imply there is a path from v to u. The reachability graph of a temporal graph can be obtained by computing the time-ordered paths between any two pairs of nodes, and then adding a link between two nodes u and v of the reachability graph if the temporal distance from u to v is finite or a direct link exists between u and v. Figure 2 illustrates an example of reachability graph which consists of six nodes and twelve directed links. The weight w uv of each edge is the number of direct links and time-ordered paths within a duration between the two nodes.

VRank Algorithm.
Identifying a set of influential nodes is a challenging problem in complex networks. A number of centrality metrics have been proposed to address this problem, such as degree centrality, closeness centrality, and between centrality [39,40]. PageRank [41] is the most famous web page ranking employed by Google to rank web pages. It measures the relative importance of a page within a web graph. LeaderRank [42] has been shown as an effective and efficient method to identify influential spreaders in directed networks, which is a simple variant of PageRank. Motivated by the success of this algorithm, we propose an algorithm named VRank. Instead of considering degree of node only, we also take encounter times between any two vehicles into account to depict the weight. The main idea is that nodes with a higher VRank value will generally be more important vehicles to disseminate the objects, given that popular nodes are more likely to meet other nodes in the networks.
Given a weighted directed network consisting of N nodes and E edges, a ground node is then added by establishing bidi-rectional edges between it and all the other nodes. Then, the network becomes strongly connected and consists of N + 1 nodes and E + 2N edges. Initially, each node in the network, except for the ground node, is assigned to one unit of resource, while the ground node is assigned with no resource. And then each node distributes its resource to neighbors along the outgoing edges. Next is to update resource distribution as summing up the resource each node derives from its incoming edges. This process of distribution and updating of resources continues until steady state is attained. The whole process can be described mathematically as follows.
Assuming v i ðtÞ is the score of node i at time step t, the initial state can be represented as follows: And each node can update its score according to the following equation: where w ji is the weight of the edge ðj, iÞ and sumwðjÞ is the sum of weight of node j. Consider vehicular network where a user i is called a fan of user j if there is a directed link from i to j, namely, i could receive information from j and thus j will receive scores from i (if node's fans are of high influence, this node will be highly influential as well). Obviously, the number of fans is an important local indicator for a user's influence in spreading. When the score v i ðtÞ of all nodes converges to a unique steady state at time t c , the score of the ground node is then evenly distributed to all other nodes, and the final score distribution on node i is as follows: To illustrate the ranking process, we provide a simple ranking example in Figure   Wireless Communications and Mobile Computing

Phase 2: Opportunistic Forwarding Strategy
After phase 1, initial sources receive the assign objects via cellular communication; they can disseminate those objects to other vehicles through opportunistic communication. How to schedule those objects to maximally satisfy users' interests is addressed in this section. We first introduce the user interest and utility model. Next, we formulate the forwarding scheduling as a global utility maximization problem and design an effective scheme to solve it.

Interest
Model. In a system with multiple content objects, a user will have different interests in different objects. Some objects are popular that are interested by many subscribers, whereas some other objects are not popular data, which may only be interest to a small number of subscribers. Keyword set is a set of keywords selected to depict the object and user interest. Let K = fK 1 , K 2 , ⋯, K M g be a keyword set, where each K i denotes one topic that can be rated. For any object, o ∈ O is described by a subset of keywords, which is denoted by K c , and weight v k c , which indicates the importance of keyword k c ∈ K c . An object is described by a M × 1 vector D = ½d 1 , d 2 , ⋯, d M , where d k c = v k c , and all the other elements are 0. We assumed each v k c is normalized, i.e., ∑ In order to model the interest of different subscribers to different objects, a M × 1 vector P s = ½P 1 s , P 2 s , ⋯, P M s is to represent the interest profile of subscriber s, where each P i s denotes the degree of how subscriber s is interested in keyword K i . In practice, P i s is used to compare the subscriber's interest in different keywords. Hence, without loss of generality, we define ∑ M j=1 P j s = 1. Finally, the interest probability of subscriber s for object o is computed: where ð·Þ T indicates matrix transpose. When the forwarder f meets a group of neighboring vehicles V τ f in time slot τ, each vehicle is required to explicitly provide its interest in different predefined topics K. The forwarder f can know the local utility of each neighbor vehicle and broadcast any object o j ∈ O τ f to all its neighbors. Each vehicle i ∈ V τ f who does not own object o j can then obtain the utility u i,o j . The local utility that the forwarder f can produce to its neighbors for the object o j at time τ is computed as follows:

Utility
However, solely considering the local utility is not enough due to the lack of consideration of the utility contribution by vehicle i ∈ V τ f to their future contacts. Let U i,o j ,τ denote the future utility of vehicle i if it gets the object o j in time slot τ. Thus, the global utility that the forwarder f can produce to its neighbors for the object o j at time τ is computed as follows: 5.3. Problem Formulation. We know that the contacts between forwarder f and its neighbors occur only during a short period, allowing them to exchange a limited volume of data. Therefore, the neighbors of f can only receive part of interested objects. Which objects to forward and how to schedule these objects to maximally satisfy user interests is required to be addressed when the forwarder f has multiple objects to broadcast.
Intuitively, the forwarder f can schedule those objects to maximize the global utility for the current slot. However,

Wireless Communications and Mobile Computing
such a greedy approach can not guarantee the long-term highest level of user satisfaction. Consider such a scenario shown in Figure 3, where the forwarder f owns three objects (o 1 , o 2 , o 3 ) and has contact with u 1 and u 2 at time slot τ. The contact durations with u 1 and u 2 are four and two time slots, respectively. Each vehicle u i ði ∈ f1, 2gÞ has the future utility U u i ,o j ,τ for object o j ðj ∈ f1, 2, 3gÞ as shown in Figure 3. We use U u i ,o j ,τ to approximate U u i ,o j ,τ+1 in this example. According to the greedy strategy, the forwarder f distributes those objects in the following order: [o 3 , o 1 , o 2 ], since the utility that can be obtained is 0:13,0:12,0:11, respectively. The link between f and u 2 is maintained for two time slots, such that u 2 can only receive two objects. The cumulative utility obtained after the broadcast of all the objects of the forwarder f is 0:27ð0:13 + 0:12 + 0, 02Þ. However, if we consider consecutive time slots, the object o 2 is more important to u 2 than object o 1 . The forwarder f can send the objects in the following order ½o 3 , o 2 , o 1 and the utility obtained is 0:31ð0:13 + 0:11 + 0:07Þ. Therefore, the greedy algorithm that results in immediate optimum cannot guarantee the long-term optimum.
From the above example, we observe that the forwarder f should consider more consecutive time slots if it can exactly know the contact duration d f i for all neighbors i ∈ V τ f . We define a set of available time slots T = fτ, τ + 1, ⋯, τ max g, where τ max = max i∈V τ f τ + d f i . The transmission scheduling decision for each time slot can be represented by a vector: where a i t ð Þ = 1 object o i is broadcast in time slot t, There is a constraint ∑ |O τ f | i=1 a i ðtÞ ≤ 1 for the vector AðtÞ, which means that, at most, one object will be scheduled to broadcast for slot t.
With multiple slots to be considered, the problem is formulated as finding the best time slot allocation A = fAðτÞ, Aðτ + 1Þ, ⋯, Aðτ max Þg for a total time period τ max to schedule the transmission, such that the global utility is maximized. Therefore, we formulate the global utility maximization problem (GUMP) as follows: Subject to The objective is hence to find the optimal forwarding schedule for all o j ∈ O t f and t ∈ T , such that forwarder f can contribute the maximal total global utility.

Prediction of the Future Utility.
We assume that the random variable N ij ðtÞ denotes the cumulative number of contacts between nodes i and j at time t and any two contacts between them are independent from each other. Thus, N ij ðtÞ can be modeled as a homogeneous Poisson process with rate λ ij . Assuming the length of a time slot is Δt, the number of contacts N ij ðt + ΔtÞ − N ij ðtÞ between nodes i and j within time Δt follows the Poisson distribution, which can be expressed as follows: Then, the contact probability between nodes i and j within time Δt can be expressed as follows: The value of U ði,o,tÞ can be determined by two factors: (1) the probability that forwarder i can meet other contacts who have not owned object o after time slot t and (2) the utility of i ′ s contact on object o. Now, we first derive the probability that a contact j ∈ V has not received object o in time slot t ′ from any contacts k ∈ V , denoted by Φ j,o ðt ′ Þ. In [18], they considered that user j will not able to download object o from user k before time slot t if and only if the two following events occur: (1) j and k never met before time slot t, i.e., N jk ðt − 1Þ = 0 or (2) the last contact between j and k was in time slot z < t, but k did not have object o in time slot z. Therefore, the probability ϕ j,k,o ðtÞ that j cannot download object o from k before time slot t can be computed by the following:   Wireless Communications and Mobile Computing Then, the probability Φ j,o ðzÞ that user j has not downloaded object o until time slot t equals the probability that j cannot download object o from any user k before t.
Since only the data sources have object o in the initial time slot 1, we have Φ j,o ð1Þ = 0 if j is a data source of object o and Φ j,o ð1Þ = 1 otherwise. Given the initial value of Φ j,o ð1Þ, we can compute the value of Φ j,o ðzÞ for each time slot z = 2, 3, ⋯, t − 1 iteratively for all users.
Thus, the expected utility that i can contribute to contact j between t and T max o for object o is as follows: where T max o is the expiration time of object o. Last, the future utility of user i can be approximated by the following: 5.5. Optimal Solution. The global utility maximization problem can be solved by searching all the states. Obviously, the computational complexity is very high. Here, we design a simple yet effective algorithm to achieve the maximum global utility. The total number of time slots that each vehicle i ∈ V τ f will stay with the forwarder f can be calculated as follows: As described earlier, more consecutive time slots should be considered in the optimal problem. The maximum possible global utility achieved for each object o j ∈ O τ f can be estimated if the object can always be transmitted. The average global utility potential of object o j in the next successive n slots U n is defined as the total amount of global utility obtained during the next successive n slots over n, which is computed as follows: The computation of average global utility potential can be considered to find the object that can bring the highest utility in the following couple of slots. Therefore, the maximum average global utility potential max ðU 1 , U 2 , ⋯, U n Þ should be computed for the scheduling, which is shown in Algorithm 1. With the average global utility potential computed, all the objects will be sorted according the global utility potential. The object with the highest global utility potential will be selected to transmit during the current time slot.

Performance Evaluation
We now evaluate the performance of VRank-based initial source scheme and the forward schedule strategy using the realistic vehicular traces from the city of Beijing, which gathered from 12:00 am to 11:59 pm on Jan. 05, 2009, in the local time [43]. The number of subject taxis is 2927 and we assume all vehicles are subscribers.
The experiments could be divided into two parts: (1) performance comparison with other seed selection methods and (2) global utility gain of the proposed forward strategy.

Performance Comparison with Other Seed Selection
Methods. In this section, the proposed VRank initial source selection algorithm and some other alternative methods were implemented in the experiments. The alternative methods include LeaderRank [42], PageRank [41], and POST [16]. In LeaderRank strategy, the initial sources are selected based on the ranking of the algorithm, where the probability that a random walker at a vehicle goes to other in the next step is mainly determined by the vehicle's out-degree. PageRank is also used to choose the initial sources with damping factor α = 0:9. The POST method is based on inferring the future centrality of vehicle and ranking all vehicles according to their predicted centrality values.
(1) Offloading Ratio versus the Number of the Initial Sources.
To evaluate the performance of our solution and those alternative methods, we use offloading ratio as the metric, which refers to the ratio of the number of successfully received vehicles to the total number of vehicles in the network. We select an observation interval from 7 : 30 am to 9 : 30 am to analyze the performance of different algorithms. The first half of the trace is used as the warm-up period for the vehicles to accumulate the necessary network information and construct the reachability graph. Table 1 shows the top 8 vehicles' IDs ranked by the four approaches. From Table 1, we can see that the initial source sets are different from each other. Figure 4 plots the average offloading ratio as a function of the number of the initial sources. It can be seen that the VRank scheme outperforms other algorithms in all conditions. Especially, the offloading rate of VRank can achieve 45%, while that of LeaderRank, PageRank, and POST is around 33% when the number of the initial sources is 20. Similarly, the VRank also has a higher offloading rate than other strategies when the number of the initial sources arises from 40 to 200. We also notice that offloading rate increases as the number of the initial sources increases. In VRank, the offloading rate increases from 43% to 54% as the number of the initial sources increases from 20 to 120. It is obvious that the more initial sources the BS selects to distribute the objects, the more mobile vehicles can obtain the objects opportunistically. However, we observe that offloading rate 7 Wireless Communications and Mobile Computing reveals a trend of a slow increase when the number of the initial sources exceeds a certain value. The first 120 seeds can achieve offloading rate about 53%, but adding another 80 seeds can only bring an extra 1% offloading rate in VRank. The similar phenomenon is also observed in other schemes. This is due to the fact that as the number of seed vehicles increases, the probability of two selected vehicles having sim-ilar network positions also increases, which means they are redundant and therefore choosing both of them may not be able to gain as large coverage as possible.
(2) Offloading Ratio versus Time Scale. We future examine the impact of time scale on the performance of VRank. We set the number of the initial sources to 100 and run all methods using the mobility trace and get the average.
The performance of average offloading ratio as a function of time scale for the four schemes is shown in Figure 5. We can see the similar results that VRank outperforms the rest of other schemes. As the time scale increases, VRank always achieves the best performance. Particularly, the offloading ratio of VRank is up to 34:28% comparing with LeaderRank and POST are up to 22:20% and 20:95%, respectively. Meanwhile, we notice the performances of LeaderRank and the POST are very close both in Figures 4 and 5. That is because those two schemes both take the degree of vehicle into consideration to select the initial sources.

Global Utility Gain of the Proposed Forward Strategy
(1) User Interest Profile in Different Keywords. The user interest profiles are generated based on a key set K with M = 20 and we assume that keyword K j ∈ K is the j-th popular keyword in the network.

Wireless Communications and Mobile Computing
According to the interest model, the user interest profile in each keyword K j is randomly drawn from a normal distribution with P j as the mean value. We exploit various distributions for generating P j of different keywords.  Figure 6.
(2) Average User Interest Probability in Different Objects. We assume there are 15 objects to be disseminated in our system. Each object o j is described by 5 keywords with equal weights. To ensure that different objects have different popularity, the keyword indices of object o j are fj, ⋯, j + 4g. Thus, we can computerize the interest probability of each user for object o j according to Equation (4). Figure 7 shows the average user interest probability in different objects. It is easy to be seen that when P j is generated exponentially or Zipf with exponent s = 2 , most of the user interests concentrate on popular objects. For Zipf with exponent s = 1, such concentration on popular objects decreases. Therefore, Figure 7 actually represents different object interest patterns of mobile users in our system.
(3) Global Utility Gain versus the Time Scale. We evaluate the global utility gain of proposed algorithm MGUP with greedy, local utility maximum, and random schemes. We let all the schemes select a fixed number (20 in our experiment) of initial sources with VRank. The user interest probabilities are generated by following the Zipf distribution with exponent s = 2. In Figure 8, the global utility gain as a function of time scale is shown. We can see that global utility of all the schemes increases with the time scale increases. Meanwhile, we notice that MGUP can produce 13:65 and 23:48 percent higher global utility than greedy and local utility maximum, respectively, until 45 minutes later. The improvements are mainly because each forwarder in MGUP considers the future utility contribution of its members and more consecutive time slots in the scheduling process. Especially, the future utility of neighbor i in MGUP not only considers different user interests in different objects but also predicts how many users can gain utility by obtaining object o j from neighbor i. Therefore, the forwarder f can select the object that benefits all users in the system. The figure also shows both MGUP and greedy have higher global utility than local utility maximum and random. This further indicates the future utility has a great impact on the performance.   4) Global Utility Gain versus the Different Numbers of Objects. We future evaluate the impact of the number of objects on the performance of comparison schemes. We vary the number from 5 to 60. Figure 9 plots the global utility gain under the different numbers of objects. It is obvious that global utility increases as the number of objects increases in all schemes. This is because more objects can meet users' different interests in different objects and produce a higher global utility.

Conclusion
As the increasing popularity of mobile devices and user demands, the amount of mobile data traffic in cellular net-works grows explosively. Users will face extreme performance hits in terms of low or even no network bandwidth, missed calls, and unreliable coverage. It is an urgent agenda for cellular provider to offer quick and promising solutions. Therefore, proximity-based opportunistic communication as a mean to offload and reduce the load of cellular network attracts the attention of service providers. In this paper, we study how to exploit opportunistic vehicular communications to offload mobile data traffic. We propose a two-phase approach, which includes one phase of initial source selection and subsequent phase of data forwarding. For the issue of initial source selection, we propose an algorithm named VRank to identify some influential vehicles to server as initial sources, which lead to quick and wide spreading of content objects. For the issue of data forwarding, we formulate the problem as the global utility maximization, which takes into heterogenous users' interests and future utility consideration. An optimal solution MGUP is proposed to solve the problem. The effectiveness of our approach is verified through extensive simulation using real vehicular trace.

Data Availability
The data used to support the findings of this study are included in the article.