Characterizing Pairwise Social Relationships Quantitatively: Interest-Oriented Mobility Modeling for Human Contacts in Delay Tolerant Networks

,


Introduction
Wireless portable devices (e.g., laptop, PDA, and cell phone) are often held by humans in a delay tolerant networking (DTN) scenario.Understanding human mobility and accordingly designing better routing protocols have drawn the attention of researchers working on DTNs.Using real trace to evaluate routing protocol performance does not allow enough flexibility to change the mobility settings in order to perform the analysis for a slightly different scenario [1].Human mobility models, however, can depict real-life human mobility characteristics and can be used to obtain meaningful routing protocol performance results in simulations [2].So far, a number of real-life experiments have been conducted to observe and summarize human mobility characteristics, including individual (e.g., spatio/temporal preferences), encounter-based (e.g., inter-contact time and contact duration), and social (e.g., group, community) metrics [3].Although current human mobility models are good at reproducing individual and encounter-based metrics, their strategies of generating social metrics still need further exploiting.For example, most existing models are able to reproduce the inherent social interactions but on a rather coarse granularity of community.Thus, the intercommunity nodes' social relationships and the social relationships of nodes that do not belong to any community are ignored.This ignorance leads to a considerable deviation of social relationships between the synthetic scenario and the reallife scenario.In addition, it will be unable to utilize the intercommunity nodes' social relationships to assist data dissemination when those mobility models are used to evaluate routing protocols.This paper will focus on the granularity issue and propose a new mobility model that takes social relationships between each pair of nodes into account.We call this model interestoriented human contacts (IHC) mobility model.The design principle is based on two intuitionistic and empirical observations.First, people always visit several spots periodically, and visit the more interesting spots at a higher probability.Second, the available contacts between a pair of people are always detected when the two people are relatively static.By implementing these two rules rationally, IHC is able to

Related Work
Up to now a number of research articles on human mobility modeling are published.Some of them are dedicated to explore exhaustive human mobility itself such as [1,4]; others are designed for assisting accurate performance evaluation of DTN forwarding protocols in simulations, such as [2,5].Our work belongs to the latter.To provide a good understanding of mobility model framework as well as the relationships among human mobility characteristics, mobility models, and routing protocols in a DTN, we draw Figure 1.
In Figure 1, each entity represents a human mobility characteristic: a rectangular entity represents an individual metric; an elliptical entity represents an encounter-based metric, and a circular entity represents a social metric.A bold line transversely divides Figure 1 into two portions.The entities above the bold line are taken as input parameters of mobility models, and the entities below the bold line are considered as output metrics.On the other hand, two erect dashed lines divided Figure 1 into three portions, namely individual metrics module, encounter-based metrics module and social metrics module, respectively (on order of left-toright).These three modules are demarcated based on the attributes of the metrics, no matter where they are, above or below the bold line.Based the on extensive study of the existing work, we say that the most significant metrics that lead to human-like "inter-contact time" and "contact duration" distributions are the "pause time" and "location preference and periodic re-appearance" (two individual metrics).Therefore, individual metrics that do not affect the two encounter-based metrics are classified into individual metrics module and denoted by ". .." in Figure 1.As a work for a model designed for assisting accurate performance evaluation of forwarding protocols in simulations, this paper is oblivious to such individual metrics.
Human mobility is driven by social relationships.Figure 1 indicates this intuition by exhibiting the social metrics module.The "inherent social interaction" in the input portion works together with several individual metrics, such as "pause time" and "location preference and periodic reappearance" to generate "social metrics" in the output portion.However, early human mobility models (such as [6,7]) only have the individual metrics module and encounter-based metrics module.As far as we know, CMM [8] is the first model that takes social relationships into consideration.Models after CMM always include the social metrics module, with different social based representations such as interaction matrix [8,9], communities [5,8,9], home-points distribution [5], overlapping communities [10], and centrality [10].
Due to the lack of end-to-end paths in DTNs, routing protocols have to utilize nodes' each chance of contact to forward packets (see Figure 1, "encounter-based protocoldependent metrics").The nodes' chances of contact are unstable and hard to hold.However, in DTNs, nodes' social relationships are steady and reliable.Hence, social-aware routing protocols utilize some social metrics, such as node community [11], node centrality [12], and node interest [13] to help with routing decisions.Among these metrics, the community is an important conception in social network theory.Therefore, most existing models reproduce the "inherent social interactions" on the granularity of (and are limited to) the community.Our model breaks this limitation and reproduces social relationships on a pair-wise granularity.
We argue that a good human mobility model should, first, be simple, second, have the ability of reproducing crucial human mobility metrics (both encounter-based and social), and third, predict well the performance of forwarding protocols on real DTNs.TLW [7] generates movement traces using a model which is similar to Levy Walks, except that the flight lengths and the pause times have power-law distributions.TLW generates the inter-contact time similar to real traces.However, as each node moves independently, TLW does not capture the nodes' social behaviors.As a subsequent work to TLW, SLAW [14] correlates the walks of different nodes based on TLW, and takes heterogeneously bounded mobility areas and fractal waypoints into consideration.SLAW generates the intercontact time and fight lengths distribution similar to real movements.However, there is no clear explanation about how to characterize nodes' social relationships.SMOOTH [15] is a mobility model with similar principle and performance to SLAW.
As the first mobility model that utilizes social network theory, CMM [8] is not concerned about individual human mobility characteristics.Besides, CMM has been proved defective: specifically, in the majority of configurations, all users collapse into a single location, this practically overthrows the initial setting of the system [9].To erase this defect, Boldrini and Passarella [9] improve CMM by adding the following two individual human mobility characteristics: people tend to visit just a few locations, where they spend the majority of their time (the same meaning as the "location preference and periodic reappearance" in Figure 1), and people prefer shorter paths to longer ones.They validate their synthetic data with real traces and show a good matching of inter-contact time and flight lengths.
HHW [10] concerns heterogeneous human popularity.The model's input characteristics are "location preferential" (individual) and "overlapping communities" (social).Its output metrics "inter-contact time" (encounter based) and "centrality" (social) also have a good matching with real traces.
The intuition that inspires SWIM [5] is as follows: people go more often to places not very far from their homes and where they can meet a lot of other people.This feature is actually the same as "location preference and periodic reappearance" in Figure 1, with a clear location preference weight decided by the location's distance and popularity.SWIM uses a bounded power-law-distributed "pause time." The authors validate their synthetic data with real traces and show that SWIM has a good matching of inter-contact time, contact duration, number of contacts and community structure.
Maiti et al. [1] collect numbers of human mobility patterns and explore dependencies between them.A framework is also proposed to reproduce these human mobility patterns in the model.Thakur and Helmy [3] propose a framework for mobility model analyzing.The model generated by the framework is validated with real traces on both encounter-based metrics and social metrics.STEPS [16] is designed based on a Markov chain modeling method.The human mobility pattern it depends on is also "location preference and periodic reappearance." A good matching of both encounter-based and social metrics with real traces is shown in the literature.SAGA [17] is dedicated to the geographic diversity of the region of interest, which is different from all of the abovementioned models.As a result, SAGA is validated with real traces on different metrics as well.
IHC also relies on "location preference and periodic reappearance" but with a different modeling method from existing models.IHC is very simple to implement and able to reproduce statistical human mobility properties, such as inter-contact time, contact duration, and accurate meaningful social relationships on the granularity of pairwise.It also can predict well the protocol performance on real DTNs.Based on the analysis of human mobility patterns and the statistical data of real traces, IHC chooses to ignore the spatial human mobility characteristics which are noncorrelative to human contacts.The inherent social interaction in the IHC input parameters is node interest, such that node interest analyzing will be more facile in an IHC simulation environment.

Interest-Oriented Human Contacts Mobility Model
In this section, we propose interest-oriented human contacts mobility model.We introduce the human mobility characteristics that inspire our work and describe the model in detail, followed with a similarity analysis of node interest based on IHC and two methods to generate input parameters based on the social interaction matrix of target scenarios.

Design Principle.
We observe that people prefer to visit a few locations and spend plenty of time staying at such locations.In other words, a few locations bring more attractions to people.In this paper, such locations are called hotspots.
In real-life scenarios, hotspots are the locations related to people's interests, for example, the locations where we work, study, have meals or do sports.Usually, people spend a lot of time at some of the hotspots (such as work or study) and less time at others (such as having lunch or doing sports).
As people spend the majority of their time at hotspots, they spend little time on the journey.Besides, the vehicle people choose to accomplish the journey is strongly dependent on the distance of the next destination.For example, people prefer to walk to an adjacent destination, ride to a farther one and drive to a much farther one.
Further, we observe that an available contact (an available contact is considered as a contact when one person meets the other, their wireless devices can detect each other and accordingly forward messages) between a pair of people usually happens when the two people are relatively static.Suppose that, in a college, two students with active wireless devices are walking past each other.Due to the transmission range of wireless devices and relative speed of the two students, the actual contact duration is too short for wireless devices to detect each other.Only when the two students appear at the same spot and stay for a while, such as having dinner in the cafeteria or reading in the library, their wireless devices have enough time to detect each other and then forward messages.
Now that an available contact occurs only if the two people are relatively static, the geographical position where the contact happens is not significant anymore if we do not take the case of the synchronized motion (e.g., the two people walk along with each other) of the two people into consideration (because synchronized motion involves very specific social relationships, e.g., a colleague relationship and a particular schedule, which is hard to hold for a mobility model).Since temporary passing-bys (non-available contacts) from one spot to another spot are negligible, the specific geographical position of these two spots and distance between them make no sense.Under these circumstances, the factors that impact the contact metrics of the two people (e.g., A and B) in a period of time are as follows: when A arrives at this spot; how long A will stay for (when A will leave this spot); when B arrives at this spot, how long B will stay for.If we extend this period of time to the overall runtime of the social network that A and B belong to, then the encounter based metrics of A and B (inter-contact time and contact duration) depend on the probability that A and B visit this spot and how long will A and B stay at this spot.
Consequently, theoretically, by rationally setting numbers of hotspots, the probability of visiting each hotspot, and the pause time after each arrival, it is quite promising to generate good matching statistical characteristics and social relationships with real traces.

Detailed Model.
As an interest-oriented mobility model, IHC builds an environment where node interests are manifested as hotspots.In general, one interest stands for one hotspot and vice versa.
Like the most existing mobility models, in IHC, the mobility of a specific node is composed of a set of movement epochs throughout the simulation time.At the beginning of a movement epoch, the node chooses a destination and moves towards it at a speed.After arrival, the node stays at the destination spot for a time period which is known as pause time.Till the end of the pause time, the node begins to choose a new destination and start the next movement epoch.Contacts occur when one node is within the transmission range of another node.However, in IHC, only available contacts are under consideration; that is, a contact is recorded if and only if two nodes are staying at the same hotspot simultaneously.In addition, nodes spend the majority of their time at hotspots.That is, the destination of each epoch can only be chosen from hotspots.Therefore, the contact metrics of a pair of nodes in IHC are determined by their probability to visit the same hotspot and the pause time they stay there.In such a case, the network area, the node transmission range and the positions of hotspots make little sense to contact metrics in IHC, and we choose to omit them.However, in order to ensure that the node interests are mutually independent, the distance between any two hotspots (although IHC does not care about their specific geographical positions) needs to be larger than the node transmission range so that nodes visiting different hotspots will not meet each other.
Since a node has different preferences to different interests, it visits the corresponding hotspots at different probabilities.Suppose that there are  hotspots corresponding to  interests in the network area.These  node interests compose an -dimensional interest space.Each node has an -dimensional vector, corresponding to a point in the interest space.The interest vector of nodex ( 1 ,  2 , . . .,   ) means nodex visits hotspot  at probability   (  ≥ 0),  ∈ [1, ]; then ∑  =1   = 1. Figure 2 maps a network area containing three hotspots (i.e., playground, library, and laboratory) and two nodes (A and B) who determine each movement epoch following IHC.Initially, each node, for example, node A, can be anywhere in the network area.Node A's first epoch soon begins: it chooses its destination from all hotspots according to its interest vector.In Figure 2, node A chooses to visit playground (hotspot 1 ), library (hotspot 2 ), and laboratory (hotspot 3 ) at probabilities  1 ,  2 , and  3 , respectively, such that  1 +  2 +  3 = 1.Once the destination has been chosen, for example, the library (hotspot 2 ), node A starts moving straightly towards it with a constant time .A constant flight time actually indicates that the speed is proportional to the flight length (distance between the starting point and the destination) in IHC.This proportional relation is based on the observation that in real-life scenarios people spend little time on the journey by choosing different vehicles for different distances of destinations.After reaching the destination, the pause time will be determined by a variable whose probability density function (PDF) () obeys a bounded power-law distribution as in [5].Now node A is reading in the library and will stay for a time duration (pause time).Note that node B has also been accomplishing its movement epochs.If it is staying at the library coincidently, both node A and node B will be able to detect this contact until one of them runs out of the pause time and begins the next epoch.Later, the other node will leave the library as well for the next epoch.Both A and B keep this kind of movements till the end of the simulation time.
It can be seen that IHC gives a clear expression on node interest and ignores specific geographical information.Such treatments lead to a much more convenient tuning up of node interest parameters.Other models do not provide a direct node interest tuning.For example, in SWIM, the probability that a node visits a spot depends on not only the spot's popularity but also the distance between the spot and the node's home.Although such settings make the preference of each node to each interest (corresponding to a spot) selfcontrolled, it is not easy to change the probability that a node visits a spot at will.Therefore, by ignoring specific geographical information, IHC replaces the probability of visiting a spot influenced by popularity and distance in SWIM with a single interest value.In this way, IHC gets rid of the inconvenience of altering a spot's visiting probability influenced by a home's position.In addition, IHC keeps all temporal metrics on contact and ignores the information of "the geographical position of contact, " which not only needs complicated settings but also lacks corresponding information in real traces and does not affect the performance of forwarding protocols as well.In IHC, the specific geographical position of each hotspot has no influence on either contact metrics or forwarding protocols' performance as long as the distance between any two hotspots exceeds the node transmission range.As a conclusion, Table 1 summarizes all parameters and their meanings in IHC.

Similarity Analysis.
Thakur et al. [18] demonstrate that people with similar behavioral principle tie together, which means that user-location coupling can be used to identify similarity patterns in mobile users.They make similarity analysis for several mobility models and show that many mobility models do not explicitly capture similarity and result in homogeneous users that are all similar to each other.Their similarity analysis is based on spatiotemporal preferences, preferential attachment to locations, and the frequency and duration of visiting these locations, which are actually the first-hand design principle of IHC.Therefore, IHC is suitable for similarity analysis inherently.
Mei et al. [13] try to utilize the cosine similarity of node interest profile to assist data forwarding in social-aware routing protocols because they believe that similar node interest profiles lead to close social interactions.However, as we have mentioned above, the mobility model they use, SWIM, cannot be used to measure node interest either accurately or conveniently.Additionally, whether cosine similarity of node interest profile can represent people's social interactions accurately is still unclear, while for IHC, node interest is taken as input parameters, thus making similarity analysis so natural that we may hopefully get meaningful conclusions.
It is generally believed that a large contact duration represents a close relationship between nodes, so social relationships are always denoted simply by contact durations [8,12].We also use this denotation in this paper.
Intuitively, in IHC, the social relationship between two nodes ought to be related to the interests shared by the two nodes, as only their common interests result in the two nodes' meeting at the corresponding hotspots.Based on this intuition, we conduct extensive simulations to observe what the relation between the common interests and contact durations of the two nodes is.In our simulations, there are only two nodes, namely, A and B, in the network area.Without specific input social interaction, pair-wise contacts are mutually independent.Thus, multiple nodes do not bring new insights.Each node has and only has 4 interests, such that node A's interest vector Let   be the amount of A and B's common interests, with value of 1, 2, 3,4, respectively, and set different values of   and   , such as (0.25, 0.25, 0.25, 0.25), (0.3, 0.3, 0.3, 0.1), (0.1, 0.2, 0.3, 0.4), or (0.05, 0.05, 0.05, 0.85).Note that even for two specific vectors, different common interests should be assigned in each simulation.The simulation time is set as three days, that is, 259200 seconds.To get the expected value, for each scenario, we average the results of contact duration over 10000 runs using different seeds.
Excitedly, we find that there is specific relation between "the dot product of   and   " and "the expected value of A and B's contact durations, " and we show the results in Figure 3.Each black point in Figure 3 represents a simulation scenario.The -axis shows the dot product of   and   (  ⋅  ) and the -axis indicates the expected value of contact duration averaged over 10000 runs.The maximum of   ⋅   is 1 when both nodes have only one, and the same interest.Corresponding to this specific scenario, the two nodes stay forever at the same hotspot.Thus, their contact duration is the simulation time, 259200 s.
Figure 3 shows a proportional relation between "the dot product of   and   " (-axis) and "the expected value of A and B's contact durations" (-axis).The dot product of   and   is a metric derivative from A and B's interests and the expected value of A and B's contact durations is a metric, which can represent the social relationship between A and B. Figure 3 reveals a promising feature of IHC.That is, IHC may have the ability of accurately reproducing a specific contact duration matrix, which is always regarded as a social interaction map.IHC can generate a specific expected value of contact duration accurately by setting appropriate values to node interest vectors.Note that the -axis in Figure 3 only shows the expected value of contact durations.As a complement, the distributions of the contact durations for different expected values are shown in Figure 4.
Figure 4 is graphed to assist understanding what distribution the contact durations obey for one expected value in Figure 3. Figure 4 is composed of eight subfigures.The expected value of contact duration in each subfigure is denoted by Exp.We choose Exp for eight scales to show in Figure 4, namely, 2500, 5000, 7500, 10000, 15000, 20000, 50000, and finally a very large one, 180000.As we can see in Figures 4(a)-4(d), for a small Exp (no larger than 10000), the distributions are far from the Gaussian Distribution, such that the expected value shows a considerable deviation from a randomly chosen value.In such a case, maybe IHC cannot reproduce a small contact duration accurately by tuning node interest vectors.Fortunately, a small contact duration makes nearly no sense in social network analysis.When Exp becomes larger, for example, in Figures 4(e)-4(h), the distributions look like a Gaussian Distribution, such that it will be more accurate by representing contact durations with the expected value.1, we take node interest vectors as the exclusive social relationship input of IHC.The corresponding parameters, namely, ,   (, ), and   , can be derived based on the conclusion in Section 3.3.That is, the expected value of A and B's contact durations is proportional to   ⋅   .Consider an extreme case: when two nodes both have only one interest and their interest is the same as follows their dot production of interest vector is 1, and their contact duration is exactly the simulation time st (see Figure 2) such that the coefficient of proportionality is 1/.

Interest Vectors Generator. As shown in Table
Suppose the  nodes are node1, node2, . .., nodenum, and their interest vectors are  1 ,  2 , . . .,   , respectively.Denote the sum of elements of vector  by ().Denote the contact duration matrix of the scenario which we want to reproduce by , such that  is a  ×  matrix.Then the contact duration between nodei and nodej is   .The following equation set holds: The equation set has  ×  variables, ( × ( − 1)/2 + ) equations.Obviously a properly selected  can make this equation set have solutions.Approximate solutions can be derived with the Levenberg Marquardt algorithm.The solutions include the parameter settings of  and   .  (,) can be anywhere in the network area as long as the distance between any two hotspots is larger than .
However, the above method of choosing amount of interests and nodes' interest vectors, named as method-1, may not generate accurately small contact durations when the simulation time is not long enough.The reason can be deduced in Figure 3.For a small contact duration expected value, the smaller the sample size (amount of contacts between a pair of nodes) is, the harder the control of contact duration value (because the distribution is far from the Gaussian Distribution) is.To tackle this problem, we provide another method of choosing amount of interests and nodes' interest vectors, namely, method-2.
The detailed method-2 is as follows.
(1) Let  = .To simulate the small contact durations in scenarios with a short simulation time, we assume that the amount of node interests equals the number of nodes.In other words, it can be regarded that each node has a home spot which the node visits at a high probability.Under this circumstance, a node's interest vector ( 1 , 2 ,. ..,  ) means that this node visit node1's home at probability  1 , node2's home at probability  2 , and noden's home at probability   .Therefore, if two nodes have a large contact duration value, one node will certainly visit the other node's home at high probability, and the value of the probability is determined by the contact duration of these two nodes in the corresponding scenario.
(2) Designate a public spot.We use a spot that is visited by all nodes at a specific probability to generate all small contact durations in the contact duration matrix of the scenario.Now, the value of  is actually  + 1.A node's interest vector becomes (,  1 ,  2 , . . .,   ), where  presents the probability at which the node visits the public spot.Generally,  holds the same value for all nodes.
(4) Set a threshold to the target scenario's contact duration matrix.For the contact duration matrix, we set a threshold th and select all the values that are no less than th to reproduce in IHC.The value of th is chosen intuitively and empirically, assuring that contact duration larger than th is considerable and meaningful to represent a close social relationship.
(5) Tune up all nodes' interest vector value based on the values exceeding th in the target scenario's contact duration matrix.Since all contact duration values no larger than th are generated by the visiting of the public spot, the remaining contact duration values (exceeding th) can be generated by tuning up the probability of home spots in nodes' interest vector values.Generally, we deal with the nodes one by one on the order of node ID from 1 to .That is, for node1, the interest value is (,  1 = 1 − , 0, . .., 0).Then, we search the node1's list in contact duration matrix; if nodea and node1's contact duration  1a exceeds th, nodea's interest vector will be updated as (,  1 =  1a /((1 − )), 0, . ..,   =1 −  −  1 , . .., 0) and so on.Different treatments can also be conducted as long as the contact duration values exceeding th are all held and for each node's interest vector (,  1 ,  2 , . ..,   ),  + ∑  =1   = 1 is assured.
We can derive the interest vectors by using method-1 or method-2 if we want to reproduce a real-life scenario using IHC.However, in general, there are totally two cases when we need to determine the interest vectors of IHC.In the other case, if we want to set up just a simulation scenario, the interest vectors can be set as we need, for example, random values.

Model Verification
In order to show the accuracy of IHC in simulating reallife scenarios, we compare IHC with three real traces whose data is gathered from experiments done with wireless devices carried by people.

Trace Data.
We illuminate how we use the trace data as follows.
First, we only care for the data generated by mobile and homogeneous nodes.For example, in trace-1, there are totally 98 iMotes in the experiment, but 20 of them are long range (around 100 meters) and static (deployed throughout the area or placed in lift of the hotel).The remaining 78 iMotes are carried by participants of the Infocom student workshop, with transmission range around 30 meters.Hence, these 78 iMotes are our research objects in this experiment.In order to find neighbor iMotes, each iMote performs periodic desynchronized scanning.The scanning takes approximately 5 to 10 seconds with time granularity between two consecutive scanning 120 seconds.An iMote cannot respond to any request when it is active such that the synchronization needs to be avoided.In this experiment, a contact is defined as a period of time where all successive scanning by one iMote receive a positive answer by another.That is, a contact can only be confirmed after at least two scanning.Given that the scanning granularity is 120 s, the speed of a pedestrian is around 1 m/s, and the transmission range is around 30 m, it indicates that a moving iMote can hardly detect a contact such that the rationality of the feature "ignoring the contacts of moving nodes" in IHC is supported.
Second, symmetrize the contact duration matrix.In the three experiments, due to the interference and other limitations, non-mutual sightings are always created.As a result, the inter-contact time and contact duration are not   symmetrical.We keep all inter-contact times detected by the mobile iMotes.However, for the contact duration between a pair of iMotes, we take the maximum of their detected results as the value.That is, if iMote A has detected that its contact duration with B is  − , while iMote B detected the value as  − , we will take max( − ,  − ) as the contact duration value between iMotes A and B. Note that these two treatments are conducted for all the three real traces.1 as input.To compare IHC with real traces, we make the output text files containing records on contact metrics and social relationships, including:

Simulation Environment and Parameter Settings. IHC takes parameters listed in Table
(i) inter-contact time.txt:recording all inter-contact times between any two nodes; (ii) contact duration.txt:recording all contact durations between any two nodes; (iii) interaction matrix.txt:recording all contact durations between any two nodes in a matrix.
As we build a discrete even simulator of IHC with VC++6.0,we are able to change the output of the simulator to observe each event, such as a node starting moving or finish  moving and two nodes meeting each other or departing from each other.Parameters of IHC are chosen and tuned up based on the scenarios which we want to simulate, for example, Infocom 06 trace (trace-1).For the parameters of  and , we assign them exactly the same values as trace-1, that is, 78 and 3 days (259200 seconds). is set to be 10 seconds based on the intuition that people spend few time on the journey.The pause time, which makes the best output (e.g., inter-contact time) matching with the real traces, is a bounded power law over the range of [120s, 4800s] with slope 6, denoted by (slope, lower bound, upper bound) in Table 3.Among them, the lower bound affects the head of the inter-contact time CCDF, the upper bound affects the tail of the inter-contact time CCDF, and the slope weakly affects the slope of intercontact time CCDF in a very small range.The lower bound is determined by scanning granularity because it is the scanning granularity that strongly affects the head of inter-contact time CCDF of the real traces.The values of slope and the upper bound are determined by matching between real trace inter-contact time CCDF and simulation results.
Table 3 summarizes all parameter settings of the three scenarios.  (, ) and   are too expatiatory to show in Table 3 and thus omitted, since we have indicated the specific method to get them in detail in Section 3.4.

Simulation Results
. We show the simulation results of inter-contact time and contact duration of Infocom 06 trace, Infocom 05 trace, and Cambridge trace in Figure 5, Figure 6, and Figure 7, respectively.Figures 5-7 validate that IHC can generate statistical metrics that approximate real traces.For a quantitative comparison, we calculate the Jensen-Shannon divergence between the distributions of the real traces and IHC traces in Table 4, as well as the corresponding results of SWIM traces whose data can be found in [5], since SWIM is a very outstanding work on human mobility modeling.The results shown in Table 4 indicate that our model outperforms SWIM in the accuracy of reproducing inter-contact time and contact duration.
We draw meaningful social relationships in real and IHC traces in Figures 8, 9, and 10, which are weighted undirected graphs.A vertex in the graph (Figures 8, 9, and 10) represents the node with the same ID in the networks.The edge between two vertices indicates that the social relationship (contact duration) between these two nodes exceeds a certain threshold.For Infocom 06 trace, th is assigned as 20000 seconds because only the top 1.665% (50 out of 3003) largest contact durations are larger than 20000.For Infocom 05 trace, th is set to be 10000 seconds because only the top 3.9% (32 out of 820) largest contact durations are larger than 10000.The threshold for Cambridge trace is chosen with similar principle with the value of 50000.The weights of the edges are calculated as the ratio of contact duration between the two nodes to the network simulation time, retaining two decimal places.
Figures 8, 9, and 10 visually show the social relationship similarity between real traces and corresponding IHCgenerated ones.For a quantitative view, we conduct Mantel Test on the real and IHC-generated social interaction matrices where the raw data Figures 8, 9   IHC is also able to characterize pairwise social relationships quantitatively.For example, there exist some close social relationships that cannot be detected by a community detection algorithm (e.g., -clique [22],  > 2), such as the relationship between 13 and 16 and 18 and 25 in Figure 8(a).This kind of relationship is defined as "friendship" in [12].IHC has the ability of reproducing the "friendship" in the target scenario.Further, the inter/intra-community social relationships and pairwise social relationships belonging to no communities that IHC generates all match real traces very well.As far as we know, no model has such a feature.Note here that we only compare and show pair-wise social relationships of real traces and IHC traces, since other mobility models cannot reproduce the social relationships on the basis of pair-wise.

Building Cases to Use IHC
In this section, we build cases to use our model.We compare the performances of forwarding protocols running with real traces and our simulated scenarios to validate that IHC can be used to predict protocols' performance.We use the three real traces (Infocom 06 trace, Infocom 05 traces and Cambridge trace) and the three corresponding synthetic traces generated by IHC as the network environments.Our goal is to validate that IHC is able to predict the performance of forwarding protocols rather than evaluating which forwarding protocol performs better.Therefore, the protocols we choose, that is, Epidemic Forwarding [23] and Spray and Wait [24], which are very mature and get extensively utilized in DTNs and    [12], which is a sophisticated social-aware protocol are appropriate for our goal.
As in [5,25], we choose two metrics to evaluate the performance of forwarding protocols.They are delivery cost (cost) and packet delivery ratio (pdr).The former indicates the price of forwarding a data packet successfully and accounts for the efficiency of the protocol.The delivery cost is calculated by the ratio of "the amount of received control packets plus the amount of data packets' replicas" to "the amount of received data packets".The packet delivery ratio, instead, is actually the successful rate of forwarding data packets and accounts for the effectiveness of the protocol.Packet delivery ratio is calculated as the ratio of "amount of received data packets" to "amount of generated data packets".
The following settings are validated for each scenario: a set of messages is generated with sources and destinations chosen uniformly at random with interval of 20 minutes, as we simulated the overall periods, that is, 3 days or 11 days, which is significantly different from that of [5] where each simulation runs only for 3 hours (choosing 3 hours out of 3 days or 11 days incurs too many uncertainties).However, the interest vector setting in Section 4.2 is based on the overall experiments duration (i.e., 3 days and 11 days).In IHC, all movement epochs are consecutive, but the actual movement epochs differ greatly in daytime and nighttime.Therefore, for a more meaningful simulation and for making statistical analysis in separate scenario, we divide the experiment duration into fragments equally.Concretely, the scale of the fragment is chosen as 12 hours to capture daytime and nighttime motions, respectively.That is, Infocom 06 and Infocom 05 scenarios are divided into 6 fragments and the Cambridge scenario is divided into 22 fragments.More importantly, interest vectors are dynamic and the values are derived using "Interest Vectors Generator" based on the current experiment fragment.To avoid end-effects, no messages are generated in the last hour; the time-to-live of messages is set as 1 hour.The accumulated forwarding protocols' results are shown in Figure 11.That is, in each simulation in Figure 11, the statistical metrics (cost and pdr) are continuously calculated except that interest vectors change with the alternate fragments.Table 6 shows  the average error percentage of all fragments in each separate scenario.In our simulations, IHC-generated traces always lead to lower cost and higher pdr than the corresponding real ones.Thus in Table 6 the average error percentage of cost are all negative and those of pdr are all positive.However, the error percentage results in Table 6 have small absolute value, showing that each of the three forwarding protocols has similar performance in both real and synthetic traces generated by IHC.In each subfigure, we draw six pillars representing consecutively the performance (corresponding to the subfigure, such as cost or pdr) of Epidemic Routing in the real trace and synthetic trace, Spray and Wait in the real trace and synthetic trace and BUBBLE.Figure 11 shows that the trend of the protocols in the real traces is the same as that of the corresponding synthetic ones.That is, the ones that perform better in the real world do the same things in the IHCgenerated one.Figure 11 and Table 6 both indicate that IHC can predict well the performance of all the three protocols.As a result, IHC is a good model for protocol validation; the performance of protocols in the real life scenarios can be accurately predicted by running the protocols on the synthetic traces generated by IHC.

Conclusions
In this paper, we propose a mobility model, IHC.IHC merges a few human mobility characteristics and is very simple to implement.IHC takes node interest as input to reproduce nodes' social relationships.Correspondingly, we explore 2 methods to generate node interest vectors based on a contact duration matrix.Through the comparisons with real-life human mobility metrics of inter-contact time and contact duration, we validate that IHC can generate synthetic traces that approximate real traces.Being different from any existing mobility models, IHC has the ability of characterizing pairwise social relationships quantitatively.Further simulations have been conducted to show that IHC can predict the performance of forwarding protocols well.

Figure 2 :
Figure 2: An IHC scenario containing three hotspots and two nodes.

Figure 3 :
Figure 3: Proportional relation between a node interest metric and a social relationship metric.

Figure 4 :
Figure 4: Distributions of contact durations under different expected values.
, and 10 comes from.Mantel Test measures the correlativity between two matrices.Since Figures 5(b)-7(b) have shown a very similar scale of social interaction matrices between real traces and corresponding IHC-generated ones, a high correlativity can complementarily prove that the IHC-generated social interaction matrices are very similar to the real ones.The Mantel Test results are shown in Table 5. Figures 5(b)-7(b) and

Figure 10 :
Figure 10: Comparisons between Cambridge trace and IHC: meaningful social relationships.

Figure 11 :
Figure 11: Performance of forwarding protocols (interest vectors change with the alternate fragments).

Table 2 :
The three experimental data sets.
Table 5 prove that IHC can accurately reproduce the overall social relationships in reallife scenarios.Further, Figures 8, 9, and 10 indicate that

Table 4 :
Jensen-Shannon divergence between distributions of the real and IHC traces, comparing with the corresponding results of SWIM traces.

Table 5 :
Mantel Test results on the real and IHC-generated social interaction matrices.

Table 6 :
The average error percentage of all fragments in each separate scenario.