Learning Automata Based Caching for Efficient Data Access in Delay Tolerant Networks

,


Introduction
Delay Tolerant Networks (DTNs) [1] are proposed as a network architecture to address communication issues in challenging environments where network connectivity is subject to frequent and lasting disruptions.DTNs consist of a number of communicating devices which contact each other opportunistically so that only intermittent connectivity exists in DTNs.As a result, DTNs are normally characterized by unpredictable node mobility, high forwarding latency, and the lack of global network state information.To address the data access in DTNs, "carry-and-forward" mechanism is used for data transmission in DTNs.When transmitting data, each mobile node acts as a relay to store the passing data and forward the data when contacting other nodes.
DTNs have been introduced into many recent applications.For instance, users with personal mobile devices utilize mobile P2P networks to share data in a certain area [2].DTNs can also be used in mobile military communication networks to deliver real-time battlefield information locally [3].In these applications, mobile nodes not only request data but also generate data themselves.Meanwhile, all mobile nodes are also responsible for forwarding passing data.It is necessary to design an appropriate network scheme to coordinate all mobile nodes to cache and forward data in DTNs.
Different caching techniques have been widely used to improve data access performance in many studies.The basic idea of caching techniques in networks is to store data at appropriate locations such that future requests for the data can be replied promptly.Although caching techniques have been extensively studied in traditional networks and wireless networks, they are rarely applied to DTNs considering the harsh network environments.The intermittent connectivity leads to the difficulty of determining the appropriate set of caching nodes in networks.And it is also hard to determine and maintain a set of caching nodes to adapt to the changing network topology given the unpredictable node mobility.
To address the data caching and access problems in DTNs, a number of caching schemes have been proposed.Some of the schemes [4] select a set of caching nodes based on probabilistic metrics, and others [5] use Markov chain to predict the nodes mobility and reduce data access delay.However, most existing schemes are still less practicable since the required global network state information is hard to obtain and may consistently change in DTNs.
In order to overcome the aforementioned issues in DTNs, we propose a novel data caching scheme based on the distributed learning automata.Our basic idea is to select and maintain a set of caching nodes called Caching Node Set (CNS), which caches data and addresses data requests from other nodes.When nodes generate new data, new data will be disseminated to all caching nodes intentionally.Besides, CNS can be updated in real time to adapt to the network topology changes via learning automata mechanism.With the updating of CNS, data will be redistributed among the latest caching nodes by cache replacement strategies.The major contributions in the paper are listed as follows: (i) We propose a novel caching scheme to coordinate multiple caching nodes for addressing data access and to improve the overall performance of networks in DTNs.(ii) We propose a novel algorithm utilizing distributed learning automata to construct the optimal Caching Node Set (CNS) to address the network topology change in DTNs.To maintain CNS in real time, two well-designed processes, voting process and updating process, are introduced to our scheme.(iii) We propose a distributed algorithm requiring no global network state information to fulfill our scheme, which improves the practicability of our scheme for DTNs-bases applications.
The remainder of the paper is organized as follows.Section 2 describes the basic idea of our approach; Section 3 describes design and implementation of learning automata based caching node selection; Section 4 describes cache based data access; Section 5 presents our evaluation; Section 6 reviews related work; Section 7 concludes the article; and Section 8 gives a glimpse of further works in this area.

Setup
2.1.Problem Statement.DTNs can be described by network contact graph ((), ()), where each vertex in () represents a network node in DTNs at time  and each edge   represents a contact between node  and node  at time .Due to the node mobility, some nodes may leave the network while new nodes may join; thus () changes along with time.Similarly () represents opportunistic contacts in DTNs and changes along with time as well, so edge   only exists when the node pair ,  ∈ () contacts each other at time ; otherwise edge   does not exist.
We assume that when node  and node  contact each other at time , that is, edge   exists at time , data can be forwarded from node  to node .The network model and an example of data transmission in DTNs are shown in Figure 1.The data are assumed to be transmitted from node  to node .
In Figure 1(a), at time  1 , there are six nodes in the network, , , , , , , and .And there are three edges,   ,   , and   , which indicates that only these three node pairs can transmit data in the network.

Random environment
Learning automata In Figure 1(b), at time  2 , node  generates new data and prepares to transmit it to node .Since there is no direct endto-end path from node  to node  at time  2 , node  has to forward the data to node  that relays the data to node .Node  has to cache the data because it cannot contact other nodes and forward the data immediately.Meanwhile the network topology changes and node  leaves the network.
In Figure 1(c), after a while, at time  3 , node  moves and contacts node .Then node  forwards the data to node  and node  will cache the data until it contacts other nodes.
In Figure 1(d), at time  4 , node  contacts node , then node  forwards the data to node , and the data transmission finishes.Meanwhile node  may move and join the network again.
Therefore, to improve the performance of data access, we need to find a shortest path between the requesting node and the hosting node.Here the shortest path means the path with the highest data delivery probability, which is implemented through the contacts between the nodes along this path.[6] is a selfadaptive unit designed to achieve certain goals by learning through repeating interactions with outer random environments following a predefined sequence of rules.In the learning process, the learning automaton chooses the optimal actions from a finite set of actions based on a probability distribution in each instant.For each action taken by the learning automaton, the environment will respond with a reinforcement signal.Then the learning automaton updates its action probability vector according to the feedback signal.The relationship between the learning automaton and outer random environment is shown in Figure 2. The objective of learning automata is to evolve progressively to a desired state.

Learning Automata. A learning automaton
Learning automata can be classified into two main structures: fixed structure learning automata and variable structure learning automata.Learning automata with variable structure are represented by a triple ⟨, , ⟩ where  denotes the set of actions,  denotes the set of feedback signals, and  denotes the learning algorithm.The learning algorithm is a recurrence relation used to modify the state probability vector.Let   () and () denote the action chosen at time  and the action probability vector on which the chosen action is, respectively, based.The recurrence equation for updating the action probability vector  is shown by ( 1) and (2).
when the taken action is rewarded by the environment (i.e., () = 0) and when the taken action is penalized by the environment (i.e., () = 1). is the number of actions from which the learning automaton can choose. and  in the recurrence equations denote the reward and penalty parameters.If  = , the learning algorithm is called linear reward-penalty ( - ) algorithm.If  ≫ , it is called linear reward- ( - ) penalty algorithm.And if  = 0, it is called linear reward-inaction ( - ) algorithm [7].
Distributed learning automata consist of a number of learning automata which form a network and achieve a global optimal result cooperatively.Each learning automaton in the network updates its own action probability vector based on the feedback signals received from neighboring learning automata.

Main Idea.
Our basic idea is to let the nodes decide whether to cache data automatically using learning automata mechanism.The nodes that cache the data are selected as caching nodes and construct the Caching Node Set (CNS) cooperatively.In our proposed scheme, each network node is assigned a learning automaton.When DTNs start operating, all learning automata in nodes are activated and start constructing the CNS to cache data and address data requests based on the node state information.Following rules in the learning automata algorithms, a number of easily accessed nodes in DTNs will become the caching nodes and comprise the CNS together.And more importantly, to address the changing network environments in DTNs, CNS can update itself to adapt to the network topology changes in real time utilizing the learning automata.
The construction and maintenance of CNS are shown in Figure 3.For each node, the action set of learning automata includes two actions, setting as a CNS node and setting as a non-CNS node.At the beginning, the node chooses not to be a CNS node.After that, a set of nodes making frequent contacts with other nodes is selected as CNS, which corresponds to the gray nodes in Figure 3(a).All other nodes in DTNs are able to contact a node in CNS at least once in a period of time with high probability.Node  and node  are selected as caching nodes, and arrow lines indicate the opportunistic contacts among nodes.It is necessary to ensure that nodes , , , , ℎ, and  can contact node  or node  in a certain period of time.When networks topology changes along with time, the CNS is updated to cover all nodes in network as much as possible.In Figure 3 of nodes and variability of contacts among nodes, node  and node ℎ are selected as new caching nodes while node  ceases to be a caching node.When CNS is constructed, we focus on utilizing the available node buffer to improve the performance of data access.When a node generates new data with the globally unique identifier and a finite lifetime, the data copies will be disseminated to all caching nodes as quickly as possible.When a node requests for data, it will send the queries to neighboring caching nodes to pull data.If the caching nodes have the data copies other nodes request, the request are replied; otherwise the request fails.
The dissemination of data is shown in Figure 4(a), where the dash lines indicate the transmission of data.New data are generated by node  and then disseminated to all caching nodes using the data forwarding strategy.When CNS changes or the nodes deplete the storages, cached data will be replaced and redistributed to ensure data accessibility.The data replacement is shown in Figure 4(b).When node  ceases to be a caching node, data cached in node  are removed and redistributed to other caching nodes.The overall performance of a caching scheme relies on prompt data dissemination and optimal data replacement strategies.

Learning Automata Based Caching Node Selection
In this section, we discuss how to construct and maintain CNS based on distributed learning automata.

Preview.
To introduce our new scheme, we first make a preview and start with the presentation of several new concepts and terms used in our scheme.with the second greatest vote weight value and so on.In this way, we can improve the successful rate of data access, decrease the loss of data, and reduce the transmission delay.
Besides direct contacts between two nodes, data in DTNs may be transmitted from the starting node to the destination node through multiple contacts.Therefore, we present two different types of votes, direct votes and indirect votes.When node  votes for node , the direct vote is the vote that node  receives from node  and the weight value of the direct vote  V is added to  of node  directly.And the indirect vote is the vote node  relays to another neighboring node  through node .Generally, node  is unaware of the vote from node  until node  sends it to node .Therefore node B has to cache the indirect vote until node  contacts node .When receiving the indirect vote, node  adds the weight value of the indirect vote  V to .
Voting plays an essential part in caching nodes selection.However, only direct voting between two nodes in contact is inadequate; we propose indirect voting to refine the scheme.The main purpose of indirect voting is to intentionally redirect the votes weight values to the nodes that tend to be caching nodes, which increases the difference of  values of all nodes and makes it more conducive to selecting an appropriate Caching Node Set.Otherwise, the votes may be dispersed among all the nodes uniformly.The uniform distribution of  values may increase the difficulty of caching nodes selection and the instability of the Caching Node Set, causing the loss of network performance.

Node Information.
Beside the passing data caching nodes need to store to improve data transmission; to implement our scheme, each node also needs to cache necessary and relevant node information to build and maintain our scheme.The following node information is cached in each node: (i)  indicates the sum of the votes weight value a node has received.
(ii)   indicates the action probability of setting a node as a caching node.Correspondingly, 1 −   indicates the probability of setting a node as a noncaching node.Whether a node is a caching node or not is finally determined by the value of  .
(iii)    records the frequency of contacts with other nodes in history.
(iv)    records node information of other nodes.
(v)   V records the indirect votes' values for other nodes.
(vi)    ℎ sorts the neighboring nodes ordered by the frequency of contacts.
(vii)    records the greatest  value of all neighboring nodes.
(viii)    records the node ID with the greatest  value of all neighboring nodes.
(ix)  records the frequency of contacts with the node   .
(x)   serves as an indicator of whether the node is a caching node.
More specifically, a node allocates a queue to store all necessary information of each contacted node, including the node information   , the frequency of contacts  a , and the indirect vote information to the node   V.Ideally, all the relevant information can be stored in the queue permanently; however, a network node only provides limited memory for information storage.Therefore, the restricted memory allocation for the queue leads to eliminating obsolete information stored in queue.We first assign a time stamp to each queue item; the time stamp of the certain queue item storing the contacted node information is updated every time the node make a contact.The queue always eliminates the items with the outdated time stamps when the memory has depleted.On the other hand, each queue item has a predefined lifetime and will be removed when it expires.

Information Delivery.
A node will transmit the node information when contacting other nodes besides voting.When node  contacts node , node  delivers the node information of its own to node .When receiving the information, node  would record and update the cached information of node .When delivering the node information, node  increases the   [] by 1.   is the message containing the node information and contains the following details: (i)  indicates the total weight value of votes a node receives.
(ii)    records the greatest  value of all neighboring nodes.
(iii)    records the node ID with the greatest  value of all neighboring nodes.
(iv)  records the frequency of contacts with the node   .
(v)   indicates whether the node is a caching node.
When receiving the message from node , node  records the information of node  in   .Moreover, node  updates its  value according to the received information.The direct votes' value from node  is added to  and the indirect votes from node  to other nodes are cached in   V.Node  updates    and    as well according to the  value of node ..   = (. , ., .), and .   is updated as the node ID which has the greatest  value.

Overview.
The proposed CNS selecting process based on learning automata starts simultaneously with the network operation and is aimed at selecting a set of nodes as caching nodes to construct CNS and updating CNS in real time to adapt to the change of network topology.It consists of two processes: voting process and updating process, as shown in Figure 5.
The action set of the node includes two actions, setting as a CNS node and setting as a non-CNS node.The node selects its action based on   value.The initial   is set to 0.5 for each node.Once the action is selected, some nodes become the CNS node.It affects the following voting process.The votes that the nodes get change and the state values also change.This leads to new reinforcement signal and new rewarding/penalizing parameter and then new action probability  .
In the voting process, when node  makes a contact with node , node  starts a voting process which sends the vote of different weight values to node  according to the node information and connection conditions between each other.When receiving the vote from node , node  updates its node information.Meanwhile, node  sends a message containing its own node information to node  in the voting process as well.And when receiving the message from node , node  updates the cached node information of node .When delivering message, node  finishes the voting process.Node  finishes the voting process when receiving the message and then updates its own node information.
In the updating process, every node in the network updates its own node state information and  value according to the cached node information of neighboring nodes.Each node repeats the updating process at a predefined time interval independently.Equipped with a learning automaton, each node updates the action probability vector based on the reward-penalty algorithm and reinforcement feedback from other nodes.Each node will decide whether to be a caching node based on the action probability vector before finishing the updating process.
These two kinds of processes occur independently.The voting process occurs whenever the node contacts other nodes, and the updating process repeats at a predefined time interval independently.One kind of process is affected by the feedback information from the other.The updating process updates the node state information according to the votes and information messages received in the voting process, while the voting process decides the vote values according to the node state changes in the updating process.The time line of one single node is shown in Figure 6 where V  indicates that The procedure of node using our scheme.

INTERVAL_TIME INTERVAL_TIME INTERVAL_TIME
Update process the node makes a contact with another neighboring node and that a voting process occurs.INTERVAL TIME represents the time interval at which the updating processes occur.

Voting Process.
The voting process occurs whenever a contact happens.When node  contacts node , the pair of nodes starts a voting process.A voting process consists of two parts, sending votes and delivering node information.However, the voting processes which happened in node  and node  are different.Node , which starts a contact, votes for node  and delivers the node information to node , while node  receives the votes from node  and updates the stored node information of node .
A voting process is shown in Figure 7 where node  contacts and votes for node .Generally, three node, node ,  According to the difference of the node information, there are mainly three different scenarios where the distribution of vote weight values is different.
Scenario 1 (the relationship between . and . is uncertain).In the first scenario, node  fails to realize the presence of node ; that is, .   contains no information of node .It is mainly because that node  has never made a contact with node  and node  receives no message and data from node .In other words, node  is unaware of . and .  .This scenario is shown in Figure 8.
This kind of scenario usually happens in the initial time of networks, when there are only a few contacts between nodes and nodes lack information of neighboring nodes.Along with the operation of networks and the dissemination of nodes information, the occurrence of the first scenario will decrease.
In this scenario, when node  contacts node , node  increases the   [] by 1, which records the accumulative number of contacts from node  to node  in history.Then we determine the total votes' weight value.We assume the place of node  in .   ℎ is .Finally the total votes weight value  of vote from node In the first scenario, the vote sent from node  to node  is the direct vote alone and the value of the indirect vote is 0. Node  sends the vote and the node information to node , while when  = 0 node  only sends the node information.The equations for the distribution of votes' values are shown by (3).Scenario 2 (. ≤ .).When node  is aware of the node information of node , which is contained in .  , the relationship between . and . is clear and node  is also aware of the node information of node , which is represented by .  , referring to the node that has the greatest  value in the neighborhood.
In the second scenario, we assume . ≤ ..Similar to the first scenario, the total weight value of votes  is determined at first.Then the distribution of the direct vote weight value and the indirect vote weight value can be decided according to the accumulative number of contacts between nodes.
Node  has the greatest  value of all neighboring nodes; therefore the relationship of  values among node , node , and node  is shown According to (4), the second scenario can be classified into a number of following subscenarios.Subscenario 2.1 (. > .).If node  is in .   ℎ, it indicates that node  has contacted node  before and the number of contacts is relatively large.According to the contacts history, we can predict that node  tends to contact node  in the future.Thus, node  is more inclined to have the data cached in node  and the data transmitted from node  to node  also tends to be forwarded to node  and cached.So the contact between node  and node  can be also considered as a contact from node  to node  and at the same time, .  [], which records the number of contacts from node  to node , is also increased by 1.This situation is shown in Figure 9. Then the distribution of the direct vote value and the indirect vote value can be determined.
The value of the direct vote sent from node  to node  is shown by The value of the indirect vote relayed from node  to node  through node  is shown by  It can be argued that the  value decides the directions in which data are transmitted and the votes' value indicates the expectations of the quantity of transmitted data.

Wireless Communications and Mobile Computing
If node  is not in .   ℎ, it indicates that node  hardly or never contacts node .Thus .  [] remains unchanged and meanwhile the distribution of the direct vote value and the indirect vote value can be determined.This situation is shown in Figure 10.
The value of the direct vote sent from node  to node  is shown by The value of the indirect vote relayed from node  to node  through node  is shown by Subscenario 2.2 (. = .).This subscenario is shown in Figure 11.According to the relationship between node  and node , it can be classified into two situations: (i) .   = .The  value of node  is the greatest in the neighborhood; that is, . = . .(ii) .   ̸ = .The  value of node  is the same as that of node , which means . = .= . .
Since . is always equivalent to . in these two situations; node  would tend to have the data cached in node  rather than in node  in this contact.The distributions of vote values in these two situations are the same.When the total vote value  is decided, the distribution of the direct vote value and the indirect vote value can be determined.The equations for the distribution of votes' values are shown by According to (10), the third scenario can be classified into a number of following subscenarios.
If node  is not in .   ℎ, this situation is shown in Figure 13.It indicates that node  hardly or never contacts node .The equations for the distribution of votes' values are shown by Subscenario 3.2 (. = .).According to the relationship between node  and node , it can be classified into two situations as well.It is shown in Figure 14.
(i) .   = .The  value of node  is the greatest in the neighborhood; that is, . = . .
(ii) .   ̸ = .The  value of node  is the same as that of node , which means . = .= . .
The distribution strategies of vote value are the same.When the total value of votes  is decided, node  only adds the direct vote to itself rather than votes for node  or node .So the equations for the distribution of votes' values are shown by The proposed algorithms used in the voting process are shown by the following pseudocode in detail.The voting process occurs in node  which sends the votes as shown in Algorithm 1 and the voting process occurs in the node  which receives the votes as shown in Algorithm 2.

Updating Process.
Each node repeats the updating process at a predefined time interval INTERVAL TIME independently, during which each node updates its own node state information based on the information obtained from other nodes.In the updating process, nodes will update the action probability vectors and determine whether to set themselves as caching nodes.The updating process fundamentally depends on the learning automaton assigned to each node.
As mentioned,   represents the action probability of setting a node as a caching node.Each learning automaton updates the action probability   following the learning automata mechanism and the reinforcement signal .Then the state of every node is determined by the action probability   and the predefined threshold value of action probability DOR THRESHOLD.The updating process is shown as follows.
First of all, the node suspends all contacts with other nodes before starting an updating process.Based on the neighboring nodes state information recorded in   , the node calculates the average  value of neighboring caching nodes V   and the average  value of neighboring noncaching nodes V  .Meanwhile, nodes also get informed of whether a neighboring node is a caching node according to the   value stored in   .
The reinforcement signal  is shown by (14), where    represents the number of neighboring noncaching nodes.If the node gets the votes (state > 0) and is surrounded by noncaching nodes, it is rewarded because it may be a good caching candidate; otherwise it is penalized since it is surrounded by a caching node already.
According to the reinforcement signals, nodes start to update the action probability  .All learning automata reward the action if  = 1 whereas they penalize the action if  = 0. Let  () be the action probability at instant . denotes the reward and penalty parameter and determines the amount of increases and decreases of the action probabilities. is shown by and 0 ≤  ≤ 1.
If the state of a node is 0, this node does not get any vote during the last updating period.In this case, the node cannot get the reward by making the value of   same as before.If penalized the value of   is set to 0. So the possibility of becoming a CNS node will be reduced significantly.Otherwise, this node does get some votes during the last updating period, and the reward is decided by the ratio of its state value to the maximum state value of its neighbor.It means the node that gets the most votes among its neighbors will be rewarded most.
The node state is determined by   and DOR THRESHOLD.If   ≥  , the node is set as a caching node; otherwise the node is set as a noncaching node.It is different with the classical learning automata selecting its action according to its action probability vector.Therefore, the nodes that are not suitable for caching data get no chance to become CNS nodes.It leads to significant performance loss, especially in the beginning period and the period when the node contacting pattern changes.To address this issue, we modified the classical learning automata to our version to make only suitable nodes become CNS node.It may cause suboptimal solution.However, it can find a reasonable solution quickly and may be more appropriate for dynamic DTNs.
Finally, the node removes the stored node information items whose lifetime has expired from the buffer.When the node state is determined and the obsolete information is eliminated, the node resumes working and contacting other nodes.Then the updating process is finished.And CNS is comprised of all the caching nodes in the network and might change along with the network operating.More details about the updating process are shown by the pseudocode in Algorithm 3.

Cache Based Data Access
In this section, we focus on data distribution and cache replacement in the network.

Data Transmission.
There are several different routing algorithms in DTNs.Each algorithm has its own advantages and disadvantages.The major concern of all algorithms is to balance the trade-off between data delivery rates and transmission overhead, and the appropriate algorithm to use depends on the given circumstances.
Epidemic routing [8] now is widely used in wireless networks.Epidemic routing is a flooding-based algorithm, as nodes continuously replicate and forward data to newly discovered nearby nodes indiscriminately.However, epidemic routing is a resource-hungry algorithm because it fails to reduce redundant and unnecessary data replications.Thus further techniques and protocols are proposed to improve the performance of data transmission in DTNs.
Generally, there are mainly two types of data transmission in our proposed scheme as follows: (i) Data requests: when a node requests some data, it only multicasts the queries to nearby caching nodes.And the caching nodes reply to the node with the data if they cache the data copy.
(ii) Data distribution: when a new data item is generated by a random node, the data needs to be distributed to all the caching nodes.One copy of data is cached at each caching node.
To achieve effective data transmission in DTNs, we need to find an appropriate routing protocol.Considering the lack of connectivity and instantaneous end-to-end paths in DTNs, popular ad hoc routing protocols such as AODV and DSR fail to establish routes.To overcome these challenges, we employ the PRoPHET routing protocol [9], which is specifically designed to solve routing problems in DTNs.In the adaptive algorithm, each node  in DTNs maintains a delivery predictabilities vector.Each delivery predictability (, ) indicates the probability for successful delivery from node  to the destination node .The delivery predictabilities are determined by the following rules [9].
(i) When a node  makes a contact with another node , the rule is shown by where  is a constant parameter.
Require: Each node updates the action probability vector and eliminates obsolete votes.Ensure: Each node updates the action probability vector and construct a new caching node set.
(ii) The delivery predictabilities for all destination nodes like node  decay along with the time; the rule is shown by where  is the decaying constant parameter and  is the elapsing time slots since the last decaying.
(iii) When exchanging delivery predictabilities vectors between node  and node , the delivery predictability of destinations  is updated based on the transitive property of predictability for which node  has a (, ) value on the assumption that  is likely to meet node  again.The rule is shown by where  is a scaling constant.
In the proposed scheme, all nodes are either caching nodes or adjacent to caching nodes within a short probabilistic distance so that most data requests can be replied quickly.Due to the distribution pattern of caching nodes in DTNs, the data are supposed to reach destination nodes within a few hops.On the other hand, data distribution in our proposed scheme is flooding-based in nature.When new data are generated by a random node, the node merely forwards the data to the neighboring known caching nodes.When a caching node receives the newly generated data, it caches the data and then forwards the data to other neighboring known caching nodes.Caching nodes receiving the data repeat the process until each caching node caches a data copy.
To improve routing performance and reduce the waste of resources in networks, we also introduce some rules and restrictions into the routing protocol.First, we set a hop limit as ℎ indicating that a data copy will be discarded after a ℎ-hop delivery.An appropriate hop limit ℎ is used to eliminate redundant data transmitted in networks and prevents routing overflooding effects.Additionally, not only does the asymmetric probability distribution in PRoPHET protocol prevent data from being transmitted in loop to some degree, but some transmitted data copies trapped in loop will perish when exceeding the hop limit.It avoids useless repetitive data transmission and saves resources in networks.

Wireless Communications and Mobile Computing 13
We also enhance PRoPHET routing protocol by proposing the predefined probability threshold  threshold and a data delivery rule.When node  needs to transmit the data to the destination node  through the relay node , nodes would first check the relationship among (, ) and (, ) and the probability threshold  threshold .If (, ) <  threshold and (, ) < (, ), the data copy would not be delivered.This rule avoids data transmission through some paths of low transmission successful rate and improves routing performance in networks.

Data Replacement.
One major restriction for any caching scheme is the limited storage space in caching nodes.Cache replacement is consequently necessary when caching buffers run out, where some obsolete and less popular data will be removed so that new data can be cached.The latency and the hit ratio are two major concerns for cache replacement.The cache replacement strategy used in the scheme is designed to optimize the trade-off between the delay and data accessibility.In the proposed scheme, cache replacement occurs under two circumstances.
(i) When newly generated data reach a caching node lacking enough space for storage, the caching node removes obsolete data to cache new data.The data copies cached in caching nodes are ordered by utility values.The caching node will discard the data of lowest utility values until there is enough space for the new data.
(ii) When a former caching node ceases to be a caching node, the data copies cached in the node will be removed and transmitted to other caching nodes if necessary.The node strives to forward the data with the utility values from high to low to other neighboring caching nodes and removes the data cached in itself.
Of all different cache replacement strategies, utility-based cache replacement strategy is most DTNs-based and widely used by many researches and applications [10][11][12].We use utility-based cache replacement strategy in our scheme to address limited memories in caching nodes.
In the proposed scheme, utility value determined by utility functions is assigned to each data copy cached in caching nodes.Utility values indicate the frequency of each data being requested based on the query history.The occurrence of data requests follows a Poisson distribution [4,13].We assume that there are  requests to the data in the time period [ 1 ,   ], then the parameter for Poisson distribution is   = /(  −  1 ) and the utility value for each data decays along with time as well.At time , the utility value   for data  is determined by the utility function   = 1 −  −  ⋅(−  ) .

Performance Evaluation
In this section, we conduct simulations to evaluate the performance of the proposed caching scheme and compare the merits and demerits with two existing data caching schemes, Intentional Cache [4,13] and Adaptive Cache [14].
Each simulation is repeated multiple times for statistical convergence.The following metrics are used for evaluations: (i) Successful ratio, the ratio of forwarding the requested data to requesters successfully.
(ii) Data access delay, the average delay for responding to queries with the requested data.
(iii) Number of caching nodes, the number of nodes set as caching nodes in a certain period.The number of caching nodes to some extent reflects the caching overhead and the number of cached data copies in the network.

Simulation Setting.
The performance evaluations are performed on the Infocom06 trace, which is collected by the Haggle project [15], and MIT Reality trace, which is collected by the MIT Reality mining project [16].In Infocom06 trace, there are 100 participants with iMote devices to record their contacts in 4 days.Infocom06 trace contains 227657 internal contacts.In MIT Reality trace, there are 97 participants with cellphones to record their contacts in 246 days.It records 114046 internal contacts.Unlike the evaluation simulations conducted in [14] and [4,13] where the first half of the trace is used as the warmup period and only the second half of trace is used for the performance evaluation, the data and queries in our simulations are generated throughout the whole trace.
Each node periodically generates new data, and the probability of generating new data is   which is set as 0.2.Like the simulations in [4,13], each generated data has a finite lifetime which is set as , and the period for data generation decision is also set as .
The queries are randomly generated at all nodes.Similar to the query pattern used in [4,13], the query pattern follows Zipf distribution [17].We assume   ∈ [0, 1] as the probability that data  is requested and  as the number of data items in the network.Then we have   = (1/  )/(∑  =1 (1/  )) where  is an exponent parameter.At intervals of /2, each node determines whether to request data  with the probability   .
In [4,13], the caching performance of Intentional Caching is evaluated by compared with several other different caching schemes, including No Cache, Random Cache, CacheData, and Bundle Cache.The results show that Intentional Caching proposed in [4,13] has overall advantages over other caching schemes.This is the reason that we choose Intentional Caching scheme as a comparing object.Similar to our schema, Adaptive Cache uses learning automata to decide whether a node should be caching node or no.Hence, we select Adaptive Cache as the other comparing object.

Caching Performance.
In [4,13], the caching performance of Intentional Caching scheme is only evaluated on the MIT Reality trace.Consequently, to compare our caching scheme with Intentional Caching scheme, we also only use the MIT Reality trace to evaluate the caching performance of our proposed scheme.In Intentional Caching, the number of NCLs used to cache data is set as 8, while the number of caching nodes in our caching scheme varies according to the contact history and network conditions.And we set the DOR THRESHOLD as 0.8.
We compare the performance of these two caching schemes with different  shown in Figures 15 and 16.When  increases from 12 hours to 2160 hours, the successful ratio of both caching schemes is improved.While the Intentional Caching scheme achieves a satisfactory successful ratio and short delay of data access, our proposed caching scheme provides a similar performance in terms of successful ratio and reduces the data access delay significantly.
As shown in Figure 15, our scheme displays a better successful ratio with short data lifetime, while the performance of Intentional Caching slightly outmatches our scheme when  increases.On the other hand, our scheme reduces data access delay significantly as shown in Figure 16.The access delay for both caching schemes remains relatively low when  is small.But as  increases, the delay for Intentional Caching increases more significantly while our scheme achieves 60% shorter delay than Intentional Caching.
The Intentional Caching scheme uses global network information to select NCL (network central location) nodes.NCL nodes are the nodes that have the most frequent contact with the rest network nodes.Once selected, NCL nodes cache the data in the whole time.Without using the global network information, our scheme is distributed and makes it possible for nodes to decide whether to be caching nodes themselves.The CNS is reconstructed occasionally to respond to the changes in networks to ensure that most data requests can be replied more quickly; thus our scheme performs over Intentional Caching given short data lifetime.It also explains the high successful ratio of Intentional Caching given long data lifetime.
If the data are not found in NCL, then the request is broadcasted to the entire network until the data source is found.The storage is limited, if the data lifetime increases; there may be no enough space in the NCL nodes to cache all the data items.It will cause performance degrading such as longer delay.In our scheme, the node uses the local information, the nodes, and their roles it contacted directly or indirectly to decide whether it becomes a caching node or not.It may not get a global optimal solution, but more nodes get the chance to cache the data.For example, a node making few contacts with the other nodes may become a CNS node if all its contacted nodes are not CNS nodes.The storage limitation has less effect on the performance of our scheme comparing with the Intentional Caching.
Comparing with our scheme, the performance of the Adaptive Caching scheme has a similar variation tendency when the data lifetime varies since these two schemes both use learning automata.However, the performance is lower than our scheme since it uses forwarding ratio of the node as the indicator of the network environment, which make it highly dependent on the data traffic pattern and underlying routing protocols.For example, if there is no previous traffic in a certain area, all the nodes in this area can not get the chance to become the caching nodes.If a node in this area wants to access a certain data, it may need long delay since no nearby nodes cache the requested data.
In our scheme, the size of CNS varies along with the time, which may need more caching nodes than NCLs sometimes.The selection of caching nodes contributes to the difference and details will be discussed in the next section.
We also evaluated data access performance with different average data size when  is set as 168 hours.The results are shown in Figures 17 and 18.When the average data size increases, the performance of Intentional Caching weakens, while the performance of our proposed scheme is less susceptible to the variation in data size.When the average data size increases from 20 Mb to 200 Mb, the successful ratio of Intentional Caching decreases from 60% to 45%, but the successful ratio of our proposed caching scheme remains around 50%.And the data access delay of our proposed scheme is overall shorter than that of Intentional Caching.As we discussed above, when the data item size is large, the storage limitation causes the performance of the Intentional Caching degrading.In our scheme, more nodes get the chance to cache the data, leading to a better performance.The Adaptive Caching scheme is also not as sensitive to the variation of data size as the Intentional Caching scheme since it is also a distributed and adaptive scheme.The performance is lower than our scheme since it uses forwarding ratio as we discussed before.
Comparatively, all caching schemes adopt effective cache replacement strategies to cache the most appropriate data.But the intelligent distribution of caching nodes in our scheme ensures that most nodes are able to contact multiple caching nodes in shorter opportunistic distances, which increases the successful ratio and reduces data access delay significantly.

Selection of Caching Nodes.
When determining the caching nodes in the networks, we are inclined to reduce the number of caching nodes so as to lower the overhead while maintain the overall performance.In Intentional Caching, the number of NCLs is predefined before simulations.Using Infocom06 trace, they set  = 3 hours and evaluate the impact of the numbers of NCLs on the data access performance on Infocom06 trace.The result shows that  = 5 is the best choice for the Infocom06 trace.But there exist some defects in the selection of NCLs.The probabilistic selection metric is in fact based on the global network state information, which is hard to obtain in practice.Moreover, a network warm-up period is required to build up the NCLs.
As for our proposed caching scheme, the selection of caching nodes is completely based on local network state information and avoids the warm-up period.The number of caching nodes, instead of being constant, varies along with the changing networks environment.In this section, we study the major factors influencing the number of caching nodes.
The sum of vote values for each node serves as an important metric for the selection of caching nodes.The proposed vote mechanism ensures the differentiation of the sums of vote value.We evaluate the sum of vote values for each node using Infocom06 trace and MIT Reality trace. is set as 14 hours for Infocom06 trace and 7 days for MIT Reality trace.
The results in Figure 19 show that the sum of vote values for each node is highly skewed in each trace.The obvious differentiation is conducive to the selection of caching nodes and supports the validity of our proposed caching scheme.And it also indicates that the selected caching nodes can be easily accessed by other nodes.
Although the number of caching nodes varies over time due to the proposed mechanism, the distribution of the metrics is still consistent with the selection of caching nodes to some extent.For instance, the average number of caching nodes is around 10 when  is 14 hours in Infocom06 trace.As shown in Figure 19(a), the number of caching nodes is generally no more than 10 and we can easily select 6 nodes as caching nodes of greater vote value sums.And the nodes we select from Figure 19(a) all serve as the primary caching nodes in the trace.
Then, we evaluate the impact on the number of caching nodes by using Infocom06 trace and MIT Reality trace.Considering the lack of persistent connection among nodes, in a certain period, some nodes in the network may be unable to contact any caching nodes or even any other nodes at all.During the simulation, we record the number of caching nodes in the network every  hours; meanwhile each noncaching node also periodically finds out whether it can contact a caching node successfully.We assume the nodes which are not caching nodes as well as unable to contact any   other caching nodes as isolated nodes.As shown in Figure 20, the number of caching nodes is reduced when  increases.Generally, caching nodes only make up 5% to 15% of all the nodes.On the other side, the number of isolated nodes decreases significantly when  increases.In Figure 20(a), there are over 40 isolated nodes when  = 3 hours while there are only 5 isolated nodes when  = 30 hours.However, there are more isolated nodes in Figure 20(b), which is largely caused by the contacts pattern among nodes in MIT Reality trace.Although some nodes are still isolated inevitably, the reduction of isolated nodes still improves the overall performance.
The change of the number of caching nodes also can be seen as a clear indicator reflecting the converging characteristics of our scheme.From Figure 20, the number of caching nodes becomes relatively stable quickly, which means learning automata reaches a stationary status.
However, the number of caching nodes is not the only concern.The effectiveness of caching nodes requires that the states of caching nodes remain relatively constant; that is, when a node sets itself as a caching node, it is necessary for the node to keep itself as a caching node for a continuous period to cache data and reply requests.Let the action probability of setting a node as a caching node be  ∈ [0,1].Since the states of nodes are determined by the action probability vector, the values of  for some nodes remain comparatively constant for a long and continuous period while others might vary and fluctuate dramatically which are likely to disturb the stability of the network.The variation trends of the action probability reveal the frequency of nodes changing their states either from caching nodes to noncaching nodes or the other way around.The average frequency of states change for all nodes is shown in Figure 21.The frequency of states change remains comparatively constant, which also shows our scheme converges quickly.Infocom06 trace describes the movement of people participating in an international conference, and MIT Reality trace describes the movement of the paper studying and working in the same lab.The former case has the higher dynamics.Therefore, the frequency of node state changing in Infocom06 trace is overall higher than that in MIT Reality trace.

Summary of Simulation.
We compare the performance of our proposed caching scheme with the Intentional Caching scheme [4] and the Adaptive Caching scheme [14].The successful ratios of data access and data access delay are evaluated.Our proposed caching scheme exhibits a similar overall performance to that of Intentional Caching under some circumstance, while showing a best performance insofar as the data access delay.However, the most noticeable advantage of our proposed scheme over other schemes is that our proposed scheme requires no global network state information to select the caching nodes and is able to adjust to the variations in network topology automatically.In conclusion, our proposed caching scheme proves to be an effective caching scheme.

Related Work
The general concept of Delay Tolerant Networks is proposed in 2003.Since then, a growing number of researches have drawn increasing attention to DTNs and sought to developing effective schemes to improve the overall performance of DTNs-based networks.
Issues on data transmission in DTNs derive from epidemic routing in ad hoc networks [8].However, the "carryand-forward" mechanism [18] used in epidemic routing is ineffective in DTNs.To improve the performance of routing in DTNs, later studies incorporate semi-Markov chains [19] and Hidden Markov Models [20,21].Other studies [22][23][24] propose routing protocols based on historic contacts records and predications of mobility patterns.Taking a step further, [25] offers a hybrid routing algorithm and explores the application for providing health services in rural areas.
To facilitate data access in DTNs, caching techniques have been extensively studied and cooperative caching schemes are designed to intentionally coordinate multiple nodes to cache and share data.Different cooperative caching strategies that apply to DTNs are proposed.For instance, [26] provides a cooperative caching scheme based on social relationships among nodes and [27] proposed a content floatingbased cooperative caching strategy.However, most caching schemes fail to address the issue on the changing network topology or the lack of global network state information in DTNs; thus these schemes rarely remain a consistently high performance in a long term.
To overcome the variability of networks topology, cellular automata [28,29] are introduced into traditional wireless networks.Given the limited resource supplied to nodes in wireless networks, [30] provides an energy-conservation solution to maximize the life time of the network by minimizing the energy consumption based on cellular automata.Then on the basis of the cellular automata, later studies seek to develop self-organized and self-adaptive network schemes.Enlightened by the solutions [31][32][33] to graph theory problems using learning automata mechanism, distributed learning automata are introduced to wireless network clustering algorithms [34], scheduling methods [35], and routing protocols [36].Our scheme takes the advantage of the learning automata theory to overcome the unpredictable and inconsistent network environment in DTNs.
Hamid and Meybodi [37,38] proposed a continuous action-set learning automaton based call admission control algorithm to minimize the blocking probability of the new Wireless Communications and Mobile Computing calls subject to the constraint on the dropping probability of the handoff calls in cellular mobile networks and theoretically prove that it converges to the optimal solution.We use the same idea to the caching based data access in DTNs.Since we can not model the data access process as a random process like Markov process, we use the simulation to demonstrate the effectiveness of out method.
When evaluating the performance, besides [4], we also compare our scheme with other works theoretically.In [13], authors make an improvement for the scheme proposed in [4] and the improved scheme is able to build the set of caching nodes in the absence of global network state information.However this improvement is at the great cost of the scheme performance.The inconsistency in the selection of caching nodes in networks appears in [13], undermining the reliability of the selection of caching nodes.To neutralize the inconsistency, the broadcasting period has to be extended and additional information exchange between two contacted nodes is needed.On the other hand, some major deficits in [4] still exist in [13].For instance, the scheme in [13] still cannot overcome the changing networks topology and needs a warm-up period to select the Caching Node Set.
Reisha Ali proposes a learning automata based scheme to choose a set of nodes for caching which could contribute more to the entire network [14].Although it has something in common with our proposed scheme, they are essentially different.It uses forwarding ratio of the node as the indictor of the network environment, which make it highly dependent on the data traffic pattern and underlying routing protocols.Furthermore, like the algorithm in [4,13], the algorithm in [14] needs a warm-up period for the selection of caching nodes and is unable to adapt to the dynamic changing of network topology.

Conclusion
In this paper, we propose a self-organized and self-adaptive caching scheme based on distributed learning automata.The basic idea of our proposed caching scheme is to maintain the Caching Node Set (CNS) which offers easy and effective data accessibility to all nodes.The selection and maintenance of CNS are achieved by the learning automata assigned to each node.Through the well-designed voting and updating processes, all nodes can cooperate spontaneously to achieve effective data access in DTNs, optimizing the overall performance of networks.More importantly, our proposed caching scheme is specially designed for the DTNs environment, which is characterized by the lack of the global network state information, the unpredictable node mobility, and high forwarding latency.The trace-driven simulations exhibit the effectiveness and advantages of our proposed scheme compared with other existing caching schemes.

Future Work
In our proposed distributed caching scheme, we use the utility-based cache replacement strategy to accommodate the limited storage space in each node; however, other possible and promising cache replacement strategies are uncovered in our evaluations.Cache replacement strategies have an important influence on the performance and studying the effects of different strategies on the performance of our scheme is necessary.Furthermore, cache redistribution may cause traffic overhead and have negative effects on the performance in certain periods, especially when the nodes change their states.Further research is still required in the future.

Figure 2 :
Figure 2: Relationship between learning automata and random environment.

Figure 3 :Figure 4 :
Figure 3: The construction and maintenance of CNS.

Figure 6 :
Figure 6: The timeline of node using our scheme.

Figure 8 :
Figure 8: The relationship between . and . is uncertain.
> B.state > A.state C not in A.Q_max_k_heap

Subscenario 3 . 1 (
. > .).According to the relationship between node  and node , it can be classified into two situations.If node  is in .   ℎ, this situation is shown in Figure 12.Similarly, .  [], which records the number of contacts from node  to node  in node , is increased by 1.When the total value of votes  is decided, > A.state > B.state C not in A.Q_max_k_heap

Figure 15 :
Figure 15: Successful ratio of data access with different data lifetime.

Figure 16 :
Figure 16: Data access delay with different data lifetime.

Figure 19 :
Figure 19: Sums of vote values for each node on realistic traces.

Figure 20 :
Figure 20: The number of caching nodes and isolated nodes.

Figure 21 :
Figure 21: The frequency of node states change.
VOTE K different possible choices of nodes to which a node can transmit the data ordered by the vote weight values from high to low.When transmitting data, the node prefers to transmit the data to the node  1 which receives the greatest vote weight value.If node  1 is unreachable or disabled, the node would transmit the data to the node  2 . < .: .will be updated as node ; that is, node  and node  refer to the same node.The relationship of  values of node , node , and node  is shown by . ≥ . > ..
) Scenario 3 (.>.).Like the second scenario, in the third scenario, node  has contacted node  and is aware of the node information of node .According to the relationship between .and., it can be classified into following situations:(i) .>..Node  and node  are two different nodes.(ii).= ..Node  and node  refer to the same node or are different nodes but of the same  value.(iii) Node  receives votes and node information messages from node .Ensure: Node  updates its node information and records node information of node  (1) procedure Voting Process( V,  V, . ) Require: