Opportunistic Mobile Sensing in the Fog

The increasing adoption of mobile personal devices and Internet of Things devices is leveraging the emergence of a wide variety of opportunistic sensing applications. However, the designers of this type of applications face a set of technical challenges related to the limitations and heterogeneity of the hardware and software platforms and to the dynamics of the scenarios where they are deployed. In this paper, we introduce a Semantic-Centric Fog-based framework aimed at effectively and efficiently supporting this type of applications. The proposed framework is composed of local and distributed algorithms that support the establishment and coordination of sensing tasks in the Fog. First, it performs ontology-driven in-network processing to locate the most adequate devices to carry out a given sensing task and then, it establishes efficientmultihop routes that are used to coordinate relevantdevices and to transport the collected sensory data to Fog sinks. We present a set of theorems that prove that the proposed algorithms are correct and the results of a series of detailed simulation-based experiments inNS3 that characterize the performance of the proposed platform. The results show that the proposed framework outperforms traditional sensing platforms that are based on centralized services.


Introduction
Fog computing is a distributed paradigm for transporting, storing, analyzing, and acting on data generated by a swarm of heterogeneous networked devices such as Internet of Things (IoT) [1] devices and personal mobile devices that are located at the network edge [2][3][4][5].
Fog computing provides Cloud-like services implemented close to where data is generated.The purpose is manifold: (1) to provide stable resources to the swarm at the network edge: this way, edge devices do not have to rely solely on their limited resources; (2) to improve scalability by offloading data traffic from the core network [5,6]: in the Fog computing model, only selected or preprocessed data is transported through the core network to the Cloud [4]; (3) to reduce response time: by analyzing and acting on timesensitive data close to where it is generated, systems can eliminate a network round-trip time, reducing the response delay and jitter [7,8]; (4) to improve privacy by storing privacy-sensitive data at the local premises [3,5,[7][8][9]; and (5) to improve efficiency by distributing processing, storing, and communication functions anywhere between the Cloud and the swarm at the edge [7,10].
The Fog computing paradigm provides an ideal platform for implementing sensing applications because it enables seamless integration between the unprecedented capabilities to monitor the physical world [11] of dedicated sensor networks, personal mobile, and IoT devices; and the scalable storage and high-performance computing capabilities for data analytics of the Cloud [8].In fact, despite the large diversity of Fog applications (e.g., smart grid, smart traffic, and smart buildings [9]), all of them include a sensing component involving IoT devices [4], mobile devices, and/or sensor networks.However, the vast majority of current 2 Wireless Communications and Mobile Computing sensing applications typically address a single scenario on a dedicated set of resources.While this approach provides performance guarantees and reliability, it prevents the explosion of possibilities that result from sharing data, hardware, and software services across applications [11].
In this paper, we present a platform for opportunistic sensing [12] in the Fog, where collections of heterogeneous networked devices (e.g., IoT devices, dedicated sensor networks, and personal mobile devices) can self-organize to collect relevant sensory data and to efficiently transport it to Fog sinks.The result is an adaptive platform that is able to opportunistically take advantage of the local sensing devices to support a wide diversity of sensing applications with optimized performance.Please note that opportunistic sensing and the Fog share a common fundamental research question, namely, how to distribute computational tasks over a dynamic set of heterogeneous resource-constrained wireless nodes [7].
Opportunistic sensing (OS) systems differ in important ways from traditional sensor networks, introducing new challenges but also opening new opportunities.While sensor networks are typically deployed as a well-known set of homogeneous devices, OS systems are usually composed of highly heterogeneous devices with diverse hardware characteristics [13].Moreover, OS systems are much more dynamic in the sense that completely new types of networked sensors can come into play at any time during the system operation, and OS platforms should be able to seamlessly integrate them into the sensing tasks.Therefore, identifying the right set of devices, those that can produce the desired sensory data with the proper context and at minimum network cost become one of the most important and complex problems for the complete realization of an opportunistic sensing platform [12,14].The context of a sensor may include its current battery's energy level and its geographical location, but also, in the case of mobile devices such as smartphones, the set of applications running on the foreground, as well as instantaneous readings of their sensors that may indicate, for instance, whether a device is inside of a bag.
Another important difference between OS and traditional sensor networks is the degree in which human users are involved in the sensing tasks.In the opportunistic sensing paradigm, sensing applications can run in the background of mobile personal devices while opportunistically collecting data.In this type of scenarios, personal devices cannot be overloaded with continuous sensing tasks that may reduce the user experience by disrupting applications or depleting battery power because it may prevent users to participate in future sensing tasks [15].This has motivated the development of collaborative sensing strategies, where two or more mobile devices share the sensing, processing, or communication load to save resources [16].
In a nutshell, the main advantage of opportunistic sensing platforms is that they can harness the sensing and communication capabilities of the static sensor networks deployed in a given environment, but also of mobile and IoT devices that happen to also be located at the same environment.This is all to provide sensing services to applications running on local devices but also to large-scale applications running on the Cloud.
The proposed framework for opportunistic sensing is based on Semantic-Centric Sensing Foglets (sensing Foglets for short) which are dynamic collections of local heterogeneous networked devices such as smartphones, traditional sensor networks, IoT devices, and mobile and desktop computers.These devices organize themselves to collect and deliver relevant sensory information to a designated local Fog data sink.For a given sensing request, the proposed framework locates and identifies the set of local sensing devices that are most fit to perform the task.This selection is based on the computation of a semantic distance function between semantic labels describing the requested sensor and the sensors installed on the local devices, the instantaneous context of the sensing devices, and the communication cost.
The main contributions of this paper are as follows: (1) an ontology-based semantic distance function, with an efficient implementation, that can be computed without accessing the whole ontology; (2) a semantic-driven distributed algorithm that uses in-network processing to locate a set of sensing devices that are able to perform a given sensing task at minimum network cost.This way, sensing tasks can be opportunistically carried out by a combination of mobile devices, IoT artifacts, traditional sensor networks and Fog devices; and (3) an effective and efficient distributed algorithm that instantiates sensing Foglets by establishing and maintaining multihop paths connecting the best sensing devices in the environment to Fog sinks and that implements collaborative sensing schedules where multiple devices can share the load of implementing a sensing task.These sensing schedules can reduce local network contention and congestion by balancing the network load.They can also be used to improve the quality of the sensory data by providing redundant sources of information.
The rest of this paper is organized as follows.In Section 2, we describe existing mobile sensing platforms, emphasizing the fact that most of these platforms are either based on centralized architectures where mobile nodes communicate directly to services on the Internet; or do not address the problem of finding the best set of sensing and communication nodes.In Section 3, we present the proposed opportunistic sensing architecture and formulate the problem of instantiating semantic-centric Foglets for sensing.In Section 4, we establish the correctness of the proposed algorithms.Section 5 presents the results of a series of detailed simulationbased experiments that show that the Semantic-Centric Sensing Foglets outperform traditional sensing platforms in terms of efficiency and effectiveness.Lastly, in Section 6 we present our concluding remarks and future research directions.

Mobile Sensing
Many sensing platforms have been proposed in recent years.
Here we present a small, but representative sample of that body of work.Sensing platforms can be classified as either centralized or distributed depending on the way the mobile nodes interact among them.Representatives of centralized platforms are METIS [17], PRISM [18] Medusa [18], and InCense [19], while representatives of distributed platforms are COUPON [20] and the work by Ngai et-al [21].
METIS [17] is a sensing platform that offloads sensing tasks to sensors embedded in the environment with the goal of conserving energy.METIS assumes the presence of a centralized rendezvous point, which is used by smartphones to query the infrastructure about the available sensing resources and their capabilities.Offloading decisions are based on the estimated energy cost when sensing is performed on the phone and the prediction of the energy cost when sensing is performed by sensors in the environment.Unlike the proposed platform, METIS does not provide support to generate sensing schedules where multiple devices can be involved in a given sensing task.In [22], the authors present a human-based data-muling system where smartphones are used to collect data from sensor networks.Then, smartphones use a 3G cellular link or 802.11 to upload their collected data to a centralized server.EEMSS [23] and Jigsaw [24] are two sensing platforms that focus on energy efficiency through the optimized use of the smartphones' sensors.These proposals, however, consider neither the problem of sensor selection nor collaboration among smartphones.
PRISM [25] is based on a client-server architecture where a PRISM server accepts sensing jobs from application servers; then, these jobs are deployed by pushing an application into smartphones that comply with a set of predicates.The PRISM runtime platform implements a software sandbox where the sensing application is executed.This sandbox also provides functionality for resource metering, for preventing applications to retain sensed data, and for allowing users to establish policies on the type of applications that they are willing to run on their phones.Once data is collected, it is transmitted back to the PRISM server through a wireless WAN link.AnonySense [26] also uses a client-server architecture where users of smartphones can volunteer to accept sensing tasks and send back anonymous reports.Tasks are accepted based on the acceptance condition defined by the task issuer and on the local policies of the phones.The main focus of AnonySense is to preserve privacy in opportunistic sensing environments.Medusa [18] provides a high-level language for developing crowd-sensing tasks which are specified as a sequence of steps that are executed by the Medusa runtime system, which is structured as a set of services that run on the Cloud and on the phones.These services are in charge of coordinating the execution of the sensing task between the smartphones and a cluster on the Cloud.InCense [19] is a general purpose mobile phone sensing platform for deploying sensing campaigns through a visual programming paradigm.The architecture of InCense is based on a centralized contextaware server that coordinates smartphones in order to improve the quality of the sensed data and reduce energy consumption.Neither PRISM, AnonySense, Medusa, nor InCense has support for direct peer-to-peer communication between smartphones.These four proposals are complementary to the one presented here in the sense that they tackle related but orthogonal aspects of mobile opportunistic sensing.COUPON [20] is a cooperative sensing and data forwarding framework that incorporates a cooperative sensing scheme and two store-carry-and-forward forwarding schemes with data fusion.In COUPON, the area of interest is divided into grid cells whose size is defined by the application requirements.Time is also divided into time slots that are further divided into sampling periods.Cooperative sensing consists of nodes reporting to their neighbors their coverage tables that contain two-tuples consisting of grid cell identifiers and the time that the grid cell has been covered during a time period.Using this information, nodes may decide not to sense and transmit redundant information regarding a given cell in the current time period.The authors implicitly assume that nodes are homogeneous and capable of sensing the required variable with the appropriate resolution at any time.Lastly, in [21] the authors propose a context-aware sensing data dissemination framework where smartphones are either used as sensors or as data mules that opportunistically collect data from stationary sensors through short-range communications.

Semantic Fog for Opportunistic Sensing
Figure 1 depicts the proposed Cloud/Fog-based opportunistic sensing architecture that aims at integrating the storage and processing power of the Cloud with the sensing capabilities of any networked device that happens to be located in a region that is of interest for a sensing application.In the proposed architecture, sensing applications running either on a mobile device (e.g., a personal health-care application) or on the Cloud (e.g., a large-scale public-health sensing application) can issue sensing tasks to Fog environments requesting to monitor a given set of variables, during a specified period of time and in a region of interest.As a response to such request, the devices in the Fog environment will organize themselves to create a Foglet composed of interconnected devices that have the capabilities to fulfill the sensing task by delivering the requested data to a designated Fog data sink.This is at minimum network cost in terms of congestion and contention.Under this paradigm, the sensing Foglets Wireless Communications and Mobile Computing are in charge of providing the sensing capabilities needed to implement personal, community, and large-scale sensing applications that run either on a mobile device or on the Cloud.It is important to mention that, even though our platform is specifically designed to instantiate Foglets for sensing, the same architecture can be used to provide other types of Fog services.
Figure 1 shows two examples of the way in which the proposed opportunistic sensing Fog environments operate.In the first example, Foglet A (composed of a smartphone and a personal computer) is instantiated as a response to a sensing task issued from the Cloud requesting to measure the noise level at the local environment.In the example, the smartphone is selected because it is equipped with an adequate sensor (e.g., a microphone), has the right context (e.g., it has enough energy in its battery, is not stored and is located in the region of interest), and is located just onehop away from the designated Fog sink (the same personal computer that received the request).In the second example, Foglet B is instantiated as a response to a sensing task issued by an application running on a mobile device located in the local environment.This sensing task requests three devices with video cameras to record a video for a designated time period.The task of this example also designates a local Fog storage service as a sink so that the mobile device can move out of the environment without interrupting the data collection process.
For the sake of completeness, Figure 1 also shows two other Cloud services, which are relevant to opportunistic sensing platforms, namely, a storage service and a data analytics service.
In the following subsections, we present the system model and formally formulate the problem of instantiating an effective and efficient semantic-driven sensing Foglet.

System Model.
We use a time varying graph () = ((), ()) to model the topology of a dynamic network composed of a time varying set () of heterogeneous nodes (e.g., desktop computers, mobile and IoT devices, and sensor nodes) that at time  are located in a given Fog environment.Nodes in the Fog environment interact with each other by means of wireless links that are modeled by edges (, V) ∈ ().Two nodes , V ∈ () are connected by a link (, V) ∈ () at time , if and only if (  (),   (V)) ≤ .Where  is the transmission radio range,   : () → R × R is a function that assigns to each node  ∈ () a position in the (, )-plane at time  and  is the Euclidean distance between two points.In practice, individual nodes know their own instantaneous position only, usually through GPS or from other positioning services.
Since nodes are heterogeneous, they can be equipped with different types of sensors that provide them with a specific set of capabilities.Such sensors are described by means of a sensor ontology O S = (  ,   ) that contains concepts  ∈   describing the different types of sensors and a set of unidirectional relations (, ) ∈   between concepts ,  ∈   .For this work, we assume that the relations in   , as well as the relations in the other ontologies, are is-a relations and that the ontologies are organized as hierarchies.Figure 2 presents a small portion of the sensor ontology O S = (  ,   ) that shows the branch that describes the acoustic sensors.In the figure, the relation (Microphone, Acoustic sensor) indicates that a Microphone is a type of Acoustic sensor, and, for instance, that a node equipped with a microphone is adequate to fulfill a request for an Acoustic sensor.
The particular sensors of a given device (also referred as node)  ∈ () are specified by a function  : () → P(  ), where P(  ) denotes the power set of the concepts   in the ontology O S .This means that a node can have any subset (e.g., a Pressure sensor and an Acoustic sensor) of the sensor types described in the ontology O S = (  ,   ).Please note that in practice, the information of function  is distributed among the nodes in the environment, namely, that each node only knows the information regarding their local sensors.
Similarly, a function   : () → R + ×   × {, } defines the individual instantaneous node context in terms of a 3-tuple (, ,   ) ∈ R + ×   × {, } where  ∈ R + is the remaining node energy.For the case of personal devices,  ∈   is the type of the application currently running on the foreground and   ∈ {, } indicates whether the device is stored (e.g., in a pocket or a bag).As in the case of the types of sensors, an application ontology O A = (  ,   ) is used to describe the different types of applications that can be executed by the devices.For instance, for a smartphone  that at time  is stored in a bag, has 50% of remaining battery charge and is not currently running an application on the foreground, we would have   () = (0.5, −, ).
A sensing task is implemented by means of a set of sensing requests that are issued to the Fog by one or more devices.A sensing request issued by node  at time  is a 5-tuple of the form ⟨  ,   , , , ⟩, where   is the identifier of the request,   ∈   is the requested type of sensor,  = ⟨,   ,   , ⟩ is a 4-tuple that defines the restrictions that a node needs to fulfill in order to be considered a feasible data source,  is the maximum number of devices that will generate data, and  is the identifier of the Fog data sink.The restrictions  include the geographic area of interest of the sensing request (a disk of radio   with center at  =   ()), the maximum number of hops (  ) that the request will be propagated, and a context predicate Wireless Communications and Mobile Computing 5 ( = (, ,   ) ∈ R + ×   × {, }) that defines the required device context, which includes the acceptable remaining energy level , the type  of application running on the foreground of the sensing device and whether the device can be stored.When  > 1, the requesting node will be in charge of coordinating a collaborative sensing schedule where the selected nodes will monitor the required variables at intervals defined by a sensing schedule.

Semantic Distance.
A semantic distance function between two concepts of an ontology O = (, ), denoted by  O , is a function  O :  ×  → R + that assigns a positive real value to any pair of concepts depending on how taxonomically close the two concepts are in the ontology [27] (here we have dropped the subindex  of O S = (  ,   ) because we are not referring to a particular ontology).We use such a function to assess how appropriate is a given sensing hardware to fulfill a sensing request.In particular, if the value of the semantic distance between the requested type of sensor and the sensing hardware installed on the device is small, we say that the device is adequate to fulfill the sensing request.From the example of Figure 2, a node equipped with either a condenser microphone or a carbon microphone is adequate to fulfill a request for a Microphone because both concepts are more concrete instances of the more abstract concept Microphone.
The definition of semantic distance between two concepts can be extended in a number of ways to obtain a semantic distance  O :  × P() → R + between a concept  ∈  and a set of concepts   ⊆ .In this work, we propose (1) that simply returns the smallest semantic distance between concept  and any of the concepts  ∈   .The idea is that a device is adequate to fulfill a request if the type of any of its hardware components is semantically close to the requested type of sensor.To exemplify this, assume that a node  is equipped with the set of sensors   = {Pressure sensor, Acoustic sensor}, and that a request for a Pressure sensor arrives to the Fog environment.Then, since  O (Pressure sensor, {Pressure sensor, Acoustic sensor}) = 0 we can say that node  is adequate to fulfill the request.
Precise specifications of the way in which the proposed semantic distance function is computed are presented in Sections 3.5 and 3.6.Section 3.5 presents the formulation of the semantic distance function when the ontology is codified as a graph, and Section 3.5 presents an equivalent formulation that uses prefix-based labels as parameters.(3) ∀ ∈  there is a path  = ,  1 , . . .  ,  of length  ≤   in ().
(5) Nodes in  are those equipped with the most adequate sensing hardware, or more specifically, the nodes that minimize where, as defined in Section 3.1,   ().denotes the remaining energy in the battery of node  at time ,   (). is the application that node  is running on the foreground (if any) at time , and   (). indicates if node  is stored at time .Similarly,  is a predicate that defines the required device context which includes the acceptable remaining energy level , the type  of application running on the foreground, and whether the device can be stored   or if the latter is irrelevant for the request (..  = * ).
This way, a solution to the problem of instantiating a sensing Foglet is a set of nodes  currently in the environment, located in the area of interest (Condition 1), with the adequate context (Condition 2), equipped with the most suitable sensing hardware (Condition 5) and, a set of devices that connect the requesting node  with all the nodes in  through paths in () of length less than   (Condition 3).Other criteria can be used to compute the paths connecting  with the nodes in , for instance, paths of Maximum-Minimum residual energy, paths composed of nodes not currently in use by their human users, or paths composed of nodes that have not recently participated in a sensing task.
Please note that the problem formulated at Definition 1 is a search problem where a subset  ⊆ () of feasible nodes is selected to perform a sensing task.Feasibility is defined by Conditions 1-4, whereas Condition 5 establishes that the best devices to fulfill the request are those equipped with the most adequate sensors for the request, namely, those with the smallest semantic distance between the requested sensor   and themselves.3.4.Semantic-Driven Foglet Formation.The proposed semantic-driven distributed framework for the instantiation of sensing Foglets is composed of a distributed semantic-driven sensor selection algorithm, a sensor ontology, an application ontology, a semantic distance function that can be computed without accessing the whole ontology, a distributed protocol for implementing collaborative sensing plans, and an interestdriven [28,29] routing algorithm.
During the first phase of the sensor selection algorithm, a Foglet Request () is disseminated across the Fog environment establishing a breadth-first search tree rooted at the requesting device.The request is disseminated only among devices located inside of the region of interest defined by the request and up to the maximum number of hops (  ).At the end of this phase, every node in the tree knows if it is a feasible device, namely, if it complies with the restrictions defined in the request.
During a second phase of the algorithm, starting at the leaves and up to the root, nodes inform their parents in the tree of the best feasible devices they have seen.A device is considered better than another device if the semantic distance between the requested type of sensor and any of the sensors in the device is smaller, or if the semantic distance is the same but the path connecting the root with the device is better in terms of hop length.This way, the node issuing the request receives only the information regarding the set of best feasible devices available in the Fog environment.With this information, the requesting node can determine if the selected devices are in fact adequate to implement an effective sensing Foglet.
Figures 3(a)-3(d) illustrate the previous idea.Figure 3(a) shows a Fog environment composed of a heterogeneous set of devices that are connected through wireless links.In the figure, a sensing task is received by a Fog server from the Cloud.As a response, the Fog server issues a series of Foglet requests that are disseminated inside the region of interest.If more than one device is selected to generate sensory data, the requesting node will coordinate the selected devices to implement a collaborative sensing plan where the selected devices and the nodes connecting them share the computing and communication load.For instance, the requesting node can compute and deliver sensing schedules to the set of selected devices so that they can turn off their radios or sensing hardware to preserve energy and bandwidth.Alternatively, the sink can also request sensory information from all the selected devices to improve the quality of the information through the redundant sources.Figures 3(b) and 3(d) show the BFS-trees established over the nodes located inside of a circular region of interest of radio   .
Figure 3(c) illustrates the case where a sensing request is issued by an application running on a mobile device.The request in this example designates a Fog server as the data sink (Figure 3(d)) so that data can be collected even if the mobile device is turned off or if it moves out of the Fog environment.Once the best sensors have been identified, interest-driven routing [28,29] is used to transport the sensory data to the designated sink (Figure 3(d)).
The following sections present detailed specifications of the proposed algorithms and protocols.

Asymmetric Semantic Distance.
We propose an asymmetric semantic distance function  O :  ×  → R + defined over the concepts of an ontology O = (, ) that can be efficiently computed, even without the need of accessing the whole ontology.The objective of this semantic distance function is to serve as the key metric to identify the best available sensor to fulfill a sensing request and, for the case of personal devices, to determine if an adequate application is running on the foreground.Equation (3) shows the definition of the proposed  O where we can observe that given a pair of concepts ,  ∈ , the function  O (, ) equals 0 if the requested concept  is an hypernymy (an ancestor) of concept .This is because  is-a more specific instance of  and hence, it can be used to fulfill the request.Otherwise, the value of  O (, ) is computed as the length of the path from  to the first common ancestor of  and  in the ontology, which is denoted by  ℎ (, ).This semantic distance function is similar to the one proposed by Rada et-al [30] in the sense that both functions are based on the hop distance between concepts in the ontology.The proposed distance, however, is an asymmetric version aimed at evaluating to what extent an instance of concept  can be used as an instance of concept .

Asymmetric Semantic Distance over Prefix-Based Labels.
In order to make the computation of (3) more efficient, every concept in O is mapped onto a prefix-based label that is derived from a finite alphabet Σ.A prefix label   for concept  ∈  is a word in Σ * such that   =   ⊕ , where   is the prefix label of the parent concept of  in O, ⊕ is the concatenation operator, and  ∈ Σ is a suffix assigned to , which is different to the suffixes assigned to its sibling concepts.The root concept of O is assigned to label  0 that can be equal to any symbol  ∈ Σ.We use function Λ :  → Σ * to denote such a labeling.Please note that we can assign prefix labels to all the concepts in linear time by performing a level-order traversal of the ontology.Now, we can reformulate (3) in terms of the prefix labels.We use || to denote the length in symbols of label  and  1 ⊘  2 to denote the label composed of the largest common prefix between  1 and  2 .The new formulation is shown in (4).Please note that (4) can be easily implemented using simple string operations.
As we have already mentioned, the main advantage of this formulation is that devices do not need to access the whole ontology in order to compute the semantic distance between concepts.They only need to perform simple string operations over the corresponding prefix-based labels.Moreover, devices can use (4) to correctly compute the semantic distance between the labels of their sensors and a label of a concept that was added to the ontology to accommodate a new type of sensor.This discussion also applies to the applications running on the devices.
Similar to the case of (1), we can define a semantic distance function  Λ(O) : Λ(O) × P(Λ(O)) → N + between a prefix label   ∈ Λ(O) and a set of prefix labels   ⊆ P(Λ(O)), where P(Λ(O)) denotes the power set of prefix labels in the labeling Λ.This is shown in (5).
Figure 4 shows a small ontology and its corresponding prefix-based labeling derived from the alphabet Σ = {0, , , }.From the figure we can see that Please note that for any ontology O = (, ) with a tree topology, the length |  | of the prefix-based label of any  ∈  is in (ℎ) where ℎ is the height of the ontology tree.Therefore, the space needed to store a prefix-based label and the number of operations needed to compute Equation ( 4) are also in (ℎ).

Semantic-Driven Sensor Selection Algorithm.
As already mentioned, a sensing task is implemented by means of a set of sensing requests that are issued to the environment by one or more devices.A sensing request is a 5-tuple of the form ⟨  ,   , , , ⟩, where   is the identifier of the request,   ∈   is the requested sensor type codified as a prefix label,  = ⟨,   ,   , ⟩ defines the restrictions that nodes need to fulfill in order to be considered feasible data sources,  is the maximum number of devices that will generate sensory data, and  is the identifier of the sink node.
As shown in Algorithm 1, when a node receives a sensing request from the upper layers, it creates a new element in the set of Foglets (, line (4)) that contains the fields and data structures that are used during the process of establishing and maintaining the sensing Foglet.All this information is soft state.An element of the Foglet set contains (i) a unique Foglet identifier (  ), (ii) the identifier of the root node of the tree (), (iii) the identifier of the sink (), (iv) the identifier of the parent of the current node in the tree (), which in the case of the node issuing the request is its own node identifier, (v) a monotonically increasing sequence number () that is further used to maintain the routes from nodes to the root node, (vi) the requested sensor type (  ), (vii) the maximum number of devices that will generate sensory data (), (viii) the set of restrictions (), (ix) a subset of one-hop neighbors that are known to be children of the current node in the tree (ℎ), (x) a subset of one-hop neighbors that are known not to be children of the current node in the tree (ℎ), (xi) the set of current one-hop neighbors (  ), (xii) a set containing the replies () received from the children nodes in the tree, (xiii) a flag that indicates if the node has already sent a Foglet Reply back to its parent () and, (xiv) a routing table () that contains pairs ⟨, ⟩, which are used to route packets from the root to the selected nodes.
A reply  ∈  is a 3-tuple that contains the identity of the node () that originally issued the reply, the semantic distance () between the requested sensor type and the sensors in the node, and the hop distance (ℎ) from the root to that particular node.
Once the local state has been created, the root node transmits to its neighbors a Foglet Request which is an 8tuple of the form  = ⟨  , , , ℎ,   , , , ⟩, where   is the Foglet identifier,  is the identifier of the root node,  is the root's sequence number, ℎ is the hop distance that the  has traversed so far,   is the requested sensor type,  is the set of restrictions,  is the maximum number of devices that will be used to generate sensory data, and  is the identifier of the sink.Note that the pair ⟨  , ⟩ uniquely identifies the Foglet that will be instantiated in response to the sensing request.Upon reception of a e from neighbor , node  first checks if it is the first time it has received a request with the same ⟨  , ⟩ pair (line (8)).If so, it creates a new element in the set of Foglets and checks whether or not it has to relay the  message by verifying if it is inside the region of interest (by calling function (), line (11)) and, if the  has not reached the maximum number of hops (.  ).If node  has already received another Foglet Request with the same ⟨  , ⟩ pair, it checks if it is a leaf of the tree by comparing its set of nonchild neighbors (ℎ) against its one-hop neighborhood at the time it received the  for the first time (line (30)).
Nodes know their one-hop neighborhood, denoted (), by periodically exchanging hello packets that contain the identity of the nodes.
In order to ensure termination in the presence of packet loss due to either channel effects or topological changes, nodes start a timer (line (13)) after they have relayed a Foglet Request.The value of the timer is proportional to the difference between .  and the node's hop distance to the root.When this timer expires, and if the node has not already done so, the node sends its Foglet Reply back to its parent.This way, in the worst case, the tree will be contracted from leaves to root in a time proportional to .  .
The contraction phase of the sensor selection algorithm starts when a  either reaches its maximum number of hops or when it is received by a node located outside of the region of interest.In the first case, the node considers itself a leaf of the tree and sends a Foglet Reply message back to its parent containing its own reply (line (19) of Algorithm 1).When the node does not fulfill the context restrictions defined in the Foglet Request (V(.)= ), it sets the value of the semantic distance to ∞ to indicate that it does not have a feasible sensor (line (22)).If the node is located outside of the region of interest ((.,.  ,   ()) = ), it also sends a Foglet Reply containing a value of semantic distance equal to ∞ (line (22)).A node can also detect that it is a leaf if it receives  from all of its current neighbors indicating other nodes as their parents (lines (27) and (30)).
As shown in Algorithm 2, when a node  receives a Foglet Reply  = ⟨  , , ⟩ from one of its children (node ), it stores the replies  contained in the message in the  data structure and adds 's identifier to the ℎ data structure.Then, for each of the replies contained in (3) Neither  nor  is an ancestor of the other.We also have that Λ() ⊘ Λ() ̸ = Λ().By construction of the prefix-based labeling Λ, the difference in length between the label Λ() of concept  and the label of any of its ancestors is equal to its distance in the ontology; therefore, |Λ()| − |Λ() ⊘ Λ()| is equal to  ℎ (, ), because, by definition, Λ()⊘Λ() is the prefix label of the first common ancestor between  and .
For the following theorems and corollary, we assume that packets sent in unicast mode are reliably delivered to their intended destinations and that every node  ∈ () knows the constituency of its one-hop neighborhood at time , which is denoted by   ().We will use  = (  ,   ) to denote the BFSthree composed of active links at the end of the contraction process that was induced by the  pointers established during the dissemination of the Foglet Request ().Theorem 3. When the algorithm described in Algorithms 1 and 2 terminates, the set  sent to upper layers is composed of a subset of the nodes of the BFS-tree  = (  ,   ) that comply with the five conditions of Definition 1. Proof.
Condition 1.We have to show that ∀ ∈ , (...−   ().) 2 + (...−   ().) 2 ≤ (  ) 2 , where   () denotes the position of node  at the time it received the .From line (11) of Algorithm 1 we know that nodes located outside of the region of interest do not propagate the  and reply with a value of semantic distance equal to ∞ (lines (22) and (40) of Algorithm 1) to indicate that they are not feasible nodes.Therefore, no node  ∈  was located outside of the region of interest at the time it received the .
Condition 2. To show that the selected nodes have a feasible context, we have to show that ∀ ∈ ,   ().≥ ... ∧  Λ(O S ) (...,   ().)= 0 ∧ (  (). = ...  ∨ ...  = * ).Similar to the previous case, nodes call to V(..) to check if their context   () comply with the restrictions defined in .. at the time the  was received.If not, they reply with a value of semantic distance equal to ∞ to indicate that they are not feasible nodes.
Condition 3. We have to show that at the time the Foglet is established, ∀ ∈ , the length of the path , . . .,  connecting a node  to the root in the BFS-tree is less than or equal to ...  .Assume there is a path , . . .,  of length |, . . ., | > ...  .This is not possible because  packets are propagated at most ...  hops away from the root, and the paths connecting the root to the selected devices are reverse paths established using the parent pointers defined during the propagation of the .
Condition 4. We have to show that || is as large as possible but || ≤ .Assume || <  and that at the time the algorithm terminates, there is a node  that complies with Conditions 1, 2, and 3 in the BFS-tree but that  ∉ .Since the algorithm terminates and unicast messages are reliable, the branch of the BFS-tree containing  should have completed its contraction process with the transmission of a  message to the root node containing a set  of feasible nodes.Now, since at the root node || < , it should be the case that the number of feasible nodes in  is also smaller than .Otherwise, the extra nodes could be included in .On the other hand, the information regarding  should have been omitted, either by  itself or by one of its ancestors in the BFS-tree.This means that at some point during the contraction of the branch containing , a node decided not to include  in the replay.But this is only possible if that node had information about other  feasible nodes which are better than .From this point on, all the  messages in that branch would contain information of at least  feasible nodes which is a contradiction.
Condition 5. We have to show that ∀ ∈ P(  ) such that every node in  complies with Conditions 1, 2, 3, and 4, we have that Assume there is a feasible node  ∈   but not in  at the root node, such that  Λ(O S ) (  , ℎ()) <  Λ(O S ) (  , ℎ()), for some  ∈ .Since the algorithm terminates and unicast messages are reliable, the branch of the BFS-tree containing  should have completed its contraction process with the transmission of a  message to the root node containing a set  of feasible nodes.Since  ∉  at the root, it should be the case that at some point during the contraction of the branch containing , a node decided not to include  in the replay.But this is only possible if that node had information about other  feasible nodes which are better than .This leads to a contradiction since from this point on, only better nodes can be added to s sent towards the root.Theorem 4. All the branches of the BFS-tree  = (  ,   ) are correctly contracted.
Proof.By contradiction, assume that there is at least one branch of the BFS-tree  = (  ,   ) that is not correctly contracted up to the root.Let node  ∈   be the first node that did not send a  message to its parent in  and let  ∈   be its parent node.Note that  ̸ =  since we are assuming the branch did not finish its contracting process.Now, by lines (30) of Algorithm 1 and (10) of Algorithm 2, node  sends a  message to  when   () ∩   () =  ℎ ∪  ℎ , where   () is the one-hop neighborhood of  at the time it received the first ,   () is the current one-hop neighborhood of ,  ℎ is a set of nodes which are known to be children nodes of  in , and  ℎ is a set of nodes which are known not to be children nodes of  in .Since  adds nodes to  ℎ when it receives a  message from them, the conditions of lines (30) of Algorithm 1 and (10) of Algorithm 2 state that a node contracts itself when it has collected  messages from all of his children nodes which are still within radio range.Since  messages are transmitted using a reliable protocol, they always reach their intended destinations (if within radio range); therefore, the only way in which the contraction condition at  never becomes true is when one or more  messages stating a parent different to  are not correctly received at .But, since  is not actually waiting for information from those nodes, it can send a  message when a timer expires as described by lines (21) to (31) of Algorithm 2. Now, we have to argue that the contraction timer at  expires after the contraction timer at .Let   be the time in which  started its contraction timer and ℎ the value of that timer, with ℎ ≥ 1.Now, the time in which  started its own timer equals   +   and its corresponding value equals (ℎ − 1), where   is the maximum one-hop delay experienced by a message.Therefore, the expiration time of ' timer is   +   + (ℎ − 1) and this time can be earlier than 's expiration timer only if   > .So, by setting the value of  sufficiently large, the contraction timer at  will expire after the time in which the contraction timer at  expires.

Corollary 5. The algorithm described in Algorithms 1 and 2 terminates.
Proof.This is a direct consequence of Theorem 4 because the algorithm terminates when all the branches of the BFS-tree complete their contraction process.

Experimental Results
In this section, we present detailed simulation results comparing the performance of the Sensing Fog against that of a traditional opportunistic sensing environment in which sensing nodes periodically post their location, instantaneous context, and sensing capabilities to a well-known local ambient server.In this centralized environment, when a sensing request arrives at a node, that particular node forwards the request to the ambient server that replies back with a list of devices that can fulfill the requirements specified in the sensing request.Lastly, the requesting node contacts the selected devices to instruct them to generate sensory data.In this sensing environment, all the communications are supported by the AODV [31] routing protocol.This implementation mimics the behavior of many of the centralized environments described in Section II.The algorithms and protocols that implement both sensing environments were developed over the Network Simulator 3 (ns-3) [32] that provides realistic models of the whole protocol stack.In particular, ns-3 provides welltuned versions of AODV and of the underlying MAC protocol and physical layer.The source code can be downloaded from https://github.com/blunan/Stratos-Distributedand https://github.com/blunan/Stratos-Centralized.
We use search success ratio, satisfied-request ratio, packet delivery ratio, average reply delay, and overhead as our performance metrics.The search success ratio is computed as the ratio between the number of times the sensing environments were able to locate the best set of devices (according to Definition 1) and the total number of sensing requests received by any node in the environment.The average satisfied-request ratio is computed as the ratio between the number of times the sensing environments located a valid set of devices (according to conditions 1 to 4 of Definition 1) and the total number of sensing requests received by any node in the environment.The average reply delay is the average time between the arrival of a sensing request and the reception, at the corresponding sink node, of the first data packet containing sensory data.The overhead is computed as the total number of bytes transmitted by the nodes in the environment, divided by the number of bytes of sensory data received at the sink nodes.All the algorithms were tested with IEEE 802.11DCF as the underlying MAC protocol.Nodes move according to the random waypoint mobility model and each simulation was run for fifty different seed values.The confidence level for all the results presented in this section is 95%.Table 1 lists the details of the simulation environment.
The results are organized in two sets.In the first set (Sections 5.1-5.2) we evaluate the performance of the algorithms when the sensing requests require a single source of sensory data, whereas in Section 5.3 the sink nodes coordinate sensing plans that are implemented by a set of devices.In all the experiments, the value of the radio (  ) of the region of interest is selected uniformly at random from the interval [400, 600], the maximum number of hops (  ) that the request is propagated equals four, and nodes are equipped with a set of sensors which are selected uniformly at random from the set of sensors defined in the sensor ontology.Nodes receiving sensing requests are also selected uniformly at random from all the nodes in the environment and the length of the data packets generated by the selected sensors is of 256 bytes.Both, the sensor type and the context included in the sensing request are also selected uniformly at random from the sensor and application ontologies and from all the possible values for node contexts, respectively.For the centralized environment, a node is selected as the ambient server uniformly at random at the beginning of the simulation.The ambient server remains static during the whole simulation time.All these values were selected to model scenarios as described in Section 3.

Performance with Increasing Proportion of Mobile Devices.
In these experiments, we evaluate the performance of the sensing environments as the percentage of mobile nodes is increased from 25% to 100% of the nodes in the environment.Every node is assigned with two different types of sensors, there are four concurrent sensing requests, and selected sensors generate data flows composed of 10 data packets.The purpose of this scenario is to evaluate the ability of the sensing environments to cope with an increasing number of topological changes.
From Figure 5(a) we can observe that the sensing Foglets clearly outperform the centralized environment in terms of both search success ratio and satisfied-request ratio.As shown in Figure 5(c), the reduced performance attained by the centralized environment is mainly due to the extra communication overhead induced by the nodes while updating its state at the centralized server.Under this situation, a significant number of requests (or replies) are lost due to contention and congestion, which is reflected as a reduced number of successfully replied sensing requests.Moreover, as the number of mobile nodes increases and more links break, the control overhead induced by AODV's route repair mechanisms also increases, creating even more contention and congestion.The net effect is a reduction in the number of successfully replied sensing requests.This is consistent with the fact that the values of the search success ratio and the satisfied-request ratio attained by the centralized environment are almost the same, which indicates that this environment almost always identifies the best sensor, but sometimes it is unable to deliver a reply back to the requesting node.
The results (Figure 5(c)) also show that the on-demand approach used by the Foglets is far less costly than the proactive approach of the centralized server.While the overhead induced by the Foglets is consistently less than 30, the overhead induced by the centralized environment reaches 2417.The sensing Foglets attain this level of efficiency because nodes can locally compute semantic distances using the proposed algorithm that is based on labels.In this case, however, there is a gap between the search success ratio and the satisfied-request ratio attained by the Foglets that indicates that they do not always locate the best sensor in the environment.This usually happens when the best node does not correctly receive a Foglet Request (sent in unreliable broadcast mode) and hence, it is not included in the BFS-tree.
Figure 5(b) shows that the average delivery ratio attained by the Foglets is consistently better than that of the centralized environment.Moreover, unlike the centralized environment, the figure also shows that the Foglets are fairly insensitive to the increase in the percentage of mobile nodes.This indicates that the routes connecting sensors to sinks computed and maintained by the Foglets are more effective than those computed and maintained by AODV.
Lastly, from Figure 5(d) we can observe that the average reply delay attained by the centralized environment is consistently better than that of the Foglets.This behavior was expected and the reason is as follows.First of all, it is important to recall that this metric considers the time between the moment in which the sensing request arrives at the environment and the time the first sensory data packet is received at the designated sink.Therefore, in the case of the sensing Foglets, this delay includes the time it takes to propagate the request within the environment, the time it takes to contract the BFS-tree established during the dissemination of the Foglet Request and the time it takes to transport the first data packet to the sink.Moreover, as described in Section 3.7 and Theorem 4, due to the conservative way in which the BST-Tree is contracted (needed to guarantee termination), the BFS-tree contraction delay is proportional to the height of the BFS-tree and to the value of  which is a configuration parameter.On the other hand, due to the proactive approach of the centralized environment, all the information needed to locate a relevant sensor is already stored at the ambient server and hence, it can immediately reply back to an incoming request.

Performance with Increasing Number of Sensors per Node.
In these experiments, we increase the number of sensors per node from 1 to 8, 50% of the nodes are mobile, there are four concurrent sensing requests, and selected sensors generate data flows composed of 10 data packets.The purpose of these experiments is to evaluate the ability of the environments to take advantage of having better-equipped devices.
Figure 6(a) shows the search success ratio and the satisfiedrequest ratio attained by the two environments.From the figure, we can observe that the Foglets are able to take advantage of having more nodes equipped with the requested sensors which is reflected in an increased search success ratio.This is not the case for the centralized environment where less than 50% of the requests are correctly fulfilled, even though the centralized server almost always identify the best sensor in the environment.As in the previous section, the main reason for this poor performance is the extra communication overhead induced by the proactive way in which the state of the sensors is periodically updated at the ambient server (Figure 6(c)) that tend to congest the network around the server increasing the probability of losing request and reply packets.In these experiments, the overhead induced by the Foglets is always less than 30, while the overhead induced by the centralized environment reaches 3002.
Figure 6(b) shows that the Foglets also outperform the centralized environment in terms of packet delivery ratio by delivering up to 30% more data packets at the sinks.From Figure 6(d) we can observe that the average reply delay attained by the centralized environment is consistently better than that of the Foglets.Again, the extra delay experienced by the Foglets is mainly due to the time it takes to contract the BFS-tree.

Collaborative Sensing Plans.
In this set of scenarios, we evaluate the performance of the sensing environments when the sensing requests ask for a set of three devices that will collaborate in order to share the computation and communication load.The three devices implement a sensing plan where a single node is selected at a time to sense a required variable and to send the collected data back to the sink.In both sensing environments, data sinks are in charge of coordinating the sensing nodes by instructing them to start or stop sending sensory data.Every node is assigned with two different types of sensors, there are four concurrent sensing requests, and selected sensors generate data flows composed of 20 data packets.
From  than one device.The results presented by these figures are consistent with those presented in previous sections; specifically, that the Foglets achieve higher search success ratio, satisfied-request ratio, and packet delivery ratio than the centralized environment.This is all while inducing two orders of magnitude less overhead per sensory data byte received at the designated sinks (up to 30 against up to 1408), but with larger average reply delays.

Discussion and Future Research
In this paper, we presented a new distributed framework for opportunistic sensing in the Fog, where collections of potentially highly heterogeneous devices organize themselves into sensing Foglets to fulfill a sensing request issued to the Fog environment.The proposed framework uses semanticdriven in-network processing to locate the devices that are most fit to perform a given sensing task.It also establishes and maintains multihop paths, connecting the selected sensors to a designated data sink, which are used to deliver flows of sensory data.The sensing Foglets are instantiated in response to sensing requests generated either by an application running on a device currently in the Fog environment or by an application running on the Cloud.In general, the sensing Foglets can extend the sensing capabilities of any device by taking advantage of other devices located at a relevant Fog environment.
Search success ratio (foglet) Satisfied-request ratio (foglet) Search success ratio (centralized) Satisfied-request ratio (centralized)  The fitness of a node is computed in terms of its instantaneous context and the semantic distance between its sensing hardware and the hardware specified in the sensing request.Semantic distances are computed locally at the nodes by means of a very efficient algorithm that takes prefix-based labels as input.The main advantage of this algorithm is that a node does not need to store the whole ontology in order to compute the semantic distance between the requested sensor and its own sensors.It only needs to store the prefix-based labels that codify the branches of the ontology that contain the concepts describing its sensing hardware.Moreover, the devices in the environment can correctly compute the semantic distance between its sensors and new types of sensors that are added to the ontology.This is without the need of updating any information at the devices.
We presented the results of a series of simulation-based experiments that show that the proposed sensing Foglets outperform traditional sensing platforms that are based on centralized services by correctly answering to more sensing request and at a lower communication cost.This is true in scenarios where sensing Foglets incorporate a single sensing device and in scenarios where each sensing Foglet includes a set of nodes that implement collaborative sensing plans in order to distribute the computation and communication load.
Future work includes the development of delay-tolerant sensing Foglets and the use of heterogeneous networks including vehicular ad hoc networks (VANETs).Lastly, it is important to highlight the fact that authentication, access control, rogue node detection, intrusion detection, privacy, and trust management are crucial aspects that need to be fully addressed before opportunistic sensing platforms are adopted by the general public.The good news is that the Fog computing paradigm provides promising solutions to address many of these security and privacy issues.For instance, Fog nodes can provide cryptographic services to devices that lack the processing power to perform complex computations [33].Relevant services such as authentication and reputation-based trust management are also good candidates to be implemented by local Fog services running on stable and trusted computers [34].Privacy policies can also be enforced by Fog nodes that decide which data can be transported outside of the local Fog environment.

Figure 2 :
Figure 2: A small fraction of the sensor ontology.

Figure 3 :
Figure 3: Semantic-driven Foglet formation.(a) A Fog ambient where a sensing task is received from the Cloud by a desktop Fog computer.The Fog ambient is populated by desktop computers, mobile and IoT devices, and sensor nodes.(b) A sensing Foglet composed of sensor and relay nodes, located inside of the region of interest, is instantiated on-demand as a response to the sensing request received by the desktop computer.(c) A Fog ambient where a sensing task is issued by a local mobile device.(d) The Foglet transports the collected sensory data to a desktop Fog computer.Dotted arrows indicate parent-child pointers that define BFS spanning trees that are used to collect information about the sensors in the environment.Solid arrows indicate wireless links used to transport sensory data from sensors to the designated sinks.

Figure 4 :
Figure 4: Example of a small ontology and its corresponding prefixbased labeling.
Figures 7(a)-7(d) we can observe that the sensing Foglets are capable of effectively and efficiently implementing a collaborative sensing task that is carried out by more Search success ratio (foglet) Satisfied-request ratio (foglet) Search success ratio (centralized) Satisfied-request ratio (centralized)

Figure 5 :
Figure 5: Performance with increasing proportion of mobile devices.

Figure 6 :
Figure 6: Performance with increasing number of sensors per node.

Figure 7 :
Figure 7: Collaborative sensing plans: performance with increasing proportion of mobile devices.
devices, a function  that describes the sensing hardware installed in the devices in (), a node context function   that describes the instantaneous context of each device, and a sensing request  = ⟨  ,   ,  = ⟨,   ,   , ⟩, ⟩ issued by node  ∈ () at time ; find a set of devices  ⊆ (), such that the following conditions hold:(1) The selected sensing nodes are located inside of the region of interest: ∀ ∈ ,   () is a point located inside of the circumference of radio   with center at  =   ().(2)Nodes in  have the required energy, are running an adequate type of application on the foreground (if any), and are not stored if it may interfere with the sensing process: ∀ ∈ ,   ().≥ ..∧,  O S (..,   ().)= 0 ∧ (  (). = ..  ∨ ..  = * ).
Definition 1.Given an environment composed of a time varying set of devices () located at positions designated by function   , a sensor ontology O S = (  ,   ) describing the different types of sensing hardware, an application ontology O A = (  ,   ) describing the different types of applications running on the let us assume that the concepts located inside of the dashed box ( 10 ,  11 , and  12 ) were added to accommodate new types of sensors.We can see that a device equipped with a sensor of type  9 can reply to a request Wireless Communications and Mobile Computing for a sensor of type  12 with a distance  Λ(O) (  12 ,   9 ) = |"0aab"| − |"0"| = 3 because it only needs the labels of the two concepts.