Shortest Paths Based Web Service Selection in Internet of Things

The connecting of things to the Internet makes it possible for smart things to access all kinds of Web services. However, smart things are energy-limited, and suitable selection of Web services will consume less resources. In this paper, we study the problem of selecting someWeb service from the candidate set. We formulate this selection of Web services for smart things as single-source many-target shortest path problem. We design algorithms based on the Dijkstra and breadth-first search algorithms, propose an efficient pruning algorithm for breadth-first search, and analyze their performance of number of iterations and I/O cost. Our empirical evaluation on real-life graphs shows that our pruning algorithm is more efficient than the breadth-first search algorithm.


Introduction
Recently, development of RFID, sensor, and networking technologies make it possible for various physical world things connecting to the internet, and people usually call it Internet of Things (IoT).In IoT, more and more devices are getting connected to the Internet, and the next step is to use the World Wide Web and its associated technologies, such as Web services, as a platform for smart things.
In Internet, there are a lot of Web services.These Web services connect with each other and construct a topology of graph.In IoT, all smart things are connecting to the Internet, so they can access all kinds of Web services.However, smart things are usually energy-limited, so the selection of suitable Web service is extremely important.Guinard et al. [1] give an overall view of Web service access in Internet of Things.They propose a process and a suitable system architecture that enables developers and business process designers to dynamically query, select, and use running instances of realworld services.
In this paper, we study the problem of Web services selection in IoT setting; that is, given a smart thing and a graph of Web services, how to select several suitable Web services from the candidate set while providing satisfactory quality of service.If the selected Web services are nearest, the smart thing would take less accessing and waiting time and then consume less power.Here, we formulate the selection of Web services as single-source many-target shortest path problem.
The shortest path problem can be classified as SSSTSP (single-source single-target shortest path), SSMTSP (singlesource many-target shortest paths), APSP (all-pairs shortest paths), and SP ( shortest paths).The MSSTSP (manysources single-target shortest paths) is the same as SSMTSP if we reverse all edges in the graph.

Our Contribution.
In this paper, we study the problem of SSMTSP (single-source many-target  shortest paths) on MapReduce, which is finding  shortest paths from a candidate set for one source node on MapReduce.In our work, we do not care about the exact  paths, but rather the  nearest neighbors of the source node, so the SSMTSP problem can also be considered as single-source many-target  nearest neighbors problem.The SSMTSP can be used in recommending friends or ads in social networks, and in searching malls or hotels in road networks.
We design algorithms based on the Dijkstra and breadthfirst search algorithms, propose an efficient pruning algorithm for breadth-first search, and analyze their performance of number of iterations and / cost.Our empirical evaluation on real-life graphs and Hadoop platform shows that the pruning algorithm is more efficient than the breadth-first search shortest path algorithm.
The rest of the paper is organized as follows.Section 2 gives the background for shortest path for graphs, MapReduce, Dijkstra algorithm, and breadth-first search.The SSMTSP problem and its corresponding algorithms are presented in Section 3. We analyze the performance of our algorithms in Section 4, show experimental results in Section 5, and review the related work in Section 6.Finally, conclusion and future work is given in Section 7.

Background
In this section, we provide some background for shortest path for graphs, MapReduce, Dijkstra algorithm, and breadth-first search.
2.1.Shortest Path for Graphs.We consider a Weighted Directed Graph  = (, ), where  is the set of nodes,  is the set of edges, and the number of nodes and edges is || and ||, respectively.We use  to represent the diameter of .For an edge (, V) ∈ , the weight is denoted with  ,V , and  is V's parent node and V is 's child node.A path  in a graph  is a sequence of nodes ⟨V 1 , V 2 , . . ., V  ⟩, where V  ∈  (1 ≤  ≤ ) and (V  , V +1 ) ∈  (1 ≤  < ).V 1 and V  are called start and target of  and linked by .The length of  between source and target is the sum of weights of edges in it; that is, len() = ∑ −1 =1  ,+1 , and the number of hops from source to target in  is the number of edges, that is, ℎ() =  − 1. Definition 1.A path  in graph  from start  to target  (,  ∈ ) is a shortest path, if there does not exist any path   between  and , such that len(  ) < len().
For the sake of simplifying the presentation, we assume for the rest of the paper that the weight on each edge of  equals 1; that is, len() = ℎ().This degrades  to an unweighted graph but does not affect the result of our algorithms.[2], proposed by Google, is a programming model for processing huge amounts of data in parallel using a large number of commodity machines, and its open-source implementation is Hadoop (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html).By automatically handling the lower level issues, such as job distribution, data storage, and fault tolerance, it allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

MapReduce. MapReduce
In the MapReduce-like systems, programs are done iteratively in three phases: Map, Shuffle, and Reduce.In the Map phase, they read a collection of values or key/value pairs from input sources in parallel and emit zero or more key/value pairs for each input element by invoking a user defined Mapper function.In the Shuffle phase, they group together all the Mapper-emitted key/value pairs sharing the same key and output all groups to the next phase.In the Reduce phase, they invoke a user defined Reducer function for each distinct group independently and in parallel and emit zero or more key/value pairs that will be written on disks or be the input Figure Example 2. Given the graph  in Figure 1, if we choose V 0 as the source, the distances between V 0 and other nodes will be ]}, which can be done in 6 expanding processes.

Breadth-First Search.
To find the shortest paths between the source  and other nodes in graph , we can conduct a breadth-first search (BFS for short) starting from  and obtain a BFS-tree   [4].In order to construct BFS-tree of , we first initialize   as a tree having only the root node  and then add children of  in  to   .That is, if (, ) ∈ , we add  to   as a child of .Then iteratively, for each leaf node  in current   , we search all its children , such that (, ) ∈  and  has not been added to   yet and add  to   as a child of .
Example 3. Also considering the graph  in Figure 1, the BFS-tree of  can be seen in Figure 2. If we choose V 0 as the source, the distances between V 0 and other nodes are also 2], and [V 5 , 3]}, but the BFS can finish in only 3 expanding processes.

Problem Statement
Definition 4. Given a (weighted) directed graph  = (, ) and a candidate node set  which is chosen from , the SSMTSP problem is that, for a source node  ∈ , find  nearest neighbors with shortest paths, such that those nearest neighbors are in .
In our work, we do not care about the  exact paths, but rather the  target nodes, or called nearest neighbors, so the SSMTSP problem can also be considered as single-source many-target  nearest neighbors problem.The candidate set  could be chosen from  according to nodes' importance on the graph, such as PageRank scores [5].

DijkstraKNN Algorithm on
MapReduce.On MapReduce, we keep a root set  for source .At the beginning,  contains only (, 0), and at each iteration, we add a nearest neighbor  and the shortest distance between  and  to .As long as  contains  nodes coming from , or there is no out-link from , the expanding process terminates.Then we can get the  nearest neighbors through another Map phase.Details of the DijkstraKNN algorithm can be seen in Algorithms 1 and 2.

BFSKNN and PruningBFSKNN Algorithms on MapReduce.
The DijkstraKNN algorithm expands one nearest neighbor in each iteration for the source .However, we can expand all its neighbors of the same hops at the same time, that is, BFSKNN algorithm.In BFSKNN algorithm, iteratively we expand all neighbors of the same hops from  and update the shortest distances between  and these nodes.While all distances do not change any more, we obtain the shortest distances between  and other nodes in a graph.Details of the BFSKNN algorithm can be seen in Algorithm 3.
Different from the Dijkstra algorithm, which expands the nearest neighbor in each iteration, the BFSKNN algorithm expands all nodes that have the same hops from  (see Algorithm 4 for details).The conditions that terminate them are also different.In the Dijkstra algorithm, if we have found  nearest neighbors or  has not any out-link, the algorithm terminates.However, in the BFSKNN algorithm, the termination condition is that all distances between  and other nodes in  do not change.

Theorem 5. The BFSKNN Algorithm terminates in a limited number of iterations, and the output is a valid solution to the SSMT𝑘SP problem.
Proof.Given the graph , construct a BFS-tree  rooted at source  in the following steps: (1) let  be the root of ; (2) find all out-links for each leaf node of  in  and add them to  as children of that leaf node; (3) repeat Step 2 until that the path from root to leaf node forms a loop in .

DijkstraMapper (𝑅, 𝐺):
(1) for all (, ) in  do (2) if  ∈ . ∧  ∉ . then (3) emit(, ); (4) end if (5) end for; DijkstraReducer (output of DijkstraMapper): (6) let (, V) = (0, +∞); (7) for all (, V) do (8) //find a nearest neighbour not in ; (9) if  ∉ . ∧ V < .V then (10)  = (, V); (11) end if (12) end for; (13)  So, we have that there is a simple path from  to V in  if and only if there is a path from root to leaf node V in .The above steps finish in a limited number  ( ≤ ||) of iterations, so the BFSKNN algorithm terminates in a limited number of iterations.After  iterations, we can find all the shortest paths between  and other nodes in , so since the ( + 1)th iteration, all the shortest paths between  and other nodes will not change any more, and the  nearest neighbors from  construct a valid solution to the SSMTSP problem.
However, in each expanding process of BFSKNN, all nodes expand along with their out-links, and this causes the problem of path expansion because all edges have to be accessed.In order to prune edges that are not necessarily accessed, we keep a list cList for each node, which records nodes from the candidate set for the current path.In cList, if a path contains  nodes from , then we prune paths derived from that path, because they do not contain any useful information about  nearest neighbors.Details can be seen in in Algorithms 5 and 6.
Example 6.Given the graph  in Figure 1, candidate set  = {V 0 , V 2 , V 5 , V 6 } and  = 2, we can prune the BFStree of  in Figure 2, and the results are in Figure 3.

Theorem 7. The PruningBFSKNN algorithm terminates in a limited number of iterations, and the output is a valid solution to the SSMT𝑘SP problem.
Proof.Constructing a BFS-tree  the same as Proof of Theorem 5, we have that there is a simple path from source  to target V in  if and only if there is a path from root  to leaf node V in .Prune paths have  candidate nodes from  after the th candidate node.There are two kinds of pruned nodes, normal nodes (not belonging to ) and candidate nodes.If a node is a candidate node, it cannot be one of the  nearest neighbors in the pruned paths, because we have found  nearest neighbors in those paths.Otherwise, if a node is a normal node, then whether it appears in pruned paths or not does not affect the termination of the algorithm and does not affect the correctness of the algorithm either.So we have that the PruningBFSKNN algorithm terminates in a limited number of time, and the output is a valid solution to the SSMTSP problem.

Performance Analysis
In this section, we analyze and compare the number of iterations and / cost of the DijkstraKNN, BFSKNN, and PruningBFSKNN algorithms.Realistic analysis of the efficiency of MapReduce algorithms is not straightforward, because the algorithms' efficiency in practice depends on many other factors, such as distribution of data, scheduling of jobs, and proximity of the communicating machines.However, these factors are all controlled by the system, and

The Number of Iterations
Proof.Construct a BFS-tree  the same as Proof of Theorem 5, so the depth of  equals the diameter  of , and the BFSKNN and PruningBFSKNN algorithms terminate in at most  expanding processes, so the BFSKNN and PruningBFSKNN algorithms finish in at most  MapReduce iterations.

𝐼/𝑂 Cost Analysis.
We now start by analyzing the / cost of the DijkstraKNN algorithm.The DijkstraKNN algorithm terminates in (×(||/||)) expanding processes.In each expanding process, the input is  and , so the total / cost is Next, we analyze the / cost of the BFSKNN algorithm.
The BFSKNN algorithm terminates in at most  expanding processes.In each expanding process, the input is  and , so the total / cost is At last, we analyze the / cost of the PruningBFSKNN algorithm.The PruningBFSKNN algorithm also terminates in at most  expanding processes.In the th (1 ≤  < ) expanding process, the input is the same as BFSKNN, but in the th ( ≤  ≤ ) expanding process, we prune unnecessary paths, so the total / cost that we save is

Experiments
In this section, we present the results of the experiments that we did to test the performance of our algorithms.Here ||, ||, and  represent the number of nodes, the number of edges and the diameter of a graph respectively.

Experimental Setup
Dataset.In order to demonstrate the robustness of our methods and to show their performance on realistic data, we present experiments with two real-world datasets, Epinions social network [6] and LiveJournal social network [7,8].Summary statistics about these datasets are presented in Table 1.
Experimental Platform.We implement the Dijkstra, BFSKNN, and PruningBFSKNN algorithms in Java on top of Hadoop platform.Our experiments are executed on a cluster of 20 nodes, where each node is a commodity machine with a 2.16 GHz Intel Core 2 Duo CPU and 1 GB of RAM, running CentOS v6.0.20 source nodes from each graph and compute the average execution time.
As you can see from Figure 5, the execution time of the DijkstraKNN algorithm changes greatly as we change the size of the candidate set, and bigger candidate set has smaller execution time.Moreover, the BFSKNN and PruningBFSKNN algorithms have more stable execution time.We also print all results for candidate sets size of 500 and 5000 in Figures 6(a) and 6(b), which show that the execution time grows linearly as the growth of iterations.This is because the inputs are  and  in each iteration, and the processing time is about 37 seconds.However, the execution time also grows nearly linearly as the growth of , which is against our intuition that it should grow exponentially as the number of nodes grows exponentially.The reason is that when we search in a graph, we usually find high PageRank score nodes instead of those fringe nodes.
We also compare the stability of the above algorithms; details are in Figure 7.The execution time of the Dijk-straKNN algorithm changes greatly as we choose different sources because some nodes having more influential neighbors need less iterations.The BFSKNN and PruningBFSKNN algorithms have more stable execution time because their execution time depends on the radius of a graph for the source, and the radius of a graph is more stable in practice.
For LiveJournal social network, we choose the top 5000 PageRank score nodes as the candidate set and compare the performance of the above algorithms.The result is in Figure 8.The DijkstraKNN algorithm is efficient for small , but if the candidate set is smaller and  is bigger, the PruningBFS algorithm is a better choice.

Related Work
In this section, we review some related work about MapReduce, the shortest paths and entity recommendation for graphs.
MapReduce.MapReduce algorithms have been designed or proposed in the literature for a wide range of applications, such as machine learning [10], text processing [11], and bioinformatics [12,13].MapReduce provides an excellent tool for large graph processing as well; a number of graph processing systems have been designed on it in the literature, such as Pegasus [9] and Giraph (http://incubator.apache.org/giraph/).In this paper, we study one of the most well-known graph computation problems, that is, computing the shortest paths or the nearest neighbors.
The Shortest Paths.Computing the shortest paths is a basic and important primitive that lies at the core of graph related problems.The shortest path problem can be classified as SSSTSP, SSMTSP, APSP, and SP and classical algorithms such as Dijkstra algorithm [3] and Floyd-Warshall algorithm [14] can only handle small graphs in serial computation model.As the graphs become larger and larger, in order to answer questions timely, researchers deal with this problem by approximation [15][16][17], preprocessing [18,19], or parallelization.Current parallel algorithms for the shortest path problem are mainly based on searching the BFS-tree of the graph [18,20,21], but they either study the problem of SP or focus on APSP.In this paper, we study the problem of SSMTSP, which is a composition of the above two, design BFS-based parallel algorithm on MapReduce, and propose corresponding pruning strategy.
Entity Recommendation.As complex networks, such as social networks and web-page graphs, become more and more popular, entity recommendation on graphs becomes a hotspot among researchers.Current entity recommendation methods are mainly content-based or link-based.Contentbased entity recommendation algorithms consider only the content of entities and recommend the most similar ones for some entity (refer to [22,23] for details).Rank-based entity recommendation algorithms consider the structure of the graphs and recommend the most influential ones for some entity.The influence of entities can be defined as PageRank [5], HITS [24], SimRank [25], or some methods that derive from them.In this paper, from the perspective of influence, we consider the nearest neighbors as the most influential entities and recommend the nearest neighbors from the selected candidate set for some entity.

Conclusion and Future Work
In this paper, we study the problem of single-source manytarget  shortest paths for graphs, which is finding  nearest neighbors from a candidate set for one source.To the best of our knowledge, this is the first study of recommending  nearest neighbors from selected candidate set.As MapReduce programming model becomes more and more popular in  data intensive computing, we evaluate algorithms on its opensource implementation-Hadoop.We design algorithms based on the Dijkstra and breadth-first search algorithms, propose an efficient pruning algorithm for breadth-first search, and evaluate their performance.As graphs become bigger and bigger, precise algorithms need much time to compute several nearest neighbors and some fringe nodes may even waste more computing sources.In the future, we plan to seek approximation algorithms that can handle this problem.

Figure 5 :
Figure 5: Execution time of Epinions.

Table 1 :
Summary characteristics of datasets.