A Location-Based Business Information Recommendation Algorithm

Recently, many researches on information (e.g., POI, ADs) recommendation based on location have been done in both research and industry. In this paper, we firstly construct a region-based location graph (RLG), in which region node respectively connects with user node and business information node, and then we propose a location-based recommendation algorithm based on RLG, which can combine with user short-ranged mobility formed by daily activity and long-distance mobility formed by social network ties and sequentially can recommend local business information and long-distance business information to users. Moreover, it can combine user-based collaborative filtering with item-based collaborative filtering, and it can alleviate cold start problem which traditional recommender systems often suffer from. Empirical studies from large-scale real-world data from Yelp demonstrate that our method outperforms other methods on the aspect of recommendation accuracy.


Introduction
Rapid technology development has brought an increasing number of mobile devices equipped with GPS capacities, such as laptops, PDAs, and mobile phones.It makes checkin behavior become a new life style of millions of users who share their locations, tips, and experience about points of interest (POI) with their friends in location-based social networks.Recently, how to provide timely and personalized information and sharing services based on users' location information is gradually attracting a lot of attention both from the industry and from the research community.It also forms a known and independent research area named the location-based services (LBS).In particular, personalized information recommendation is more important since it is beneficial for users to know new POIs or special promotions in marketplace and explore their city and for advertisers to launch advertisements to targeted users.
Recently, many researches on information (e.g., POI and ADs) recommendation based on location have been done in both research and industry [1][2][3].Collaborative filtering (CF) is the mainstream of algorithm to solve this task.Both memory-based and model-based collaborative filtering methods have been proposed and investigated to learn users' preferences on the LBS from users' location check-in data [1,4,5].However, previously proposed methods consider all check-ins in a whole and mobile users' basic laws governing human motion and dynamics are usually overlooked as well as rare researches on cold start problem which results from users' rarely rating on items in location-based recommender systems.As shown in [6], humans experience a combination of periodic movement that is geographically limited and seemingly random jumps that are correlated with their social networks.About 50% to 70% of all human movements are short-ranged and periodic both spatially and temporally and are not affected by the social network structure and about 10% to 30% of all human movements are long-distance and random and are usually influenced by social network ties.Hence, location-based recommendation should be sensitive to range of users' movement, and we will show how to 2 Mathematical Problems in Engineering alleviate cold start problem in location-based recommender system using user's basic movement laws in this paper.
Hence in this paper, unlike the previous works, our goal is to provide users with information recommendation within the scope of users' movement in a very sparse rating system.The task is much harder than traditional location-based recommendation or prediction, because it recommends some interesting business information in the scope of user's daily movement.However, this task is more significant since it can provide various personalized favorite local pieces of information combined with long ranged travel information which is close to their friends' home.Thus, if we could divide users' movement region into two parts, the local part and remote part, then we can recommend their favorite business information in each part to them, but most of all, we should determine each user's scope of movement by exploiting their check-in log.In location-based services or social networks, the places users check in every time are often some points of interest or parkland they are visiting and we cannot obtain users' successive location trajectory from their check-in data, and more importantly, this results in a very sparse dataset in location-based recommender system.
Based on these two properties of check-in dataset and studies in [6], we focus on explicitly modeling users' local movements and long-distance travel preferences for recommendation in their check-in dataset.There are two challenges: (1) how to determine users' local movement region and remote movement region and (2) how to find users' favorite business information in each of their movement regions.To address these challenges, we propose a regionbased location graph (RLG) and design new algorithms to make accurate top- recommendation on RLG.The uniqueness of the proposed model is the introduction of users' movement region nodes which could help users find out their local neighbors and remote friends for collaborative information recommendation, including users' local movement region and remote movement region; furthermore, it captures users' local visit interest through user-local movement region connections and captures long-distance travel interest through user-remote movement region connections.As the two users' local movement regions are intersecting, we call the two users local neighbors and as the local movement region of a user and the remote movement region of the other user are intersecting, we call the two users remote friends.
To summarize, our main contributions are as follows.
(1) We construct a region-based location graph (RLG), in which region node connects with user node and business node, respectively.
(2) While the two regions of users' local movement are intersecting, we call them local neighbors and while the local movement region of a user and the remote movement region of the other user are intersecting, we call the two users remote friends.Based on RLG framework, we propose a novel location-based business information recommendation algorithm.
(3) We compare our approach with other methods on a real dataset and show the performance of our approach on alleviating cold start users problem in location-based recommender systems.

Related Work
In recent years, the technologies of mobile communication and mobile location have achieved great development, especially the usage of social network sites, which brings a new chance for social application of geospatial information.The willingness of users to share their current locations and experience originally facilitates the creation of location-based recommender systems (LBRS) based on users generated content and makes it receive much attention from the academic community and industry.Currently, there are two lines of work to solve the task of location-based recommendation [5].One line of research is conducted based on the GPS trajectory logs [7][8][9][10][11].The GPS trajectory data usually consist of small number of users but dense records [12,13].Many collaborative filtering algorithms have been proposed and deemed location or POIs as item in traditional recommender systems, such as collective matrix factorization [8], tensor factorization [9], memory-based collaborative location model [10], and pattern recognition model [11].The other line of work focuses on location-based social networks data, which is very sparse and large-scale [1,4,14].Currently, geographical influences, for example, modeling the check-in probability to the distance of the whole checkin history by power-law distribution [1], modeling users' multicenter check-in behaviors via multicenter Gaussians [4], and mining user check-in behaviors [15], have been addressed and fused with traditional CF algorithms.
The crucial point of location-based recommender systems enables users to read or ask for recommendations in the vicinity of a specified location users are visiting or used to visit, so it is important that recommendations in locationbased recommender systems must have strong binding to users' movement region.However, there are rarely previous researches on this issue and cold start problem because of the fact that very sparse data in location-based recommender systems is not yet well studied.As a user can only visit a limited number of locations, especially when a user travels to a new city, the user's locations matrix is very sparse, leading to a big challenge to traditional collaborative filteringbased location recommender systems.To this end, Bao et al. [16] proposed a location recommender system, which consists of two main parts: offline modeling and online recommendation.The offline modeling part models each individual's personal preference with a weighted category hierarchy and infers the expertise of each user in a city with respect to different category of locations according to their location histories using an iterative learning model.The online recommendation part selects candidate local experts in a geospatial range that matches the user's preferences using a preferenceaware candidate selection algorithm and then infers a score of the candidate locations based on the opinions of the selected local experts.The significance of the recommender systems in location-based services and the promising solution motivate us to investigate further in this paper.

Data Model and Problem Definition
In this section, we briefly introduce the related data model and define users' favorite business information finding problem in their different movement regions.

Data Model.
Unlike GPS trajectory data, users' checkin data are not continuous in both special and temporal dimensions in location-based social network.The places users check in are often some points of interest or parkland they are visiting.For example, when a user has dinner in a restaurant, he may share some information about this restaurant and his experience with his friends at a social network or review websites.So such check-in data indicate that the scope of user's daily motion can cover all businesses reviewed by him; in other words, the business information reviewed by a user is limited in the scope of his daily motion.
Suppose that  = {  |  ∈ [1, ]} is a mobile user set, where  is the total number of mobile users.Each user has some essential attributes, such as gender, age, and occupation, which is denoted by the form of { , |  ∈ [1, ]}.For a user's reviewed and favorite businesses  = {  |  = 1, 2, . ..}, which can be formed as the following triple ⟨user, business, rating⟩, each business   has some basic attributes, including location (e.g., longitude and latitude, denoted by   = ( , ,  , )) and service categories (e.g., restaurant, hotel, bar, and shopping mall, denoted by Our data are in the form of ⟨User, Business, Location⟩ triple which can be modeled by tripartite graph [17] or a tensor [18].Although both tripartite graph and tensor treat location as a universal dimension shared by all users, as matter of fact, there is one-to-one correspondence between the business and location in users' check-in data and some users could never review lots of businesses which are out of their movement region in their daily lives.As argued in [6] most of the users' motions are composed of short-ranged daily travel between their homes and workplaces which is periodic both spatially and temporally and long-distance travel which is more influenced by social network ties.In a recommender system, the fixed correlation between business and location is typically not significant, while the movement region plays an important role in recommendation generating process, and the correlation between user and his movement regions is more relevant than that between user and location of business reviewed by them. Therefore, according to the geographical position distribution of all businesses reviewed by each user, we divide a user's movement region into local movement region and remote movement region.Provided that (  ,   ) represents the geographical center of a user's movement region,  max is the farthest distance between the center  : ( , ,  , ) and the position the user can reach in his daily life.If there exists a number  int ( max ≤  int < 0), the percentage of businesses reviewed by a user in a circle region around the pointer of (  ,   ) with  int as its radius can reach a fixed number , and we call this circle region the user's local movement region  loc and the other region his remote movement region  rem .Namely, where

RLG Construction
In this section, we treat the two movement regions of users as new nodes, which enable new linkages between users and the location of their reviewed business and construct a graph and name it region-based location graph (RLG), which contains three types of nodes: user node, movement region node, and business node.In this way, we can transform ⟨user, business, location⟩ into ⟨user, region, business⟩ by formation of users' two movement regions.RLG is a bipartite graph (, , , , ), where  denotes the set of all user nodes,  is the set of users' movement region nodes, and  is the set of business nodes. :  →  * is a nonnegative weight function for all edges.Figure 1  In RLG, each user node connects with two movement regions, and the two movement regions only connect with some business nodes which were reviewed by user.If two users coreviewed a business, then their two movement regions would be overlapping; for example, in Figure 1, user

Region
Business User  1 and user  2 coreviewed, respectively, business  4 in their local movement regions.This means that user  2 is the local neighbor of user  1 and the two regions are overlapping; that is to say, user  1 could reach  2 's local movement region and user  2 could reach  1 's local movement region, and they both would like some businesses in the region.Thus, based on the above empirical observations, we connect user node  1 with region node  21 and connect user node  2 with region node  11 in RLG (dotted lines), and if we start working from a user node ( 1 ), passing through a region node ( 11 ), we will find out his local neighbor ( 2 ) and we can reach an unknown business ( 5 ) in his local region; namely,  1 →  11 →  2 →  21 →  5 .In the same way, we can obtain another path  2 →  21 →  1 →  11 →  1 ( 2 ) which connects user node  2 with business node  1 or  2 .Hence, in this way, we can help user search for favorite business information in his movement region from local neighbors or remote friends.
The edge weights of RLG between user node and region node are defined as Given an edge (  ,  , ), if  = ,  = 1, its weight will be  * 1 and if  = ,  = 2, its weight will be  * 2 , and,  * 1 >  * 2 , and since each user's local movement region or remote region is a part of his whole movement region and the probability of the user being active in local movement region is much greater than that in the remote region, so we let  =  * 1 / * 2 as a learning parameter in the following experiment, and in this way, we use different weights to model the influence of local mobility preferences and long-distance mobility preferences.sim(  ,   )   * ∩  is the similarity between the two users who coreviewed lots of businesses.When one of the two users only reviewed one business, we use Jaccard coefficient (as shown in formula ( 6)) between the businesses reviewed by them as their similarity sim(  ,   )   * ∩  , while when the two users reviewed more than one business, we calculate sim(  ,   )   * ∩  by using the Cosine similarity (as shown in formula (7)).
The edge weights of RLG between region node and business node are defined as The definition means the higher the reviewing score is, the more the user likes the business in a movement region.We denote by Pr() the transition probability matrix of RLG: where Pr( → ) is an || × || matrix representing the transition probability between user nodes and region nodes, as defined in (2) and Pr( → ) is an || × || matrix representing the transition probability between region nodes and business nodes, as defined in (3), and they are symmetric matrixes.We will choose random walk with restarting process to simulate location-based business recommendation process.

Making Recommendation on RLG
So far, several graph-based methods have been introduced into recommendation system [19][20][21] to model the interaction between users and items on a graph and to compute node similarity from a global perspective, instead of local pairwise computation of neighborhood [19].They essentially transform the recommendation process into graph search problem in a graph.Random walk on graphs has shown that it has a rather good performance in graph-based recommendation systems.PageRank, one of typical random walk algorithms, has been widely used in search engines to rank items globally.Now we describe the recommendation process as a graph search problem in RLG and use the example shown in Figure 1.Suppose that the system needs to recommend lots of businesses to an active user in one of his movement regions.We firstly determine the association between the user and each of businesses which have not been reviewed by this user.The businesses are then sorted according to the associations and top  businesses are chosen for recommendation.
Algorithm 1 if the larger score is on a business reviewed by most of his local neighbors, this user probably will review this business in his local movement region.We adapt modification [22] of PageRank algorithm to calculate the association between an active user and recommended business   .For the ease of the algorithms description, let  denote the set of nodes that can form paths from   to business nodes; let   ∈  denote a node regardless of this is a user node, region node, or business node; let   denote the weight of the link between nodes   ∈  and   ∈ ; and let   denote the link weight between node   and   ∈ ; then, the matrix is formed from Pr().Furthermore, let   () denote the association degree between   and node   ∈  when considering paths of length  ( ≤ ).The algorithm for computing association between   and business nodes is shown in Algorithm 1.
Here,  ∈ [0, 1] is a parameter that downweighs longer paths.We fix  with 0.5 in our experiments.The most time consuming part of this algorithm is from line 3 to line 9 which requires ( 2 ) computations over all   .However, the matrix (  ) is very sparse with most elements that are equal to zero and are symmetric.This allows us to use sparse and triangular matrix representation for (  ), which can reduce the complexity to ( ⋅ ), where  is the maximum number of nonzero elements for each row of matrix (  ).

Experimental Evaluation
6.1.Data Set.We use dataset from Yelp in our following experiment, and it is publicly available [23].It is from a US city, Phoenix; each review has a location (being reviewed by users) that is associated with a unique pair of latitude and longitude coordinates.It contains 43873 users, 229907 reviews, and 11537 pieces of business information.About half of all users reviewed just only one business, and consequently the dataset is very sparse (99.9545% sparsity).The other pieces of information about the dataset are given in Table 1.
In the following section, we will discuss the location distribution of businesses reviewed by all users which can reveal all users' movement regions.We firstly randomly select ten users from the dataset, and the total number of their reviewed businesses, respectively, is 1, 1, 14, 22, 28, 49, 69, 91, 102, and 112, and the location distribution of businesses reviewed by ten users is as shown in Figure 2. The data from Figure 2 indicates that almost all businesses reviewed by each user are located in a certain region, and most of regions are overlapping, and obvious zoning appeared in the whole region.We can get an observation which is the same as the observation in [6].To further verify the above-mentioned conclusion, we do some statistical analyses on the percentage of businesses reviewed by each user in three circle regions, whose center is user's movement center  : ( , ,  , ) and radii are, respectively,  max /3,  max /2, and 2 max /3 (as shown in Figures 3, 4, and 5), and we, respectively, call them  max /3 region,  max /2 region, and 2 max /3 region.
We can see from the three figures that the larger the number of businesses reviewed by a user is, the higher the proportion of businesses in the central zone is.It also demonstrates that most mobility of all users is restricted in a local region by their daily activities.Furthermore, the percentage of all users who reviewed more than half of businesses in  max /2 region is, respectively, 48.95%, 86.03%, and 99.48% in the three figures.Therefore, we call  max /2 region the local movement region of each user and the other is the remote region.

Evaluation Metric and Compared
Methods.We use Hit ratio [24] as the metric for the top- recommendation.Our dataset is split into training part and testing part: for each user, the latest businesses he reviewed are selected as test data and other businesses are selected as training data.When a recommendation is made, we always generate a list of  ( = 10) businesses named (, ) for every user in his  whole movement region.If the test business appears in the recommendation list, we call it a hit, and then the Hit ratio can be calculated as follows: We compared the recommendation performance of our method with several existing methods: popularity-based  (Pop@1), item-based collaborative filtering (ItemKNN@1), user-based collaborative filtering (UserKNN@1), and their extended methods under condition of users' movement regions divided into two (Pop@2, ItemKNN@2, and UserKNN@2).

Popularity-Based
6.3.Experimental Results.this section, we illustrate the results of all methods and show performance of our method for all cold users who do not review any business We will firstly investigate the impact of parameter  and then compare the Hit ratios of the four methods recommending cold users.

6.3.1.
The Impact of Parameter .We focus on analyzing parameter  which governs the influence of users' local mobility preference formed by their daily activity and long-distance mobility preference formed by their social network ties in location-based businesses recommendation.When tuning , the results of how Hit ratios change against all algorithms are shown in Figure 6.Pop@1, ItemKNN@1, and UserKNN@1 do not have parameter ; thus, their Hit ratios are drawn as a straight line.The results show that Pop@2, ItemKNN@2, and UserKNN@2, respectively, outperform Pop@1, ItemKNN@1, and UserKNN@1, and our method always outperforms others whatever parameter  is.Moreover, when we set parameter  to 3/2, we can get the best results of Hit ratio for most of the methods, so we simply fix parameter  to 3/2 in the following experiments.

Make Recommendation for Cold Users.
One challenge for most of existing methods is that the recommendation accuracy suffers when the user-business matrix is very sparse.
From Table 2, we can see that over half users reviewed just one business in the dataset.Traditional user-based collaborative filtering cannot recommend any business to these users because of the fact that it is difficult to find out the nearest neighbors for these users in it.We regard the location of business reviewed by a user as the center of the user's local Hit ratio (%) Pop@1 Pop@2 ItemKNN@1 ItemKNN@2 UserKNN@1 UserKNN@2 Our method  region and the average  max /2 of other users as the radius of his local region.We use (6) to calculate sim(  ,   )   * ∩  , and thus our method possesses the advantages of userbased collaborative filtering and item-based collaborative filtering.Our method can alleviate the sparsity problem by exploiting movement region to find out users' local neighbors and remote friends.To verify this hypothesis, we use four methods that recommend some businesses in their local region to all cold users, and the results are shown in Table 3. Apparently, our method has better performance than other methods.

Conclusion
User mobility often exhibits long-and short-distance factors which, respectively, formed daily activity and social network ties.Tracking and leveraging these factors for location-based business information recommendation pose great challenges.
In this paper, we construct a region-based location graph (RLG), which can combine with user short-ranged mobility formed by daily activity and long-distance mobility formed by social network ties and sequentially can recommend local business information and long-distance business information to users.Moreover, it can combine user-based collaborative filtering with item-based collaborative filtering and can be successful in generating recommendation for cold start users, and, consequently, it can alleviate cold start problem which traditional recommender systems often suffer from.The experiments on real dataset confirm that the effectiveness of the proposed method is better than that of other methods.

Figure 1 :
Figure 1: A simple example of RLG.

Figure 2 :Figure 3 :
Figure 2: The location distribution of businesses reviewed by ten users.

1 D 3 Figure 4 :
Figure 4: The percentage of businesses reviewed by 4185 users in three circle regions ( > 10).

Figure 5 :
Figure 5: The percentage of businesses reviewed by 195 users in three circle regions ( > 100).

1 Figure 6 :
Figure 6: Hit ratios of all algorithms with different .
[6]...}, and (  , ) is the Euclidean distance between two points, and according to the conclusion in[6], we set  to 0.7.In addition, we call two users as local neighbors if their local movement regions are overlapping, denoted by  loc , whereas we call them remote friends, if the remote movement region of a user and local movement region of the other user are overlapping, denoted by  rem .
mation within users' entire motion range.With data model defined above, we formally define this problem as follows: there are some basic datasets of all users, including review log set  = {  |  = 1, 2, . ..}, local neighbor set  loc and remote friend set  rem .For a specific query user, one recommendation method should return a ranked list of businesses which the user would like, and what is more, the ranking score in the process should consider both user's different movement region and social relationship.
is a simple example of RLG containing two user nodes, 4 region nodes, and 6 business nodes.It shows that user  1 interacts with his two movement regions  11 and  12 ; likewise, the user node  2 interacts with his two movement region nodes  21 and  22 , Furthermore, region node  11 is also linked to business nodes  1 ,  2 , and  4 because user node  1 reviewed business nodes  1 ,  2 , and  4 in his local movement region node  11 , and region node  12 is connected with business nodes  3 because user node  1 reviewed businesses node  3 in his remote movement region node  12 .Similarly, region node  21 is connected with business nodes  4 and  5 and region node  22 is connected with business node  6 .

Table 1 :
The total number of users who reviewed  businesses.  and   .In this computation, we differentiate two types of paths-paths via local neighbor nodes and paths via remote friend nodes.The length of the two paths is 4. As we defined problem earlier, for a user   , (Pop) Method.Popularity-based method generates a ranking list based on the popularity of businesses in the training dataset.It is not a personalized method and consequently generates the same list of recommended businesses for every user.Item-Based Collaborative Filtering (ItemKNN) Method.Itembased collaborative filtering method finds the  businesses which are similar to some businesses reviewed by each user.∈()∩(  )  , ⋅    , √∑ ∈()∩(  )  2 , ⋅ ∑ ∈()∩(  )  2   ,