Mining the Relationship between Spatial Mobility Patterns and POIs

Passengers move between urban places for diverse interests and drive the metropolitan regions as the aggregation of urban places to group into network communities. This paper aims to examine the relationship between the spatial patterns (represented by the network communities) ofmobility flows and places of interest (POIs). Furtherly, it intends to identify the categories of POIs that play themost significant role in shaping the spatial patterns of mobility flows. To achieve these purposes, we partition the study area into disjoint regions and construct the network with each partitioned region as a node and connection between them as links weighted by themobility flows.The community detection algorithm is implemented on the network to discover spatial mobility patterns, and the multiclass classification based on the logistic regression method is adopted to classify spatial communities featured by POIs. Taking the taxi systems of Shanghai and Beijing as examples, we detect spatial communities based on the movement strengths among regions. Then we investigate their correlations with POIs. It finds that communities’ modularity correlates linearly with POIs; particularly governments, hotels, and the traffic facilities are of the most significance for generating the mobility patterns. This study can provide valuable insight into understanding the spatial mobility patterns from the perspective of POIs.


Introduction
People move in a city, generating the population mobility flows between places.Acquiring the volume of mobility flows in different places in a city is particularly important as it benefits a convergence of applications, such as location selecting for a retail store to allow increasing customers to shop around and advertisement casting for capturing as many consumers as possible [1,2].Technological advances allow for precise measurements of mobility flows on large datasets [3][4][5][6][7][8][9][10][11][12][13], including taxi trajectories [14][15][16], mobile phone trajectories [17,18], and transport smart cards [19].
By solving the privacy-preserving problem of mobility traces [20][21][22][23][24], retrospective studies of mobility flows which focus on modeling the mobility flows from a place to another, such as the universal model, called radiation model [25], are proposed and applied to predict human movements [26].Though the model is parameter-free that requires only population distribution as input, it disregards the spatial cluster features of mobility flows, which means that most people travel in a specific range of regions instead of the whole city and some of the citizens share similar regional scope.To analyze the spatial variability of urban mobility flows, we construct the spatial network with the metropolitan regions as nodes and the connections between them as links weighted by the aggregated strengths of interregion movements [1,17,27].The community in the spatial network is applied to further analysis of the spatial patterns of mobility flows as it offers a visual representation of spatial cluster features of mobility flows, where a spatial community is a set of nodes which have more connections among themselves than with the rest of nodes [1,28].The community in the spatial network is then named as the spatial community in this paper, representing the spatial patterns of urban mobility flows.The community detection allows one to identify the innercommunity links which plays a very important role in understanding the travel pattern and interaction among urban regions [29,30].For example, based on the mobility flows around the city area of Shanghai, Liu et al. [16] built the spatial network and adopted the community detection to model spatial patterns around the city area.
Combined with the techniques of the network, applications based on mobility flows are widely developed in the field of urban computing [31,32].For example, the centrality metrics of the network are used to estimate the importance of road segments [14], and the network connectivity is applied to reveal new latent links among urban regions [33].Studies mentioned above provide insights into using mobility flows in networks to reveal the mobility patterns or urban structures.However, these have not given the underlying mechanisms that motivate the urban mobility flows from the land-use aspects and socioeconomic views.
Actually, urban mobility flows are rooted in people's traveling activities (e.g., work or entertainment) [19,34], reflected by specific POIs.Retrospective studies of spatial communities improve our ability to analyze the mobility flows from the perspective of the network.However, they do not provide insight into the factors that motivate the population mobility dynamics.As each urban movement contains the origin and the destination that is determined by the travel motivation [35], the regions as the origin and destination of a trip are the cause of mobility flows.In [19,36,37] POIs are collected to explain the activity patterns and model the dynamic decisionmaking process that shapes individual's movements.Besides, POIs are combined with mobility flows to discover functional regions [2], where the segmented regions of the city area carry socioeconomic functions as people live in the regions and POIs fall in regions.In [38], POIs are applied to find the characteristics of resident trips based on the clustering method.It finds that the residents' travel pattern in the working day can be expressed as "spatial relative dispersionspatial aggregation-spatial relative dispersion."The effectiveness of these proposed models indicates that mobility flows are related to the POI distribution among urban regions.
However, there has not existed research concentrating on the relationship between spatial communities and POIs, which should be taken into consideration in the future urban developing planning of POIs for the prediction of community changes.In this study, we aim to study the relationship between spatial communities and POIs.And we intend to find the group of specific categories of POIs to explain the identified communities.Taking the large-scale and realworld datasets of Shanghai and Beijing in China as examples, we construct the networks with the urban regions and interregion movements, where the communities are detected.We collect POIs for each node to characterize people's mobility motives and study the relationship between spatial communities and POI features by adopting the stepwise logistic regression.
Researching the inherent relationship between spatial communities of mobility flows and POIs provides a new insight for understanding the underlining mechanism of urban movements.In accordance with the research aim of our work, the rest of this paper is organized as follows: Section 2 presents the methods used in this paper, including the relationship estimation model and the significant POIs identification method.Experiments are implemented in Section 3. We discuss the experimental findings in Section 4. Finally, we briefly conclude the paper in Section 5.

Relationship Estimation Model.
A mobility used in this paper is a 2-tuple ⟨(  ,   ), (  ,   )⟩.Both (  ,   ) and (  ,   ) are geospatial positions, respectively, representing the origin and destination of a trip.In detail, the OD pair represents a trip starting at (  ,   ) and ending at location (  ,   ).
As shown in Figure 1, to construct the spatial networks, in this research, the study area is segmented into disjoint grids, and each grid   is set as a node.Trips between two nodes indicate the existence of an edge or a linkage.After extracting mobility flows from the travel trace dataset, the volume of mobility originating from   and ending in   is set as the weight   from   to   .Thus a weighted and directed network is constructed.
As shown in the Figure 1, some nodes indicate much stronger connections among them than with other nodes.By dividing the network into densely connected subnetworks, the urban area is partitioned into intensely interactive subregions.In network science, community detection methods can partition an entire network into tightly connected subnetworks, called communities, and reveal the network's clustering characteristics.
A community, also called a cluster or a module, is typically regarded as a group of vertices which probably share common properties or play similar roles within the network, and the metric of modularity is always to estimate the community detection results [39][40][41].When applied to weighted and directed networks, the modularity  is defined as [42] Here   is the total weight of links starting and ending in module ,  in  and  out  are the total in-and out-weight of links in module , and  is the total weight of all links in the network.
To optimize , the vast majority of searching strategies take one of the following steps to evolve starting the network partitions: merging two communities, splitting a community into two, and moving nodes between distinct communities.We employ a high-quality modularity based community detection algorithm that adopts all the three strategies, called Combo [43] as the adopted community detection technique.
In the spatial networks, partitioned regions are set as nodes, and the number of OD pairs is set as the weight on the directed edge.To explain how the spatial communities are generated by the mobility flows in the network, the ultimate proof of the hidden reason is to match the mobility patterns to POIs distributed among regions.We match POIs of the studied area to nodes, in accordance with the geolocation using the process of map matching.We get the (1) (1) POI features of each node in the network, denoted as  = ( (1) ,  (2) , . . .,  () ), where  is the number of POI category (particularly equal to 17 in our case studies), and  () is the number of POI category  in node .After applying community detection algorithms to networks, the nodes are partitioned into disjoint sets (spatial communities).Nodes in the same community share the same value  of classification label .Then the community label  is set as the dependent variable.Suppose that the value set of  is {1, 2, . . ., }, then the multinomial logistic regression is defined as where  = ( (1) ,  (2) , . . .,  () ) and  are parameters of the model.Given the testing set  = {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )}, let   denote the samples labelled with , and  = (, ), then the MLE (maximum likelihood estimation) is applied to calculate the parameters: We adopt the stepwise strategy to select POI categories for the logistic regression, and the fitness metric of the -square guarantees that none redundant dependents are selected.It means that we choose categories of POIs that make sense for distinguishing the spatial communities.

Significant POIs Identification.
For the problems of multiclassification based on logistic regression, one class is always set as the reference class as shown by (2).To identify the categories of POIs that affect the spatial communities, in this paper, each community is set as the reference class in turn, and then the statistic frequency of the significant POIs is set as one element   of the feature vector of a community   .As shown in ( 4) and ( 5), each element of the feature vector for a community   represents the frequency of the th significant category of POIs in community   .Then the feature value of a community is calculated by its norm multiplied by the entropy.This guarantees that we selected communities of more significant POI categories and more diverse of the significance frequency.
= feq (sig ( POI )) . ( Then we identify significant POIs by (5), where top  percentage of communities of the largest   is selected as the candidate set.Then we identify the most significant categories as one element in the ultimate significant POI set.
For example, suppose that we got four communities, and the significance frequency of each POI for a community is shown in Table 1.
|  | selects communities that contain larger significance frequency of the total significant POI categories.And the entropy tends to select spatial community candidates that consist of more frequency variation.Specific to  2 and  3 ,  2 = 3, and  3 = 1, though | 3 | > | 2 |, the entropy of  3 is smaller than that of  2 , meaning smaller difference between significance frequency of POIs.When  is set as 50, we select two communities to identify significant POIs that feature the spatial communities. 1 and  2 are selected, and then POIs of traffic facility and enterprise are identified as the ultimate significant ones that make sense for forming this community snapshot.   of OD = ⟨(  ,   ), (  ,   )⟩ own the same occupation state  = 1.The extracted points for Shanghai and Beijing are, respectively, shown in Figure 3.

Wireless Communications and Mobile Computing
To construct the spatial networks, we first divide the spatial area into grids of 1 km by 1 km using the open street map (OSM) (http://www.openstreetmap.org/copyright),then each grid is set as a node in the network.We extract mobility flows between any nodes by matching origin points or destination points to grids using the OSM.The mobility flow volume originating from grid  to grid  is set as the weight on the directed edge.Disregarding grids visited by none OD pairs, it remains 2926 nodes for the spatial network of Shanghai and 3995 nodes for Beijing.Then datasets of mobility flows are as shown in Table 2.
We use the Baidu APIs (Liu et al., 2015) to collect the POIs in two cities.Seventeen categories of POIs are collected as shown in Figure 4.As the category of POIs will be set as the independent variables in the relation estimation model in this study, we label them as  () , as shown in Table 3.Each category of POIs is set as a specific dimension of the independent variable.And the number of each POI category C u l t u r e  (9)  3,971 3,723 10 Scenic spot  (10)  56,996 48,463 11 Auto service  (11)  50,898 55, 479 12 Living service  (12)  158,121 149,576 13 Food  (13)  86,301 82,021 14 Shopping  (14)  208  of the two cities is as shown in Table 3 and Figure 4; totally we collect 1,446,865 POIs for the network of Shanghai and 1,405,954 POIs for the network of Beijing.

Relationship between Spatial Communities and POIs.
The spatial communities are affected by the travel distance.
Thus we add the distance threshold (DT) into the spatial community detecting process.As shown in Figure 5, for the networks of Shanghai, the edge number and the mobility flow reaches 90% as the distance threshold gradually increase to 20 km and 14 km; similar to networks of Beijing, the critical distance is 25 km and 9 km.
The modularity of community detection results for two cities is, respectively, shown in Figure 6, together with the stepwise logistic regression results, -square.It can be found that the modularity is changing with the distance threshold (DT) increasing.
Larger DT means that more edges and more mobility flows are added to the networks so that it gets smaller modularity for the networks of Shanghai and Beijing.It also shows that the modularity tends to be convergent with the distance threshold tending to be larger.And the modularity variation trend presents quite similar for both networks.
As shown in Figures 7(a) and 7(b), even nodes in the suburban area are connected to the spatially close communities.And the distance increased by 1 km takes little variation (16 km to 17 km) to the spatial community detection results.
Note that the mobility flow density of Beijing network is 466, while it is just 130 for the network of Shanghai.The modularity got for the spatial networks of Beijing is generally larger than that of Shanghai as in Figure 6.
The results of regression fitness (-square) for both networks also tend to be convergent.The median value of adjusted 2 obtained is, respectively, 0.3 for the Shanghai networks and 0.48 for the Beijing networks.This verifies that the spatial communities are tensely correlated to the POI features.Then the quantitative correlation between the modularity and the -square is presented as shown in Figure 8.It shows that the adjusted 2 presents to be positively and linearly correlated to the regression of the modularity.This indicates that the spatial communities are correlated to POIs, and the spatial communities can be explained by POIs.

Identified POIs.
The community detection results got without distance threshold limitation for Shanghai and Beijing are depicted in Figure 9. Table 4: Identified significant POIs.

POI
Beijing using [38] Beijing using our model Shanghai using our model As shown in Figure 9, we get seven spatial communities for the spatial network of Shanghai and thirteen spatial communities for Beijing.It can be observed that both cities are polycentric.
Then each community is set as the reference class in turn to conduct the stepwise regression method, and we use the -square as the metric for estimating the regression results.The significance of the variables is adopted to identify the POI categories that are tensely correlated to the spatial communities.Note that some POIs are identified as shared categories for both cities.When  is set as 50, which means that we select half number of communities of the largest value of , significant frequency of each POI category in a community is as shown in Figures 10 and 11.And the identified POIs for both cities are as shown in Table 4.
To verify the effectiveness of the proposed method, the identified POIs using the method proposed in [38] are listed in Table 4.It can be found that the reference method also identified the POI categories of living service, government, and education in Beijing, which certifies the effectiveness of the proposed significant POI identification method.Compared with findings in [38], which partitions time of day into three time intervals (morning, evening, and night), the proposed identification method finds that the traffic facilities play an important role on shaping the community pattern in urban transport networks for both Beijing and Shanghai.These findings fit in with the actual situation in life, as traffic facilities satisfy the daily commuting needs.This further verifies the effectiveness of the proposed model.
The significant POIs for generating spatial communities in the network of Shanghai contain shopping, enterprise, traffic facility, government, finance, and hotel, while it contains the living service, traffic facility, food, government, and hotel for the network of Beijing.It is found that the POIs of traffic facility, government, and hotel are identified as the common significant POIs to distinguish the communities in both networks.

Discussion
Understanding the spatial patterns and finding the driving factors of the urban mobility flows help planners to evaluate the urban construction plan.To study the drivers of communities of mobility flows, we propose to estimate the relationship between spatial communities and POIs.
Using the taxi systems of Shanghai and Beijing as case studies, the experimental results show that the communities in spatial networks generated by mobility flow linearly correlate to the POIs.To further recognize the specific factors that drive the spatial communities of mobility flows, the stepwise logistic regression is used, and it is found that the POIs of governments, hotels, and the traffic facilities are common features that play an important role on distinguishing communities for both cities.
From the socioeconomical perspective, the locations of governments in a city attract various types of facilities and improve the economic development of the surrounded area, which is reflected by the spatial communities of mobility flows.Similarly, hotels are always located in the area of numerous facilities.A small number of hotels can be a better representative for the regional features that attract mobility flows [44].Traffic facilities play a role in forming community pattern of mobility flows [45][46][47][48], which may be because that these facilities satisfy the essential need for daily traveling and life.Note that mobility flows used in this paper are merely extracted from taxicabs.Thus, another reason for the significance of these categories of POIs may be that citizens are more likely to choose taxicabs due to the arbitrary option of travel departure time.Possibly, taxicabs are also popularly preferred as the transfer tool for the public transport system, such as train station, subway station, or bus station.After all, most commuters more likely choose buses or the subway, and travelers less take taxis for a long trip, especially in the two metropolises in China.
The computational complexity of the relationship and POI identification model is mainly reflected in the community detection process, which justified an upper bound to the execution time of ( 2 log()), where  is the number of nodes and  the number of communities in the network.
This study has some limitations.Mobility flows are only extracted from the taxi trajectories, and other spatial    community patterns may be found with various data sources.However, the same analysis of methods could be used.In this study, we focus on the spatial communities generated by the taxi systems; future studies could consider the similarity and differences of the spatial communities in other public transport systems.Another limitation is that we just adopt the number of each category of POIs as the influencing factors, disregarding the scale of each POI, which should be considered in future works.

Conclusion
This paper proposes a model for estimating the relationship between the spatial communities of mobility flows and the urban POIs, thus to identify the categories of POIs that drive mobility flows to network communities.
Taking the mobility flows in Beijing and Shanghai as case studies, we find that the spatial communities can be explained by the POIs.Specifically, it is found that POIs of traffic facilities, government, and hotel are of great significance for dominating the spatial communities in both cities.It implies that experts could monitor the spatial distribution of urban mobility flows by observing the distribution of POIs, and urban planners could influence the spatial communities of mobility flows by changing the locations of these categories of POIs or adding new POIs of these categories.
In the future, we will further study the formation mechanism of the spatial communities of mobility flows.Meanwhile, we are going to employ other mobility data sources, Wireless Communications and Mobile Computing 9 such as cell-tower traces, and check-ins in location-based services.

Figure 1 :
Figure 1: Illustration of the network and communities (this is the prototype proposed by Liu et al., 2015).To construct a network based on mobility flows, the study area is divided into small regions (a) with each small regions corresponding to a node in the network.A directed edge or linkage existed between two nodes if there are mobility flows from one node to the other.The weight of an edge equals the volume of mobility flows represented in (b, c).Graphic (d) provides an illustration of the communities detected from a network, which is divided into four parts (depicted by four circles) in which the subnetworks had relatively dense connections.The community detection result corresponds to closely connected subregions (e).

Figure 2 :Figure 3 :
Figure 2: Mobility extraction from taxi trajectories.The occupation state changing from unoccupied to occupied or from occupied to unoccupied is adopted to extract the origination and destination of an urban movement.

Figure 4 :
Figure 4: Categories of POIs for Shanghai and Beijing.

Figure 10 :
Figure 10: The significance frequency of each POI category of 13 communities for the network of Beijing is shown.

Figure 11 :
Figure 11: The significance frequency of each category in the seven spatial communities for the network of Shanghai is depicted.

Table 1 :
Illustration of the significant POIs identification.
(, ) is a pair of spatial coordinates representing latitude and longitude. = 1 means that the taxi is occupied by passengers; otherwise  = 0.The flag  bound to each trajectory position is essential for judging the taxi occupation state, which is used to extract the origin and destination points (OD) of a trip.All other GPS points between a pair

Table 2 :
Studied area and OD number for the network of Shanghai and Beijing.

Table 3 :
Seventeen categories of POIs.