Understanding City-Wide Ride-Sourcing Travel Flow: A Geographically Weighted Regression Approach

-e emerging ride-sourcing service has become an important element of urbanmobility. A challenging question underlying the provision of such service is how and to what extent the built environment affects origin-destination (OD) travel flows. -is paper employs the geographically weighted regression (GWR) model to analyze the OD-based ride-sourcing travel flow. It makes a comparison with the existing ordinary least square (OLS) model and spatial autocorrelation model (SAM). We have collected ride-sourcing order data in Hangzhou, China, to provide an accurate source for acquiring ride-sourcing travel flow. We investigate the effects of the residential area, points of interest (POIs), and transit stations on ride-sourcing travel flow among traffic analysis zones (TAZs). -e results show the following: (a) GWR has better goodness-of-fit than SAM and OLS. (b) Residential area, enterprise, and bus stations have positive correlations with ride-sourcing OD flows, but education and subway stations have negative correlations. We have further investigated the issue and found that it is not a causal relationship between the bus station and OD flow, due to collinearity between the two variables.-e bus station builds on locations with high demand, but its capacity is not large enough to reduce the ride-sourcing flow to a low level, which results in a positive coefficient. (c) Based on the estimated coefficients, the prediction of ride-sourcing flows is feasible, supporting the impact analysis for urban land use and transportation planning. -is paper contributes to understanding OD-based ride-sourcing travel flow distributions and provides a framework of long-term OD flow prediction for urban land use and transportation planning.


Introduction
e government agencies and urban planning department attach great importance to the prediction and understanding of large-scale origin-destination (OD) flow. e OD flow pattern reflects the distribution of travel demand and reveals the human mobility pattern. It helps plan traveling routes [1,2], discover commuting regularity [3], and analyze land use properties [4]. An accurate estimation of OD matrices benefits decision-makers to better coordinate urban resources and mobility demand. e advanced technologies of intelligent transportation systems (ITS) have offered extensive data collection methods to rectify the trip pattern identification problem. ese technologies and data, including the automatic vehicle identification (AVI) technologies [5], radio frequency identification (RFID) and the license plate-based AVI [6], taxi GPS traces data [3][4][5], mobile phone data [7,8], and smart card fare data [9], have been studied extensively in the past. Apart from transit and taxi, ride-sourcing services offered by platforms like Uber and Lyft have made tremendous changes to the transportation systems and have become an essential component of city transportation mode [10]. For instance, by 2020, Uber offered ride-sourcing services for more than 900 cities in 70 countries [11]. e number of rides served by Lyft has already reached one billion in the region of the US, Toronto, and Canada by September 2018 [12].

Objective.
ere have been numerous studies in the literature to investigate the influences of the built environment on travel demand. However, the analysis of ODbased ride-sourcing flows is inadequate. is paper aims to fill the research gap of ride-sourcing OD flow analyses in the existing research by applying the geographically weighted regression (GWR) model and two baseline models. We analyze and explain the relationship between OD flow and independent variables (built environment) on the origin and destination levels and then illustrate the marginal effect of the residential area and subway on ride-sourcing OD flow based on urban land use and transportation planning policies.

Methodology.
We adjust the GWR model to analyze the OD-based ride-sourcing travel flow and take the ordinary least square (OLS) and spatial autocorrelation model (SAM) as comparisons. e implications of model coefficients are analyzed. We collect the emerging ride-sourcing order data that can accurately reflect the OD flow distribution of ondemand ride services. We have also collected the built environment data for 70 traffic analysis zones (TAZs) in Hangzhou City, China, including the residential area, ten types of POIs (i.e., beauty, restaurant, education, enterprise, medicine, hotel, house, entertainment service, tourist spots, and shopping), and transit stations (i.e., bus and subway). We further analyze the policy influence of subway stations development in 2020 and the increase of the residential areas in 2050 on TAZs. e plausible reasons for future OD flow changes are discussed and interpreted. e results could provide insights into the importance of existing factors and a better understanding of ride-souring flow distribution.

Results.
For the total OD flow, we find that the residential area, enterprise, and bus station variables are positively related to OD flow at both the origin and destination levels. However, education and the subway are always negatively related. When analyzing the result of different periods, we find that the enterprise attracts people in the morning rush, while in the evening rush hour, people travel from zones with dense enterprise buildings. ere are some differences between weekdays and weekends. Entertainment service and tourist spots play a more important role on the weekend. We find that the residential area's marginal effect is positive in most TAZs at the inflow, outflow, and intraflow levels. In contrast, the subway station has negative effects on most TAZs.

Contribution.
e paper's major contributions are threefold: (a) is study is among the first attempts to utilize ride-sourcing order data to explore the influence of the built environment on OD flows. (b) We explore the influence of the built environment on ride-sourcing OD flow by applying three models (OLS, SAM, and modified GWR). e results show that GWR has better goodness-of-fit than SAM and OLS. (c) e future influences of built environment changes due to local government policies on ride-sourcing OD flows are estimated based on marginal analysis. e rest of the paper is organized as follows: Section 2 reviews the related work on OD flow estimation in terms of models used to analyze travel flows over time. Section 3 introduces the three models (i.e., GWR, OLS, and SAM). Section 4 introduces the data collected for this study. Section 5 presents the results of the three models. Finally, Section 6 concludes the findings and provides the future direction of research.

Literature Review
ere has been a vast body of literature exploring the relationship between the built environment and traffic flow. However, when it comes to ride-sourcing data, only a handful of papers existed. Sabouri et al. [13] analyzed how Uber demand in 24 diverse US regions was affected by 5D variables with the multilevel modeling (MLM) method. ey found that demand was negatively related to intersection density and destination accessibility variables. Yu and Peng [14] explored the relationship between ride-sourcing demand (from RideAustin company) and built environment and socioeconomic factors. e factors they chose covered income, age, education, population, and transit accessibility. By applying the geographically weighted regression model, Bao et al. [15] established the relationships between ridesourcing usage and various environmental factors such as commercial, residential, and parking areas. Gerte et al. [16] used a linear panel model to estimate the relationship between ridesharing adoption and time, built environment, and demographic variable.
However, the studies above aggregated the ride-sourcing flow into origin-based demand flow rather than OD flow among different regions. e latter reflects travel mobility in a more detailed way. By analyzing the relationship between OD flow and built environment, we may find that the built environment factors affect inflow, outflow, and intraflow, which is a research gap worth filling.
Many studies have used spatial analysis approaches to understand the effect of the built environment on various transportation usages. e spatial autoregressive model (SAM) is one of them. LeSage and Pace [17] proposed the standard spatial autoregressive model (SAM), which considered the interaction among regions. Many researchers applied the model to predict traffic flow [18][19][20][21]. e GWR model is different from SAM by allowing the coefficients of explanatory variables to vary over space [22]. Plenty of researchers applied the GWR model on transit data. Cardozo et al. [23] explored the station-level transit ridership. e prediction of ridership at the Madrid subway stations showed that GWR outperformed traditional ordinary least squares multiple regression. Chiou et al. [24] identified the major factors for the public transit usage rate in Taiwan. e results showed that the GWR model had better accommodation of spatial autocorrelation and better prediction accuracy than the Tobit regression model. Ma et al. [25] explored the relationship between the built environment and transit ridership in Beijing using one-month transit card data and POI data. Other studies utilized data such as taxi trajectory, walking demand, and daily activities. Li et al. [26] used taxi trajectory data within one week and POI information to estimate transportation factors such as pick-ups and drop-offs. Qian and Ukkusuri [27] modeled the spatial taxi ridership distribution through the GWR model with various sociodemographic and built-environment variables. ey further investigated how the rise of TNCs infected traffic states and emissions [28]. Yang et al. [29] studied walking travel demand at intersections using walking counts over ten years in Chittenden County, Vermont, USA. Lucas et al. [30] studied the influence of travel disadvantage on travel amount with personal surveys. ey found that the level of bus services, street connectivity, and neighborhood safety were all significant factors to the undertaken daily trips. Shen et al. [31] collected the car license plate recognition data and analyzed the spatial-temporal automobile travel demand with a geographically and temporally weighted regression model.
Our study estimates how different factors influence the ride-sourcing OD flow, which benefits urban land use and transportation planning. e model can predict the changes in flow once the future built environment change is determined. To the best of our knowledge, this is the first study that employs GWR to ride-sourcing OD flow analyses.

Models
OLS is a traditional linear regression model that maps the independent variables linearly to dependent variables. Compared with the traditional OLS model, the SAM introduces origin, destination, and OD dependence to capture the spatial autocorrelation. e GWR model is different from the above models by allowing the coefficients of explanatory variables to vary over space. e following subsections will introduce these models. e traditional OLS has often been applied to estimate flows in population migration, transportation, and trade. However, the model assumes that observations are independent of each other, which could potentially cause inaccuracy in treating spatial problems. In this paper, we formulate the traditional OLS model as follows: where Y represents OD flows from origin to destination. It is an n 2 × 1 vector in which n represents the number of TAZs in our study region. us, there are n 2 OD pairs in total. We stack the vector y first by origins and then by destinations. X o and X d represent independent variables in the origin and destination side. ey are n 2 × k matrices in which k is the number of independent variables. β o and β d are the associated coefficients. L represents the travel distance between each OD pair. It is an n 2 × 1 vector and c is the associated scalar coefficient. ατ N is the constant term, in which τ N is a vector of ones with the size of n 2 × 1, α is the associated scalar coefficient, and ε is the random disturbance. e spatial autocorrelation model extends the traditional OLS by using three spatial weight matrices for origin, destination, and origin-to-destination dependence. It can be formulated by adding three correlated terms [17]: where W o , W d , and W w represent the origin, destination, and OD dependence. ey are n 2 × n 2 spatial weight matrices whose elements are relevant to the distance between regions. W � [w ij ] n×n , in which w ij is the distance between origin i and destination j. I n is an identity matrix with the size of n × n, and and ρ o , ρ d , and ρ w are the associated scalar coefficients representing the effect strength of W o , W d , and W w , respectively. e GWR model is different from the above models by allowing the coefficients of explanatory variables to vary over space. We change the model to estimate the OD flow. e model can be formulated as follows: where (u i , v i ) represents the location of the centroid of the ith TAZ. y ij is the OD flow from origin i to destination j. x ki and x kj represent the kth independent variables on origin i and destination j. β kij and β (k+K)ij are the associated coefficients using the same bandwidth. β 0ij is the constant term. ε ij is the random disturbance.
Algebraically, the GWR estimates can be expressed as follows: where represent independent variables on the origin and destination side. X o and X d are w ij denotes the allocated weight for neighboring TAZ i and TAZ j. It is determined by the adaptive Gaussian kernel function where d ij refers to Euclidean distance between TAZ i and TAZ j. b i is an adaptive bandwidth. We use the Akaike information criterion (AIC) to choose the best specification of W(u i , v i ).

Ride-Sourcing Passenger Flow Pattern.
e city-wide ride-sourcing order data are collected during March 6-12, 2017, from Didi company in Hangzhou, China, as the ridesourcing passenger flow input to our models. Figure 1 illustrates the hourly orders during the period. ere are two peaks on weekdays in Figure 1(a) (7:00-9:00 and 17:00-19: 00), while there is only one peak from 16:00 to 18:00 on weekends. As shown in Figures 1(b) and 1(c), we analyze the relationship between travel time and the number of orders. ere is no significant difference between the two distributions. e spatial distributions of the trip origin in the morning and evening peak hours (7:00-9:00 and 17: 00-19:00) are shown in Figures 1(d) and 1(e). Most trips originate from the city center. e distribution of ridesourcing trips reflects the urban mobility pattern to some extent.

Data Statistics in TAZs.
e statistics (i.e., minimum, mean maximum, and standard deviation) of the data on the level of TAZs are shown in Table 1. e TAZ and residential area information is offered by the Hangzhou Planning Bureau, which is the government agency in charge of urban planning. We have collected the information on POIs from AMAP, one of the largest map service companies in China.
ere are ten types of POIs in the dataset: beauty   (barbershop and beauty salon), restaurant, education, enterprise, medicine, hotel, house, entertainment service (government bodies, gym, and places for recreation and entertainment), tourist spots, and shopping (grocery store, supermarket, furniture, and computer market). e subway stations and bus stations are also collected. ere were only three subway lines in 2019, and the largest number of subway stations in one TAZ was only six. As shown in Figure 2, there are 70 TAZs in Hangzhou. e study region covers from 119.89 to 120.57 degrees longitude and from 30.07 to 30.5 degrees latitude, which contains the urban center.

Explanatory Variables.
We choose the residential area, ten types of POIs, transit stations, and travel distance as the explanatory variables, all of which are defined at the TAZ level. Since we divide the study area into 70 TAZs, there are 70 observations for each variable.
ere are 4,900 (70 by 70) observations of ride-sourcing passenger flow and travel distance variables defined for each OD pair.

Residential Area.
As the origin of travel demand, the residential area is a critical variable for the ride-sourcing passenger flow analysis. We have collected the land use data, including the base year of 2019 and the future year of 2050. It contains the size of residential land use as well as the commerce and residence land in each TAZ.
As shown in Figure 3(a), more than 80 thousand residents live in TAZs 1, 17, 27, 33, 38, 47, and 52. Since the residential area in these 70 TAZs varies significantly, we normalize the area size between 0 and 1. e distribution of the residential area in 2050 is consistent with that in 2019, as shown in Figure 3(b).

POIs.
As the attraction of passengers, POIs have significant impacts on the ride-sourcing flow. We have collected the POIs data from AMAP, one of China's biggest map service companies. Among the ten types of POIs, medicine facilities include pharmacy, clinic, and hospitals. Beauty facilities contain shops as barbershop and beauty salon where people improve their looks. Entertainment service facilities contain government bodies, gyms, and places for recreation and entertainment. Shopping facilities contain the grocery store, supermarket, furniture, and computer market. e education includes university, middle school, primary school, and kindergarten. Enterprise mainly covers office buildings. We illustrate the zonal distribution of each type of POIs in Figure 4.

Transit Stations.
e transit station is another important factor influencing ride-sourcing passenger flows.
is paper collects the public transit data, including the numbers of bus stations and subway stations of each TAZ in 2019. Generally, the ride-sourcing flow is expected to rise with the number of bus stations and subway stations. As shown in Figure 5(a), the city center owns the highest subway station density. ere are 76 subway stations and three subway lines in Hangzhou in 2019. As shown in Figure 5

Multicollinearity Problem.
e existence of multicollinearity will lead to bias in the experiment results. To solve this problem, we calculated the Pearson correlations among all the variables. e results show that some variables are strongly related to others. e beauty variable is strongly related to the restaurant, medicine, house, and entertainment variables with correlation coefficients of 0.910, 0.908, 0.910, and 0.910. e restaurant variable is closely related to education, medicine, and house variables with correlation coefficients of 0.917, 0.905, and 0.911. Medicine is closely related to the house, entertainment, and shopping variable with correlation coefficients of 0.901, 0.952, and 0.918. e house variable is related to entertainment and shopping with correlation coefficients of 0.877 and 0.869. e shopping variable is related to entertainment, with a correlation coefficient of 0.885. e remaining coefficients are below 0.7.

Spatial Autocorrelation Test.
Spatial autocorrelation of an explanatory variable means its value in one zone is dependent on its value at neighborhood zones. e existence of   Journal of Advanced Transportation 7 spatial autocorrelation will cause the basis of the GWR model. us, before conducting the GWR model, the spatial autocorrelation should be tested. We adopted Moran's I for testing our spatial autocorrelation, as it is the most commonly used index in literature. Moran's I of all explanatory variables are summarized in Table 3. e Z test value larger than 1.64 or smaller than − 1.64 means the variable is statistically significant and has a strong spatial autocorrelation. As can be seen, most variables are significant except for shopping, house, and subway variables. e subway variable is not significant, since it is very sparse. Only 76 subway stations were built in Hangzhou in 2019. However, it is a rather important variable for transportation planning. us, we keep the subway station and remove shopping and house variables.

Model Estimation Results.
We use the overall OD data during March 6-12, 2017, and normalize the dependent variable and independent variables before processing the   Table 4 shows the coefficient of SAM and OLS. e − 2 log-likelihood, AIC (Akaike information criterion), BIC (Bayesian information criterion), and AICc (second-order AIC) indicate that SAM is more accurate than OLS. It indicates that taking origin, destination, and OD dependence into consideration is essential. Most coefficients in SAM are statistically significant, which means that most variables have considerable influences on ride-sourcing flows. e result of the GWR is summarized in Table 5. e criterion for optimal bandwidth is AIC, and the chosen bandwidth is 342. Due to the size of the coefficients, we cannot present them in one table. Hence, the average, minimum, maximum, and standard derivation of the coefficients are presented. e − 2 log-likelihood, AIC, BIC, and AICc show that the GWR model fits the data better than SAM. Hence, we will choose GWR to predict OD flow in Section 5.2.

Influences of Residential Area and POIs.
e coefficients provide some insights into how these variables influence ride-sourcing flows. As shown in Table 4, the residential area at both origin and destination has a positive coefficient in OLS and SAM. is result is consistent with the GWR model in Table 5, which is reasonable, since the residential area is the source and the strong attraction of traffic flow.
However, in the GWR model, education plays a negative role, since most Chinese parents would prefer kindergarten, primary school, and middle school close to their home locations, and their children do not need to take ridesourcing service. A walk or bike ride is enough to cover the distance. For high school or college/university in China, most students live on campus, indicating that this group of people does not need to take the ride-sourcing service often either. at is why the education variable does not have a positive coefficient, although it is supposed to be a strong attraction of flow. Enterprise has a positive effect on the ridesourcing OD flow. Since office buildings are places with a large population density (hundreds of working people gather in one office building) and high access frequency (people access their workplace almost every weekday), their influences on flow are positive at both origin and destination levels.
e hotel is dense around the transportation hub, like high-speed railway stations and bus terminals. Many people would choose ride-sourcing services to travel from hotel to transportation hub and vice versa, resulting in a positive coefficient. As for entertainment services, people go to these places for entertainment. us, they tend to choose the most relaxed transportation mode, like taxies or ride-sourcing services. e coefficient is mainly positive. Tourist spots at destination have negative correlations with ride-sourcing flows, and the spots at origin have few effects. (Table 4), bus stations at origin and destination have a strong positive effect on OD flow. e results are consistent in the GWR model in Table 5. e result is surprising, since the bus, as another mode of transportation, is supposed to reduce the pressure on the roadway. Figures 1 and 5(b) show that the distribution of bus stations is consistent with trip origin/ destination distribution. us, we infer that it is not a causal relationship between the bus station and OD flow but a correlated relationship. e bus station is designed to build on locations with high demand to reduce the traffic pressure, but its capacity is not large enough to reduce the ridesourcing flow to a low level, which results in a large coefficient. e effect of subway stations is consistent in GWR and SAM, where the average coefficient is negative. Some studies [25,26,31] find that subway stations were positively related to walking demand, transit ridership, or automobile travel demand, since subway stations do not compete with these travel modes. However, in our case, subway stations relieve traffic pressure on the roadway by attracting passengers to the subway. us, it reduces the ride-sourcing OD flow and obtains a negative coefficient.

Effects of Explanatory Variables in Different Periods.
Since the GWR model outperforms SAM, we further explore the effects of explanatory variables in different periods with the GWR model. As shown in Table 6, the residential area, education, hotel, bus, and subway station coefficients on weekdays and weekends do not make any differences compared with those in Table 5. e residential area, hotel, and bus variables always play a positive role in attracting and generating traffic flow, and the education and subway stations always play a negative role. On a weekday, the enterprise has a positive effect on OD flow, while on weekends, its effect is not that strong and there even appears  For coefficient at different times of day, the effect of the residential area, education, hotel, and the subway station is consistent with that on weekdays and weekends. Nevertheless, for enterprise, in the morning rush hour, the OD flow going to zones with more enterprise buildings would be enlarged. Meanwhile, in the evening rush hour, the OD flow from zones with more enterprise buildings would be enlarged. It is related to the activity of commuters who go to enterprise buildings in the morning rush hour and leave from enterprise buildings in the evening rush hour.

Marginal Effects of Residential Area Change.
In 2050, the residential area is predicted to increase by 48.30%. e residential area distribution is shown in Figure 3(b). e increase in residential areas is not even across all TAZs. With the coefficients estimated in GWR, we predict the marginal effects of residential area change in 2050. e change ratio (local flow difference divided by local flow in 2019) of OD flow is illustrated in Figure 6.
As shown in Figure 6, we divide the flow into three types, that is, intraflow (trips depart from and arrive at the same TAZ), inflow (summation of trips arriving at the TAZ), and outflow (summation of trips departing from the TAZ). As shown in Figure 6(b), the outflow changes are consistent with residential area changes. TAZs with a significant residential size increase (15,12,24,54, 58, and 67) will have an increase in the outflow. TAZs with a residential size decrease like 1 and 47 will have a decrease in the outflow. As shown in Figure 6(a), for most TAZs, the changes of outflow and inflow are similar. Meanwhile, for TAZ 47, a decrease in residential areas will have an increase in inflow. e possible explanation is that TAZ 54, which has a large increase in residential areas, influences its neighbor 47, causing many people to enter TAZ 47. A similar situation happens to TAZ 33 and TAZ 52, which has no increase in the residential area but has an increase in inflow, since their neighbors TAZ 25 and TAZ 24 have a rising residential area size. e intraflow change in Figure 6(c) is much more even. e intraflow of TAZ 69 and TAZ 17 decreases. It may be caused by the increase of residential areas in TAZ 58 and TAZ 15, which attract part of the intraflow. Overall, the outflow changes are consistent with residential area changes.

Marginal Effects of Transit Station Change.
e local government planned to construct ten new subway lines by 2022. As can be seen in Figure 7, the newly built subway stations are illustrated. ere will be 260 subway stations in Hangzhou, increasing by 242.11% compared to the base year. Based on the coefficients acquired in the GWR model above, the increased ride-sourcing OD flows in the whole city can be estimated.
We have shown the OD flow changes ratio in Figure 8. Since the average coefficient of the subway station is negative, most of the TAZ's intraflow, outflow, and inflow will decrease. In Figure 8(a), TAZ 56 has the largest outflow decrease, since its subway station number dramatically increases from 0 to 11. For TAZ 38 and TAZ 66, which have no newly built subway stations, the outflow increase is caused by new stations in nearby TAZs like 42, 10, 6, and 57. People might flow out of the zone to take the subway. A similar conclusion can be drawn in the inflow case where TAZ 38 and TAZ 66 have an increase in inflow. Since TAZ 38 and TAZ 66 did not contain dense subway stations in 2019, people had to take the ride-sourcing service to enter these TAZs, causing an increase in inflow. e intraflow is consistent with the outflow.

Conclusions
is paper explores the influences of several built environment variables (e.g., residential area, POIs, and transit stations) on ride-sourcing OD flow.
is study differs from related research by analyzing ride-sourcing OD flow rather than just origin-based demand (outflow) or destination-based inflow, which offers more detailed spatial information. e results of the OLS, SAM, and GWR models are compared. e GWR model is different from the other models by allowing the coefficients of explanatory variables to vary over space. e result shows that the GWR model outperforms both SAM and OLS models. On average, the increase in the residential area, enterprise, and bus station variables will increase OD flow at both levels of origin and destination. e increase in education and subway will cause an opposite result, since students are not the main force of ride-sourcing passengers, and the subway competes with ride-sourcing services. For the different time of day, we find that enterprise attracts people in the morning rush hour, and people start to travel from enterprise building in the evening rush hour. It is related to the activity of commuters who go to work in the morning and go off work in the evening. Entertainment service and tourist spots play a different role on the weekends and weekdays. We also calculate and illustrate the changes in OD flow based on the residential area and the subway line construction plan. e findings and the modeling approach in this study help better understand the ride-sourcing flow and provide planners and policymakers with scientific guidance on the design of urban land use and transportation planning. is paper has a few limitations. All independent variables are regarded equally without considering their scales. For example, large shopping malls should have a larger weight than small ones. In the future, it is better to introduce multisource datasets such as subway transaction data to compare the impact of the built environment on different transportation modes. Several important characteristics like characteristics of the service (e.g., price, and type of car) should be considered for mode choice evaluations.
Data Availability e POI information can be accessed by utilizing the AMAP API service by visiting https://developer.amap.com/api/ webservice/guide/api/search.

Conflicts of Interest
e authors declare that they have no conflicts of interest.