Taxi Driver ’ s Operation Behavior and Passengers ’ Demand Analysis Based on GPS Data

The existing research outputs paid less attention to the relationship between land use and passenger demand, while the taxi drivers’ searching behavior for different lengths of observation period has not been explored. This paper is based on taxi GPS trajectories data from Shenzhen to explore taxi driver’s operation behavior and passengers’ demand. The taxi GPS trajectories data covers 204 hours in Shenzhen, China, which includes the taxi license number, time, longitude, latitude, speed, and whether passengers are in the taxi vehicle, to track the passenger’s pick-up and drop-off information.This paper focuses on these important topics: exploring the taxi driver operation behavior by the measurements of activity space and the connection between different activity spaces for different time duration; mainly focusing on eight traffic analysis zones (TAZs) of Shenzhen and exploring the customer’s real-time origin and destination demands on a spatial-temporal distribution on weekdays and weekends; taxi station optimization based on the passenger demand and expected customer waiting time distribution. This research can be helpful for taxi drivers to search for a new passenger and passengers to more easily find a taxi’s location.


Introduction
Urban land use and built environment have been considered to affect residents' travel demand with three dimensions: design, density, and diversity [1].Traffic engineers and urban planners have been paying more attention to explore the correlation between land use and transportation, including the land use influence on travel demand, the transport network impacts on the urban spatial development, and the integration of land use and transport system [2][3][4][5][6].
As an important mode, taxis play a key role in the urban passenger transportation market and provide a convenient and comfortable service for the passengers.In the taxi service study field, researchers usually adopt virtual customer origindestination demand patterns to analyze the model [7][8][9], which is connected with the area land use situation, but cannot completely reflect the temporal and spatial characteristics of passenger demands.With the rapid development of Information and Communication Technologies (ICT), this provides more accurate access time and location information for the study of human mobility.Taxi vehicles equipped with Global Position System (GPS) can be served as city-wide probes, which can also provide the traffic condition, time, taxi speed, and location information, as well as whether there are passengers in the taxi.Based on the taxi GPS traces data, we can obtain the customers real-time origin and destination demand, which can assist researchers in validating the taxi service model [10][11][12].
As the GPS data on taxis are only recorded over a special period of time's passenger trajectory, it is difficult to analyze the human mobility over a longer period, but taxi GPS traces can be adopted to analyze urban transport and land use situations [13].For the past few years, researchers have achieved more progress in this field, such as the researches of Li et al. (2011) [14], Zheng et al. (2011) [15], and Yue et al. (2012) [16].
Recently researchers have combined taxi GPS data with mathematical models (Lévy flights model or Zipf distribution law) to analyze the passenger's visiting frequency at one area [17], trip length distribution [18], and drivers' behavior [11,19].However, the existing researchers paid less attention to the taxi drivers' behavior for different lengths of observation period; meanwhile, the relationship between land use and passenger demand has not been explored.So this paper focuses on the time series distribution dynamic characteristic of passenger's temporal variation in certain land use types and taxi driver's searching behavior connection between different activity spaces for different lengths of observation period.This paper focused on the following topics.
(1) Exploring the taxi driver operation behavior by the measurements of activity space and the connection between different activity spaces for different time duration (2) Mainly focusing on eight TAZs of Shenzhen and exploring the customer's real-time origin and destination demand on spatial-temporal distribution on weekdays and weekends (3) Taxi station optimization based on the passenger demand and expected customer waiting time distribution.
The structure of this paper is as follows.Section 2 reviews the urban land use and travel demand correlation, as well as taxi driver's searching behavior.In Section 3, we present the taxi GPS traces data source and analysis measurements in detail.Section 4 presents the results and discussions.Finally, we conclude this paper in Section 5.

Literature Review
Researchers usually use virtual customer origin-destination demand patterns to analyze the taxi service model, which can refer to Arnott (1996) [7], Yang and Wong (1998) [8], Wong et al. (2001) [20], Bian et al., (2007) [21], and Luo and Shi (2009) [9].With the development of GPS hardware and communication technology, now we can collect taxi GPS traces data over longer periods than previous typical survey [16] and it also can provide more information in detail, such as trip length, travel time, and speed by time of day, which can assist researchers to validate the taxi service model.At present, some researchers also work on this field [22,23]; Zhang and He (2011) [22] focused more on the spatial distribution of taxi services in one day, while Hu et al. (2011) [23] mainly analyzed the one-day taxi temporal distribution of customers' pick-up and drop-off times in Guangzhou, China.
Based on taxi GPS trace data, researchers can analyze urban transport and land use status at the macro level, which can cover the shortage of the traditional questionnaire survey [14][15][16]24].Yue et al. (2012) [16] calibrated the parameters of the spatial interaction models based on the taxi GPS traces data of the central business distinct in Wuhan.Liu et al. (2012) [25] explored the temporal patterns of urban-scale trip in Shanghai and found that urban land use and structure can be expressed by the taxi trip patterns.
Giraudo and Peruch (1988) [26] had divided the taxi operation into two phases, "the transport phase" and "the approach phase," which also can be used to represent the taxi with passenger and without passenger operation, respectively.The taxi driver's searching passenger behavior happens in "the approach phase."When the driver has dropped off the prior passenger, then he/she drives around the area or region searching for the next passenger after a short time.
For the taxi driver's individual characteristics (driving experience, road network familiarity, etc.) and randomness of the passenger's arriving, the driver's searching for the next passenger can be seen as a random variable.Luo (2009) [27] had expressed taxi driver's searching for the next passenger as a double exponential (Gumbel) distribution.
Liu et al. (2010) [11] described the taxi driver's operation patterns and difference between top drivers and ordinary drivers' behavior in Shenzhen and discussed taxi drivers' behavior based on the taxi daily GPS traces data; they analyzed the drivers' spatial selection behavior, operation behavior, and route choice behavior.But in the research of Liu et al. (2010) [11], they did not mention drivers' searching space behavior pattern.
This paper attempts to bridge these gaps between theoretical research and practical development, based on the taxi GPS trajectories data of Shenzhen to explore urban land use and taxi driver's operation behavior.

Data Source and Taxi Operation Activity Space Measurements
3.1.Data Source.In this research, we use the taxi GPS traces data of Shenzhen, China, which contains 3198 taxi records over nine consecutive days, from 18 April, 2011 (Monday), to the noon 26 April, 2011 (Tuesday), with a total of 204 hours.Table 1 shows the typical format of taxi trajectory data, including taxi location (longitude, latitude), speed, direction (angle), and passenger pick-up and dropoff information (status), with associated time information.The data collection time interval is generally around 5 to 15 seconds.Delays or missing data may occur depending on the GPS signal, and additional records are collected when taxi load status changes.
The data analysis level and scope can be divided into three aspects, including the mean center analysis of the pick-up and drop-off events each day, passengers' spatial-temporal distribution of eight TAZs (traffic analysis zones) in the 204 continuous hours, and the taxi driver's searching behavior exploring from different level.

Taxi Operation Activity Space Measurements.
The taxi operation activity measurements mainly are based on the basic parameters distributions, such as mean center, standard deviational ellipse, standard deviation of the  and  coordinates, and kernel density.Based on existing researches, we divided these measurements into two categories, the spatial distribution category and the extended second moments of activity locations measurement category.

The Spatial Distribution
Category.The mean center (MC) is the average location of taxi operation activity service events in the space (including the pick-up and drop-off event), which can be calculated by the following [28]: where ẋ mc , ẏ mc are the coordinates of the mean center, which can determine the space location of the MC;   ,   are the coordinates of taxi service event  in two dimensions;  is the total number of taxi service events.
Standard deviational ellipse (SDE) [28,29] can determine the directional factors of the spatial distribution and find the main direction of the taxi service event in space.The calculation of SDE can be expressed by  (the SDE -axis in the clockwise rotation angle);  1 ,the major axis of SDE; and  2 , the minor axis of SDE.The detailed formulas are as follows: where  is area of SDE,   is the semimajor axis of SDE,   is the semiminor axis of SDE, and ẋ mc , ẏ mc and   ,   are consistent with formula (1).

The Extended Second Moments of Activity Locations
Measurement Category.Each taxi driver's daily activity space area mean center may have the relationship with the centroid of the whole taxi drivers' activity space [19], similar to the Susilo & Kitamura (2005) [30] analysis of the worker's daily activity locations relationship.We can analyze taxi driver's day-to-day variation on activity space and statistically analyze the second moments of activity locations.
Figure 1 shows an illustration of the drop-off (pickup) locations mean center of each taxi driver and all taxi drivers, which can analyze each taxi driver's day-today variation in the pick-up and drop-off activity space.Based on our statistics, the distance of the two MCs is mainly concentrated between 200 m and 400 m, which may reflect the taxi driver's searching behavior around a certain MC.

Taxi Driver's Operation Behavior Analysis.
In this section, we first explored the taxi driver operation behavior by the measurements of activity space and the connection between different activity spaces for different time duration.Here the MC and the locations mean center of each taxi driver and all taxi drivers have been used in the analysis.Figure 2 presents the spatial distribution of all taxi drivers' drop-off activity space mean center, which is analyzed by each day.
From Figure 2, we can find that taxi driver's drop-off activity space mean centers are mainly distributed around 22.562 to 22.576 (latitude) and 114.035-114.070(longitude).And comparing the weekdays (from Monday to Friday) and weekends, there are two area distributions, which is from 1 a.m. to 6 p.m. and from 7 p.m. to 12 p.m., respectively.The red circle in Figure 2 shows the distribution from 7 p.m. to 12 p.m.
Figure 3 presents the spatial distribution of all taxi drivers' pick-up activity space mean center, which is analyzed by each day.From Figure 3, we can also find that taxi driver's pickup activity space mean centers are mainly distributed around 22.560 to 22.574 (latitude) and 114.035-114.070(longitude).And comparing the weekdays (from Monday to Friday) and weekends, there are two area distributions, which is from 1 a.m. to 6 p.m. and from 7 p.m. to 12 p.m., respectively.The red circle in Figure 3 shows the distribution from 7 p.m. to 12 p.m.
From Figures 2 and 3, the scope of the taxi drivers' pick-up activity space mean center is more concentrated than the drop-off activity space mean center.Meanwhile, on the weekdays, the taxi drivers' pick-up activity space mean center and drop-off activity space mean center are more concentrated than on weekdays, which can reflect the passengers' daily life change, and they may have a more comfortable weekday.

Eight TAZs' Passenger Demands' Spatial-Temporal Distribution Analysis.
In this section, we present the analysis results between passenger's origin and destination demand on spatial-temporal distribution from 18 April, 2011 (Monday), to the noon 26 April, 2011 (Tuesday).And we mainly focus on eight TAZs (see in Table 2) of Shenzhen; Figure 4 presents the eight TAZs' passenger pick-up (in blue line) and drop-off (in red line) statistical chart.
Based on Figure 4, there is a relative equilibrium of the total passenger pick-up and drop-off situation in the eight TAZs, but, for different land use types, there exist different peak hours of the pick-up and drop-off demand.
For the commercial and CBD areas (TAZ1, TAZ2), the passenger pick-up and drop-off service frequency is higher than in residential areas (TAZ6 and TAZ7).Meanwhile, the peak hour for the eight different land use areas is quite different from each other (see in Table 3).
By reason of different land use types, the peak hours of the eight TAZs are different from each other, while the passenger's pick-up and drop-off events are not synchronized.In Shenzhen, the peak hour of taxi passenger's is almost at the midnight, such as in TAZ2, TAZ7, and TAZ8, which is similar to the research of Hu et al. (2014).
The trend of how pick-up and drop-off changes with time is almost the same from Monday to Friday for each TAZ.At weekends, the peak hour is a bit different with in weekdays, especially in TAZ1, TAZ5, and TAZ6.
Then the taxi vehicle's service frequency for each TAZ was analyzed, which is shown in Table 4. From this table it can be seen that, in each TAZ, the taxi vehicle's supply is different to each other and each taxi vehicle's service time in TAZ is quite different.In Table 4, we can find that some taxi drivers are cruising around some areas, especially for the taxi drivers who provide more than 130 pick-up service in 204 hours (see in Table 4).
Based on this phenomenon, we divide the taxi drivers into different categories, some drivers only provide random service in the whole city, but some drivers can provide a relatively fixed service just around a specific area, such as the CBD, and residential area.Then the distributions of taxi drivers' pick-up service time in the eight TAZs were analyzed (as shown in Figure 5).
In TAZ1, TAZ5, and TAZ7, more than 60% of taxi driver's pick-up service times are less than 5 times, while, in TAZ3, TAZ4, TAZ6, and TAZ8, more than 85% of taxi driver's pickup service times are less than 20 times, so 20 times can be taken as the boundary for the two different categories of taxi driver's service pattern.From Figure 5, we can also find that, in TAZ2, the average service time of each taxi driver is 46.47 times, and the 85% of taxi driver's pick-up service times is 70 times, so in TAZ2 the 70 times can serve as the boundary for the two different categories of taxi driver's service pattern.

Taxi Station Optimization.
From the analysis, we can find that the biggest passenger demand is in TAZ2, which is along the Shenzhen south road and international trade   center; at present this TAZ does not have taxi service station, which is inconvenient for passenger's travel, so this TAZ area needs to consider optimizing the taxi service station.
From Figure 4, we can find the two peak hours of passengers' pick-up service in TAZ2 is 2 p.m. to 3 p.m. and 9 p.m. to 10 p.m., which is connected with the land use and geographic location.So the taxi station optimization is based on the passenger demand and expected customer waiting time distribution, while we do not consider the setting form of the taxi station in this paper.
For the study field of taxi station's service area, Daganzo (1978) [24] proposed the flexible transit design model (FTDM), and in 2012 he had optimized it into a transit optimization approach [31].Based on existing research of Nourbakhsh and Ouyang (2012) [32] and Sathaye (2014) [33], here a taxi station optimization model is presented to determine the service radius R.
According to the research of Nourbakhsh and Ouyang (2012) [32], each passenger's expected walk distance is shown in the following formula in km: where  is the length of the side of one square; then each passenger's expected walk time in hours is

𝑇 = 𝐸
where V is the average operation speed (km/h).Therefore, a taxi station's service radius  can be expressed by the following formula: where  is service radius of taxi station (km) and  is the number of taxi stations.For the given D and Y, we can calculate the taxi station's service radius; the results are shown in Table 5. Referring to the study by Zhang et al. (2015) [34], which is based on taxi GPS data and analysis, they recommend the taxi station's service distance to be 300 m; this result can be matched with some results in Table 5 (the bold result).

Conclusions and Recommendations
This paper is based on taxi vehicle's GPS data to analyze the time series distribution dynamic characteristics of passengers' temporal variation in certain land use types and taxi driver's searching behavior in connection with different activity spaces for different lengths of observation period.And adopting GPS data had identified the passengers' demand hot area and proposed a taxi station optimization model, which can be served as reference to taxi station location decision.
By researching and proposing appropriate measures and statistics to properly measure and analyze activity spaces while recognizing their geographical dependence, this study can make some contributions to methodologies in measuring and analyzing behavioral dynamics.By understanding the dynamics in the activity spaces of taxi drivers over time, this study directly contributes to the field of travel behavior dynamics.The value that the study will potentially render in policy guidance also cannot be underestimated, understanding these dynamics at the driver level.The analysis will also provide a new method to optimize urban transport management, to investigate land-using planning, and to evaluate road network traffic conditions.
However, this paper from taxi vehicle's GPS data can reflect driver's behavior more accurately and, regarding the passenger level, it needs to combine the passenger's characters survey and the booking data from an Internet-booking application with the taxi vehicle's GPS data [35,36] and to analyze passenger's trip and the relationship with land use.In addition, taxi service and the public passenger transport system are strongly complementarity in big cities.In the future, we will take into account the main public transit facilities on taxi demand analysis.
On the other hand, based on the expected customer waiting time distribution, the approximate distribution of the customer waiting time formula can be validated and modified in different kinds of land use type, which will be more useful in describing the principal characteristics in the taxi market and affect customers' decision and taxi driver's cruising time [37,38].

Figure 1 :
Figure 1: An illustration of the mean center locations of each taxi driver and all taxi drivers.

Figure 2 :
Figure 2: Spatial distribution of taxi driver's drop-off activity space mean center (from Monday to next Tuesday).

e
taxi pick-up and drop-off service frequency of TAZ1 e taxi pick-up and drop-off service frequency of TAZ2 e taxi pick-up and drop-off service frequency of TAZ3 e taxi pick-up and drop-off service frequency of TAZ4 e taxi pick-up and drop-off service frequency of TAZ5 e taxi pick-up and drop-off service frequency of TAZ6 e taxi pick-up and drop-off service frequency of TAZ7 e taxi pick-up and drop-off service frequency of TAZ8 blue Drop color is red Pick color is blue Drop color is red Pick color is blue Drop color is red Pick color is blue Drop color is red Pick color is blue Drop color is red Pick color is blue Drop color is red Pick color is blue Drop color is red

Figure 4 :
Figure 4: Eight TAZs passenger pick-up (in blue line) and drop-off (in red line) service statistics.
Histogram of service time, TAZ1 Histogram of service time, TAZ2 Histogram of service time, TAZ3 Histogram of service time, TAZ4 Histogram of service time, TAZ5 Histogram of service time,

Table 1 :
An example trace of taxi GPS data, using the same vehicle as the example.

Table 2 :
The detailed information of the eight TAZs.

Table 3 :
Peak hour statistics of different TAZs.

Table 4 :
Summary of each taxi vehicle's service frequency of each TAZ.

Table 5 :
Calculation table of taxi station's service radius (unit: km).