Exploring Urban Taxi Drivers ’ Activity Distribution Based on GPS Data

With the rapid development of information communication technology anddatamining technology,we can obtain taxi vehicle’s real time operation status through the large-scale taxi GPS trajectories data and explore the drivers’ activity distribution characteristics. Based on the 204 continuous hours of 3198 taxi vehicles’ operation data of Shenzhen, China, this paper analyzed the urban taxi driver’s activity distribution characteristics from different temporal and spatial levels. In the time level, we identified the difference with taxi daily operation pattern (weekday versus weekends), continuous time in one day, passengers in vehicle time, and taxi drivers’ operation frequency; in the space level, we explored the taxi driver’s searching pattern, including searching activity space distribution and the relationship between the pick-up locations and the drop-off locations. This research can be helpful for urban taxi drivers’ operation and behavior pattern identification, as well as the contribution to the geographical activity space analysis.


Introduction
As an important component of the urban public passenger transportation system, taxi offers an all-weather, convenient, comfortable, and personalized travel services for the urban residents, as well as playing a key role in the urban passenger transportation development [1].Taking the capital of China, Beijing, as an example, there were 66,600 registered taxi vehicles in the year 2009, and these taxis carried 680 million passengers annually (which means more than 2 million passengers per day), accounting for 9.4 percent of the total passenger volume of urban public transport (including the conventional bus, metro transit, and taxi service) [2].
Due to the 24 hours operation per day, taxis account for a higher proportion in the urban traffic volume and air emission pollution [3]; more researchers attempt to analyze taxi operation status and taxi driver's routing choice based on the GPS traces data, aiming to reduce taxi driver's vacant travel distance and cruising time, and also this can save energy resource.In the taxi operation status analysis aspect, researchers analyze the taxi daily average operation distance, taxi passenger demand distribution, daily average operation frequency, and different driver's operation patterns; the reader can be referred to Hu and Feng [4], Chen [5], Jiang et al. [6], Wen et al. [7], Liu et al. [8], Hu et al. [9], and Zhang and He [10].
In the above studies, researchers have paid more attention to the basic taxi operation status and routing choice analysis, but they paid less attention to the taxi driver's temporal and spatial activity space analysis.Meanwhile, each taxi driver's activity space is quite different from the other drivers, and the analysis of taxi driver's travel activity space is a more interesting research field to explore.When the taxi is with a passenger, the driver will complete the trip in accordance with the passenger's travel will by adopting the shortest path from the origin to the destination [11]; this destination set can be called the taxi driver's drop-off locations; when the taxi is without passenger, the driver always wants to search the next potential passenger during the shortest possible time near or around his last drop-off passenger location [10,12]; this set can be regarded as the taxi driver's pick-up locations.Although the taxi driver will not pick up his passengers at the same location each day, each taxi driver's pick-up passenger's location and drop-off passenger's location are close to each other; the relationship between taxi driver's pick-up locations and drop-off locations can be explored to describe the spatial distribution of taxi drivers.

Mathematical Problems in Engineering
Due to the randomness and lack of regularity of taxi operation characteristics, it is improper to adopt visit places' order or arrive at a fixed time to describe taxi driver's travel activity pattern.So how to describe the regularity of taxi driver's operation spaces and time period, is there any difference between weekends versus weekdays?In the urban road network, when a taxi completes a trip to arrive at the passenger's destination, then the taxi driver would be at the without passenger stage and search for the next passenger within a range of regions/areas.So taxi drivers may search and cruise around some region to find their next passenger; there may be a relationship between taxi driver's covering area and operation frequency.We assume that there is relationship between the two activity spaces, drop-off locations and pickup locations.Also, each taxi driver's activity space dynamics can be explored as time evolves.
As a geographic concept, the activity space means "a composite of the locations where an individual conducts routine activities" [13][14][15].A major application of activity theory is empirical measurement and analysis of space time activity (STA) data or records of where and when individuals conducted activities over a daily, weekly, or monthly cycle.Wang and Cheng [15] had summarized the basic components in the activity theory, which includes the activity, activity frequency, activity destination, trip, transport mode(s), and the activity space.
The special work characteristics of the taxi drivers, as a special group, make their daily activity space quite different from the commute workers or students' daily mobility variability, such as the regular arrival at the workplace at a relatively fixed time [16], visiting some shop or restaurant frequently [17], and visiting some places following a certain order [18].Some researchers focus on taxi driver's cognitive space or travel time difference with the general public, which can reflect taxi driver's spatial-temporal cognitive of the urban space.The pioneer work can be traced to Giraudo and Peruch [19], Peruch et al. [20], Spiers and Maguire [21], and Wakabayashi et al. [22].Besides the traditional comparison of the cognitive maps method, Wakabayashi et al. [22] also adopted the SDE (standard deviational ellipse) to analyze the difference between the taxi drivers and the university students.
The aim of this paper is to provide measurements to analyze the taxi driver's activity space from the time level and individual level; in the time level, the taxi driver's daily (workdays versus weekends) pick-up locations and dropoff locations will be measured and analyzed, as well as the exploration of relationship between the two activity spaces; in the individual level, each taxi driver's activity space dynamics can be explored as time evolves.We will introduce the geographic concept into this research and attempt to analyze the taxi driver's activity space, which will reveal the relationship between the taxi driver's pick-up locations and drop-off locations and help drivers reduce cruising time and improve operation frequency.This research can also be helpful for saving taxi energy consumption and lowering air pollution emissions, to achieve a more sustainable and environmental development of urban taxi industry.
The paper is organized as follows.Section 2 provides an overview on taxi driver's behavior analysis and activity space measurements comparison.In Section 3, we present the taxi GPS traces data source and taxi driver's activity space measures in detail.Section 4 describes and compares the analysis results.Section 5 is the discussions.Finally, we conclude this paper in Section 6.

Literature Review
Around the world, research on taxi drivers' travel behavior analysis has been paid more attention from Pailhous [23] and Michon [24].There are many influence factors on the taxi driver's travel choice behavior in the actual road network environment, such as the driver's road network familiarity, traffic control devices, road construction, and other urban traffic management and control measures.With the development of ICT (information communication technology), it provides a new way to explore taxi driver's route choice or way-finding behavior, which involves the shortest path optimal and driving destination or direction selection based on the huge number of taxi trajectories.Based on the taxi GPS traces and data mining technology, especially one can obtain the experienced/smart taxi driver's route choice behavior in real time and provide guides for other general public's shortest path optimal choice [11].
Another research field focuses on the taxi driver's practical travel behavior.Murakami and Wagner [25] adopted PDA (personal digital assistant) with a GPS receiver to obtain taxi vehicle's speed and daily travel information, which can be seen as the first trial in analysis of taxi driver's travel behavior by using GPS traces data, and this method can have higher data quality on travel start and end times, total trip time, and destination locations than daily travel diary and telephone retrieval methods.Liu et al. [8] described the searching behavior difference between the top taxi drivers and the ordinary divers in Shenzhen and found that the top drivers pay more attention on the less competitive and more profitable locations, rather than the central business district.Based on the taxi GPS trajectory data in Wuhan, Yue et al. [26] forecasted the taxi driver's pick-up locations on the basis of the selected drop-off locations.These researches also can help us to understand taxi driver's travel behavior, but they have paid less attention to the taxi driver's searching behavior dynamic pattern in their pick-up and drop-off activity space.
The activity theory has been adopted to analyze different kinds of traveler's spatial mobility dynamics.The special work characteristics of taxi drivers, as a special group, make their daily activity space quite different from the commute workers or students' daily mobility variability.Due to their professional relationship, the taxi drivers need to arrive at the destination according to the passenger's travel will or search the next potential passenger according to their individual characteristics (the driving experience and the familiarity of the road network or place); there is not a fixed or constant mobility route for the taxi driver to follow.Based on Wang and Cheng's [15] summary, we can identify taxi driver's basic components in the activity theory, which is shown in Table 1.A composite of the locations where a taxi driver conducts pick-up passenger activity and drop-off passenger activity, so it can be divided into pick-up activity space (location) and drop-off activity space (location) From Table 1, we can know that some basic components of taxi driver' activity have been determined, such as the activity, trip, and transport mode(s).So we pay more attention to the taxi driver's activity frequency, activity destination, and activity space.
For the taxi driver's activity frequency, we can adopt a taxi's daily number of operation service times to describe this component; this also can be divided into the pick-up passenger activity and the drop-off passenger activity.Due to the different passenger's destination demand, the taxi driver's drop-off locations have the irregularity characteristics and will be in accordance with the passenger's requirement; the taxi driver will search the next passenger near the last drop-off location, so the pick-up locations will have some relationship with the drop-off locations.The question will be focused on the pick-up locations and the drop-off location, which is also the taxi driver's activity space analysis.
In the research of Ge et al. [27], they had clustered the taxi drivers' pick-up locations during a certain time interval to obtain the centroids (mean centre) of these pickup locations, which can be served as the recommended pickup locations with a certain probability.This leverage can help a new taxi driver to search the potential passenger in a certain area; meanwhile it provides a reference for us to explore the dynamics in the activity spaces of taxi drivers over time and the relationship between each taxi driver's daily pick-up locations' area and drop-off locations' area.Susilo and Kitamura [16] had extended the action space to the second moment of the activity locations that it contains, and then they examined the day-to-day variation in the second moment.The second moment of activity locations has given us inspirer how to analyze each taxi driver's day-today variation in the pick-up and drop-off activity space (location).
Here we hypothesize that each taxi driver's activity space mean centre (centroid) has the relationship with the centroid of the whole taxi driver's space area, just as Susilo and Kitamura [16] analyzed the worker's daily activity locations relationship.We also analyze taxi driver's day-to-day variation on activity space and statistically analyze the variation of the second moments.

Taxi Driver's Activity Space Measures
There are various measurements to measure the activity space in the previous researches, such as the SDE (standard deviational ellipse), mean centre, the / axis ratio, SDC (standard deviational circle), and kernel density.Based on existing researches, we divided these measurements into two categories, the spatial distribution category and the extended second moments of activity locations measurement category.

The Spatial Distribution Category.
The spatial distribution estimates the basic parameters about the distribution; they include mean centre, standard deviation of the  and  coordinates, standard deviational ellipse, standard distance deviation, and convex hull.
The mean centre (MC) is the average location of taxi service events in the space (including the pick-up and dropoff event), which can be calculated by [28] where ẋ mc , ẏ mc are the coordinates of the mean centre, which can determine the space location of the MC,   ,   are the coordinates of taxi service event  in the two-dimensional, respectively, and  is the total number of taxi service events.
Standard distance deviation (SDD) [28,29] can describe the absolute dispersion degree for each taxi service event relative to the mean centre (MC); the formula can be expressed by where   ,   are the coordinates of taxi service event  and ẋ mc , ẏ mc are the coordinates of the mean centre.Based on the mean centre and standard distance deviation, we can draw the standard deviational circle (SDC), which can express the dispersion of taxi service in all directions of space.standard deviational ellipse (SDE) [28,29] can determine the directional factors of the spatial distribution and find the main direction of the taxi service event in space.The calculation of SDE can be expressed by  (the SDE -axis in the clockwise rotation angle), where  1 is the major axis of SDE and  2 the minor axis of SDE.The detailed formulas are as follows: where  is area of SDE,   is the semimajor axis of SDE,   is the semiminor axis of SDE, ẋ mc , ẏ mc and   ,   are consistent with formulas (1) and (2).
Figure 1 has shown a taxi operation example in Shenzhen from 0 am on April 18, 2011 to 12 am on April 26, 2011 (which consists of 204 continuous hours) in Shenzhen, China.The red points represent the taxi drop-off activity locations, while the cyan points represent the taxi pick-up activity locations.
From Figure 1, we can find that the taxi's operation dropoff locations and pick-up locations' mean centre (centriod), standard deviational ellipse distribution, and the two activity spaces' standard deviation of the  and  coordinates are quite different, and the -axis is rotated clockwise through an angle.But the mean centres of the drop-off locations and pick-up locations are quite near to each other; we may use the extended second moments of activity locations measurement category to analyze this relationship and distribution.
From Figure 2, we can know that the taxi driver's daily operation has no fixed origin; this is quite different from other workers or students' origin-home location; meanwhile, there is no fixed operation lines or order for taxi driver to follow.But each taxi driver's operation has a mean centre that can be as a fixed location; based on this we can extend to measure taxi driver's daily operation relationship.
Kamruzzaman and Hine [30] analyzed the correlations between the different measures of activity space size; this method can help us to understand different activity space measures relationship of each taxi driver's operation.We will analyze these relationships: (1) area size of SDE and area size of SDD of pick-up locations, (2) area size of SDE and area size of SDD of drop-off locations, (3) area size of SDE comparison between the pick-up and drop-off locations, (4) area size of SDD comparison between the pick-up and drop-off locations.

The Extended Second Moments of Activity Locations
Measurement Category.The researches of Susilo and Kitamura [16] and Ge et al. [27] have given us inspirer how to analyze each taxi driver's day-to-day variation in the pick-up and drop-off activity space.Each taxi driver's daily activity space area mean centre may have the relationship with the centroid of the whole taxi drivers' activity space, just as Susilo and Kitamura [16] analyzed the worker's daily activity locations relationship.We can analyze taxi driver's day-to-day variation on activity space and statistically analyze the second moments of activity locations: (1) the Great-circle distance between the mean centre of each taxi driver's drop-off location and pick-up location, with calculating the statistics distribution, The whole mean centre Each taxi mean centre which can be referred to in Figure 1 of the MC pick and MC drop (by using (1)), (2) the Great-circle distance between the drop-off (pickup) locations mean centre of each taxi driver and all taxi drivers, with calculating the statistics distribution, which is shown in Figure 3, (3) the fullness of activity spaces to describe the individual SDE feature class.
The next part is how to calculate the Great-circle distance between two points on the globe surface.Here we adopt the simple spherical law of cosines formula to calculate it [31,32], assuming the Erath as a spherical earth (ignoring the ellipsoidal effects).The spherical law of cosines formula gives well-conditioned results down to distances as small as around 1 meter.Given two points' latitude and longitude, the formula of the spherical law of cosines is as follows: where  is the Earth radius; here we use the average radius of the earth, which is equal to 6371.004 kilometers in average; lat  , long  represent the  point's latitude and longitude (in radian), respectively;  is the Great-circle distance between the two points.
Kamruzzaman and Hine [33] had proposed the fullness of activity spaces to describe the individual SDE feature class.The formula can be expressed as where  is the fullness of activity spaces;  ,axis ,  ,axis represent the length of the activity spaces locations' SDE -axis and axis, respectively.For each taxi equipped with GPS, the records information including taxi ID, date, time, location (longitude, latitude), velocity, driving direction, and taxi operation status (having passengers or not) is recorded every 5 or 10 seconds for the driving environment effects; under the effect of the traffic environment, the GPS record collection interval time period is not constant.

Results Analysis
In the taxi GPS traces data processing, we adopt ArcGIS 9.3, R statistics software and Crime Stat software to do data mining and statistics work.Figure 4 shows the GPS data mining process.
We have divided the GPS data mining into three levels: the time level analysis, personal (each taxi driver) level, and the top 1% and last 1% taxi driver's comparison.From the time level we want to find taxi daily operation pattern (weekdays versus weekends), continuous time in one day, passenger's in vehicle time, and taxi's operation service frequency.This can be used in taxi operation planning, taxi pricing model, and taxi management policy implementation.From the personal level, we want to identify taxi driver's searching pattern, including searching activity space distribution and the relationship between the pick-up locations and the dropoff locations, which can be used in taxi dispatching and setting taxi stop location.From the comparison of the top and last 1% taxi drivers, the difference can be identified, which will be helpful for reducing passenger's waiting time and increasing driver's daily operation service frequency.

Time Level Analysis.
Our first analysis has taken all the 3198 taxi vehicles operations as a whole to explore their operation temporal pattern per hour, which includes the pick-up passengers' locations and drop-off passengers' locations analysis.
Figure 5 shows all the taxi operation service frequency analysis; we can find these useful pieces of information.(1) In weekdays, there are three peak times in one day, which are 9 am to 10 am, 3 pm to 4 pm, and 10 pm to 11 pm, and the highest peak time interval is during 10 pm to 11 pm.
(2) At weekends, there are only two peak times, which are 11 am to 12 pm on Saturday and 2 pm to 3 pm on Sunday, as well as the 10 pm to 11 pm at night.Meanwhile, the beginning time of trip increasing at weekdays is an hour later than weekdays.
(3) The trend of pick-up and drop-off changes with the time is almost the same from Monday to Friday, and the lowest service time is from 5 am to 6 am.
(4) When the service frequency increases, the number of pick-up locations will be higher than the drop-off locations; on the contrary, when the service frequency decreases, the number of pick-up locations will be lower than the drop-off locations.
Here we adopt the service frequency divided by the total taxi numbers ( = 3198) and obtain each taxi daily operation service frequency situation.The results are shown in Table 2; we can see that the lowest taxi service frequency is less than 0.4 per hour, and the highest is almost 2.7 per hour.The average number of each taxi operating per day is between 39 and 44 times, which is higher than other cities statistics, such as in Harbin where there are 32 taxi service times for each taxi operating per day in 2008 [3], as well as 31 to 33 taxi service times per day in Beijing in 2007 [34].During 2006 to 2007, Chen [5] also calculated the average operation time for each taxi per day, which is between 35 and 40 times per taxi per day, which is also lower than that in 2011.
According to the researches of Kamruzzaman and Hine [30,33], we analyze the correlations between the different measures of activity space area, which are shown in Table 3 and Figure 6.
From Figure 6 and Table 3 we can find that the drop-off locations cover area range bigger than the pick-up locations, which can also reflect the disorder and discrete characteristic of the passenger's destination distribution.Meanwhile, the area of location's SDD is bigger than the SDE, regardless of the pick-up or drop-off locations.

The Personal Level.
In this part, we have calculated the distance between the mean centre of each taxi driver's pickup location and drop-off location and then analyzed the distribution of these distances and the relationship between the pick-up locations and drop-off locations.
Table 4 shows the statistical information of the entire 3198 taxi vehicle's service frequency and the distance between the mean centre of each taxi driver's pick-up locations and dropoff locations.With the combination of Figure 7, more than 80 percent of the taxi vehicles operation service frequency is between 300 and 500 times during the 204 hours.
Figure 8 shows the distance distribution and the frequency distribution curve of taxi pick-up and drop-off passenger locations' mean centre, respectively.The distance between each taxi vehicle's drop-off and pick-up locations' mean centre is below 5000 meters; however, the mean centre of pick-up (or drop-off) locations' distance between each taxi vehicle and the whole operation is changing from 500 meters to 30000 meters, which means each taxi vehicle's operation discrete and randomness.The density line color is blue, and the normal distribution line color is red.From Figure 8, we can also find that the frequency distribution of taxi pick-up and drop-off passenger locations' mean centre has some aggregate character, so we adopt nearest neighbor hierarchical clustering method to calculate each taxi vehicles' pick-up and drop-off location, which are shown in Figures 9 and 10, respectively.
The nearest neighbor hierarchical clustering method can identify small geographical environments where there are concentrated pick-up or drop-off activities.And the linkages between several small clusters can be seen through the second-and third-order clusters; there are different scales to the clustering of points' different geographical levels.Also the analysis can show the number of points found per cluster and the density of points found per cluster.
From Figure 11 we can find that the drop-off locations cover area range bigger than the pick-up locations, which can also reflect the disorder of the passenger's destination distribution.
Kamruzzaman and Hine [33] had proposed the fullness of activity spaces to describe the individual SDE feature class.This can be seen in Figure 12.

The Comparison between the Top and Last 1% Taxi Drivers'
Level.From Figure 13 and Table 5 we can find the difference between the top 1% and last 1% taxi searching time/operation time.The top 1% taxi drivers have shorter time both in searching and in operation time than the last 1% taxi drivers.There is more than 22 percent of the last 1% taxi driver's  searching time which is above 60 minutes; furthermore, the top 1% taxi drivers are only 5 percent above 60 minutes.
Referring to the concept of 85 percent vehicle speed, here we calculate the 85 percent searching time and operation time of the top 1% and last 1% taxi drivers, which are shown in Table 5.The top 1% taxi drivers' 85 percent operation and searching time are at the interval from 20 to 23 minutes; however, the last 1% taxi drivers' 85 percent operation and searching time is 35 and more than 60 minutes, respectively.That is why the top 1% taxi drivers can operate more service time than the last 1% taxi drivers.

Discussions
In the time level, we have found that taxi's operations in weekdays and weekends have different operation status; in weekdays there are three peak hours during one day and in weekends there are only two peak hours.Meanwhile, the time level of taxis operation reflects its difference with the normal traffic demand peak hour distribution.There are always two peak hours in the normal traffic demand (as the dual hump curve), which corresponds to going to work and getting off   The drop-off locations' fullness of activity spaces The pick-up locations' fullness of activity spaces and spatial distribution.From the time level, we found the difference between taxi daily operation pattern (weekday and weekends), continuous time in one day, passenger's in-vehicle time, and taxi's operation frequency.From the personal level (each taxi vehicle), we have identified the taxi driver's searching pattern, including searching activity space distribution and the relationship between the pick-up locations and the drop-off locations.Through the comparison between the

Figure 1 :Figure 2 :
Figure 1: Drop-off locations, pick-up locations, SDE, and mean centre figure of one taxi vehicle operation.

Figure 3 :
Figure 3: An illustration of the locations mean centre of each taxi driver and the all taxi drivers.

Figure 6 :Figure 7 :
Figure 6: The correlations between the different measures of activity space area.

Figure 8 :
Figure 8: The frequency distribution curve of taxi pick-up and drop-off passenger locations' mean centre.
drop-off locations' SDE Area of pick-up locations' SDE

Figure 11 :
Figure 11: correlations between the different measures of activity space area.

Figure 12 :
Figure 12: The fullness of activity spaces of each taxi driver's pick-up location and drop-off location.

Table 1 :
Taxi driver's basic components of activity theory.

Table 2 :
Each taxi driver's daily average operation service frequency statistics.

Table 3 :
The statistics of the different measures of activity space area.

Table 4 :
The statistics of the different measures of each driver's operation activity space.

Table 5 :
The statistics of the top 1% and last 1% taxi searching time/operation time.