Exploring Intracity Taxi Mobility during the Holidays for Location-Based Marketing

Taxi mobility information can be considered as an important source of mobile location-based information for making marketing decisions. So, studying the behavioral patterns of taxis in a Chinese city during the holidays using the global positioning system (GPS) can yield remarkable insights into people’s holiday travel patterns, as well as the odd-even day vehicle prohibition system.This paper studies the behavioral patterns of taxis during specific holidays in terms of pick-up and drop-off locations, travel distance, mobile step length, travel direction, and radius of gyration on the basis of GPS data. Our results support the idea of a polycentric city. It is concluded from the reporting results that there are no significant changes in the distribution of pick-up and drop-off locations, travel distance, or travel direction during holidays in comparison to work days. The results suggest that human travel by taxi has a stable regularity. However, the radius of gyration of movement by most of the taxis becomes significantly larger during holidays that indicate more long-distance travels. The current study will be helpful for location-based marketing during the holidays.


Introduction
Human behavior is the source of all of social phenomena, including location-based marketing.The current research is mainly focused on analyzing the human behavior quantitatively in statistical physics and complexity science.In the state of the art in the field, many researchers analyzed the human behavior in different aspects.Barabási (2005) studied the power-law characteristics in the distribution of the interevent time of human communication behavior [1].Brockmann et al. (2006) analyzed the data of dollar bills in circulation and got the moving step of each dollar bill in space.The authors concluded that its moving step probability has obvious characteristics of power-law distribution, with an exponential power of −1.59 [2].Similarly, Gonzalez et al. (2008) used mobile phone data to analyze the spatial distribution characteristics of moving steps by mobile phone users.They observed that the moving step is in line with a power-law distribution with an exponential power of −1.75 ± 0.15 [3].Song et al. (2010) also reported that accuracy of predicting human behavior lies in the range of 70% to 93% [4] The results were reported on the basis of analyzing the mobile phone records of one million users for a period of three months.
Recently, it has been observed that the analysis and prediction of human spatial movement [5] is an emerging research topic [6][7][8][9][10] in the field relevant to urban planning [11], the spread of infectious diseases [12], and catastrophic emergency management [13] and many more.Most of the researchers in the field have only been able to analyze the behavioral patterns of human movement based upon the data collected through surveys.Nowadays, the researchers are also exploring the utilization of personal mobility data for analyzing the human behavior, such as vehicle global positioning system (GPS) data [14] and mobile phone records [3].
GPS data provides a precise spatial resolution.Its capability to represent people's mobility features has made its wide use for the analysis of human behavior.Rhee et al. (2011) collected and analyzed GPS data of 44 volunteers.Their analysis results confirm that the moving step of different groups of volunteers for different scenarios approximated a power-law distribution [15], whereas the finding of the few studies suggests an exponential distribution of distance traveled by taxi passengers [16,17].There exist a significant research for the collection of the car GPS data in Rome, Bologna, Senigallia, and Florence and a lot of statistical research on private car drivers' travel trajectory.It was found that vehicle travel distance had an exponential distribution that remains invariant with the time [18][19][20].
In the present paper, we analyze the behavior of human spatial movement by using taxi GPS data collected over Tianjin, China.We studied the impact of holidays on human movement in various aspects, such as the distribution of urban residents' pick-up and drop-off locations by taxi, travel distances, travel directions, and taxis' scope of activities.The findings of the study are as follows: (i) Pick-up and drop-off locations for the urban residents by taxis are mainly concentrated in the three time periods: (a) Morning: 8:00-12:00 (b) Afternoon: 14:00-20:00 (c) Evening: 22:00-0:00 (next day).
It is observed that these locations are mainly focused between 8:00 and 10:00 in the morning during weekdays and between 10:00 and 12:00 in the morning during the holidays.
(ii) Pick-up and drop-off locations for the urban residents by taxis are mainly distributed throughout Tianjin's main urban area and the Binhai New Area, as well as two isolated hub locations: Tianjin Binhai International Airport and Tianjin South Railway Station.
A heterogeneous distribution of residents' travel distance by taxis and a centrally symmetric pattern distribution of travel direction have been observed.
The holidays do not have a significant impact on the distribution of pick-up and drop-off locations, travel distance, or travel direction.
(iii) The taxis' radius of gyration becomes significantly larger during the holidays.
The remainder of this paper is organized as follows: Section 2 describes the GPS data and its preprocessing.Section 3 presents the statistical analysis and the results.Finally, Section 4 presents the discussions of the results and concludes the paper.

Data Description
In the current study, we collected the GPS data of 3051 taxis in Tianjin during the month of October 2012.The important features of the collected data involve the taxi's vehicle identification number, vehicle meter status, longitude, latitude, date, and time.The sampling frequency of empty taxis is once per 20 seconds, and the sampling frequency of carrying-state taxis is once per minute.The collected data is presented in the form of records as described below: Taxi ID: the unique ID of each taxi Time: the sample timestamp YYYY-MM-DD HH:MM:SS GPS position: the longitude and latitude of the sample taxi at the sample time Meter state: indicating whether the taxi meter is running: 0 represents that there are no passengers in this taxi, and 1 represents that there are passengers in this taxi The collected data is preprocessed for further analysis by extracting the trips of each taxi and dropping the trips beyond the scope of the city under the study.
The meter state identifies the presence of passenger(s) in the taxi.Therefore, a taxi's travel trajectory of meter state is similar to 000000111111100000.We also extracted the O location (origin location, also called pick-up location) and D location (destination location, also called drop-off location) for the purpose of analyzing the travel distance and direction.Furthermore, the Euclidean distance, or the direction of residents' travel by taxi, is computed.In this paper, we selected the OD locations as per the following method: O location: the location where meter state changes from 0 to 1 D location: the location where meter state changes from 1 to 0 Furthermore, we calculated the Euclidean distance by latitude and longitude coordinates for each pair of OD locations.We removed the invalid data (where the value of distance is too large or too small and OD locations are in different time periods).Finally, we have a total of 1,957,470 records of O locations or D locations.The data statistics are shown in Table 1.
Every year, the National Day is celebrated on October 1st to commemorate the founding of the People's Republic of China.Moreover, the seven-day holiday from October 1st through 7th is the so-called "Golden Week."During the Golden Week, more Chinese people travel all over the places.Therefore, in order to analyze the impact of holidays, we divided all data collected in the month of October 2012 into four parts by time (each part including 7 days): (1) Oct 01-Oct 07; (2) Oct 08-Oct 14; (3) Oct 15-Oct 21; (4) Oct 22-Oct 28 as highlighted in Table 1.Here, the first part (Oct 01-Oct 07) can be used to represent the "Golden Week" of National Day.

Analysis of Time-Sharing Statistics of OD Locations.
For each slot of the seven-day period, we count the number of OD locations for every two hours (such as 0:00-2:00).The recorded statistics are shown in Figure 1.From Figure 1, seven 24-hour cycles can be clearly identified.The identified cycle indicates that the distribution of OD locations is repeated daily.The curves of the number of O locations and the number of D locations of four time periods are very similar: OD locations are mainly at 8:00-12:00 in the morning, at 14:00-20:00 in the afternoon, and at 22:00-0:00 (the next day) in the evening.However, the four curves of OD locations also have some nuances: the curves indicate main concentration between 10:00 and 12:00 in the morning during the holidays (the red curve) and weekends (the last two cycles of the black, green, and blue curve).But, the concentration remains between 8:00 and 10:00 in the morning during weekdays (the first five cycles of the black, green, and blue curve).This shows that urban residents' morning travel by taxi during weekdays is earlier than during the holidays and during weekends in general.Moreover, for the time period from 14:00 to 20:00 in the afternoon, the number of OD locations fluctuates around 8000, while during weekdays (the first five cycles of the black, green, and blue curves) the number of OD locations fluctuates around 7000 as indicated by the holiday period (the red curve) and weekends (the last two cycles of the black, green, and blue curves).It also confirms that the urban residents travel by taxis less on weekdays than during the holidays for the afternoon timings.

Analysis of Spatial Distribution of OD Locations.
Mobile step length is an important metric of the mobility.Spatial displacement is commonly used as the mobile step length in the studies on the human mobility.This is because displacement can represent mobility behavior without being affected by the details of paths [21,22].
Figure 2 shows the distribution of displacement by taxis for residents in Tianjin.However, it is not possible to describe the displacement distribution by the power-law distribution only.So, the current paper provides a comparison of four distributions, namely, power-law distribution with an exponential cutoff (PLEXP), lognormal distribution (LN), Weibull distribution (WB), and exponential distribution (EXP).The distributions are represented by the red dotted line, blue solid line, green dotted line, and carmine dotted line, respectively.As per results depicted in Table 2 and following Akaike information criterion (AIC), it can be concluded that the displacement distribution for taxi passengers follows the lognormal distribution and the exponential distribution.Table 3 presents the optimal fitting parameters.Another interesting phenomenon is that the displacement distribution can be partitioned into two parts at 20 km.The first part had a slow increase and then sustained a stable decrease.There was an obvious peak in the second part.
It can be observed from Figure 2 that increase in displacement leads to first increase of the displacement distribution density function (Δ) to a high level which then decreases slowly for traveling behavior with a displacement of less than   20 km.(Δ) reaches its peak value at Δ = rm, where rm varies with time and ranges from 1.3 to 2.3 km.It is easy to understand the increase in (Δ), as residents prefer to travel on foot or by bike for distances less than 1 km.Moreover, rm also indicates that people take traveling costs into account in their daily life.
Analysis of the data sets over the four time periods reveals that 97% of passengers have a displacement of less than 20 km.This coincides with experience collected from the daily life.Considering travel costs, residents prefer to travel long distances on public transport or by private car, rather than by taxi, during holidays.From Figure 3, it can be deduced that the traveling behaviors with a displacement in the rm-20 km range decreased exponentially and its corresponding parameters are as given in Table 4.The values indicate that the current part accounts for more than 67% of the travel distance of all passengers.Here, the value parameter  varies with time, but its values remain to be close to each other.In a nutshell, the displacements of residents traveling by taxi in different cities share similar statistical features.All trips above 2 km follow the two-piece exponential distribution.The distribution during National Day does not differ greatly from the distribution during regular workdays.the map into different grids (the resolution of each grid is 0.01 longitude and 0.01 latitude), count the number of OD locations in each grid, and visually indicate the grids having an OD location number greater than 1000 in the map as depicted in Figure 4.It can be observed from Figure 4 that the spatial distribution of OD locations in the four different slots of study period is similar mainly in the Tianjin urban area (large gathering area in the red ellipse on the left), Binhai New Area (red ellipse on the right beside the sea), and the two isolated locations of the Tianjin Binhai International Airport (orange grid on the right of Tianjin urban area) as well as Tianjin South Railway Station (yellow grid on the bottom left of Tianjin urban area).Out of the specified areas, the airport is a hot area having a number of OD locations of around 4400, which accounts for about 4.7% of the total OD locations.The distribution of OD locations depicted in Figure 4 also reflects the geographic characteristics of Tianjin: "Two City," namely, the Tianjin urban area and Binhai New Area.
Travel distance is an important measurement in describing travel behavior.It is measured by calculating the Euclidean distance of the respective pairs of OD locations.The probability distributions of the residents' travel distance (D) are shown in Figure 5. From Figure 5, it can be seen that probability curves representing different slots for the month of October 2012 are also very similar.The similarity here indicates that residents' travel distance by taxis is not affected by holidays.
Here, in Figure 5, the red curve represents the residents' travel distance distribution during National Day; the other three curves (black, green, and blue) represent residents' travel distance distribution on rest of the days.We can observe that there is no significant difference in the four curves that indicates the null effect upon residents' taxi travel distance of the holidays.In order to measure the similarity between two probability distributions, Hellinger distance [23][24][25][26] is one of the most commonly used metrics.So, we use it to compare the similarity between the distributions of travel distance.The Hellinger distance for measuring similarity between continuous probability functions () and () over a domain  is defined as follows [27]:  For discrete distributions, the Hellinger distance is computed as follows: We computed the Hellinger distance of travel distributions over the four slots for the study period and listed the values in Table 5.The lower left and upper right of Table 5 are perfectly symmetric, because the Hellinger distances between () and () and vice versa are same.It can be observed from Table 5 that the Hellinger distance between every pair of the four parts is greater than 0.999.It indicates the high similarity between the mobility patterns of any two parts.However, the Hellinger distance between the National Day and the other three periods of time is about 0.9998.Whereas, the Hellinger distance between every pair of the other three parts is higher than 0.9999.This is a nuanced difference that is reflected by the red probability curve in Figure 3.The curve represents its lowest value in comparison to the other three curves after 20 km as described in Figure 3. Therefore, it indicates that long-distance travel by taxi has declined during the holiday period.in describing travel behavior.Each trip can be represented as a vector (also called an OD vector) in space.The distribution of the residents' travel directions during the four slots of the study periods in a polar coordinate system is as depicted in Figure 6.It can be observed from Figure 6 that the overall distribution of the travel directions of the four slots for the study period is very similar and has central symmetry.This shows that the movements of the vast majority individuals can be considered round travel, such as leaving home to go to work in the morning and coming home from work in the evening.The elliptical nature of the curve indicates the uneven distribution of travel direction.It signifies that travel in the northwest and southeast directions is more frequent than travel in the northeast and southwest directions.

Analysis of Residents Travel
We also calculate the Hellinger distance of travel direction between the respective pairs of the four slots in the study period and computed values are as shown in Table 6.It can be observed from Table 6 that its lower left and upper right parts are also symmetrical and each value is greater than 0.999.It indicates a high similarity between the mobility patterns of each pair of both the parts.These similarities are more significant than those of travel distance to deduce that Weekdays (all the weekdays from Oct 8 to Oct 31) Weekends (all the weekends from Oct 8 to Oct 31) We compared the distributions of travel distances and travel directions between these two slots of the study period for weekdays and weekends as depicted in Figure 7.It can be observed that travel distance and travel direction by taxi are very similar on weekends and weekdays.This proves that the regularity of human travel by taxi is not affected by holidays.

Analysis of the Scope of Activities of Taxis.
The scope of taxi's activities can be represented by its radius of gyration and is defined as follows [3,28]: where  ⃗   represents  = 1, . ..,    () positions recorded for user a and is the center of mass of the trajectory [29].
In order to compute the scope of taxi's activities, it requires preprocessing of the values.We compute the radius of gyration of each taxi every day in the month of October, and then we screen taxis to make sure that their radius of gyration is larger than 1000 m (a radius of gyration of less than 1000 meters is regarded as invalid data in this study).This makes consider 2198 taxis and calculate their radius of gyration for the four slots of the study period.The computed values for their statistical distributions are plotted as depicted in Figure 8.
It can be observed that the value of the radius of gyration greater than 10 km on National Day (the red line) is significantly higher than the other three periods.This indicates that, during National Day, most taxis have a wider range of activities.The other three curves (black, green, blue) are very close but not exactly the same, which also shows (1) the diversity of the taxis' movements and (2) that people's activities tend to be steady during ordinary times [30].

3.7.
Taxi Mobility throughout the Year.We studied the data set of GPS tracks from the taxi company for the daily tracks of 4,252 taxis in 2012 for obtaining characteristics of taxi mobility throughout the year.We considered over 4.5 billion GPS sample points.These tracks cover the entire city and are concentrated in the central urban zone.GPS sample points were taken at an interval of 24 seconds, and each sample point includes the serial number of the taxi, a time stamp, longitude, latitude, speed, and the number of passengers in the vehicle.We extracted over 25 million cases of passengers traveling by taxi.These passengers had an average stay of 13.6 minutes inside the vehicle and an average displacement of 4.2 km.
The number of passengers traveling by taxi in 2012 is depicted in Figure 9.It can be observed that the residents travel by taxi at a regular weekly interval.Some abnormal points exist where the number of travelers was extremely low.We analyzed the facts behind the abnormalities and that these anomalies happen on special days (e.g., the Chinese New Year's Eve on January 22) or during terrible weather conditions.In particular, urban traffic was low during the rainstorm on July 26 and found to one-third of the average daily level.However, there was little influence on traveling behavior on ordinary holidays such as the National Day.

Conclusions
In this paper, we analyzed urban resident travel behavior patterns in Tianjin by using taxi GPS data.Analyzing taxi mobility in the cities during the holidays enables us to study the behavioral patterns of the people.The analysis of the current study will yield good insights for the administration of transportation needs.We analyzed the impact of the spatial distribution of urban residents' pick-up and drop-off locations by taxi, their travel distance and travel direction, and the taxis' scope of activities during the holidays.Based upon results, we concluded the following: (1) the holidays do not affect the spatial distribution of residents' pick-up and drop-off locations by taxi, travel distance, or travel direction; (2) human travel behavior tends to have a stable regularity; (3) during the holidays, taxis have a larger scope of activities, which may be associated with the city's economy as well as taxi operating patterns.The research contribution cited above will help the transportation administration to allocate appropriate resources during the holidays.However, residents often choose taxis as a way to travel in addition to using other means of transportation, such as walking, buses, and cars, whereas, in addition to GPS data, many other sources like mobile phone records can also represent human movement.Future research in the field should be focused

Figure 1 :
Figure 1: Statistics of OD location numbers at different times.(a) Number of O locations.(b) Number of D locations.

Figure 2 :
Figure 2: Distribution of displacement by taxi.

Figure 3 :
Figure 3: Distribution of displacement for short-distance travels.

Figure 4 :
Figure 4: Spatial distribution of OD locations during (a) the first part of October (Oct 01-07), (b) the second part of October (Oct 08-14), (c) the third part of October (Oct 15-21), and (d) the fourth part of October (Oct 22-28).

Figure 6 :
Figure 6: The probability distribution of residents' travel direction.

Table 1 :
The data statistics of GPS data records.

Table 2 :
Akaike weights chosen in the AIC model for displacement distribution.

Table 3 :
Result of optimal fitting on displacement distribution.

Table 4 :
Results achieved by fitting displacement in piecewise exponential distribution.
3.3.Analysis of Residents Travel Distance.In order to facilitate the statistical spatial distribution of OD locations, we divide (d) Oct 22-Oct 28

Table 5 :
Hellinger distance of travel distance distribution.

Table 6 :
Hellinger distance of travel direction distribution.