The increasing availability of public transit smart card data has enabled several studies to focus on identifying passengers with similar spatial and/or temporal trip characteristics. However, this paper goes one step further by investigating the relationship between passengers’ spatial and temporal characteristics. For the first time, this paper investigates the correlation of the spatial similarity with the temporal similarity between public transit passengers by developing spatial similarity and temporal similarity measures for the public transit network with a novel passenger-based perspective. The perspective considers the passengers as agents who can make multiple trips in the network. The spatial similarity measure takes into account direction as well as the distance between the trips of the passengers. The temporal similarity measure considers both the boarding and alighting time in a continuous linear space. The spatial-temporal similarity correlation between passengers is analysed using histograms, Pearson correlation coefficients, and hexagonal binning. Also, relations between the spatial and temporal similarity values with the trip time and length are examined. The proposed methodology is implemented for four-day smart card data including 80,000 passengers in Brisbane, Australia. The results show a nonlinear spatial-temporal similarity correlation among the passengers.
Analysing passengers’ movements in a public transit network is important in understanding passengers’ travel behaviours and designing more customised public transit services. By identifying passengers with similar spatial and/or temporal trip patterns and understanding their characteristics, transit operators could design transit services that better meet different needs of different passenger groups and develop strategies to influence travellers to use the existing transit network more efficiently. For instance, if a group of passengers every day boards on a specific bus route at a specific stop and time, and they change the bus at another stop to arrive at their final destination, then a specific bus (or a minibus) can be allocated to that group of passengers between their first boarding and last alighting stop for particular periods.
With the availability of transit smart card data that provide information on boarding and alighting locations and times for each passenger trip, it is now possible to analyse spatial and temporal movement patterns for each passenger and compare them across passengers, thereby allowing a deeper understanding of individual passengers and their relationships. And each spatial and temporal dimension of the movement has its measures and units, which makes it difficult to study these dimensions simultaneously [
Exploring similarities among passengers’ trips, where trip similarity can be defined regarding spatial and/or temporal dimensions, can discover relationships among the passengers. By identifying how similar two passengers’ trips are spatial and temporal, the “passenger similarity” can be defined as a composite measure of trip similarity between two passengers. Such a passenger-level similarity measure can help the analysis of passenger characteristics. These measures can help the design and development of various customer-centric transit services and mobility applications. Examples of such applications include demand responsive transport (DRT) systems [
The paper, for the first time (to the best of our knowledge), investigates the correlation of spatial similarity with the temporal similarity between public transit passengers. It measures the spatial similarity and temporal similarity of public transit passengers with a passenger-based perspective, in which one or more trips model each passenger. The spatial and temporal similarity measures are developed for the public transit network. The spatial similarity measure considers direction as well as the distance between the trips of the passengers. The temporal similarity measure considers both boarding and alighting time in a continuous linear space. The spatial-temporal similarity correlation is analysed using histograms, Pearson correlation coefficients, and hexagonal binning. In addition, the relation between the spatial and temporal similarity values with the travel time and length is examined. The proposed methodology is implemented for four-day smart card data including about 180,000 trip legs for 80,000 passengers in Brisbane, Australia.
In brief, the scientific contribution of the paper is threefold. First, a passenger-based perspective that characterizes passengers by both boarding and alighting transactions’ time and location is developed to study the travel behaviour in the public transit system. Second, specific metrics based on smart card data are developed for measuring the spatial and temporal similarities between passengers in the public transit network. Third, the correlation between the spatial and temporal similarity values is investigated.
The rest of the paper is structured as follows. The Literature Review discusses recent studies in these fields. Methodology describes the proposed methods for measuring the spatial and temporal similarities between the passengers. Case study and analysing the correlation are explained in the Results. Finally, Conclusion includes discussion, potential applications, and plans.
Studying both spatial and temporal aspects of public transit passengers’ trips has been focused on just recently. Nishiuchi et al. (2013) explored spatial, daily, and hourly variations in the travel characteristics defining regularity indices for both spatial and temporal aspects. They investigated the variations on a smart card dataset extracted from 31749 passengers’ trips during one month. They found out that there is no significance difference between the numbers of hourly trips during weekdays [
Kieu et al. (2015) used a modified DBSCAN algorithm to discover spatial travel patterns at stop levels from a smart card dataset [
The existing literature recently concentrates on the data mining techniques for discovering the spatial and temporal patterns in the public transit system using the smart card data. Most of the studies focused on discovering the spatial patterns or regularity levels at the stop level in the public transit network. Stop or route-based perspectives ignore whether the same or different passengers make the trips. Also, a few number of studies examined the temporal patterns discretising temporal dimension using time windows. However, none of those studies mentioned above explored the spatial and temporal aspects of the passengers’ trips with a passenger-based perspective, which focuses on the passengers as dynamic objects characterized by both boarding and alighting transactions’ time and location; the previous studies consider just boarding or alighting transactions at different levels of their study. In addition, none of the above-mentioned studies considered the direction in measuring spatial similarity nor considered time in a continuous linear space. Furthermore, no studies above investigated the correlation between the spatial similarity and temporal similarity of the passengers. Therefore, examining questions like “how the passengers’ trips are correlated in the spatial and temporal dimensions” or “whether passengers with similar spatial trip patterns would be likely to have similar temporal trip patterns (i.e., passengers travelling similar places would also choose similar departure times)” or “what is the temporal (spatial) similarity between two passengers’ trips given their spatial (temporal) similarity” or “is there any relation between spatial and temporal similarity values with trip length or time” are neglected in the literature; in other words, the paper aims to answer these questions.
Also, discovering the correlation can help to improve the Demand Responsive Transit (DRT) and friend recommendation systems in the public transit network. Knowing the correlation between the spatial and temporal similarities, performance of the DRT system can be improved in two ways. First, spatial or temporal similarity values can be predicted by knowing the correlation between them; hence, it would be just necessary to measure one of the spatial or temporal similarity values (e.g., datasets that include just spatial or temporal attributes). Second, the conditional probability models can determine the probability of having the spatial or temporal similarity at different ranges of the similarity values. The spatial and temporal similarity values have different relations in different ranges; hence, it can be used to design different DRT services according to outcome of the conditional probability models. Furthermore, the probability of encountering two passengers in the public transit network can be predicted considering the spatial-temporal similarity correlation, which leads to improving the performance of the current friend recommendation services.
The proposed method aims to investigate the spatial similarity and temporal similarity between the passengers to discover the correlation between the similarities. It uses smart card data to reconstruct passenger trips. Also, it develops the spatial and temporal similarity measures in the public transit network. The measures are used to calculate the similarity matrices. The spatial and temporal similarity matrices are used to draw histograms, calculate Pearson correlation coefficients, plot hexagonal binning diagrams, and examine the relations between trip time and length with the similarities’ values. Figure
Methodology overview.
The smart card dataset includes time and location for the boarding and/or alighting transactions. The dataset first needs to be cleaned [
A trip is a movement in both spatial and temporal dimensions which have different concepts and metrics to quantify. The spatial space is a 2-dimensional plane, in which objects can move back and forward in both dimensions, while the temporal space is a 1-dimension linear space, in which objects just can go forward. Also, the spatial and temporal dimensions have different units like meters and minutes. Hence, the spatial and temporal dimensions of the movement are studied in separate frameworks. In addition, a passenger moves in the network simultaneously in both spatial and temporal dimensions. Figure
Spatial and temporal movements by a passenger.
The proposed method for measuring the similarities is passenger-based, although most of the literatures are stops or route-based studies. Considering passengers as dynamic objects with one or more trips in the public transit network can disclose undiscovered behaviours in the network. Stop or route-based studies are usually indifferent to the passengers; they emphasize the trips disregarding whether one passenger or different passengers make the trips. However, the passenger-based perspective models the passengers with all their trips during a day. One or more trips characterize the passenger behaviour during a day. The passenger-based perspective discovers relations between the passengers. For instance, it can investigate the similarity between two passengers’ behaviour in the public transit network.
There are some trajectory similarity measures such as longest common subsequence (LCS), Fréchet distance, dynamic time warping (DTW), and TRACLUS. Briefly, LCS defines a spatial proximity threshold and two points are considered as similar or not based on the threshold; Fréchet minimizes the maximum distance between two trajectories; DTW minimizes the sum of distances at each point of the trajectories; TRACLUS is a density-based clustering algorithm that partitions the trajectories and considers closer parts as similar. The ones above are the major measures introduced, studied, and compared in the literature. Each measure has its pros and cons that make them suitable for specific applications and networks [
All the measures above emphasize the distance between the trajectories’ points as the main criterion of the similarity. They are suitable for trajectory datasets that consist of location measurements every few meters. However, the smart card datasets include just two locations for each trip legs. Also, there are some cases in the public transit network that need to consider direction or angle between the trajectories as well as the distance. For instance, two passengers board on the same stop but travel to different stops that are located at different directions; in this case, the boarding stops are at the same location, but the trajectories are not similar. Regarding the existing algorithms, the public transit network, the structure of the smart card data, and the passenger-based perspective two criteria are considered for verifying the spatial similarity between the passengers’ trips: The distance between the origins or the destinations The direction of the trips
The distance is assumed as 600 meters based on the studies in travel behaviour of public transit passengers and the walking speed [
Spatial similarity examples between trips.
Equation (
Spatial similarity measure between trips is as follows:
Equation (
for ( for ( if (
The next step is measuring the temporal similarity. Previous studies mostly consider temporal dimension as a discrete variable and model it using time windows, which leads to biased results especially in times closer to the threshold or the boundary of time windows. Also, the previous studies consider boarding or alighting time as a representative of the temporal aspect of the movements. For instance, time windows are between 7 a.m. to 8 a.m. (called it “E”) and 8 a.m. to 9 a.m. (called it “
An example for discrete temporal similarity.
Boarding time/time window | Alighting time/time window | |
---|---|---|
Passenger 1 | 7:30/E | 8:20/F |
Passenger 2 | 8:01/F | 8:21/F |
Passenger 3 | 7:48/E | 7:59/E |
Regarding the instance provided in Table
Temporal similarity measure between trips is as follows:
Passengers can have more than one trip during a day. Trips of a passenger are temporally unique; a passenger cannot have more than one trip at the same period. The overlapped time between two trips of two passengers cannot be covered with any other trips time of these two passengers. Hence, calculating the temporal similarity between two passengers with multiple trips is simpler than the spatial similarity measuring. The temporal similarity between two passengers is assumed as the ratio of the sum of the overlapped time between the trips to the greater sum of the all trips time. Figure
An example for the temporal similarity measure between passengers.
Algorithm
for ( for ( if (
Correlation discovers statistical relationships between usually two variables. The relationships can be linear or nonlinear. Pearson correlation coefficient is used to examine the linear correlation between two variables. The coefficient is measured on a scale with no units and can take a value from −1 through 0 to +1. The values close to zero mention no linear correlation and values close to +1 or −1 imply a perfect linear correlation [
The used smart card dataset is from TransLink, the public transport authority of South East Queensland (SEQ), Australia. The dataset for three weekdays and one weekend day from the South East Queensland SEQ bus, train, and ferry modes are selected. Wednesday to Saturday (20–23 March 2013) are chosen as the weather on all four days was normal, and there were no special events during those days. 20,000 passengers randomly are selected for each day, who approximately make 45,000 trip legs per day. The sample size for each day is almost 10% of the whole number of transactions. Considering the analysis from Alsger et al. (2017), the sample size can appropriately represent the whole dataset [
Each similarity matrix consists of 400 million cells given that it is a symmetric matrix with a size of 20,000
The maximum frequency of the spatial similarity values happens at the range of 0.05, 0.1 and then it decreases until the range of 0.45, 0.5. Two apparent irregularities occur in the diagrams: the first one at the range of (0.45, 0.5) and the second one at (0.95, 1). The former is related to passengers, who at least (approximately) half-length of their trips is in the same corridor. The reason for the first irregularity can be related to the geometry of the public transit network that is divided into two parts by a river and the location of main hubs stations. Figure
Spatial similarity histograms.
Example of spatial similarity range of 0.45–0.5.
The temporal similarity matrices for the four days are generated using (
Temporal similarity histograms.
Correlation between the spatial and temporal similarity matrices can help to understand how spatial similarity and temporal similarity between two passengers are related together. Also, examining the correlation can contribute to developing a prediction model for predicting the probability of passengers encountering in the public transit. Pearson correlation coefficient is used to investigate the linear dependency between the two similarity matrices. Table
Pearson values.
Pearson (spatial similarity, temporal similarity) | |
---|---|
Wednesday | 0.04 |
Thursday | 0.03 |
Friday | 0.03 |
Saturday | 0.01 |
At each point of the hexagonal binning diagrams (specific values of the spatial and temporal similarities), size and colour of the hexagonals represent the number of passenger pairs. The diagrams can be divided into five areas of different colours. In all of them, it is more likely to have some temporal similarity with zero spatial similarity rather than have some spatial similarity with zero temporal similarity, because number of the passenger pairs on the vertical axis (spatial similarity = 0) is more than the ones on the horizontal axis (temporal similarity = 0); in other words, making trips in the similar periods of a day is more probable than in the similar routes. Also, the density of the diagrams decreases by receding from the origin point. At all the charts, the spatial similarity between 0.46 and 0.52 identifies a boundary after which density changes; the range 0.46, 0.52 of the spatial similarity values is close to the range of 0.45, 0.5 at the spatial similarity histograms, which has a higher frequency among its neighbours. Moreover, there is more consistency in the temporal similarity rather than the spatial similarity. For instance, for all the spatial similarity values between 0 and 1, increasing the temporal similarity decreases the probability of having the same spatial similarity. Also, if the temporal similarity is between 0 and 0.2, the spatial similarity is between 0.52 and 1, and if the spatial similarity increases, the probability of having the same temporal similarity will increase. The results from the hexagonal diagrams can be used in developing a conditional probability model. For instance,
Hexagonal binning diagrams.
Figure
Spatial similarity and trip length diagram.
Figure
Temporal similarity and trip time diagram.
All the analyses are done for the four days that include three weekdays and one weekend. Achieved results from the histograms, coefficients, and diagrams are close to each other. The similarity between the results from the different days can prove the stability of the method and analyses. The similarly discovered correlations between the spatial and temporal similarity matrices of the passengers for various days validate the results.
This paper measures the spatial similarity and temporal similarity of public transit passengers with a passenger-based perspective, in which each passenger is modelled by one or more trips. The spatial and temporal similarity measures are developed for the public transit network. The spatial similarity measure considers direction as well as the distance between the trips of the passengers. The temporal similarity measure considers both boarding and alighting time in a continuous linear space. Furthermore, this paper investigates the spatial-temporal similarity correlation between passengers of the public transit system. The related similarity matrices are calculated for four-day smart card datasets including approximately 45,000 trip legs of 20,000 passengers per day. The values of similarity matrices are examined using histograms. A linear correlation between spatial and temporal similarity matrices is calculated using Pearson coefficient. The hexagonal binning technique is used to plot the frequency of correspondence values of the spatial and temporal similarity matrices. In addition, relations between the spatial similarity and the trip length of the passengers are explored by plotting 3-dimensional scatters and density diagrams. Also, 3-dimensional scatters and density diagrams of temporal similarity-trip time-trip time is plotted for investigating the relation between the temporal similarity and the trip time.
The passenger-based perspective leads to revealing the spatial and temporal relations between the passengers. 97% of the passengers have a level of spatial similarity with at least one other passenger in the dataset. Also, all passengers have a level of temporal similarity with at least one other passenger. In addition, the spatial similarity histograms show more frequency for lower values of the similarity excepting the two intervals of 0.45, 0.5 and 0.95, 1. The geometry of the network triggers the former range, and the latter shows the inclination of the passengers for having the complete spatial similarity. Likewise, values of the temporal similarity matrices are more frequent with lower values of the similarity at all the ranges; passengers are more likely to have smaller temporal similarity rather than greater values.
The spatial and temporal measures are calculated for all the pairs of passengers. The developed measures show relations with trip time and length. Passengers with closer trips length are more likely to have the spatial similarity; if the difference between the trip lengths of two passengers increases, then the probability of having the spatial similarity will decrease. Also, passengers with the trip time of 60–90 minutes are more likely to have higher temporal similarity with each other; if the trip time is between 60 and 90 minutes, then it will be more likely to have the temporal similarity between the passengers.
The examined correlation between the spatial and temporal similarity matrices shows a nonlinear dependency. The Pearson coefficient presents a weak linear correlation, close to zero, between the similarity matrices; positive values of the coefficient imply an uphill relationship between the spatial and temporal similarity values. Also, the hexagonal binning diagrams present nonlinear correlation with the specific patterns; the diagrams can be divided into separate sections, and specific trends can be extracted from each section, which would develop probabilistic models.
The computational complexity should be discussed. The computational complexity of the method is O (
Discovering correlation between the spatial and temporal similarities of the passengers can answer the questions about the relations between the spatial and temporal similarities. Also, understanding the correlation can improve the efficiency of DRT and friend recommendation systems. DRT systems usually work based on the similar spatial and temporal patterns of the passengers’ trips; if a group of passengers has similar travel patterns in both spatial and temporal dimensions, then a DRT service can be allocated to them. Spatial and temporal patterns consider passengers in groups, while the correlation can analyse relations between the passengers at individual levels and for all pair of the passengers (the patterns and correlation are two different perspectives). By knowing the correlation between the spatial and temporal similarities of two passengers, performance of the DRT system can be improved in two ways. First, spatial or temporal similarity values can be predicted by knowing the correlation between them; hence, it would be just necessary to measure one of the spatial or temporal similarity values (e.g., datasets that include just spatial or temporal attributes). Second, the conditional probability models can determine the probability of having the spatial or temporal similarity at different ranges of the similarity values. The spatial and temporal similarity values have different relations in different ranges; hence, it can be used to design different DRT services according to the outcome of conditional probability models. Furthermore, the probability of encountering two passengers in the public transit network can be predicted considering the spatial-temporal similarity correlation, which leads to improving the performance of the current friend recommendation services.
Additional analyses can be performed to extend this work. The proposed method may be implemented on public transit systems in other cities and the results to be compared. Also, the effects of other parameters, such as geographical location or start or end time of trips, on the spatial-temporal similarity correlation could be examined if quality data becomes available. In addition, it would be valuable to develop a local search method to do the computation just for the passengers with similarity; in other words, the proposed method discovers passengers who do not have any potential for having spatial or temporal similarities. Furthermore, the spatial-temporal similarity correlation can be reviewed with trip purpose similarity of the passengers.
The authors declare that they have no conflicts of interest.