Assigning passenger flows on a metro network plays an important role in passenger flow analysis that is the foundation of metro operation. Traditional transit assignment models are becoming increasingly complex and inefficient. These models may even not be valid in case of sudden changes in the timetable or disruptions in the metro system. We propose a methodology for assigning passenger flows on a metro network based on automatic fare collection (AFC) data and realized timetable. We find that the routes connecting a given origin and destination (OD) pair are related to their observed travel times (OTTs) especially their pure travel times (PTTs) abstracted from AFC data combined with the realized timetable. A novel clustering algorithm is used to cluster trips between a given OD pair based on PTTs/OTTs and complete the assignment. An initial application to categorical OD pairs on the Shanghai metro system, which is one of the largest systems in the world, shows that the proposed methodology works well. Accompanying the initial application, an interesting approach is also provided for determining the theoretical maximum accuracy of the new assignment model.
As an efficient transport system, the metro system is now the mainstay of urban passenger transport in many megacities, especially in highly populated areas [
Different from private cars, a metro system is operated according to the timetable, which is an important constraint for a passenger’s travel. New technologies are widely introduced into metro systems, resulting in improvements in passenger flow assignment. For example, the automatic fare collection (AFC) system has become the main method for collecting metro fares in many cities in the world. This system records the origin and destination stations of a trip and their corresponding timestamps. The transaction data obtained through these AFC systems contain a vast amount of archived information on how passengers use a metro system. Up to date, however, there are limited studies on AFC data or how to assign passenger flows efficiently by combining these data with the timetable.
This paper mainly focuses on how to efficiently model the passenger flow assignment problem for a metro network with AFC data and timetable.
The metro timetable contains the set of all train trips with arrival and departure times per station and per train number. Figure
Example of timetable.
The assignment addressed in this paper obviously requires AFC transaction data. The ID number of a smart card holder is recorded each time the holder passes the entry or exit gates, and the corresponding transaction record indicates an unlinked trip. These smart card transaction records provide information on ID numbers, the date, departure station, passage time at an entry gate, arrival station, and passage time at an exit gate. The entry and exit times are recorded in the exact number of seconds, based on which observed travel times can be obtained. Example AFC data are shown in Table
Example of AFC transaction data.
Data  Departure station  Time at departure station  Arrival station  Time at arrival station  Smart card number 

20141117  0248  09:15:45  1056  09:36:00  1416107917 
20141117  0751  09:20:00  0727  09:36:17  1282520204 
20141117  1060  09:13:56  0248  09:36:22  0934484109 
20141117  0411  09:22:54  0750  09:36:41  1069233288 

Frequency distribution of the observed travel times extracted from AFC data. The expected travel time of a route connecting a given OD pair is based on average travel time. It can be estimated by the cluster analysis technique proposed in this paper.
A singleroute trip
A multiroute trip
The observed travel time is relevant to the passenger travel process. Figure
Typical travel process of a metro passenger.
As mentioned in Section
Illustration of random WTs’ influence on OTTs.
Fortunately, it seems more promising to model the relationship between the possible routes for a given OD pair and the corresponding PTTs which delete ENTs, EWTs, and origin WTs from OTTs (Figure
Distributions of observed travel times (OTTs) and pure travel times (PTTs). Points with different colors (red and blue) belong to different routes.
The objective of this paper is to propose a methodology to assign passenger flows on a metro network mainly based on travel times (OTTs/PTTs) abstracted from AFC data. To achieve this goal, the following approach is used:
We propose a transit assignment model using
We introduce a novel clustering approach to conduct the assignment. It is only based on the distance between data points and can detect nonspherical clusters and automatically the correct number of clusters.
We find that PTT is better than OTT when being used for clustering. It can reduce the variation of travel times for OD pairs to a great extent.
We also provide an approach, accompanying the initial application to categorical OD pairs on the Shanghai metro network, for determining the theoretical maximum accuracy of our proposed assignment model.
The remainder of this paper is organized as follows. In Section
Passenger flow is required to make and coordinate operational plans for a metro system. Conventionally, models to solve passenger flow assignment problems can be classified according to whether Wardrop’s principle is followed [
In recent years, automatically collected fare data such as smart card data have been used by transit service providers to analyze passenger demand and system performance. These data have been used for OD matrices estimation [
Chan [
To the best of our knowledge, these existing studies on transit assignment models with AFC data are either too simple or too computationally costly and should be improved. This paper focuses on how to precisely and efficiently assign the real passenger flows on a metro network using AFC data and timetable.
Since CITs and COTs are recorded in the AFC data, it is convenient to obtain OTTs. This section focuses on abstracting PTTs from AFC data. We first give some basic definitions on the train timetable. The train timetable illustrates the relationship between space and time of train operation. The main information it contains are trains’ arrival and departure times at each station. Denote the set of metro lines as
Let
The passengers can choose
Illustration of how to get BOT.
Similarly, let
Illustration of how to get GOT.
Passengers will check out from station once they get off trains, it is simpler for containing no waiting time comparing passengers’ entry. Thus, GOT is equal to
Therefore, smart card data can be trimmed as follows:
From OD(ID, CIT, COT, enter_st_no, exit_st_no)
to OD(ID, BOT, GOT, enter_st_no, exit_st_no)
After the AFC data record is trimmed from OD(ID, CIT, COT, enter_st_no, exit_st_no) to OD(ID, BOT, GOT, enter_st_no, exit_st_no), PTT can be expressed as
Since the AFC transaction data can be used to estimate passengers’ route choices, it is possible to use it for passenger flow assignment. To achieve this, we have applied cluster analysis techniques. Unlike the existing assignment model, the cluster analysis technique in this paper clusters trips between a given OD pair based on PTTs/OTTs derived from the AFC data. It then assumes that similar PTTs/OTTs are linked to the same route. Cluster centers for a given OD pair are considered the expected travel times (ETTs) of the feasible routes, and PTT/OTT is assigned to the corresponding cluster center.
Several clustering strategies have been proposed, including the
The
Distributionbased algorithms attempt to reproduce the observed data points using a mix of predefined probability distribution functions. The accuracy of such methods depends on how well the trial probability represents the data.
Densitybased algorithms choose an appropriate threshold which may be nontrivial, though clusters with an arbitrary shape can be easily detected by approaches based on the local density of data points.
The meanshift method only works for data defined by a set of coordinates and is computationally costly, although it does allow for nonspherical clusters and does not require a nontrivial threshold.
The clustering approach proposed by Laio and Rodriguez [
The adopted clustering approach is based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. For each data point
For the point with the highest density, we conventionally take
The algorithm in two dimensions.
Point distribution. Data points are ranked in order of decreasing density
Decision graph for the data in (a). Different colors correspond to different clusters
After the cluster centers have been found, each remaining point is assigned to the same cluster as its nearest neighbor of higher density. The cluster assignment is performed in a single step, in contrast with other clustering algorithms where an objective function is optimized iteratively.
Although the proposed approach aims to assign passenger flows to the routes between a given OD pair, those OD pairs with single route should be identified first of all. There are two types of OD pairs with a single route:
OD pairs with a unique physical route on the network.
OD pairs that have only one feasible route when we consider the travel cost threshold, although there is more than one physical route on the network.
In both of the abovementioned cases, all the passengers for the OD pair are assigned to only one route. And the procedure is similar to
Taking the Shanghai metro as an example, the feasible route set for a given OD pair is generated using a twostep route generation method. First, the
The initial statistics of the Shanghai metro network demonstrates that there is a large percentage of OD pairs with a single route (35.98% in terms of OD pairs and 60.15% in terms of trips).
Except those OD pairs with single route, there are a large number of OD pairs with multiple routes, for which passenger route choices can be estimated using the determinate PTTs and proposed clustering technique.
Consider an example OD pair with two feasible routes on the Shanghai metro network. The distribution of OTTs and the corresponding probability density function are shown in Figure
Cluster analysis of the pure travel times for an OD pair with two feasible route.
The test OD pairs discussed in this section are those with determinate PTTs for which the passenger route choices can be estimated accurately to a great extent and consequently a precise passenger flow assignment result can be obtained. Taking the Shanghai metro network as an example, our initial calculations and analyses for all of the OD pairs on the network showed that there are 42611 OD pairs (35.39% in terms of OD pairs, 22.22% in terms of trips) falling into this category of OD pairs.
However, there are also other categories of OD pairs that may not be suitable for the estimation of passenger route choices using PTTs. In these cases, a passenger’s travel behavior is so complex that it is difficult to determine the passenger’s PTT. For example, if both the upstream and downstream are feasible directions for the origin station of an OD pair to the destination (Figure
Illustrations of cases where we cannot judge which train a passenger boarded on and the corresponding PTT is not determinate.
Both of upstream and downstream are feasible directions for the origin to the destination
The origin of an OD pair is a transfer station
For the category of OD pairs with multiple routes and indeterminate PTTs, we use the clustering technique and OTTs to estimate passenger route choices based on which the passenger flow assignment is completed. Our initial calculations and analyses for all the OD pairs on the Shanghai network show that there are 34472 OD pairs (28.63% in terms of OD pairs, 17.63% in terms of trips) falling into these categories of OD pairs.
Of course, the result from this assignment for the abovementioned OD pairs may not be accurate due to a possible wide range variation of OTTs. However, among these categories of OD pairs, there is still a kind of OD pairs for which the corresponding assignment result can be precise to a great extent. It is because the expected travel times of routes for a given OD pair falling into this kind of OD pairs are obviously different from each other, and consequently the corresponding OTTs can be clustered into the routes accurately. For the Shanghai metro network, the corresponding percentage is 5.06% in terms of OD pairs, as well as 7.14% in terms of trips.
Moreover, there are some OD pairs for which we cannot give accurate route choice estimations. Such OD pairs include those with similar expected travel times for its different connecting routes and with small flows from several to several dozen passengers. In the case of these OD pairs, the route choices of passengers are stochastic to a great extent. For the Shanghai metro network, the corresponding percentage is 19.46% in terms of OD pairs, as well as 3.87% in terms of trips.
From the analysis in the previous sections, the proposed approach in this paper can efficiently estimate metro passenger route choices using a novel clustering technique and processed AFC data (PTTs/OTTs) and consequently provide appropriate passenger flow assignments on a metro network. Furthermore, the approach implies the potential of measuring its minimum and maximum accuracy; the minimum and maximum accuracy can be approached in practice by classifying all the OD pairs into several categories. Taking the Shanghai metro network as an example, as shown in Table
Illustration of our approach’s accuracies for different categories of OD pairs.
OD pair category  OD pairs  Trips  

120409  100.00%  3937275  100.00%  
( 
43326  35.98%  2368217  60.15% 
( 


42611  35.39%  874857  22.22% 


(2.2.1) With obviously different route expected travel times  6088  5.06%  281124  7.14% 
(2.2.2) Some special OD pairs (similar expected travel times)  10978  9.12%  128241  3.26% 
(2.2.3) Some special OD pairs (small flows)  12460  10.34%  24194  0.61% 
(2.2.4) Others  4946  4.11%  260642  6.62% 




In summary, based on the above discussions for different categories of OD pairs, the minimum and maximum accuracy of the proposed approach with the clustering technique and AFC data can be measured in practice. Taking the Shanghai metro network as an example, the proposed approach is accurate for 94.10% of trips, cannot be accurate for 5.28% of trips, and may be accurate for 0.62% of trips. And the total accuracy range is 75.43%~79.54% in terms of OD pairs with 89.51%~96.13% in terms of trips.
A metro system is operated based on the timetable. Developments in the application of ADC systems such as AFC systems have made the collection of detailed passenger trip data in a metro network possible. In this paper, we aim to propose an efficient approach to assign passenger flows on a metro network combing AFC data and timetable. The advantages of the proposed approach include the following:
A posteriori transit assignment model, which uses
A novel clustering approach was introduced to conduct the assignment. It is only based on the distance between data points and can detect nonspherical clusters and automatically the correct number of clusters.
It was found that PTT is better than OTT when being used for clustering, because it can reduce the variation of travel times for OD pairs to a great extent.
Accompanying the initial application to categorical OD pairs on the Shanghai metro network, an interesting approach was also provided for determining the theoretical maximum accuracy of our proposed assignment model.
However, some additional issues still need to be addressed. For example, several unusual phenomena during peak periods such as
Overall, this study provides a promising approach that can efficiently assign passenger flows on a metro network not only in the common case but also in the case of sudden changes in the timetable or disruptions in the metro system.
The authors declare that they have no conflicts of interest.
The study is financially supported by the National Natural Science Foundation of China (71271153) and Program for Young Excellent Talents in Tongji University (2014KJ015).