Assigning Passenger Flows on a Metro Network Based on Automatic Fare Collection Data and Timetable in Shenzhen Metro System

: Assigning passenger flows on a metro network plays an important role in passenger flow analysis that is the foundation of metro operation. Traditional transit assignment models are becoming increasingly complex and inefficient. These models may even not be valid in case of sudden changes in the time table or disruptions in the metro system We propose a methodology for assigning passenger flow on ametro based on Automatic fare collection(AFC) data and realized time table. We find that the routes connecting a given Origin and Destination(O-D) pair are related to their observed travel times (OTTs) especially there pure travel times (PTTs)abstracted from AFC data combined with realized time table.A novel clustering Algorithm is used to cluster trips between a given O-D pair based on PPTs /OTTs and complete the assignment .An intial application to categorical O-D pairs on the Shenzhen metro system, which is one of the largest system in the world, shows that the proposed methodology works well .Accompanying the initial application,an interesting approach, is also provided for determining the theoretical maximum accuracy of the new assignment model.

As an efficient transport system, the metro system is now the mainstay of urban passenger transport in many megacities, especially in highly populated areas. passenger flow is the foundation of making and coordinating operation plans for a metro network plays an important role in analyzing passenger flows. A number of studies have developed passenger flow assignment models .However ,these models are becoming increasingly complex because of many diverse parameter types .In the case of sudden changes in the timetable or disruptions in the metro system ,these models may not be valid. 1. Different from private cars ,a metro system is operated according to the timetable ,which is an important constraint for a passengers travel .New technologies are widely introduced into metro systems ,resulting in improvements in passenger flow assignment .For example the automatic fare collection(AFC) system has become the main method for collecting metro fares in many cities in the world .This system records the origin and destination station of a trip and their corresponding time stamps .The transaction data obtained through these AFC system contain a vast amount of archived information on how passenger use a metro system. Up to date ,however, there are limited studies on AFC data or how to assign passenger flows efficiently by combining these data with the timetable. 2. This paper mainly focuses on how to efficiently model the passenger flow assignment problem for a metro network with AFC data and timetable.

Timetable and AFC Transaction Data 1.Timetable Information
The metro timetable contains the set of all train trips with arrival and departure times per station and per train number. Figure 1is an example of the timetable for a metro line in the Shenzhen metro system .Since the metro system e-ISSN: 2456-3463 International Journal of Innovations in Engineering and Science, Vol. 3, No.5 2018 www.ijies.net is operated based on the timetable, a passengers travel time between the origin and destination stations is subject to not only the chosen route but also the timetable.

.AFC transaction Data
The assignment address in this paper obviously requires AFC transaction data. The ID number of a smart card holder is recorded each time the holder passes the entry or exit gates, and the corresponding transaction record indicates a unlinked trip. These smart card transaction records provide information on ID numbers, the date ,departure station ,passage time at an entry gate, arrival station ,and passage time at an exit gate. The entry and exit times are recorded in the exact number of seconds, based on which observed travel times can be obtained. Example AFC data are shown in Table 1.Our initial analysis ( Figure  2) of the observed travel times indicates that the routes connecting a given O-D pair are related to their observed travel times, although there is also travel time uncertainty the route level.

Problem Description
The observed travel time is relevant to the passenger travel process, Figure 3 shows a typical travel process of a metro passenger .  The relationship between the possible routes for a given O-D pairs and corresponding PTTs which delete ENTs, EWTs, and origin WTs from OTTs ( Figure 5). How to abstract these OTTs/PTTs based on AFC data and then complete the passenger flow assignment with them? The objective is to propose a methodology to assign passenger flow on a metro network based on travel time (OTTs/PTTs) abstracted from AFC data. The following approaches are used: 1 .We propose a transit assignment model using revealed information including AFC data and realized time table of metro system .

2.
We introduce a novel clustering approach to conduct the assignment .It is only based on the distance between data points and can detect non spherical clusters and automatically the correct number of clusters.

3.
We find that PTT is better than OTT when being used for clustering it can reduce variation of travel times for O-D pairs to a great extent.

4.
We also provide an approach, accompanying the initial application to categorical O-D pairs on the Shenzhen metro network, for determining the theoretical maximum accuracy.

II-LITERATURE REVIEW
Passenger flow is required to make and co -ordinate operational plans for a metro system. Conventionally, models to solve passenger flow assignment problems can be classified according to whether war drops principle is flowed [5].One model is the non equilibrium assignment, and the other is the equilibrium assignment model [6]. Moreover, it is assumed tht the process of passenger's choice has some random characteristics because of imperfect knowledge of travel time, individual differences ,measurement errors ,and so [5][6][7].therefore, confronted with today's metro systems, the result from passenger's route choices can be described more appropriately by the stochastic user equilibrium (SUE) with time and space constraints, which is the passengers can choose th train to arrive at their destination or transfer station. The proved by some simulation experiments [5,8]and full-scale case tests [4]. Up to date, those models to solve a SUE problem are becoming increasingly complex due to the many diverse parameter types. Through review were presented in some of the literature [2,3,9,10].
In recent years, automatically collected fare data such as smart card data have been used by transit service providers to analyze passengers demand and system performance. these data have been used for O-D matrices estimation [11,12], travel behavior analysis [15], operational management, public transist planning [16][17][18], the stud0ies on the use smart card data can be grouped into 3 categories: strategic(long term planning), tactical(service adjustment and network development), and operational(ridership statistics and performance indicators).
Chan developed two applications based on Oyster card data in the London Underground d : one of these estimated an O-D flow matrix ,while other constructed rail service reliability metrics .This is the first attempt at measuring service delivery quality using elapsed travel time . X u et al.try to estimate metro passengers route choice behavior using smart card data and proposes a new model for passenger flow assignment based on an AFC system environment .However, the problem of calibrating the vast number of parameters in behavior functions such as arrival /departure distributions still exist. Zhu  The existing studies on transit assignment models with AFC data are either too simple or too computationally costly and should be improved.

OTT and PTT Abstracting Approach.
Since CITs and COTs are recorded in the AFC data, it is convenient to obtain OTTs .This section focuses on abstracting PTTs from AFC data. We first give some basic definitions on the train timetable. , exit_st_no), where enter_st_no and _ _ represent enter station ID and exit station ID, respectively.

Determination of BOT
Let be its CIT from AFC data, and let , be its enter_st_no. Searching every train which runs through the station S l,i in order, the train that stops at station S l ,i at time t can be determined by locating j such that The passengers can choose th train to arrive at their destination or transfer station. The search process is illustrated in Figure6. Therefore, the ATT can be set as BOT← ,j' (2) Figure 6:Illustration of how to get BOT

Determination of GOT:
Similarly, let be its COT from AFC data , and let , be its exit_st_no, as shown in Figure7 Passengers will check out from station once they get off trains, it is simpler for containing no waiting time comparing passengers' entry. Thus, GOT is equal to j ,,i,' . It should be noted that there is a minimum interval threshold between CIT and BOT as well as GOT and COT, because walking or waiting will also need time.
Therefore, smart card data can be trimmed as follows:

3.2.A Novel Clustering Approach.
Since the AFC transaction data can be used to estimate passengers' route choices, it is possible to use it for passenger flow assignment. To achieve this, we have applied cluster analysis techniques. Unlike the existing assignment model, the cluster analysis technique in this paper clusters trips between a given O-D pair based on PTTs/OTTs derived from the AFC data. It then assumes that similar PTTs/OTTs are linked to the same route.
Cluster centers for a given O-D pair are considered the expected travel times (ETTs) of the feasible routes, and PTT/OTT is assigned to th e corresponding cluster center.
Several clustering strategies have been proposed, including the -means method [26], the -medoids method [27], distribution-base d algorithms [28], density-based algorithms [29],and the mean-shift method. However, a novel clustering approach was recently proposed by Laio and Rodriguez [30]. We have used this method for the following reasons 1) The -means and -medoids methods cannot detect non spherical clusters, because a data point is always assigned to the nearest center .The OTTs for a given O-D pair consist of non spherical clusters 2) Distribution-based algorithms attempt to reproduce the observed data points using a mix of predefined probability distribution functions. The accuracy of such methods depends on how well the trial probability represents the data.
(3) Density-based algorithms choose an appropriate threshold which may be nontrivial, though clusters with an arbitrary shape can be easily detected by approaches based on the local density of data points.
(4) The mean-shift method only works for data defined by a set of coordinates and is computationally costly, although it does allow for non spherical clusters and does not require a non trivial threshold.
(5) The clustering approach proposed by Laio and Rodriguez [30] is superior, because it is only based on the distance between data points, it can detect non spherical clusters ,and it automatically determines the appropriate number of clusters.
The adopted clustering approach is based on the idea that cluster centers are characterizing d by a higher density than their neighbor s and by a relatively large distance from points with higher densities. For each data point , we compute two quantities: its local density and its distance from points of higher density. Both these quantities depend only on the distances between data points, which are assumed only on the distances n between data points, which are assumed to satisfy the triangular inequality. The local density I of data point i is defined as Where ψ ( ) = 1if <0and( ) = 0otherwise. is a cutoff distance, and is the number of points that are closer than to point . The algorithm is only sensitive to the relative magnitudes of values for different points. This implies that, for large data sets, the results of the analysis are robust with respect to the choice of .
is determined by computing the minimum distance between point and any other point with higher density. That is, = min : > ( ).
For the point with the highest density, we conventionally take = max( ). Note that is much larger than the typical nearest neighbor distance only for points that are local or global maxima in the density. Thus, cluster centers are recognized as points for which the value of is anomalously large(as shown in Figure8). (2) O-D pairs that have only one feasible route when we consider the travel cost threshold, although there is more than one physical route on the network.
In both of the abovementioned cases, all the passengers for the O-D pair are assigned to only one route. And the procedure is similar to All or Nothing Assignment Model.
Taking the Shenzhen metro as an example, the feasible route set for a given O-D pair is generated using a two step route generation method. First, the th-shortest path algorithm is applied and a universal route set is generated based on the physical topology of the metro network .Second, the universal set is filtered by judging the rationality of alternative routes based on the difference in the travel costs of the alternative and shortest route. This narrows the feasible route set.
The initial statistics of the Shanghai metro network demonstrates that there is a large percentage of O-D pairs with a single route(35.98%interms of O-D pairs and 60.15% in terms of trips).  Figure 9(a). Using the abstracting approach proposed in Section 3.1, OTTs can be further fined to PTTs shown in Figure 9(b). We computed two quantities for each point of PTTs in this example data: its local density ( ) and its distance from points with higher densities ( ), with the corresponding decision graph being shown in Figure 9(c).

Passenger Flow Assignment for the O-D Pairs with
We can see that two points (blue and red) have large values and a size able density. These two points Correspond to cluster centers, which represent the expected PTTs of two routes between the O-D pair. After determining the two centers, each point is assigned to a cluster, which is used to calculate route choice probabilities for the O-D pair (Figure9(d)).  (Figure 10(a)), or the origin station of an O-D pair is a transfer station (Figure10(b)),we cannot judge which train a passenger boards on in reality and consequently the corresponding PTT is not determinate .The following section discusses how to estimate these categories of O-D pairs. this category of O-D pairs.

Figure 10:
Illustrations of cases where we cannot judge which train a passenger boarded on and corresponding PPT is not determinate.

Extended Discussions to the Proposed Approach.
From the analysis in the previous sections, the proposed approach in this paper can efficiently estimate metro passenger route choices using a novel clustering technique and processed AFC data (PTTs/OTTs) and consequently provide appropriate passenger flow assignments on a metro network. Furthermore, the approach implies the potential of measuring its minimum and maximum accuracy; the minimum practice by classifying all the O-D pairs into several categories. Taking the Shenzhen metro network as an example, as shown inTable2, we can measure the minimum and maximum accuracy of the approach as follows.
(1) O-D pairs with single route :the passenger flow assignment using the proposed approach is accurate for this category of O-D pairs because there is only one feasible route between a given O-D pair and a passenger's route choice is unique. For the Shenzhen metro network, the corresponding percentage is 35.98% in terms of O-D pairs, as well as 60.15% in terms of trips. There is a large percentage of OD pairs for which the estimated route choices are always correct, regardless of the assignment model. This is a n interesting characteristic of a metro network compared with an urban road network.
( (5) Others: except the above categories of O-D pairs ,the remainder is those O-D pairs for which the proposed approach cannot guarantee giving an accurate assignment but may have the potential of approaching the actual route choices in theory. In summary, based on the above discussions for different categories of O-D pairs, the minimum and maximum accuracy of the proposed approach with the clustering technique and AFC data can be measured in practice. Taking the Shenzhen metro network as an example, the proposed approach is accurate for 94.10% of trips, cannot be accurate for 5.28% of trips, and may be accurate for 0.62% of trips. And the total accuracy range is 75.43%∼79.54%in terms of O-D pairs with 89.51%∼96.13%in terms of trips.

Concluding Remarks.
A metro system is operated based on the timetable. Developments in the application of ADC systems such as AFC systems have made the collection of detailed passenger trip data in a metro network possible. In this paper, we aim to propose an efficient approach to assign passenger flows on a metro network k combing AFC data and time table. The advantages of the proposed approach include the following: (1) A posteriori transit assignment model, which uses V information including AFC data and timetable of metro systems rather than a priori knowledge, wa s proposed.
(2) A novel clustering approach was introduced to conduct the assignment .It is only based on the distance between data points and can detect non spherical clusters and automatically the correct number of clusters.
(3) It was found that PTT is better than OTT when being used for clustering, because it can reduce the variation of travel times for O-D pairs to a great extent.
(4) Accompanying the initial application to categorical O-D pairs on the Shenzhen metro network, an interesting approach was also provided for determining the theoretical maximum accuracy of our proposed assignment model. However, some additional issues still need to be addressed. For example, several unusual phenomena during peak periods such as "failing to board on" should be accounted for in the assignment process, and the computational efficiency of the approach should be further improved considering the massive amounts of AFC data and time table data. All the above mentioned is the prospective working the future.
Overall, this study provides a promising approach that can efficiently assign passenger flows on a metro network not only in the common case but also in the case of sudden changes in the time table or disruptions in the metro system e-ISSN: 2456-3463 International Journal of Innovations in Engineering and Science, Vol. 3, No.5 2018 www.ijies.net