Analysis of Spatial-Temporal Characteristics of Operations in Public Transport Networks Based on Multisource Data

Operational efficiency and stability are two critical aspects to measure bus systems. Influenced by many stochastic factors, buses always suffer from delay and bunching. Traditional studies focus on a single route and lack research on the systematic evaluation of bus network. In this paper, we propose a data-driven framework to analyze the efficiency and stability based on small granularity GPS trajectory data from the perspective of entire bus network. +e IC card data and route data are used to extract the boarding passenger number and topological structure, respectively. +e results show that the average headway of stations follows a lognormal distribution. Moreover, the distribution of arrival efficiency of stations is inhomogeneous and a small number of stations have large values. In addition, the relationships among average headway of stations, boarding passenger number, bus number, and complex network indicators are revealed. It is found that the average headway of station is negatively correlated with other indicators, which implies that complex network connections and more passenger flows could weaken the efficiency of bus operations.+is paper provides a way to evaluate the operational performance of bus networks and could give help for monitoring and optimizing the daily operation of bus systems.


Introduction
Nowadays, public transport plays a growing role in alleviating traffic congestion and reducing greenhouse gas emission. Building effective, convenient, and stable public transport has become a crucial step to solve urban traffic problems in many countries. e efficiency and stability of public transport are two core problems that are concerned by both travelers and operators. Affected by many stochastic factors such as weather, congestion, passenger flow, and drivers' behavior, bus delay and bunching occur in many routes during the operation time [1][2][3][4]. Bus bunching is that the adjacent buses belonging to the same route are too close to each other. Bus bunching could impact the uniformity of buses and lead to big interval from the other buses, which enhance passengers' waiting time and result in low efficiency. In some bus systems with dense stations and large passenger flow, a small disturbance of buses in a route may spread all over the network. So, it is imperative to construct a robust public transport system to provide better service.
e development of public transport information technology provides a powerful tool to monitor and manage the transportation system. e Automatic Vehicle Location (AVL) system and Automatic Passenger Count (APC) system could record the travel trajectory and passenger information. In recent years, many studies focused on travel time, delay, and reliability of buses based on AVL data [5][6][7].
To address these problems, numerous control strategies have been proposed, like speed control [8], holding strategies [9,10], and skip-stop strategies [11]. e prediction methods give great help to improve the control effects, which is better than traditional models without predictions [12]. Highresolution bus GPS data could also be used to identify congestion hotspots in the urban street [13]. e APC data are commonly used to extract OD information, estimate waiting time, and find missing transfers [14][15][16]. Recently, some researchers use APC data to identify public transit corridors and transit network flow characteristics [17,18]. Typically, the bus operational status can be achieved by AVL data incorporating APC data [19].
Essentially, the delay and bunching of bus are caused by small disturbances such as more boarding passengers and bad traffic condition of road. ese small disturbances may spread, superpose, and amplify, which results in heavy disorder of the entire bus system. e public transit system is a complex spatiotemporal network embedded in complicated urban surroundings [5]. Understanding the mechanism of bus operation is conducive to enhance the efficiency of transit service. Most researches have focused on a single route [8,20], and few studies pay attention to the whole transit network's operational stability. e structures and dynamics of bus networks are so complicated. In the past years, researchers studied the bus networks from complex network perspective using line information and IC card data [21][22][23]. ey pay main attention to the structure characteristics of bus such as community structure and "small world." Recently, there are many researches focusing on extract bus network characteristics by merging AVL data and APC data. Chen et al. used IC card data and GPS data to estimate passenger boarding and alighting station of the entire bus and found that passenger flow is mainly distributed in an east-west belt-shaped downtown area [19]. Sui et al. constructed a layered network model to depict public transport network, OD flows, and transfer flows [24]. e data-driven methods appear to analyze the whole bus network performance in recent years. Zhang et al. studied the average headway and headway deviation of the entire bus network in Jinan, China, and found that the two indicators follow lognormal distributions [25]. Iliopoulou et al. used AVL data to identify spatiotemporal patterns of bus bunching by clustering method. However, they did not study the passenger flow factor due to the lack of APC data [26]. Nowadays, network-based studies using big data have been successfully applied in many transportation systems [27][28][29][30][31].
Most aforementioned studies focus on the operational performance of a single route using AVL data. However, there is limited research for the whole network. Actually, the operational performance of bus network is more significant in bus network design and optimization of operation. For example, planners consider passenger flow, network structure, accessibility, and transfer when designing or adding new routes. ey rarely consider the operation performance systematically. erefore, there exist many stations having very bad operation performance. Passengers should wait a long time or suffer from bus bunching at these stations. is is a big problem for bus systems. At present, the networkbased studies of bus network concentrated on the topological structure rather than operational performance. How to evaluate the bus operation performance and find the hub stations is the key to solve the problem. To fill this gap, this paper designs proper indicators based on GPS trajectory data to evaluate the operational performance of the entire bus network.
In this study, we conduct a data-driven framework to measure the operational status of bus network-based incorporating GPS trajectory data, IC card data, and route data. First, we propose average headway of station and average arrival rate of station to assess the stability and efficiency of bus network. Second, the relationships among the two proposed indicators, boarding passengers, bus supply, and topological structure are studied.
is paper aims to construct a framework based on multisource data to evaluate the operation of bus systems. e rest paper is organized as follows. Section 2 gives the literature review. Section 3 proposes evaluation indicators of bus network performance. Section 4 introduces the data used in this paper. Section 5 shows the results. Section 6 concludes the paper.

Application of Locator Data in Transportation Area.
e GPS trajectory data are space-time continuum, which is the foundation to study bus operation status and residents' activity. In transportation area, it has been widely applied in travelers' behavior, estimation of traffic demand, traffic status, and traffic model optimization. e GPS trajectory data are playing a more and more important role in traffic planning and management. In some areas, it has become an alternative to traditional traffic survey.
At present, the studies of travelers' behavior and estimation of traffic demand concentrated on taxi trajectory data and mobile phone data. Tang et al. divided the studied area into small cells and estimated the distribution of OD [32]. Zhang et al. utilized complex network theory to reveal the urban traffic demand based on taxi trajectory data [33]. Mobile data are a good source to study travelers' behavior, which is a research hotspot [34,35]. Dockless bike sharing is an emerging traffic mode, which plays a significant role in connecting with other traffic modes. e trip data can be used to analyze travelers' behavior, traffic demand estimation, and bicycle rebalance [36][37][38]. In the transit area, there are many researches that focus on travel time prediction and delay [5,39]. e GPS trajectory data could reflect the operational status of bus, which is crucial for operation management. Besides that, the trajectory data can be used to detect the travelers' behavior incorporating IC card. Tu et al. studied the dynamic characteristics of multimode travel by trajectory data, IC card data, and taxi trajectory data [40]. Tang et al. used GPS trajectory data and smart card data to optimize the timetable of the bus line [41]. e GPS trajectory data could also be used to identify transportation mode by GIS information and machine learning methods [42,43].

Operational Stability of Public Transport.
e operational stability of public transport has drawn much attention since the AVL and APC devices were applied. Chepuri et al. pointed out that the travel times during peak hours followed normal distributions [44]. Fan and Machemehl verified the relationship between the waiting time of passengers and headway deviation and they found a positive relationship between them [45]. e travel time of bus contains running time and stopping time. Studies show that the running time is affected by traffic conditions of roads and traffic signals [46,47]. Kieu et al. analyzed the distribution of transit travel time used transit signal priority data and found that they followed lognormal distributions [48]. e stopping time comprises door opening, door closing, and the time of passengers boarding and alighting. Research shows that the time of stopping could account for 26% of total travel time [49]. Generally, the main factor that impacts the stopping time is the number of boarding and alighting passengers [50]. Ji et al. claimed that enlarging the platform areas and installed guide guardrails could reduce the variation of dwell time, but not the time [51]. Passengers' arrival times are related to the headway. When the headway is smaller than 12 minutes, passengers arrive randomly [45]. Chepuri et al. proposed new reliability indicators to measure the bus route by trajectory data from both route level and segment level [52]. Paudel revealed that high volumes of bus ridership could cause a significant increase in the variance of bus service reliability [53].

Evaluation of Transit System.
e service quality of the transit system is influenced by many components such as transit network structure and management level. e transit systems are fragile when they meet periodic passenger flow fluctuations and other stochastic factors. It is necessary to grasp the operational status of the transit system. Traditional studies focused on a single route rather than the entire network. From the evaluation of transit network perspective, there have been various researches in the past years. Zhang et al. constructed the evaluation framework which includes convenience, comfort, security, reliability, and facility level according to survey data [54]. Lots of studies analyzed transit network topology structure using complex network theory [55][56][57]. Sun [60]. Wei et al. highlighted the bus lines value and proposed a "line-line" network to examine the spatial characteristics of cross-administration bus lines [61]. Bree et al. studied the relationship between transit ridership and local accessibility and found that it was more closely to predict public transit ridership when including gravity-based accessibility in the model [62]. All in all, there were many studies on the evaluation of transit system, but there is little study focus on the operational status of the entire transit network.

Evaluation Indicators of Bus
Network Performance e performance of a bus network is significant in daily operations. Reliability and efficiency are two key indicators to measure the performance of bus systems. Headway is an important indicator to study the reliability. e headway between adjacent buses varies, affected by many stochastic factors such as traffic congestion and bad weather. e GPS trajectory data are a feasible source to detect the operational status of buses. Figure 1 shows the GPS trajectory data of bus routes. Traditional studies mainly focus on the operational status of a single route. In this paper, we use the average headway of stations and the average arrival rate of stations to evaluate the operational performance of bus networks. We intend to extract the macroscopic operation status of bus networks by mining the GPS trajectory data.

Average Headway of Stations.
e bus run is considered as a series of events that comprises section running, arrivals, and departures. Headway is the time difference between two successive vehicles belonging to the same routes. It is defined as [25] ΔH Generally, the headways of buses are even according to their schedule timetable when they depart at original stations. e headways fluctuate due to the complicated external environment and drivers' factors. Sometimes, the headway between two consecutive buses is smaller than the plan headway. When the headway is too small that two consecutive buses arrive at the same time, the bus bunching appears. On the contrary, there is larger headway. Figure 2 shows the four kinds of bus operation status. Bus bunching and large headway disturb the balance of operation, which could result in long waiting time for passengers.
In this paper, the average headway of stations during time period t is defined as where m t is the number of routes that stop at station k, n i t is the number of buses of route i during time period t, and s t k is the total number of buses that stop at station k during time period t.

Average Arrival Rate of Stations.
Bus operational efficiency is crucial for passengers, which could reflect the quality of service. In this paper, we introduce the average arrival rate of stations to measure the operational efficiency of the entire bus network. It is defined as the mean value of Journal of Advanced Transportation time of the nearest bus that runs to a station at a certain time.
is index is correlated with the position of the buses. When a bus delays, the operational efficiency decreases. It is known that the time of sending GPS message is out of synchronization. To solve the problem, we introduce a small time period and choose the last GPS message in the time period. e average arrival rate of stations is defined as where L is the number of bus routes, O i is the number of buses of i-th route that running to k station, Δt is the given time period, Δt ≥ max(t send ), t send is the time interval of sending message, t k ij is the arrival time of i-th route of j-th vehicle at station k, t c ij is the last message time of i-th route of j-th vehicle in the time period t + Δt, and s k Δt is the number of buses that runs to station k. c Δt ijk � 1 if i-th route of j-th vehicle is running to station k; otherwise, c Δt ijk � 0. Figure 3 shows the sketch map of arrival efficiency of bus station.

Structural Measures.
In many complex networks, the structure of network has an important impact on system functions. Some nodes with a large degree or betweenness play a critical role in system dynamics. During the past two decades, there are many studies focusing on the topological structure of public transport networks such as bus and metro. In this part, we introduce four main network-based indicators to study the relationship between bus network structure and operational performance.

Node Degree.
Node degree is defined as the number of nodes that are connected with the node. Typically, nodes with a very large degree account for a very small proportion. It is defined as where k i denotes the degree of node i, e ij is the connection status between node i and node j, and e ij � 1 means that there exist connections between node i and node j; otherwise e ij � 0.

Betweenness.
In transportation networks, transport efficiency is very crucial. Betweenness is another index to evaluate the importance of nodes in propagation. It is calculated as follows: where C B (v) is the betweenness value of node v, σ st is the number of shortest paths going from s to t, and σ st (v) is the number of the shortest paths going from s to t through the node v.  Figure 1: GPS trajectory points of bus lines. 4 Journal of Advanced Transportation where C(v i ) is the clustering coefficient of node v i , e i is the number of edges among local neighbors of node v i , and m i is the connection degree of neighbors of node v i .

PageRank.
PageRank is widely used to measure node importance in many networks. PageRank could grasp the global topological information. us, we introduce the PageRank algorithm to achieve the importance of nodes, which is defined as where p i denotes the influence score of ith node, p is the damping coefficient, k out j means the out-degree of jth node, and a ji is the adjacency matrix.

Data Description
e data were collected on October 25, 2018, in Xuchang, China. Xuchang is a famous historical city that is located in the central Henan Province. e studied bus network of Xuchang comprises 39 routes and 629 stations. Figure 4 exhibits the bus network of Xuchang, China. ere are three kinds of data that contain GPS trajectory data, IC card data, and bus routes data. Table 1 shows the basic information of bus route data, which includes line number, direction, station index, station name, longitude, and latitude. Direction "0" represents the upstream route and "1" represents the downstream route. e bus network can be constructed by the bus route information. It is noted that the station index in the table is given for a route. In this paper, each station will be given a unique index when generating the bus network. Table 2 gives the main information of GPS trajectory data, which contains line number, bus number, date, time, longitude, latitude, station index, and direction. e GPS devices of bus send a message every 10-15 seconds. e station index "0" means    that the buses are running in the sections between stations. e running status of each bus can be achieved by the GPS trajectory data. In addition, we can get the departure and arrival information of stations for each bus. Table 3 introduces the main information of IC card, which comprises card number, line number, bus number, date, time, and direction. We can obtain the boarding number of passengers by joining with GPS trajectory data. e data were preprocessed to eliminate the outliers. e duplicated and missing values are removed. Moreover, the longitudes and latitudes out of the studied area are removed. e bus network will be constructed by the bus routes data.
e performance of the bus network could be calculated by the proposed indicators with given data. en the statistical analysis and visualization will be done. In this paper, we use ArcGIS 10.3 to visualize the data. Direction: "0" represents upstream route and "1" represents downstream route.

Traffic Demand and Bus Vehicle.
Passenger flow and bus supply of bus play critical roles in daily management. On one hand, bus will delay at stations if there are a large number of boarding and alighting passengers. On the other hand, bus operational efficiency will enhance if the operational company provides more buses. Figure 5 shows the number of boarding passengers and bus vehicles during different hours in a day. We can see that the distributions of boarding passengers exhibit two obvious peaks in morning and afternoon. During the morning peak (7 : 00-9 : 00) and evening peak (16 : 00-18 : 00), there are many commuters. e peaks are not sharp; that is because some elder people prefer to travel to avoid peak hours [63]. e number of bus distributions is smooth and the large value appears in the morning peak hours. Figure 6 shows the distributions of bus number and boarding passenger number of stations in a day. As we can see that the bus number of stations follows a lognormal distribution and the boarding passenger number of stations follows an exponential distribution. e results indicate that the stations that have a large value of bus number and boarding passenger number account for small proportions. In bus networks, the lognormal and exponential distributions are ubiquitous. It is reported the average dwell time and headway deviation of stations follow lognormal distributions [25]. Bus network structure could have some impacts on the results. A small proportion of stations in the bus network have more connections and most stations have a few connections. e degree of stations follows power-law distributions or exponential distributions [55]. Figure 7 shows the spatial distributions of bus number and boarding passenger numbers. We can see that the stations with large values are located in the central city. e stations with a large number of boarding passengers are more concentrated. Operators should pay more attention to the balance of operational status of the bus network.

Average Headway of Stations.
Bus headway is an important index to measure the operational efficiency and reliability. e headway of a route is changing according to many stochastic factors such as the number of boarding and alighting passengers, traffic signal, and weather conditions. For passengers, a stable and small headway is expected. Unstable headway will disturb passengers' travel plan and large headway will enhance the waiting time. For operators, stable headway could enhance the efficiency of bus routes.
ere are a large number of studies focusing on strategies to keep the headway stable by holding bus, skipping stations, and controlling signals [64][65][66]. Bus bunching is a serious problem about headway stability in daily operation, which means two or more buses of one route arrive at the same station simultaneously. Figure 8 exhibits the running maps of route 9 and route 16. As can be seen, there are many buses running with small headways in route 16 during the morning and evening peak hours.
Traditional headway studies mostly focus on single routes. e situation of the entire network headway is not very clear. Understanding the headway condition of bus networks could provide a macroscopic view for planner and manager when making schedules and strategies. In this paper, we use the proposed average headway of stations to evaluate the whole network headway conditions. Figure 9 shows the distributions of average headway of stations in different time periods: (a) 8 : 00-9 : 00, (b) 10 : 00-11 : 00, (c) 12 : 00-13 : 00, (d) 14 : 00-15 : 00, (e) 16 : 00-17 : 00 , and (f ) 18 : 00-19 : 00. e red curve is the fitted curves of lognormal. We can see that the average station headways follow lognormal distributions with obvious right tails. Most values are concentrated between 100 seconds and 1000 seconds. Stations with large values more than 1000 seconds account for a small proportion. Figure 10 shows the average headway of station values of all stations in a day. We can see that there are some stations that have very large values. e top five stations are Ruixianglu-Gongnonglu, Xuchang Shiyan School, Jianglijijie, Yangguangdadao Dongkou, and Nongji Wuliuyuan. In addition, there is a large difference in the deviation of the values in different hours in a day among the stations. e standard deviations range from 6.32 to 2780.

Average Arrival Rate of Station.
e average arrival rate of station is an index to measure the instantaneous operational efficiency of stations, which can be achieved by the high dense GPS trajectory data. In this part, we mainly consider the buses that run to the nearest station. Figure 11 shows the average arrival rate of station that there exit buses running to them at 8 : 00, 10 : 00, 12 : 00, 14 : 00, 16 : 00, and 18 : 00. As we can see, the distributions of the values are inhomogeneous. A small number of stations have large values, while most stations have small values. In the morning and evening peak hours, there are many stations that have a large value. e reason is that there are more buses running on the road in those time periods. For one station, the average arrival rate is changing as time goes. e indicator could measure the station service quality instantaneously.  Operators could make the targeted policy to enhance the arrival efficiency of stations by providing more buses. Figure 12 shows the mean value of the average arrival rate of station at different hours of a day. We can see that the values are large in the peak hours and small in the off-peak hours.
e value is very small after 20 : 00, because most routes stop running on the road. It is noted that there are some time periods that have larger values. Due to some stochastic factors, the headways are affected and lose homogeneity. Operators should provide a more robust timetable to keep the values stable. As we know, the stability of headway plays an important role in residents' travel mode choice. High operation efficiency and stable headway could attract more people to use the public transport. e headway control could not pay attention to a single route but the entire bus network.

Topological Structural
Characteristics. In many complex networks, the topology structure is of significance in system function. We introduce node degree, betweenness, clustering coefficient, and PageRank index to evaluate the structural characteristics of the studied bus network. Figure 13 shows the spatial distributions of the four indicators.
ere is a small proportion of stations that have very large values of the four indicators, which are concentrated on the center of the city. Take node degree, for example; there is

Relationship between Indicators.
To explore the relationship among passengers, operational performance, and the connections of stations, Figure 14 shows the heatmap of the Pearson correlation coefficient among these indicators. As can be seen, the boarding number is strongly correlated with the bus number. Moreover, the boarding number is positively correlated with topology structure indicators, which means that a better network connection could attract more passenger flows. It is remarkable that the average headway of station is negative with boarding passenger  number, which demonstrates that the passenger number is a critical factor to disturb the stability of operation. For the network structure, the PageRank index has a strong positive correlation with node degree. Clustering coefficient could reflect the local connections. e figure shows clustering coefficient nearly has no correlations with other indicators, which indicates that local connections hardly impact the operation. Operators should pay more attention to the entire network. We can see that the average headway of station is negatively correlated with other indicators, which implies that complex network connections and more passenger flows could weaken the efficiency of bus operation.

Conclusions
Bus system is a complicated spatiotemporal network, which plays a key role in alleviating traffic congestions. Understanding the operational status of the whole network is of significance for improving the service quality.
e bus system often suffers from delay and bunching due to many factors such as traffic condition of road, weather, traffic light, and number of boarding and alighting passengers. Any disturbance of bus operation could cause the cascade reaction from stops to a route, even to the entire network. e efficiency and stability are two key indicators to measure the bus operation.
To evaluate the operational performance of the bus network, this paper proposes a spatiotemporal analysis of the bus network based on GPS trajectory data, IC card data, and route data. We build average headway of station and arrival efficiency of station to evaluate the bus network operation. e results show that the bus number and boarding passenger number of bus network follow lognormal and exponential distributions. Moreover, the average headway of stations follows a lognormal distribution. ere exist some stations, where the average headway of the station is very large during the operation time. Managers should arrange more buses for the routes that serve these stations. e distributions of the average arrival rate of stations are inhomogeneous. A small number of stations have large values, while most stations have small values. We test the relationships among passengers, operational performance, and the connections of stations. It is found that the average headway of station is negatively correlated with other indicators, which implies that complex network connections and large passenger flows could weaken the efficiency of bus operation. In addition, the boarding number is strongly correlated with bus number. e boarding number is positively correlated with topology structure indicators, which means that a better network connection could attract more passengers to use public transport.
is paper will promote the study of stability and efficiency of bus system from a single route to the entire network, which has important theoretical and practical meaning for bus systematic management and control. e limitations of this study are as follows. is paper did not involve the number of alighting passengers due to data limitation. e external factors, such as traffic condition and  weather, have not been considered in the paper. In the future study, more factors will be considered and prediction model will be studied.
Data Availability e data are available by contacting the corresponding author.

Conflicts of Interest
e authors declare there are no conflicts of interest.