Identifying the Key Nodes and Sections of Urban Roadway Network Based on GPS Trajectory Data

. This paper proposes a novel approach to identify the key nodes and sections of the roadway network. The taxi-GPS trajectory data are regarded as mobile sensor to probe a large scale of urban traﬃc ﬂows in real time. First, the urban primary roadway network model and dual roadway network model are developed, respectively, based on the weighted complex network. Second, an evaluation system of the key nodes and sections is developed from the aspects of dynamic traﬃc attributes and static topology. At the end, the taxi-GPS data collected in Xicheng District of Beijing, China, are analyzed. A comprehensive analysis of the spatial-temporal changes of the key nodes and sections is performed. Moreover, the repetition rate is used to evaluate the performance of the identiﬁcation algorithm of key nodes and sections. The results show that the proposed method realizes the expression of topological structure and dynamic traﬃc attributes of the roadway network simultaneously, which is more practicable and eﬀective in a large scale.


Introduction
An urban roadway network is composed of multiple intersections and roadway sections.e studies show that the importance of each intersection and roadway section in the urban roadway network is different, and large-scale congestion of the roadway network is often caused by the congestion of several key intersections and sections.If the top 5% important intersections and sections in the roadway network are attacked, the roadway network will be paralyzed [1][2][3].erefore, if the supporting and vulnerable nodes and sections can be identified, traffic planners and traffic managers can alleviate the traffic pressure by reasonably planning the topological structure of the urban roadway network.Besides, the effectively protection and scientific management can be carried out to improve the survivability and reliability of the urban roadway network, thereby avoiding the occurrence of large-scale traffic congestion.
In recent years, with the maturity of GPS technology, GPS data provide a new data source for urban traffic research.Based on the deep learning theory, Ma et al. [1] predicted the evolution of traffic congestion with taxi-GPS data.Based on the taxi-GPS data, Kong et al. [2] constructed a method combining support vector machine and the fuzzy comprehensive evaluation model, which realized the identification and prediction of traffic congestion.Feng et al. [3] proposed an identification method of critical roads based on the combination of GPS trajectory data and the directed weighted complex network.Compared with the traditional data, such as video monitoring data and the data collected by the loop coil vehicle sensor, GPS trajectory data have the advantages of easy process, easy access, high quality, and low cost [4].
e GPS trajectory data can be automatically collected and sent to the data center in real time, and the high mobility of GPS trajectory data makes it possible to monitor the large-area roadway network.Among the vehicles that equipped with GPS devices, taxis, the random sample, are distributed in every corner of the city [5].In other words, taxi-GPS trajectory data are high-quality data for the study of urban traffic problems.Based on this, the paper proposes a new method using taxi-GPS trajectory data to identify the key nodes and sections of the roadway network, which takes both topological structure and dynamic traffic attributes into consideration.e results can provide theoretical support for traffic managers and traffic planners to help (1) guide the uniform distribution of traffic flow, (2) formulate traffic management policies such as the planning of tidal lanes and off-peak travel policies, and (3) construct the emergency rescue plan for key nodes and sections to avoid the traffic paralysis of the city.e rest of this paper is organized as follows.In Section 2, we conduct a comprehensive literature review about identifying the key nodes and sections of the roadway network.en, the proposed identification method is presented in Section 3. In Section 4, the experiment results are described.Finally, the paper is concluded in Section 5.

Literature Review
With the increasingly serious of traffic congestion, the research of key nodes is extended to identify the key intersections in the roadway network.Hawick and James [6] used some quantifiable metrics of the complex network such as Dijkstra distance, mean degree, and Euclidean centroid to identify the key nodes of the roadway network.Zhang et al. [7] proposed a traffic shockwave model using the speed of traffic congestion spreading out to evaluate the importance of nodes in the roadway network.Jayaweera [8] proposed a new centrality measure called DelayFlow incorporating travel time delay and commuter flow volume to identify the key nodes in the urban roadway network, and with the experiment on Singapore's subway network, the method was proved more relevant to the urban roadway network than the network topology-based centrality measures.On the basis of the complex network model, the authors in [8]used centrality and betweenness measures to evaluate the importance of nodes which directly influenced the traffic congestion in roadway networks in Sri Lanka.Taking both node centrality and traffic flow into consideration, Xu et al. [9] introduced a data-driven framework to identify the key nodes with the use of comprehensive vehicle trajectories and geographic information.e authors in [10] proposed a method named the improved topological potential model considering entropy, which took both centrality and invulnerability into consideration to evaluate the key nodes of the metro network.
Another main research of roadway network is to identify the key roadway sections.Scott et al. [11] proposed a method that considered network flows, link capacity, and network topology to identify the critical sections, and with the results measured by travel time savings, the method is proved yield far greater system-wide benefits than the V/C ratio (under ideal conditions, the ratio of the maximum service traffic volume to the basic capacity).Sullivan et al. [12] used different link-based capacity-disruption values to rank the critical sections in the roadway network, and the research showed that the rank-ordering of the most critical sections in a network can vary dramatically based on both the capacitydisruption level and the overall connectivity of the network.Luathep et al. [13] adopted the relative accessibility index (AI) following the Hansen integral index to estimate the importance of sections in the roadway network, and the key sections were ranked according to the differences in the AIs between normal and degraded networks.Rupi et al. [14] suggested that the importance of the roadway section was linked to two aspects: the level of usage, and the impact that the closure of the link itself can have on the general functionality of the network as a whole.Kumar et al. [15] used three factors to rank the sections in the roadway network: the link flows at equilibrium, the importance of facilities served, and the number of origin-destination pairs served.Zhang and Khani [16] adopted the Lagrangian relaxation method to identify critical sections by maximizing network accessibility with limited travel time budget and roadway construction cost budget.Li et al. [17] proposed an approach using the traffic flow betweenness index (TFBI) to identify critical links, which can significantly reduce the computational burden compared with the traditional full-scan method.
e urban roadway network is composed of static topological structure and dynamic traffic attributes [3].e abovementioned methods most have only considered one of the two aspects, and there is a lack of a systematically analysis of both static topology and dynamic traffic attributes.Besides, in the existing literature, the undirected complex network model is often used to model the urban roadway network.However, in practice, the vehicles often run in two directions, analyzing the differences between two directions is of great significance to understand the tidal phenomenon of urban traffic.To tackle this issue, this paper proposes a directed weighted complex network model which considers both dynamic traffic attributes and static topology to identify key nodes and sections of the roadway network.

Modelling Urban Roadway Network.
ere are two approaches to construct the urban roadway network model using the complex network: the primary network and the dual network.In the primary network, the intersection is abstracted as node and the roadway section is abstracted as edge.In the dual network, the intersection corresponds to the edge, and the roadway section corresponds to the node.e primary network has certain limitations in describing the dynamic traffic attributes of the roadway network, such as more complex to reflect the key sections of the urban roadway network [18].Nevertheless, the primary network is the preferred method for researchers because of the simplicity and directness.e dual network can describe the spatial connection relationship of the roadway sections in detail [19].erefore, this paper selects the primary method to identify the key nodes and the dual network to identify the key sections.
e way to construct the urban roadway network model can be illustrated as follows.
3.1.1.e Primary Roadway Network Model.Assume that there are n intersections in the urban roadway network.e directed weighted complex network model can be defined as follows: the set of nodes.E � e 11 , e 12 , . . ., e ij , . . ., e ji , . . ., e nn   is the set of edges that is formed in order by pairs of nodes.e ij � (v i , v j ) represents the directed edge which starts at v i and ends at v j .R � r 1 , r 2 , . . ., r i , . . ., r n   is the weight set of nodes, and r i is the weight of node v i .Figures 1(a) and 1(b) illustrate the process to build urban roadway network model with a primary method [20].Generally, the weight of the directed complex network can reflect the dynamic traffic attributes of the roadway network, such as grade of roadway sections, traffic capacity, and traffic speed.

e Dual Roadway Network Model.
e dual roadway network model can be defined as follows: is the set of nodes in the dual network corresponding to the edges in the primary network, E dual is the set of edges in the dual network corresponding to the nodes in the primary network, and W � w e ij |e ij ∈ E   is the set of weights of the nodes in the dual network; w e ij represents the weight of edge e ij in the primary network.e relationship between R and W is shown as where N i is the set of adjacent nodes of node v i .When constructing the dual network, the corresponding link will not be created if there is a turn prohibition between v i and v j .e set of possible turns T can be defined as e turn prohibition P ⊂ T. erefore, the edge set E dual of the dual network can be defined as e expression is shown in Figures 1(c) and 1(d).
(f, g) ∈ V dual means the edges of the primary network corresponding to the nodes of the dual network.e dual link (f, g, Q fg ) is established when the following three conditions are satisfied [20,21]: (1) e original node of the dual network is the edge of the primary network (s, d) (2) e destination node of the dual network is the edge of the primary network (d, x) (3) ere is no turn prohibition between edge (s, d) and (d, x)

Evaluation Index System of Key Nodes and Sections.
is paper considers both static topology and dynamic traffic attributes when identifying key nodes and sections of the urban roadway network.For dynamic traffic attributes, we define two indexes, the level of traffic congestion and the grade of node, to build the node weight of the roadway network model, and for static topology, we construct three matrices based on node efficiency and shortest path.Figure 2 shows the evaluation index system of key nodes and sections.
e definition of each index is introduced as follows.

Dynamic Traffic Attributes.
In actual roadway network, the expressway is more important than the arterial roadway, the minor arterial roadway, and the local roadway.erefore, different grades of roadway sections should have different weights.e grade of node of the dual network can be defined as S, and Table 1 [22] shows the value of S.
e nodes and sections which often occur congestion are more important in the roadway network.According to the Urban Road Traffic Performance Index issued by Beijing, China, in 2011, the traffic states of roadway sections are divided into five degrees: smooth, basically smooth, light congestion, moderate congestion, and severe congestion.
is paper takes "moderate congestion" as the criterion to judge the congestion of roadway sections.Table 2 [23] shows the boundary speed of each grade of roadway sections.
GPS trajectory data include car number, longitude, latitude, time stamp, and speed.erefore, based on the GPS trajectory data, we can calculate the average speed of roadway sections in different directions in a period of time.
e average speed of roadway section i during the kth time period can be defined as v k i : where, v k j is the instantaneous speed of the jth vehicle in the kth time period.m k i is the car number during the kth time period on roadway section i.
e level of traffic congestion of roadway section which is also the node congestion level of the dual roadway network can be defined as CE i : where V * is the congested speed of the roadway section which equals the boundary speed of moderate congestion shown in Table 1.K is the number of time periods.erefore, the weight of node i in the dual network can be represented as when the roadway section is congested, S, when the roadway section is not congested.

⎧ ⎨ ⎩ (7)
Correspondingly, the weight of nodes in the primary network can be calculated by formula (1).

Static Topology.
Nodes in the network are influenced and restricted by each other.erefore, the importance of each node can be obtained according to the influence ranking.From the perspective of traffic flow, the influence among nodes depends on the shortest path and the number of shortest paths [24].It should be noted that the dependency relationship among nodes not only exists in adjacent nodes, but also the nodes which can be reached by effective paths [24][25][26].In summary, this paper uses the influence matrix based on node efficiency and the influence matrix based on the shortest path as the evaluation indexes to measure the importance of nodes.e efficiency between node v i and v j is the reciprocal of their distance which can be defined as p ij .If i � j or node i and j are not directly connected, p ij � 0. erefore, the efficiency matrix P can be defined as [27,28] According to [28], the influence matrix based on shortest path can be defined as where TIP is the target node-centered influence power matrix, and for its element represents the number of shortest paths from node v i to node v j ; that is, the number of paths with

Grade of roadway
e matrix TIP fixes the target node and considers the influence of other nodes on the target node.SIPis source node-centered influence power matrix, and for its element ik is the number of paths whose length is d ij from node v j to the other nodes.
e matrix SIP fixes the source node and considers the influence of the source node on other nodes.
In summary, the matrix M which reflects the influence of the static topology can be defined as where λ 1 , λ 2 , λ 3 represent the weight of efficiency matrix, the weight of target node-centered influence power matrix, and the weight of source node-centered influence power matrix in turn.e relationship among λ 1 , λ 2 , λ 3 is shown as In the paper, λ 1 , λ 2 , λ 3 are calculated by the improved analytic hierarchy process [29].Finally, the comprehensive importance of nodes Y i can be expressed as After normalization, the comprehensive importance value of nodes SY i can be expressed as Generally, the steps of the method to identify the key nodes and sections can be summed up as Figure 3.

Experiment and Discussion
e floating car data are the real traffic GPS data collected by about 40,000 taxis in Beijing, China, over a period of a week (January [12][13][14][15][16][17][18]2015).e traffic data are recorded once per minute approximately.e format of GPS data is shown in Figure 4. We first preprocess the data in order to eliminate noisy sample points, of which perceived positions are chaotic.In the second step, the sample points with the same vehicle ID are linked to each other according to their time correlate on.en, we capture the floating car trajectories on the urban roadway network space [2].Finally, map matching is carried out according to the floating car trajectories, the latitude and longitude of the vehicles, and the urban geographic information [30].
We select Xicheng District as the research object which is one of the major six urban areas in Beijing with a high population density and a developed economy.e administrative area of Xicheng District is 50.70 square kilometers.
e roadway network of Xicheng District is shown in Figure 5(a).As there are many pedestrian streets and small local roadways in the roadway network, filtering of the roadway network is required.
e filtering rules are as follows: (1) pedestrian streets are not considered; (2) local roadways with small traffic flow are neglected; (3) small intersections are neglected, and only large intersections are considered.Finally, the roadway network topology of Xicheng District is simplified as Figure 5(b) which contains 310 roadway sections (two-way) and 91 intersections.ere are 46 expressways, 206 arterial roadways, and 58 minor arterial roadways (local roadways).

Analysis of Congestion in Roadway
Network.In this paper, "moderate congestion" is taken as the criterion to judge the traffic congestion of each grade of roadway section.In other words, it is considered as congestion when the average speed of vehicles is under 35 km/h on expressway, under 20 km/h on arterial roadway or under 10 km/h on minor arterial roadway and local roadway.To clearly understand the congestion situation of Xicheng District, the congestion rate of each grade of roadway section is defined as the proportion of congested roadway sections to the total roadway sections per hour.Figures 6 and 7 illustrate the changes of congestion rate of different grade of roadway section in Xicheng District from 6:00 to 24:00 on January 12 (Monday) and January 17 (Saturday).It should be noted that the starting time of the hour interval is used to represent the time period.For example, 6:00 in the paper represents the period from 6:00 to 7:00.
To better understand the distribution law of the congestion rate, we made Figure 8 showing the change curve of the congestion rate of each grade of roadway section in Xicheng District during a week.It can be seen that the working days and weekends present a similar trend.

Identification of Key Roadway Sections.
Python is used to solve the identification algorithm proposed in this paper.
e results show that the weight of the efficiency matrix λ 1 � 0.82, the weight of target node-centered influence power matrix λ 2 � 0.09, and the weight of source nodecentered influence power matrix λ 3 � 0.09.e key nodes and sections of Xicheng District are identified within five seconds by using the packages imported form Python.
Firstly, the key sections in Xicheng District are analyzed.Figures 9 and 10 highlight the top 50 key sections in the morning and evening peak hours on January 12 (Monday) and January 17 (Saturday).
e network with red color shows the first 10 sections with the highest rank and the yellow marks the next 40 sections.
e following rules can be obtained from Figures 9 and  10: (a) Whether it is on Monday or Saturday, the key sections are almost expressways and arterial roadways.is is because the grade of roadway sections and the congestion level of expressways and arterial roadways in peaks hours are both higher than those of minor arterial roadways and local roadways.(b) e distribution of key sections between the morning and evening peak hours on Monday and Saturday is relatively similar.From the perspective of the whole Xicheng District, the importance value of the roadway sections in the center of the city is generally higher than of the edge of the city because   the node efficiency of the central section is higher than that of the edge part.Most of the sections centered on the Xidan business district are in the top 50 key sections due to the high congestion level of the roadway sections near the Xidan business district.
(c) Whether it is on Monday or Saturday, the importance values of the key 50 sections change slightly between the morning and evening peak hours.To analyze this change, we made Table 3 showing the importance value of the top 50 key sections in morning and evening peak hours on Monday.Each section is numbered according to the node serial number in Figure 5(b).For example, 1-2 represents the section from node 1 to node 2, and 2-1 represents the section from node 2 to node 1.
It can be seen that the 50 key roadway sections in the morning and evening peak hours on Monday are all expressways and arterial roadways.A comparative analysis of the distribution of the 50 key roadway sections between the morning and evening peaks hours shows that there are 35 same roadway sections.In addition, there are 16 opposite roadway sections (e.g., section 1-2 and roadway section 2-1) between the morning and evening peak hours, and the rank of the opposite roadway sections most has changed between the morning and evening peak hours.For example, in the morning peak hours, the roadway section 50-43 ranks 2nd, and its opposite roadway section 43-50 ranks 26th.In the evening peak hours, the roadway section 50-43 drops to 4th, and the roadway section 43-50 ranks up to 3rd. Figure 11 shows the change of the top 50 key sections from 6:00 to 22:00 on Monday.e top 50 key sections are colored according to the normalized importance.e lager the importance value, the darker the color is.e ranked after 50 is uniformly colored as light gray.e key sections of the whole day on Monday are still distributed in expressways and arterial roadways, and the importance value of each section is changed with time.It can be seen that, due to the changes of travel behavior, the importance value of roadway sections changes as well; that is, the importance value of roadway sections is directly affected by travel behavior.

Identification of Key Nodes.
e identification of key nodes in Xicheng District is studied.Figures 12 and 13 highlight the top 20 key nodes in morning and evening peak hours on January 12 (Monday) and January 17 (Saturday).
e network with red color shows the first 10 nodes with the highest rank and the yellow marks the next 10 nodes.
It can be seen that most of the key nodes in the morning and evening peak hours are connected with expressways and arterial roadways on both Monday and Saturday.It is caused by the higher grade of roadway sections and congestion level of expressways and arterial roadways.e distribution of key nodes is similar to that of key sections, and the importance value of the nodes in the center of the city is generally higher than that of edge areas.  of Advanced Transportation It can be found that the distribution of the key nodes between the morning and evening peak hours is roughly the same, and most of the top 20 key nodes are the nodes with higher node grades.Moreover, the rank of the top 20 key nodes just changes slightly or remains the same between the morning and evening peak hours.For example, node 10 ranks 1st in the morning and the 4th in the evening.e ranks of node 30, 68, and 62 remain the same between the morning and evening peak hour.erefore, we can draw a conclusion that the change of congestion rate of the roadway sections has little to do with the ranking of key nodes.is is because the nodes connect multiple roadway sections at the same time.Even if the tidal phenomenon of traffic congestion happens on opposite roadway sections, the average rate of congestion of the nodes just changes slightly.

Identification with Traditional Methods.
In this part, the identification results of our proposed model are compared with those of the traditional evaluation method based on  between 0.2 and 0.6, of which the average value is 0.32. the traffic peak hours, the repetition rate of the traffic congestion method reaches the peak value, which is 0.52.At 6:00, 12:00, and 22:00, the repetition rate of node betweenness reaches the peak value, which is 0.46.It shows that with the aggravation of traffic congestion, the repetition rate of the traffic congestion increases accordingly.As traffic congestion improves, the repetition rate of the identification results of the model proposed by this paper and those based on the node betweenness increases.In conclusion, the identification model proposed in this paper realizes the expression of both topological structure and dynamic traffic attributes of the roadway network and therefore is more practicable and effective in a large scale.

Conclusion
In the paper, a method was proposed to identify the key nodes and sections of the roadway network.In the method, the taxi-GPS data were used to map the floating car sample points with the same vehicle ID to the car trajectories according to their time correlation.First, the weighted complex network is used to construct the roadway network model.en, the evaluation index system of key nodes and sections is developed from the perspectives of dynamic traffic attributes and roadway network topology.Finally, Xicheng District of Beijing, China, is selected as the experiment area and the method proposed by this paper is compared with the conventional methods.
e results show that the results proposed by our model are highly similar to those of the method based on traffic congestion in peak hours.While the results proposed by our model bear similarity to those of the method based on topological structure in other time.In other words, the method proposed in this paper realizes the expression of topological structure and dynamic traffic attributes of the roadway network simultaneously, which is more practicable and effective in a large scale.
ere are still some problems need to be further analyzed in this paper.Urban traffic is composed of subway, bus, taxi and private car.Only taxi-GPS data are used in this paper, and integrated data of various traffic modes should be used into the study of key nodes and sections in the future.

Figure 1 :Figure 2 :
Figure 1: Construction process of the roadway network.(a) e actual roadway network.(b) e primary network model.(c) e establishment rule of the dual link.(d) e dual network model.

Figure 3 :
Figure 3: Identification process of key nodes and sections of the roadway network.

Figure 4 :
Figure 4: Sample of original GPS data.

Figure 5 :
Figure 5: Network topology construction of Xicheng District.(a) Roadway map of Xicheng District.(b) Network topology of Xicheng District.

Figure 7 :Figure 6 :
Figure 7: e traffic congestion ratio of each grad of roadway sections on Saturday.

Table 1 :
Weight value corresponding to each grade of roadway section.

Table 4
shows the importance value of the top 20 key nodes in the morning and evening peak hours Monday.

Table 3 :
Top 50 key roadway sections in peak hours on Monday.: the subscript represents the grade of roadway section.E is expressway, and A is arterial roadway. Note Figure 15: Repetition rate of the identification results of key roadway sections.