^{1}

^{2}

^{3}

^{4}

^{3}

^{4}

^{5}

^{1}

^{2}

^{3}

^{4}

^{5}

The introduction of complex network concepts in the study of transportation systems has supposed a paradigm shift and has allowed understanding different transport phenomena as the emergent result of the interactions between the elements composing them. In spite of several notable achievements, lurking pitfalls are undermining our understanding of the topological characteristics of transportation systems. In this study, we analyse four of the most common ones, specifically related to the assessment of the scale-freeness of networks, the interpretation and comparison of topological metrics, the definition of a node ranking, and the analysis of the resilience against random failures and targeted attacks. For each topic we present the problem from both a theoretical and operational perspective, for then reviewing how it has been tackled in the literature and finally proposing a set of solutions. We further use six real-world transportation networks as case studies and discuss the implications of these four pitfalls in their analysis. We present some future lines of work that are stemming from these pitfalls and that will allow a deeper understanding of transportation systems from a complex network perspective.

In recent years, the topological structure of different transportation systems has become an important topic of research. This is the result of the convergence of two different lines of work. On one hand, the improvement in computational and data storage resources has allowed the transportation research community to gain access to large amount of real data, enabling the detailed description of those systems at different time and spatial scales. On the other hand, there has been a great effort from the statistical physics community in analysing the structure and dynamics of both theoretical and real complex networks [

The convergence of both research fields has resulted in a paradigm shift in the way transportation systems are conceptualised and analysed. It became clear that these are complex systems and that the focus ought to be moved from one transportation unit (e.g.

Although fruitful, this convergence is also hiding pitfalls and difficulties. These come from two fronts. Firstly, complex network theory was not developed with a specific application in mind, but it is instead a general framework for understanding interacting systems. A statistical physicist must then take into account the fact that not all complex network concepts are applicable to the transportation context and that some adaptation may be required. Secondly, even if

Within the hundreds of contributions that have appeared in the last decade about the use of complex networks to understand transportation systems, a significant number of them presents one or more problems that make it difficult to interpret their results. These problems are not limited to trivial research works: on the contrary, they can be found in recent publications and in highly respected journals. In this work we aim at fostering a debate around them, by raising awareness in the scientific community and eventually at helping developing novel solutions. For the sake of compactness, this debate has been focused on the topological properties of transportation systems, for being the most basic and easily understandable application of complex network theory. These problems have been organised around two major topics: (

Originally, two types of graphs were extensively studied: regular ones, in which all nodes have the same degree (i.e.

Such heterogeneity in the nodes’ importance is also present in transport networks. Nodes are not all the same, with some of them being much more important than others. On one hand, this may be due to the way the network is constructed, with some nodes designed to connect different parts of the system. But it can also be the result of economical reasons, as, e.g.

An open topic of discussion within statistical physics is when we can confidently define a network to be scale-free (see, for instance, [

With respect to the size requirement, it is easy to see that most of the air transport networks do not fulfil it, as the number of airports in a country or even in a supranational region seldom reaches the thousands. In spite of this, scale-freeness has been claimed for the Italian [

In order to confirm the presence of a scale-freeness distribution of the degrees, the most common approach has been to resort to a graphical representation. Plenty of examples can be found in the literature, for maritime [

Not all research works suffer from this bias towards scale-freeness, and some noteworthy examples can be found. For instance, [

It is clear that the claim of the scale-free nature of many transportation networks has not been supported by suitable statistical tests. It is nevertheless undeniable that nodes are not homogeneous and that some of them attract most of the connections and traffic. Thus, even if these networks are not scale-free, they still present a

In order to understand this point, one has to take into account the fact that scale-free networks are a mathematical simplification, or model, of real-world networks. Defining an exact law for the degree distribution allows finding analytical solutions to problems like the dynamics of diseases [

In synthesis, assessing the scale-freeness of a transportation network requires a solid statistical validation. If such validation cannot be performed, for instance, because of the limited network size, it is better to avoid any mention to a scale-free topology, as this would largely be irrelevant. Putting it simply, and in spite of its lure, there is more life beyond scale-freeness.

As previously introduced, there are two problems preventing an easy assessment of the scale-freeness of real-world networks: their limited size and the fact that statistical validations of the fits are seldom performed.

As for the first issue, it has been found that even perfect fits cannot be accepted as statistically significant when the number of samples (in this case, of nodes) is below

Regarding the second issue, i.e.

Metrics of goodness-of-fit for the degree distributions of the two networks depicted in Figure

Network | Fit | | Linearised | KLD | |
---|---|---|---|---|---|

Airport | Power law | | | | |

Exponential | | | | | |

| |||||

Bus | Power law | | | | |

Exponential | | | | |

Example of seemly scale-free networks. The left and right panels, respectively, depict the cumulative degree distribution of the airport and the bus networks, as described in Appendix. Red and green dashed lines further depict the best power law and exponential fits of the two distributions.

First of all, one may be tempted to use the goodness-of-fit

A second option entails resorting to the Akaike Information Criterion (AIC), a metric estimating the relative quality of a statistical model given some empirical data [

The third and best solution requires performing a full statistical test on the model, in order to obtain a

One final note should be added. In the previous analysis, the

The interested reader may find an excellent review of this third solution, along with some practical examples, in [

Once a network is obtained, the next logical step is to calculate a set of topological metrics to assess specific aspects of the structure, including the presence of triangles (i.e.

Let us explore this issue through a simple example. An important metric for a transportation system is the

It is straightforward to see how this metric is influenced by the number of links present in the network. Increasing the number of flights in an air transport network would also increase the number of passengers able to reach their destination directly. In the limit of all airports being connected with all other ones,

If most topological metrics suffer from this dependency on the number of links composing the network, some of them are also defined as a function of the number of nodes. This is the case, for instance, of the diameter, defined as the shortest distance between the two most distant nodes in the network, and of the average path length [

How does this map to the problem of analysing a transportation network? First of all, conclusions cannot be drawn from the values of the topological metrics unless these are properly normalised, i.e.

Examples of topological metrics calculated over three networks described in Appendix. The three rightmost columns, respectively, report the raw metric value (i.e., as calculated on the original network), the average and standard deviation of the metric obtained in random equivalent networks, and the final Z-score.

Metric | Network | Raw value | Random value | Z-Score |
---|---|---|---|---|

Number of nodes | Light-rail | | − | − |

Subway | | − | − | |

Tram | | − | − | |

| ||||

Link density | Light-rail | | − | − |

Subway | | − | − | |

Tram | | − | − | |

| ||||

Efficiency | Light-rail | | | |

Subway | | | | |

Tram | | | | |

| ||||

Modularity | Light-rail | | | |

Subway | | | | |

Tram | | | | |

| ||||

Transitivity | Light-rail | | | |

Subway | | | | |

Tram | | | |

Several interesting observations can be obtained. First of all, the efficiency seems to be substantially higher (to be precise,

In synthesis, the values of topological metrics are seldom relevant

For instance, [

Moving to street networks, [

Reference [

In a first approximation, normalising a topological metric is not a complex task. In synthesis, one needs to generate a large set of networks (called the

How should then this null model be defined? As the standard objective is to compare the real network against something that has no clear structure, the simplest solution entails using random Erdős-Rényi networks, with the same number of nodes and links. This may nevertheless yield biased results. To illustrate, suppose one is studying a street network, which is by definition planar; in other words, when two streets intersect, a link between them is necessarily created. Additionally, let us suppose that streets are built at random. Would this result in a lack of structure? Surprisingly, no: triangles would be very common, as any triplets of long streets, not parallel between them, would sooner or later intersect and form a triangle. If random networks are then used to normalise the transitivity metric, the result would be a very high Z-Score. Additionally, let us consider airport networks. While they lack the planar property, still their construction is guided by some principles that should be included in the null model: for instance, the fact that airports closer than 300 km are seldom connected by a direct flight. Once again, the use of a set of completely random networks may yield biased results.

In spite of the clear shortcomings associated with the use an Erdős-Rényi model, no accepted alternative is available for transportation systems, and the topic is still a matter of debate in other scientific disciplines [

Since the release of the ground-breaking studies on complex networks and their properties, it has often been found that the failure of a small fraction of elements in these networks might lead to a cascade effect which, when related to critical infrastructure, would result in major disruptions in our society. A few examples of such extensive, wide-ranging network failures include large-scale power outages in the United States [

The robustness of a network is usually estimated based on the critical fraction of all nodes that, once removed, will cause a sudden disintegration [

An example of a network attack. The process starts with a target network (a), where we want to attack two nodes. In (b) we show an optimal disruption when being allowed to disrupt two nodes. The choice of

Original network (GC size=13)

Optimal most-hazardous disruption (GC size=5)

Normal disruption (GC size=7)

Potentially harmless disruption (GC size=11)

While trying to quantify

These metrics (and similar ones) have been used in many existing studies in order to analyse the robustness of transportation networks. In Figure

Evolution of an attack generation on the tram network with DEG. The giant component is highlighted in red and bold. When attacking the 8th node, the size of the giant component is hardly changed.

Node 1

Node 2

Node 3

Node 4

Node 5

Node 6

Node 7

Node 8

Previous metrics are based on an initial estimation of node importance in the original network. Yet, throughout the dismantling process, the roles of nodes in a network can change significantly. With the elimination of a (critical) node from the network, shortest paths between other nodes often change completely. Therefore, it is recommended to recompute a network metric throughout the dismantling process. In the literature, this process is referred to as interactive/dynamic attack generation. In Figure

Evolution of an attack generation on the tram network with BETWI. The giant component is highlighted in red and bold. When attacking the 8th node, the size of the giant component is already reduced to less than 20% of the original size.

Node 1

Node 2

Node 3

Node 4

Node 5

Node 6

Node 7

Node 8

In Figure

Comparison of robustness curves for different attack heuristics on six transportation networks, as described in Appendix.

Airport network

Bus network

Light rail network

Logistics network

Subway network

Tram network

Comparison of attack heuristics regarding R value (x-axis) and Run time (y-axis) on six transportation networks, as described in Appendix.

Airport network

Bus network

Light rail network

Logistics network

Subway network

Tram network

Since the choice of a node sequence significantly affects the level of disruption to a network, it is common to distinguish two classes of disruptions: random failures and targeted attacks. While the former do not have a driving force controlling the node sequence (which is thus being completely random), the latter is specifically tuned for creating the maximum damage to a network.

Existing studies often conclude with statements that the network is rather resilient to random failures, but more vulnerable to targeted attacks. These claims can be found on all kinds of transportation networks, including air transportation [

The general conclusion of random failures being less hazardous than targeted attacks is inherent to the definition of both node orderings, given that targeted attacks are specifically designed for a network at hand. Otherwise, if a targeted attack, for instance, as induced by a specific network metric, is worse than a random failure strategy, this simply means that this metric does not represent the node importance very well for the specific network.

The pure statement that a network is more vulnerable to targeted attacks does not provide real, novel insights. A more interesting question is

In Figure

Comparison of random attack envelopes and the best known attack BETWI on six transportation networks, as described in Appendix.

Airport network

Bus network

Light rail network

Logistics network

Subway network

Tram network

Comparison of difference between targeted/random attacks in real networks and their random counterparts (each of the 50 networks with the same number of nodes and links). The blue dashed curve is the kernel density estimation of TargetRandomQuotient in the random networks and the red vertical line indicates the TargetRandomQuotient for the real network.

In this work we have revisited some common problems that can be found in papers that apply complex network theory to the study of the topology of transportation systems, analysed their impact, in terms of how our understanding of the underlying system can be misleading, and presented a set of solutions. Four specific topics have been covered:

One of the most important topological properties of network is scale-freeness, i.e.

Beyond scale-freeness, the first step in the analysis of a complex network is usually its description through a series of topological metrics, i.e.

We would like to increase awareness of the fact that network metrics do not lead to optimal attacks. In fact, there is no single metric which always outranks all other metrics. Empirically, the interactive variant of betweenness is the best approach for analysing the robustness of a complex network. This high quality of attack sequence, however, comes at a price: computing betweenness requires computation time cubic in the number of nodes. For large networks, the run time becomes unacceptable. Therefore, we point out recent developments in network dismantling, a novel research direction specifically targeting the robustness analysis of networks. Several of these methods provide an interesting trade-off between quality and run time

Comparisons between random and targeted attacks have to be performed with care. By definition, a targeted attack is more disruptive than a random attack. The interesting case, however, is to explore this problem with respect to a reference random network with the same number of nodes and links. Analysing the obtained

It is the authors’ belief that these problems must be taken seriously by the scientific community because of two main reasons. First of all, they introduce the risk of obtaining biased (if not totally wrong) results; this may, on the long term, reduce the credibility associated with complex network analysis and hence create a burden on future ideas. Secondly, the problems here discussed are neither old nor limited to second tier journals. On the contrary, it is easy to find examples of papers published in this same year [

In spite of this, this work has also an important bright side, as there is much ground for hope. While the number of papers that have fallen in these pitfalls is indeed large, one can also find many examples of technically sound and statistically robust analyses; see, for instance, [

It was suggested that an exact scale-freeness is not an essential requirement for subsequent analyses, as the fundamental point is the presence of a long tail in the distribution of degrees. At the same time, we reported that theoretical dismantling strategies, developed on the scale-free model, may not efficiently work on real networks. One may thus ask what is the effect of not following a perfect scale-free distribution, or, in other words, what are the consequences of having real, as opposed to theoretical, networks

Metrics normalisation required the development of suitable null models, able to create networks without any specific structure, but still constrained by the characteristics of the system under study. A completely random network may not be a good null model for the airport network, as very short flights have no economical meaning. This has been partially solved in other scientific fields, for instance, on protein networks [

Most transportation studies on complex network robustness are performed on undirected, unweighted networks with unit costs for dismantling nodes/links. Clearly, all these assumptions are simplifications in order to make computation feasible and facing a limited amount of available data. We foresee the need for a generalised transportation network robustness framework, which, given a variable set of data (passenger data, schedules, etc.), computes the a realistic measure of robustness for a transportation system. While there exist a number of studies tailored specifically for regional transportation system at high level of detail, there is no agreement on a common model for transportation network robustness. Such a benchmark model would help to push our understanding of network robustness further and eventually improve our critical transportation infrastructure

As a final note, we would like to highlight that the same caution, one should devote to the previously discussed pitfalls, should also be applied to avoid misleading generalisations. Any network method being applied to a transportation problem is very much dependent on the available data and the problem at hand. If one should carefully investigate the applicability of previously published methods, rather than simply borrowing them from other disciplines, the solutions here proposed should similarly be judged according to the context. To illustrate, some theoretical models may require an exact scale-free distribution to yield meaningful results, and the characteristics of a null model should be consistent with (and adapted to) the system under analysis. In synthesis, it is important to keep in mind that “one size does not fit all”.

In order to better introduce and illustrate the pitfalls in transportation network analysis discussed in this work, a set of exemplary transportation networks are used as case studies. These networks cover a wide range of transportation modes, including air, bus, light rail, subway, and tram. The networks and their setup are described below.

Graphical representations of the six transportation networks considered in this study, including both node-based (airport and logistics) and link-based networks (bus, light rail, subway, and tram).

Airport network (N=1,808, E=11,191)

Bus network (N=12,272, E=19,584)

Light rail network (N=166, E=184)

Logistics network (N=80, E=80)

Subway network (N=163, E=165)

Tram network (N=420, E=489)

The network data used to support the findings of this study are available from the corresponding author upon request. The airport network data is available at

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This work is supported by the National Natural Science Foundation of China (Grants no. 61650110516 and no. 61601013, no. 71731001, and no. 61521091).