A Comparative Study on Flight Delay Networks of the USA and China

. Recent studies have characterized the structures of air transport network in diﬀerent countries and regions using complex network metrics. These studies coincided with the trend of increasingly available large empirical ﬂight datasets that enable researchers to investigate the dynamics of the system, such as the propagation of ﬂight delay. However, linking network structure with network dynamics remains a challenging task. In this paper, we proposed a method to construct ﬂight delay networks from operational data. We provided a detailed comparison of the key structural properties of the ﬂight delay networks in the United States and China. The comparisons of betweenness centrality of delay networks and ﬂight networks show the advantage of the proposed method. We further found that airports in similar geographical locations do exhibit similar delay patterns in both countries. To explore the underlying mechanisms, the Multifractal Detrended Fluctuation Analysis (MF-DFA) is applied to the ﬂights’ delay time series at both the airport level and network level. Singularity spectra analyses reveal the fundamental characteristics of the airport systems and air transportation system. Our ﬁndings contribute to the understanding of structure and dynamics of air transportation systems.


Introduction
Air traffic is an integral part of intermodal transportation in both developing and developed countries, playing a critical role in the global economy and our daily life. e last decade has witnessed the improvement of the air transportation systems in safety, capacity, and efficiency. Great efforts have been made to enhance the performance of the system, including the introduction of new operational concepts, the deployment of advanced automation systems, and long-term research and development activities [1,2]. However, the growing demand in air transport continues to challenge the existing system. ere is an urgent need to better understand the structure and dynamics of this complex social-technological system in order to guide future development and management. e understanding of the airport network structure has improved considerably over the past few years as is evident from numerous studies that revealed important characteristics of airport networks in different countries and regions around the world [3][4][5][6]. Traditionally, networks under study are constructed from flights data. Airports or cities are the nodes, while the edges are determined based on the connectivity of direct flights between two airports or cities. Statistical techniques are employed to uncover the properties of the network. A growing corpus of empirical analyses have revealed the fundamental properties of airport networks in a region or worldwide, such as small-world and scale-free, contributing to the understanding of the structure of human-made complex systems. e dynamics occurring on top of airport networks have drawn much attention as well. Great efforts have been devoted to the study of flight delays from the perspectives of network science and operations research [7][8][9]. ese studies generally focus on how flight delays propagate within airline's or airport's networks. However, the structural information of the airport network has not yet been considered.
One unanswered question is: what can we gain if we analyze the air transportation complex system by combining information about both structure and dynamics? A series of recent studies on outlier detection have demonstrated the advantage of investigating airport networks taking into account both network structure and dynamics [10,11]. e authors constructed the airport networks based on the correlations between airport delays. en the abnormal days of operation can be identified by the analysis of the networks using the Graphical Signal Processing technique. In this paper, we propose a method to construct airport networks that incorporate flight delay information. We then compare the fundamental properties of airport networks and flight delays in USA and China, taking an initial step towards the comparison of the two systems. e rest of the paper is organized as follows. In Section 2, we summarize related work on airport networks and delay propagation. e method is described in Section 3, and the results are presented in Section 4. Conclusion remarks are given in Section 5.

Literature Review
2.1. Structural Properties of the Airport Network. First reports of airport networks studies date back to 2000s [12], when the study of complex networks was conducted in an immature and active research area. ese studies consider the air transportation system as a complex network with the components being abstracted into nodes and the mutual relations between the components being abstracted into edges. A most prevalent approach to construct an air transport network is based on the flights data [12,13]. Traditionally, the nodes in the networks are the airports, while the edges are linked based on the existence of direct flights between airports. Common investigated structural properties of networks, for instance, the degree distributions and betweenness centrality, are examined [14,15]. Most airport networks exhibit features of small-world networks, such as a short average path length, a large clustering coefficient, and scale-free characteristics [16][17][18].
ere are certain core airports with a large degree, while other airports tend to connect with these airports [19]. He et al. [20] studied the Chinese airport network in 2004 and concluded that the network is of small-world type without the scale-free property. is is due to the fact that the degree distribution of the nodes of the Chinese airport network is exponential rather than heavily tailed. Li and Cai showed that, in a weekly cycle, the Chinese airport network exhibits the scalefree properties, and the weekly cumulative degree distribution of nodes follows the power law [21]. Recently, there is also work that considers the multilayer nature of air transport networks by decomposing airport networks into different airline networks [22][23][24]. Table 1 lists the studies of worldwide airport networks.
Another line of research has focused on the evolution of air transport networks [26][27][28][29]. Using worldwide flight schedules data between 1979 and 2007, Azzam et al. analyzed the evolution of the worldwide air transportation network and found that the degree distribution is nonstationary, growing at an accelerated rate [30]. A recent comparative study on the evolution of worldwide airport networks can be found in [24].

Flight Delay Propagation.
While the investigation on the structural properties of airport networks is extensive, few studies can be found to characterize the dynamical properties of air transport networks [7]. Flight delay is intrinsically dynamic and typically propagates in an air transportation network causing other delays over time.
Many studies on delay propagation use empirical data to investigate the cause of initial and primary delays [31]. e primary delay can trigger a cascade of secondary delays, which may spread over the airline networks and airport networks. A comprehensive study reported in [32] analyzed reactionary delays, which are caused by upstream delays in the European airports using the data collected by the Central Office for Delay Analysis. e authors suggested that the airline network structure plays an important role in absorbing delays. Recent advances in quantifying delay propagation in the US air transportation system have shed light on the intrinsic mechanisms of delay generation from network perspectives [9,33]. In particular, Pyrgiotis et al. [9] developed an analytical queuing and network decomposition model, namely, the Approximate Network Delays model, to study the delay propagation in the US air transport network.
e existing studies of flight delay are mostly focused on the impact of airline flight plans and ground transit time on the propagation of flight delays. e methods of analyzing flight delay propagation mainly employ Bayesian networks and simulation models. Wu and Law [34] proposed an airline network delay propagation model using a Bayesian Network and identified weak links in a flight network. e Bayesian network-based method is suitable for microanalysis. However, it is not suitable to analyze the overall delay propagation process of the aviation network as a whole. On the other hand, the simulation method also has its limitations due to the large number of interlinked subsystems. e emergent behavior of the whole system is difficult to capture.
Commonly used network metrics characterize different aspect of the network structure. However, one should take caution when applying these structural measures to flight delay management. Relying solely on these topological measures may be insufficient to capture the underlying properties of the air transport system. As discussed in a recent report, none of the most common and well-known centrality metrics, degree centrality, Katz centrality, and Page Rank, are able to characterize the effect of delays in the US air transport system [35]. A new centrality metric, Trip Centrality, is proposed by the authors to capture the network effect of delays. In contrast to their work, we focus on the method to construct airport networks and on the comparison of existing network metrics in the US and Chinese airport networks rather than the development of new network measures. e database details flights departure and landing events in every Chinese mainland airport, providing a comprehensive picture of air transport in China. Each flight record reports flight number, execution date, scheduled and actual departure/arrival airports, scheduled and actual departure/arrival times, and the unique aircraft registration number (tail number). Such data allow us to readily reconstruct the flight path of each aircraft in the network.

Methods
is paper is based on the flight data recorded during the period from 1 August 2012 to 31 August 2013, including a total of 196 airports and 4,007,532 flights.
On the US side, domestic flight data were obtained from the Bureau of Transportation Statistics, United States Department of Transportation. Each flight record contains the same information as in its Chinese counterpart. In order to make a fair comparison, the same time period was used. An overview of the two datasets is presented in Table 2.
A common definition of flight delay is the time difference between the real and scheduled operations (arrival or departure). In our database, the actual departure and arrival times of a flight are recorded from when the aircraft takes off from origin airport and lands on the destination airport, respectively. is study is only focused on flight i's departure delay T i depdelay and arrival delay T i arrdelay , which are calculated as

Construction of the Airport Delay Network.
Conventionally, an airport network can be constructed from the flight data and represented as a directed and weighted graph G(V, A), with each airport being represented as a node v ∈ V, and each flight forming a link a ∈ A that connects the origin and destination airports. e network is directed and weighted because of the directions and the volume of flights or passengers in a given time period.
Here, we construct a weighted airport delay network in which airports are the nodes. A link between two airports is established if there is a direct flight between them. e weight of the edge W ij is defined as follows: where F ij is the set of flights departing from airport i to airport j. D k i is the departure delay of flight k, and A k j is the associated arrival delay.

Correlations between Time Series Data.
e Pearson correlation coefficient is used to capture the correlations between flight delays in different airports. To make the time series datasets comparable, we use 15 minutes as the sampling rate to calculate departure flight delays at each airport. Let X i represent the departure delay time series at airport i. To compute the correlation ρ ij between i th airport and j th airport, one can use the following equation: where ρ ∈ [−1, 1].

Multifractal Detrended Fluctuation Analysis (MF-DFA).
Peng et al. first proposed a Detrended Fluctuation Analysis (DFA) method to analyze the statistical self-affinity of a time series [36]. Later, the Multifractal Detrended Fluctuation Analysis (MF-DFA) method was developed by Kantelhardt et al. to analyze nonstationary time series and has been widely applied in different fields. For a given time series X � x k | k � 1, 2, ..., N , we summarize in the following the five steps of MF-DFA.  Journal of Advanced Transportation 3 Step 1. e sequence of summarized displacements is defined as follows: Step 2. We divide Y(i) into N s � int(N/s) nonoverlapping segments of length s. en, this step is repeated starting from the opposite end. erefore, 2N s segments are obtained.
Step 3. For each segment v(v � 1, 2, ..., 2N s ), we apply the least-squares with a k − order polynomial to fit all the s data points: Step For Step 5. e fluctuation function F q (s) for a given real number q ≠ 0 is determined as When q � 0, F 0 (s) is obtained as where F q (s) is a function of data length s and fractal order q.
is generally referred to as the generalized Hurst exponent. When q � 2, F 2 (s) becomes the standard DFA. h(2) indicates whether time series X has long-memory processes or is 1/f noise. Specifically, h(2) is interpreted as follows: Based on Legendre transformation, one can have Here, α is referred to as the singularity strength or Hölder exponent. f(α) is the spectrum of singularities which measures the dimensions of the subset of the time series. With equations (11) and (12), we have

Comparison of Airport Network Structures.
We first construct the Chinese and US airport networks from flight delay data based on the method proposed in Section 3. ese networks are referred to as "Delay Networks." Statistical results on the two flight delay networks are shown in Table 3.
As can be seen, these two networks have significant smallworld characteristics. We plot cumulative degree distributions of the two networks in the double logarithmic graph (Figure 1(a)). It can be clearly seen that both cumulative degree distributions have two affine components with the transition point being k � 54 (China) and k � 49 (USA), respectively. In other words, the degree distributions follow the double power-law distributions or truncated power-law distributions. erefore, the delay networks have scale-free characteristics and the majority of the nodes have low degrees. A few airports have relatively large degrees, which suggest that they may play dominant roles in the air transport systems. Figure 1(b) depicts the relationships between in-weight and out-weight of the nodes. e in-weight of a node represents the flight delay that occurred before arriving at this airport, while the out-weight of the node is the delay after departing from this airport. e data points for both networks suggest a linear relationship between in-weight and out-weight, with R-square values of 0.975 (China) and 0.938 (USA), respectively, using least-squares. e slopes of the best-fitting lines are 1.017 (China) and 0.725 (USA). Compared with the Chinese air transportation system, the in-weights of the US airports are significantly larger than the out-weights, which suggests that the large airports in USA have the capability of absorbing flights delays. erefore, the outbound flight delay of these airports is less than the inbound flight delay.
To study the relationship between neighboring nodes in the network, we consider the assortativity of the flight delays network. e assortativity of Chinese network is −0.422 and it is −0.497 for the US network. Both networks show disassortativity feature, meaning the nodes with large degree tend to connect to the nodes with small degrees. Because of the existence of the hub-spoke structure, the flights of nonhub airports need to transit at the hub airports, causing flights delays to occur mainly at the hub airports.
We calculate the rich-club coefficient to uncover the cores of the network. Figure 1(c) plots the relationships between the rich-club coefficients and standardized parameter r/n (the percentage of the richest nodes) of flight delay network of China and USA, where n is the scale of   Journal of Advanced Transportation network and r is the number of the richest nodes. As we can see, the log value of rich-club coefficient is linearly increasing with respect to the decrease of the log value of standardized parameter r/n , indicating a power-law relationship. e top 15 richest nodes of China's network constitute a fully connected graph, while the top 16 richest nodes in the US network also form a fully connected network. A very few rich nodes consist a rich club and become the core of the airport delay network. ese airports are shown in Figures 2(a) and 2(b). As can be seen, the core airports in the delay network are mainly distributed in the central and eastern regions of China. Due to the huge amount of traffic, these regions exhibit larger flight delays.
To compare the differences between traditional airport network studies and our work, we construct airport networks using the most commonly used method. More specifically, the edges between airports are linked by the direct flights, while the weight of the edge is the number of flights between two airports. We denote this network as "Flight Network." In fact, there is no difference between flight network and delay network if only topological properties, such as degree distributions and rich clubs, are considered. However, the measures of weighted networks can reveal interesting information that is embedded in the network. Here we examine the betweenness centrality of these two networks. e betweenness centrality c B (v) of a node v is defined as the sum of the fractions of all shortest paths in the network which pass through v.
where σ(s, e) is the total number of shortest paths between (s, e) and σ(s, e | v) is the number of those paths that pass through node v. Figure 3 plots the relationships between the betweenness centrality and degree of airports in the two networks. e focus has been given to the analysis of the top 30 airports in the two airport systems (i.e., FAA core 30 airports in USA and the top 30 airports in terms of takeoffs and landings in China). It was previously reported that nodes with large degree normally have large betweenness. Airports such as Beijing Capital International airport (PEK), Atlanta airport (ATL), O'Hare International airport (ORD), and Dallas Fort Worth airport (DFW) manifest this pattern [37]. Our results also demonstrate this relationship as shown in the far right of Figure 3. ese airports play an important role of transportation in networks as they are also ranked in the top of the air transport system. Figure 3 reveals that there are only slight differences in the betweenness centrality as obtained from the flight network and delay network of the US, as most of the blue squares and black circles overlap. One obvious outlier is MSP (Minneapolis Saint Paul International Airport). MSP has much higher betweenness centrality in the delay network than that in the flight network. In contrast, the difference in the betweenness centrality between the flight and delay networks of China is very obvious. A few airports with medium degree have significantly higher betweenness centrality in the delay network. e betweenness centrality measures the importance of the node in terms of information/traffic control in the network. e higher betweenness centrality would suggest a stronger ability in controlling delay propagation over the network. One would expect that a node with higher betweenness centrality in the flight network should have higher betweenness centrality in the delay network. e hypothesis is that an airport with more flights has more ability to control delay. As we can see from Figure 3, the US airport network does support this hypothesis. However, the nine airports standing out in the Chinese airport network deserve further investigation. All these nine airports are the hub airports located in the capital cities of different provinces (except TSN). Most of these airports have lower ranks in the national air transport system (XIY: 7, HGH: 9, NKG: 11, TSN: 17, URC: 18, HRB: 22, SHE: 23, TNA: 25, and LHW: 27). e higher betweenness centrality of these airports in the delay network however indicates that they may have a significant influence on delay propagation.
To explain why Chinese airports with higher betweenness centrality emerged in the delay network, we examine two factors: the geographical characteristics of the airports and their operational characteristics. First, the nine airports are widely distributed in the mainland China as shown in Figure 2. Among them, URC and HRB are close to the North boundary of the country, while XIY and LHW are the major transfer hubs in the center of the mainland. Although none of these airports are the main bases of Chinese major airlines, they serve as the hubs for several airlines in the region. Flight delay can be quickly propagated in the network through them if there is serious flight delay at the connected airports. e other four airports (TSN, HGH, SHE, and NKG) have been already reported in a previous study [38]. Simulations have shown that their role in measuring the resilience of the airport network is underestimated if only structural metrics are considered. Again, the difference in betweenness centrality of the two network types suggests that the analysis of the network by incorporating the network dynamics information can offer insight into the fundamental nature of the complex system.

Correlation of Flight Delay between Airports.
To study the correlation of the flight delays in the airport network, we calculate the cross-correlation matrix C of airport delay time series data according to equation (3). It is very interesting to find that the most correlated airports in both countries have quite similar characteristics. Figure 4 plots the most correlated airport pairs in the two air transport systems.
Newark airport (EWR), LaGuardia airport (LGA), and Kennedy airport (JFK) are all located in New York metropolitan area, while Boston Logan airport (BOS) and Philadelphia airport (PHL) are geographically close and have many flights to these three airports. Due to similar operational environments and geographical locations, flight delays in these airports also show similar characteristics. Likewise, Baltimore airport (BWI) and Washington National airport (DCA) are located in the Washington area, and Fort Lauderdale airport (FLL), Orlando airport (MCO), and Tampa airport (TPA) are located in Miami area. e other six airports with the highest delay correlations are located in the west of USA. In China, the correlation coefficients of airports between Guangzhou (CAN) and Shenzhen (SZX), Shanghai Pudong (PVG) and Shanghai Hongqiao (SHA), and Guangzhou (CAN) and Shanghai Hongqiao (SHA) are much larger than the other airport pairs. e top airport pairs are generally airports in the same metropolitan area. It is very likely that they experience the same meteorological condition or suffer the same Traffic Management Initiatives because of their geographic proximity. erefore, the correlation indicates that geographical location is an exogenous factor that has significant impact on flight delays.

Analysis of Delay Detrended Fluctuation.
In order to further reveal the inherent patterns of flight delays pertaining to the temporal characteristics of flight delays, we adopted the standard MF-DFA to analyze the time series of flight delays.
Here, based on empirical data of mainland China and USA in 2012, we calculated the total flight delays by hour to obtain year-long delay time series for both regions.
In the first-order MF-DFA analysis, the value of q is in the range of [−10, 10]. Figures 5(a)  ey have a long-range power-law relationship and are sensitive to the initial condition. Flight delay data is a typical fractal time series data. e dynamics of flight delays are not a random process. Past delays will affect future delays. Figure 6(a) shows that h(q) of China and USA is bigger than 0.5 when q is less than 0. Small fluctuation will have a strong positive correlation. e strength of the positive  correlation of the US airports is greater than that of China airports. h(q) decreases gradually when q is bigger than 0. However, its value is still greater than 0.5. is indicates that, even after a long time interval, fluctuation of flight delays still has a positive autocorrelation. Flight delays will not change too much for a certain period. Figure 6(b) shows the multifractal spectra of China and the US flight delays. From the multifractal theories, we know that the singularity spectra f(α) at the minimum value of α indicate the maximum fluctuation of the system. e smaller value of α denotes larger fluctuation of the system. It can be seen from Figure 6(b) that the US air transport system has larger fluctuation. Compared to the China air transport system, where the value of α is around 0.68, the US air transport system has a smaller value: α ≈ 0.61. is result suggests there is more delay fluctuation in the US air transport systems. e range of singularities (Δα) measures the difference between the maximum and the minimum fluctuations of the system, which can be used to capture the strength of multifractal characteristics of the system. Δα of the US air transport system is 0.41, while Δα for the China air transport system is 0.25. erefore, the air transport system in USA has stronger multifractal characteristics than those in China. It also indicates that the air transport system in USA is more capable of handling such fluctuation, that is, unexpected flight delays.
We further applied the MF-DFA approach to the flight delay time series data of all airports to show the multifractal characteristics of the airport system. Detailed results of the airports with the larger range of singularity are included in Table 4. Most of the airports in USA have the range over 0.5, while only 7 airports in China have the range over 0.5, meaning there are more airports in USA which must deal with unexpected flight delays.

Conclusions and Discussions
Network science has been advancing our knowledge of complex systems. Various measures and techniques have been proposed to capture the fundamental properties of the systems. In the field of air transportation, airport networks have been investigated from different perspectives. e analytical results depend on how the network is constructed, but there has been little study of this dependency. In this paper, a quantitative method aiming at the comparison of delay networks of USA and China is introduced. Based on the actual operational data, we propose a method to construct an airport network containing delay information. We carried out the comparison between delay networks in China and USA from the following three aspects. Firstly, we compare the structural properties of flight delay networks in USA and China. e degree distributions of these two networks follow the double power-law distributions and have scale-free characteristics. Furthermore, we compare the betweenness centrality in the flight and delay networks. We found nine airports with higher betweenness centrality in the delay network, suggesting that these airports play an important role in the delay propagation. Secondly, we calculate the correlations of flight delay between airports and found that geographical location is the external factor that has the greatest impact on flight delays. Finally, the flight delay time series data was analyzed using the MF-DFA method. We found that flight delays exhibit obvious fractal characteristics that cannot be described by the single fractal method. e air transport system in USA seems to be more capable of handling fluctuation like unexpected flight delays. e contribution of this work is twofold. From the theoretical perspective, we demonstrated that the construction of airport network from delay data can provide new insights into the fundamental properties of the air transport system. From a more practical point of view, our analytical results have identified several influential airports and general patterns in the air transport system. Systemwide managers, such as network managers, should focus on those airports that have higher betweenness centrality in the delay networks. Additionally, precautions should be made at the airport which is the most correlated airport and is experiencing serious flight delay. is paper provides novel insights into the network structure and dynamics. Many outstanding questions remain to be answered. It would be particularly important to explore the implications of network structures. A clearer understanding of air transportation systems requires continued investigation into the specifics of each subsystem by adopting a domain-specific perspective.

Data Availability
e US data used to support the findings of this study are publicly available on the website of Bureau of Transport Statistics (https://www.bts.gov). e Chinese data used to support the findings of this study are available from the Operation Monitor Center, Civil Aviation Administration of China, upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.