Data-Driven Modeling of Systemic Air Traffic Delay Propagation: An Epidemic Model Approach

. To better understand the mechanism of air traﬃc delay propagation at the system level, an eﬃcient modeling approach based on the epidemic model for delay propagation in airport networks is developed. The normal release rate (NRR) and average ﬂight delay (AFD) are considered to measure airport delay. Through ﬂuctuation analysis of the average ﬂight delay based on complex network theory, we ﬁnd that the long-term dynamic of airport delay is dominated by the propagation factor (PF), which reveals that the long-term dynamic of airport delay should be studied from the perspective of propagation. An integrated airport-based Sus-ceptible-Infected-Recovered-Susceptible (ASIRS) epidemic model for air traﬃc delay propagation is developed from the network-level perspective, to create a simulator for reproducing the delay propagation in airport networks. The evolution of airport delay propagation is obtained by analyzing the phase trajectory of the model. The simulator is run using the empirical data of China. The simulation results show that the model can reproduce the evolution of the delay propagation in the long term and its accuracy for predicting the number of delayed airports in the short term is much higher than the probabilistic prediction method. The model can thus help managers as a tool to eﬀectively predict the temporal and spatial evolution of air traﬃc delay.


Introduction
Flight delays are one of the most important performance indicators of air transportation system. It has become increasingly more serious, which directly causes huge damage to the quality of civil aviation services, such as declines in operational safety, increases in operating costs, and more serious environmental pollution. Notably, commercial aviation players understand delays as the difference between scheduled and real times of departing or arriving fights [1]. According to the Federal Aviation Administration (FAA), a flight can be considered as delayed if the operation takes place 15 minutes after schedule [2]. e delay of an individual flight seems to be random at a glance. A flight delay may be transferred and amplified by consequent operations. Some delays that originate from upstream flights spread to downstream flights, which is particularly evident when an aircraft flies multiple flight legs.
is phenomenon is defined as delay propagation (DP) [3][4][5][6]. Actually, DP causes delays to obey certain statistical laws [7] when long-term delay records for a large number of flights are taken into account, which form airport delay propagation. Consequently, a congested airport may propagate delays to connecting airports through the delayed flights, which eventually has an impact on the performance of a significant part of the entire network [8]. e delay can be magnified when it is examined in multiairport networks [9].
ere are many factors affecting airport delays. e factors can be divided into two categories: propagation factors (PF) (airport delays caused by connected delayed airports) and nonpropagation factors (NPF) (airport delays caused by original factors, such as extreme weather and equipment trouble) [10]. As the number of flights increases, increasingly more airport delays are caused by the PF [11]. If there are many delayed flights in one airport, the connected airports may become delayed, which can affect further operations in a cascade-like effect. Due to the complexity of air transportation, the mechanisms of delay propagation at the airport level are not fully understood. erefore, research on the mechanism of delay propagation is timely yet challenging.
Delay spreading has received lots of attention from the Air Traffic Management (ATM) community during the last decade. Some studies [12][13][14] have established flight delay propagation models based on Bayesian Networks and analyzed the internal factors influencing air traffic delay propagation. Pablo Fleurquin [15] introduced an agentbased model that reproduces the delay propagation patterns observed in U.S. performance data and identified passenger and crew connectivity as the most relevant internal factor contributing to delay spreading. Qiu [16] constructed a joint distribution of continuous flight delays by using the 2D copula function. Wong and Tsai [17] established a cox proportional hazard model for flight delay propagation based on survival analysis theory. Kalfe and Zou [3] promoted a delay propagation pattern based on an econometric method and analyzed factors by using the Heckman twostep method.
To the best of our knowledge, gaps still remain in understandings of the delay propagation in airport networks. e process of delay propagation needs to be analyzed from a broader and network-based perspective because flight scheduling for airlines and airport operations is increasingly synchronized from the perspective of network operation. e linkage in the airport networks is the direct operation by the airlines linking the airports [18]. It is an essential feature of the network structure of the air traffic system. erefore, the propagation dynamics cannot be understood without referring to the underlying complex network structure. Some scholars have used complex network theory to characterize transportation [19][20][21][22]. e initial studies [23,24] have identified a high heterogeneity in the traffic sustained by each edge. Most of the airport network exhibits a heavy-tailed degree distribution, which is often well approximated for a significant range of values of degree k by a power-law behavior (P(k) ∼ k −c ), from which the name "scale-free network" originated [25][26][27][28][29]. Further studies have used the complex network characteristics to explain the propagation of air traffic congestion and flight delay [29][30][31][32][33][34]. From a macroperspective, the airport delay is usually caused by air traffic congestion. us, the application of the complex network theory to the air traffic problems is feasible. And most of them focused on the delay propagation between sequence flights from the perspective of single airport operation.
As aforementioned, due to the large number of airports and complex interactions, the features of delay propagation cannot be understood based on the information of an individual airport. Complex network theory and its associated metrics and tools present an opposite approach to study the air transport system beyond what is offered by classical techniques. To further understand the effects of delay propagation, most of the existing achievements focused on the delay propagation between sequence flights from the perspective of single airport operation. In this paper, we propose a network-based approach to modeling airport delay propagation.
ere are several classical epidemic models in complex network theory, such as Susceptible-Infected (SI) model, Susceptible-Infected-Susceptible model (SIS), Susceptible-Infected-Recovered (SIR) model, and Susceptible-Infected-Recovered-Susceptible (SIRS) model. Because the propagation mechanisms of SIRS is the most similar to that of airport delay propagation(the details can be seen in Section 4.1), the SIRS model is utilized to understand the process of air traffic delay propagation in the context of an airport network and explain the spreading characteristics between different airports in this paper. e SIRS model has been normally used to simulate the process of how diseases [35], safety risk [36], or computer virus [37] spread.
To the best of the authors' knowledge, this study is the first to apply the SIRS model to air traffic delay propagation. Firstly, the metrics for measuring airport delays are introduced. en, the fluctuation of airport delay is studied from different time scales based on complex network theory in order to find out the propagation factors, PF or NPF. en, an integrated airport-based SIRS (ASIRS) model is developed. At last, the effective and accuracy of this model is demonstrated using empirical data of China. e outline of the remainder of this paper is as follows. e data sources and a measurement of airport delay are provided in Section 2. Section 3 is devoted to fluctuations of the average flight delay and the determination of airport delay status. e ASIRS model is established in Section 4. e data-driven description of the ASIRS model is presented in Section 5. Finally, Section 6 contains the conclusions.

Data Description and Airport
Delay Measurement e dataset analyzed in this paper is provided by the Civil Aviation Administration of China (CAAC) and consists of all flight information of China from June to December in 2015. ere are totally 93630 records of flights. e number of carriers in the data is 295. e information from the historical flight data consists of flight ID, date of flight, real and scheduled departure (arrival) times (Beijing Time), origin, and destination. A short sample of the original database is shown in Table 1.
ere are 205 domestic airports in the database. A sample of airports are shown in Table 2.
e identification of the airport delay state is the first problem to be solved. As we know, the delay state of an airport is the concentrated performance of individual flight delay. us, the airport delay is measured by the delays of arriving and departing flights.
In our study, the normal release rate (NRR) and average flight delay (AFD) are considered to measure airport delay: (1) e NRR of an airport is the ratio of the number of normal released flights to the total number of departure flights. According to the Federal Aviation 2 Journal of Advanced Transportation Administration (FAA), a flight is considered as abnormal if the departure operation takes place more than 15 minutes after schedule: where N represents the number of departure flights and n represents the number of abnormal flights. (2) e AFD of an airport is the ratio of the total delay time to the total number of all the departure and arrival flights of the airport: where M represents the number of arrival flights and d i represents the delay of flight i.
e NRR and AFD are counted every 1 hour using the above database. Partial statistical results are shown in Tables 3 and 4.

Fluctuations of Airport Delay and
Determination of Airport Delay Status

Fluctuations of Airport Delay.
To explore the propagation laws of airport delay, the delay fluctuations should be studied from the perspective of airport networks. Fluctuations can be considered by investigating the coupling between the average flux and the fluctuations, which is actually the mean and standard deviation analysis, as developed in [38][39][40][41][42]. It is found that the standard deviation σ i and the average flux 〈f i 〉 on individual nodes obey a unique scaling law as where f Δt i (t) denotes the flux of node i in time interval As the strength of the external driving force increases, the value of α gradually increases.
A method to separate the internal dynamics from the external fluctuations of complex systems is also promoted in [38][39][40][41][42].
e dynamical variable f i (t) can be separated into two components: where f ext i (t) is generated by external factors and f int i (t) is generated by internal factors. ey can be described as follows:  Furthermore, whether or not the fluctuations are mainly internally or externally imposed can be determined: If η i ≫ 1, the system dynamics are dominated by the network-wide factors, while for η i ≪ 1, local dynamics overshadow the network-imposed changes.
We aggregate the data and carry out the scaling law analysis at different time scales. In our study, 〈f Δt i 〉 represents the AFD in the time scale Δt of airport i and σ Δt i represents the standard deviation. Figure 1 shows the relationship between 〈f Δt i 〉 and σ Δt i , with time scale Δt � 1 h, 3 h, and 6 h. e scaling law between σ and 〈f〉 can be clearly observed.
It can be seen that the value of the scaling exponent α increases as Δt increases, suggesting that the system may have an inhomogeneous influence, as pointed out by Eisler and Kertesz [43]. e reason for this result is that the fluctuations of AFD are due mainly to network-wide factors such as the PF when Δt is much bigger; on the contrary, the fluctuations of AFD are due mainly to local factors such as the NPF when Δt is much smaller. When Δt is much bigger, airport delays may be caused by connected delayed airports. Some delays that originate from upstream flights spread to downstream flights, which is particularly evident when an aircraft flies multiple flight legs. When Δt is much smaller, airport delays may be caused by original factors, such as extreme weather and equipment trouble. Additionally, the ratio η i for the 1 hour interval is calculated using the above method. e result reveals that the average η i is 5.665353, which shows that the dynamic of airport delay of every 1 hour is dominated by PF. And the larger the time scale is, the bigger the value of η i is. When the time scale is 3 hours, the value of η i is 6.431234. When the time scale is 6 hours, the value of η i is 8.534778. us, we have to study how does the delay originating from an airport propagate to other airports in a large time scale.

Determination of Airport Delay
Status. NRR and AFD are used to determine if the airport is in the delay state. e specific criteria is as follows: To explore the characteristics of airport delay propagation, the value of the time interval t is 1 hour, as the dynamic of airport delay of every 1 hour is dominated by PF. When the delay propagates in the airport network, the airport delay is usually severe. e airport delay deduced by NPF is much small in the time scale of one hour. us, in order to eliminate the influence of NPF, the values of threshold ξ 1 should be relatively large and the values of threshold ξ 2 should be relatively small. Here, ξ 1 � 15 min and ξ 2 � 0.7.

Airport-Based Susceptible-Infected-Recovered-Susceptible
Model. From the discrimination of delay propagation, we find that the epidemic model in a complex network is a valuable research tool for the exploration of fundamental laws and trends of delay propagation in airport networks. ere are three kinds of individuals in the SIRS model: susceptible ones (S), infected ones (I), and recovered ones (R). e susceptible ones are currently in a healthy state, and when they contact the source of the infection, they will become infected ones with an infection rate α. e infected ones are unhealthy ones, and they can infect susceptible ones. e infected ones will be cured with a cure rate β and become recovered ones, and the recovered ones become healthy ones with an immunity ability. e immunity will disappear under some certain situations, and the recovered ones will become susceptible ones with an immunity-loss rate c. e infectious mechanism is described in Figure 2.  In an airport network, the original airport delay may be due to capacity reduction, airport equipment trouble, and extreme weather. In the process of delay propagation for resource-shared flights, delays are propagated from an upstream flight at the departure airport to the arrival airport. As shown in Figure 3, the airports with "delay root" represent the susceptible ones, the airports with "delay propagation" represent the infected ones, and the airports with "delay termination" represent the recovered ones. e propagation of airport delay has traditionally been described as graphs with vertices representing airports and edges representing connectivity. When the delay is serious in one airport, the delay of its connected airports may be increased due to the delay spreading. Furthermore, the delay of spread airports may be absorbed in the subsequent operations, and they would not be influenced again by the same initial airport delay. However, they may be affected by another original airport delay later. Because of the complexity of airport networks [32], the evolution of delay within them possesses the characteristics of propagation in complex networks.
From the above analysis, we find that the infectious mechanism of SIRS is similar to the propagation characteristics of airport delay discussed before. Suppose there are three kinds of airports in the network at time t: non-delayed airports (S) which are easily infected, delayed airports (I), recovered airports (R) which used to be delayed but are back to normal. e recovered airports only have immunity to the current delay spread and may become susceptible ones later.
As the stochastic process is applied to all flights, the airports are affected by probability.
e dynamics of ASIRS model can be written as where S(t), I(t), and R(t) represent the fraction of susceptible airports, infected airports, and recovered airports, respectively, at time t; α is the infection rate; β is the cure rate; and c is the immunity-loss rate. Assume that the proportion of infected airports, susceptible airports, and recovered airports at the initial moment t 0 is I 0 (I 0 > 0), S 0 (S 0 > 0), and R 0 (R 0 � 0), respectively: e phase trajectory of the ASIRS model is analyzed. e S-I plane is called the phase plane, and the domain of the phase trajectory is (S, I) ∈ D: Let δ � β/α. e following equation can be obtained:

Journal of Advanced Transportation
e phase trajectory diagram is shown in Figure 4. When t ⟶ ∞, the limit values of I(t), S(t), and R(t) are I ∞ , S ∞ , and R ∞ , respectively: (1) No matter how the initial values of S 0 and I 0 change, the airport delay situation will eventually disappear, I ∞ � 0. (2) In equation (10), let I(t) � 0. e value of S ∞ can be calculated, which is the root value of equation (11) in the range of (0, δ). S ∞ is the abscissa of the intersection point between the phase trajectory and transverse axis in the range of (0, δ): (3) If S 0 < δ, I(t) increases first; if S(t) � δ, I(t) reaches its maximum and then decreases to zero. At the same time, S(t) is monotonically reduced to S ∞ : where I m is the maximum of I.
According to the above analysis, the following conclusions can be drawn: (1) If αS(t) < β, I(t) increases and the airport delay will spread to more airports (2) If αS(t) > β, I(t) decreases, the delay situation of the airport network will be alleviated, and the airport delay will not spread to others (3) If αS(t) � β, I(t) reaches the maximum and the delay situation of the airport network is the most serious 4.2. Parameter Analysis. As mentioned above, the ASIRS model contains three parameters: α,β, and c. To investigate the change of I(t), S(t), and R(t) over time, suppose α � 1, β � 0.1, and c � 0.1. e change of I(t), S(t), and R(t) is shown in Figure 5.

Analysis of Parameter α.
First, the influence of α on airport delay propagation is discussed. Figure 6 shows the changes of I(t), S(t), and R(t) under different values of α with the assumptions that β � 0.1 and c � 0.1. As seen, the higher α is, the earlier I(t) and R(t) reach their peak value and the higher the peak value is. e higher α is, the earlier S(t) reaches its minimum and the lower the minimum is. us, it can be concluded that, as the speed at which airport delay propagates increases, more airports will be infected more quickly, and more airports will be recovered. Figure 7 shows the changes of I(t), S(t), and R(t) under different values of β with the assumption that α � 1 and c � 0.1.

Analysis of Parameter β.
When the value of β becomes smaller, the peak of I(t) appears later and the trough of S(t) appears later also. e trends of R(t) with β � 0.5/1 and β � 0.1/0.2 are not the same.
us, R(t) is much more complex than I(t) and S(t). Figure 8 shows the changes of I(t), S(t), and R(t) under different values of c with the assumption that α � 1 and β � 0.1. e higher c is, the earlier I(t) reaches its peak value and the higher the peak value is. e lower c is, the later S(t) reaches its minimum, the lower the minimum is, and the higher R(t) is. Parameter c has little effect on I(t) and S(t) in the early stage and mainly affects the later stage.

Statistical Calculation of I(t), S(t), and R(t).
According to Section 3.2, we calculate I(t), S(t), and R(t) based on the following criteria: , the airport is recovered at time t and infectious at time t − 1 (3) Apart from the above situations, the airport is susceptible and can be easily infected by infectious airport  Journal of Advanced Transportation

Determination of ASIRS Parameters.
e hourly states of all the airports in the network can be identified based on the above criteria using Tables 1 and 2. Taking the airport ZBNY in October 1, 2015, for example, Figure 9(a) shows the time varying state of airport ZBNY. It can be seen that the airport ZBNY is infected in {6 : 00-10 : 00, 14 : 00-20 : 00}, susceptible in {0 : 00-6:00, 11 : 00-14 : 00, 21 : 00-24 : 00}, and recovered in {10 : 00-11 : 00, 20:0-21 : 00}. We also investigate the air traffic flow of ZBNY, which is shown in Figure 9(b). Comparing Figures 9(a) and 9(b), we find that the larger the traffic flow is, the more the airport tends to be infected. It should be noted that, although the airport has a small number of flights, the flight delay is serious, which may be deduced by the delay propagation in the airport network.
In addition, we find that there are 7 infectious airports in 4 : 00-5:00: ZBAA, ZGSZ, ZUCK, ZGGG, ZSHC, ZSPD, and ZHCC. Six of these are the top 10 airports by throughput in China (the throughput of airports is provided by the CAAC), as shown in Table 5. ere are many flights in the six airports. Large departure delays may influence the operations of the arrival airports, and the flights will arrive at or depart this airport with big delays. us, the phenomenon of delay propagation appears. e simulation model is ASIRS established in Section 4.1, as shown in equation (7). e simulation method is that we use the real information of flights in China to calculate the values of I(t), S(t), and R(t) and then calibrate the parameters of α, β, and c. us, the ASIRS model for simulating the real flight data can be obtained.

Next, we calculate the values of I(t), S(t), and R(t).
Partial statistical results are shown in Table 6. e parameters in the ASIRS model are calibrated based on the statistical values of I(t), S(t), and R(t) by using the numerical simulation method. For every day's traffic situation, we can always build an excellent ASIRS model. Taking October 1, 2015 (there are 16804 scheduled flights connecting 199 different commercial airports), for example, we find that the model fits the actual operation situation to the highest level when α � 0.47, β � 0.175, and c � 0.45.

Accuracy of the Long-Term Simulation of Delay
Propagation.
e values of I(t), S(t), and R(t) for October 1, 2015, in Figure 10 are the actual values, which are counted based on the database. e values of I(t), S(t), and R(t) in Figure 11 are the simulations of the ASIRS model for that day.
Comparing Figure 10 with Figure 11, it can be seen that the long-term trends of the actual and predicted values of I(t), S(t), and R(t) are similar, although there are some differences between local values. us, from a qualitative point of view, we can conclude that the ASIRS model is reasonable and can describe the real situation of airport delay propagation to a certain extent.
To further examine the application of the ASIRS model, we also study the characteristics of airport delay propagation derived from the degree distribution.
Next, the degree k and the degree distribution p(k) of the Chinese airport network in the above two periods are counted. e degree of airport A represents the number of airports with flights to or from A in the statistical time period. us, the same airport may have different degrees at  Table 7.
As seen in Table 7, the maximum airport degrees of the nonpeak period and the peak period are 92 and 275, respectively. Figure 12 shows the relationship between k and p(k), with time periods 00 : 00-07 : 00 and 12 : 00-19 : 00, respectively. A scaling law between k and p(k) can be observed clearly.
Figures 12(a) and 12(b) illustrate two segments that follow the power laws: (1) 00 : 00-07 : 00 p(k) ∼ k −1.1966 , k ≤ 53, k −0.4054 , k > 53.     e degree distributions are strikingly different from those of random graphs, small-world networks, and scalefree networks. First, the degree distributions of the Chinese airport network display two segments and follow the Double Pareto Law. e degree of the critical airports is approximately 53 for 00 : 00-07 : 00 and 83 for 12 : 00-19 : 00. e smaller the exponent is, the stronger the heterogeneous characteristics of the network are [44]. e exponents of 12 :  Figure 9: (a) Time varying airport states of airport ABNY. (b) Time varying air traffic flow volume of airport ABNY. In (a), "1" represents infected state, "2" represents susceptible state, and "3" represents recovered state.    00-19 : 00 are smaller than those of 00 : 00-07 : 00, which indicates that the heterogeneous characteristics of the Chinese airport network in 12 : 00-19 : 00 are stronger than those in 00 : 00-07 : 00. us, airport delays during 12 : 00-19 : 00 spread over the airport network more easily, quickly, and widely.
Obviously, the results from the ASIRS model and the degree distribution are consistent, which further indicates that the ASIRS model is well suited for characterizing airport delay propagation.

Accuracy of the Short-Term Prediction of Delay
Propagation. As mentioned above, the ASIRS model can describe the characteristic of delay propagation in the long term, but the forecast accuracy is not very good (seen in Figures 10 and 11). How about the forecast accuracy of the ASIRS model in the short term?
To gauge the forecast accuracy, we introduce the probabilistic prediction method, which is common to delay forecast [45]. Intercepting the delay period (the period is usually no more than 4 hours) on a typical day, we forecast the number of delayed airports by constructing ASIRS models for every 4 hours. e prediction results are shown in Figure 13.
We use the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) to compare the accuracy of the two methods.
where A t is the actual value in the time interval t, F t is the forecast value in the time interval t, and T is the number of time intervals. Table 8 shows a comparison of the two methods. It can be seen that the MAPE and RMSE of the SIRS model are smaller   than those of the probabilistic model. us, the ASIRS model is more accurate than the model based on probability in forecasting the number of delayed airports in the short term.

Conclusion
Understanding the process and evolution of airport delay propagation is very important for both air traffic management and aviation planning. In this study, we investigated the mechanism of delay propagation among airports from a new perspective: (1) e delay fluctuations of airport networks are studied. To quantify the delay dynamics, we collected the airport delay at 199 Chinese airports and identified the existence of a certain scaling law, which indicates that the dynamic of airport delay is dominated by a propagation factor. (2) e ASIRS model for airport delay propagation is presented to reveal the macroscopic appearance of delay propagation. e modeling approach is data driven in the sense that it is based on real China performance data. (3) e long-term characteristics of delay propagation is described through building the ASIRS model. e accuracy of the short-term prediction of delay propagation is also examined.
It is remarkable that the airport delay is the result of the coupling of different factors, and there is no information on delay factors in the datasets. We cannot determine a delay is caused by which factors. us, we study the delay propagation from the overall delay data and simulate the overall delay without considering the specific factors.
Our ongoing work involves further calibration and validation of the ASIRS model. It is interesting to compare the epidemic model of airport delay propagation in different countries and investigate the practices of the countries. We will come up with insights for mitigating airport delay from such international comparisons.

Data Availability
e data used to support the findings of this study have not been made available because the data also form part of an ongoing study.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.