Comparative analysis on propagation effects of flight delays : a case study of China airlines

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.


Introduction
Flight delays have been one of the important problems in airport management and flight scheduling, blurring the efficiency of air system operations and the choice of passengers. A significantly high number of passengers have suffered in choosing a reliable flight or airport. In Europe, more than 2.4 million flights are delayed or canceled each year due to various factors, such as weather, airlines, and air traffic control (ATC). In China, the delays can be even more serious in some airports because of the increasing demand of air travel. As found in FlightStats [1], among the 61 largest airports worldwide, seven of them from China have been reported at the bottom line based on the on-time performance (OTP) rates.
Although some airports and/or airlines have put efforts in airport/airline management to reduce the possible delays, flight delays become unavoidable in some airports. In reality, multiple factors that impact flight delay are in many cases independent. A part of the indicators are related to the departure delay such as aircraft type, flight schedule, and flight departure sequence (Dai and Liou [2]). Others may relate to external triggers, like weather and airport capacity. A comprehensive overview on the potential factors that influence flight delay has been given by Xu et al. [3], where more than 50 potential factors were identified based on a detailed airport analysis. To find a suitable solution of flight delays, Liu et al. [4] presented an optimized GDP strategy, where the operational efficiency, airline and flight equity, and ATC risks were taken into account. The simulation study showed that the proposed solution reduced the total delay time, unnecessary ground delay, and unnecessary ground delay flights.
In spite of the direct effects of main factors on flight delay, many delays are induced by the previous flight, where the delay may be propagated to the subsequent flights (Wu [5]; Hansen and Hsiao [6]). Vigneau [7] presented the propagation of sequential flight delays as the dependence between upstream arrival delays and downstream departure delays, which are caused by factors not related to air traffic control (ATC). Mott [8] presented a method for modeling air carrier departure delays by considering the correlation of a priori demand data to significantly reduce prediction error. The results demonstrated that the accuracy of the prediction 2 Journal of Advanced Transportation delay time can be improved. Kafle and Zou [9] examined the degree of delay propagation by considering the new delays generated by each sequence of flights based on a joint discrete-continuous econometric model. They found that connected resources can significantly influence the initiation and progression of delay propagation.
Meanwhile, studies also found that delay time may be magnified when two subsequent flights are sharing the same aircraft (Kondo [10]). Shervin et al. [11] analyzed the slack between sequential flights in the planned schedule when delay occurs and showed how delay propagation can be reduced by redistributing the existing slack in the planning process. Moreover, the econometric analysis from Kafle and Zou [9] quantified how much propagated delay will be generated out of the newly formed delays that occur to each sequence of flights and revealed the effects of various influencing factors on the initiation and progression of propagated delays. Pyrgiotis et al. [12] proposed an AND (Approximate Network Delays) model based on a queuing engine and a delay propagation algorithm to study the complex phenomenon of delay propagation between airports. The model was applied to the network composed of 34 busy airports in the United States. The results show that, in some major airports, especially in hub airports, the propagated delay tends to put off traffic demand. Apart from that, Xu et al. [13] found that departure delays were the primary reason for the over-one-hour delay at the destination airports, and the taxiing delay was related to the previous delay. However, departure delay could be absorbed by scheduled turnaround time. When the cascading delay exceeds 30 minutes, more than 80% of flights may reduce their actual turnaround time.
In spite of the studies mentioned above, existing literature, which involves regression analysis, neural networks, Bayesian network models, and simulation methods, mostly analyzes the flight delays and delay propagation by focusing on specific flight legs. Studies on the relations between arrival delay of upstream flight and departure delay of its downstream flight, considering the total delay time of the entire flight legs, are rare. In order to fill this gap, this paper contributes to the analysis of propagation effects of flight delays by proposing a copula-based approach. In this regard, the correlation between sequential flight delays under the influence of different delay factors is explored. The magnitudes of delays on the subsequent flights are compared and possible scenarios are designed to examine the possible reduction of delay propagation by the adjustment and improvement of flight schedules.

Methodology
As we consider the situation of flight delay propagation, which is the correlation between arrival and departure delay, that is, late arrival of one flight causes late departure for the next flight on the itinerary of the same aircraft, we use the concept of propagated delay in line with the definition given by Lan et al. [14]: the delay induced by its prior flight delay. More specifically, we focus on the sequential flight delays and −1 , which are the departure delay of flight and arrival delay of its upstream flight −1 , respectively.
To measure the correlation between sequential flight delays induced by different delay factors, this paper uses a copula function. Copula function has been typically used in the study of finance and hydrological events with an emphasis to examine the interdependency between various random variables (Nelsen [15]). It has been also used to measure the predictability of on-time gate arrivals using the degree of concordance between gate arrival delays and block delays (Diana [16]).
In general, a copula function combines the joint distribution function of random variable, ( 1 , 2 , . . . , ), and the marginal distribution function, 1 ( 1 ), 2 ( 2 ), . . . , ( ). A copula function satisfies (1) If the delay time of the upstream −1 was a continuous random variable and the delay time of the downstream that shares the same aircraft was a continuous random variable with a joint distribution function , then (1) can be transformed as follows: where ( −1 ) and 2 ( ) are the marginal cumulative distribution functions of −1 and and is a bivariate function. Then, we determine the dependence structure of the random variables −1 and by specifying a meaningful copula function to calculate the relativity between sequential flight delays.
The main issue related to the dependence between sequential flight delays is the magnitude of the influence of the upstream delay. It is necessary to understand how the downstream flight delay time changes according to the change of upstream delay time. In this particular case, the tail dependence analysis of copulas would be useful.
Let up and lo describe the upper-tail and lower-tail dependence of the random variables (dependence coefficient); then where is a probability value and −1 ( ), = 1, 2, represents the corresponding quantile of . Tail dependence coefficients and relativity measurements with extreme values can measure the tail dependence relativity of the delay times for sequential flights. That is, they measure the change in probability of departure delay for the downstream flight when the arrival delay of the Arrival delay Departure delay upstream flight changes to a great extent. The tail dependence coefficients are within [0, 1]. If = 0, then the propagated delays are progressively tail-independent; otherwise, they are tail-related. When is close to 1, the probability of changes in the downstream delays will correspondingly become larger when the upstream delays reach a certain value.

Data and Analysis
Flight data was sourced from an airline based in the Asia Pacific, and the data span is one year. In total, 9,325 flight delays were recorded. The data include flight data (e.g., time of departure and arrival), flight delays (e.g., time of the delay), and delays factors (e.g., delay causes). Figure 1 shows how propagated delay happened between sequential flights, and Figure 2 shows the distribution of delay causes. Around 70% of the delays are attributable to the factors related to airlines (air carrier delay and aircraft late arrival), weather, and ATC, where the air carrier delay takes the largest share (25%). Figure 3 shows that factors related to airlines account for the largest share during 9:00 a.m.-11:00 a.m., 11:00 a.m.-1:00 p.m., and 1:00 p.m.-3:00 p.m. This indicates that flights during these periods may be disturbed frequently    Figure 4 represents the delay distribution induced by weather. It shows that the largest proportion of delay happens during 7:00 a.m.-9:00 a.m. and 5:00 p.m.-7:00 p.m. Figure 5 represents the delays due to ATC. It shows that the delays during 7:00 am-9:00 am, 9:00 a.m.-11:00 a.m., 11:00 a.m.-1:00 p.m., and 5:00 p.m.-7:00 p.m. are higher than other times. To narrow down the scope of this study on flight delays, this paper will focus on major delay causes including airlines, weather, and ATC and examine how they affect flight delays and delay propagation.

Delays Induced by Airlines, Weather, and ATC.
In summary, it can be found that different delay factors during different periods within one day can have diverse effects on flight delays, which result in different levels of delay propagation through aircraft routing for the subsequent flights.

Exploring Sequential Flight Delays.
In order to examine the characteristics of delays in sequential flights and delay propagation, sequential flights that share the same aircraft resources were selected. Copula function is appropriate in identifying the tail relationship between different factors. In the current study, we intend to investigate how these factors impact on flight delay propagation. In addition, samples were chosen based on delay types in order to include those sequential flights with the upstream delays induced by airlines, ATC, or weather conditions and the downstream delays caused by the late arrival of aircraft. As shown in Table 1, the sample data were divided into three groups according to the delay factors of the upstream flights.
To identify the correlation between the three groups, the scatter plots, as shown in Figure 6, visualize the relationship between the delay times for sequential flights. These plots show that the discrete points have district-concentrated, fattailed distributions. Therefore, linear addictive models would not be suitable for describing the causal relationship between sequential flight delays.

Marginal Distribution of Sample Data.
In order to determine the marginal distribution for the three groups of delay data, four commonly used probability distribution functions are tested: normal distribution, lognormal distribution, Weibull distribution, and Gamma distribution (Wiboonpongse et al. [17]). Maximum likelihood estimation was used to estimate the parameters with continuous probability distributions. The Cramer-von Mises test was used to test the goodness of fit of the random variables and to determine the optimized probability distribution function (Cohen [18]). Results are shown in Table 2. It can be found that the marginal distribution function of data in sample complies with the Weibull distribution and the others comply with a lognormal distribution.

Estimation of the Copula Parameter and Goodness-of-Fit Statistics.
The joint distribution of the sample data was built using five copulas: Gaussian copula, -copula, Gumbel copula, Clayton copula, and Frank copula. The parameters were then estimated using the multistep estimation method (Chevillon [19]). The ordinary least squares (OLS) method was used to test the goodness of fit. Table 3 shows that the -copula function deserves the best fit for the data of group 1 (airline factors), while the Gumbel copula function yields the best fit for group 2 (weather factors) and group 3 (ATC factors). Hence, in the analysis mentioned in Table 3, we use -copula for group 1 and Gumbel copula for groups 2 and 3.

Tail Dependence Coefficients for Sequential Flight Delays.
In order to examine the tail dependency for sequential delays, the dependency coefficients were calculated. Figure 7 shows the density function and contour distribution for the three groups. Figure 7(a) shows that the sample points were predominantly distributed in the top right and bottom left areas with a high density and symmetrical distribution, indicating that the delay time of sequential flights due to airline operations was strongly related to these areas. That is, if the upstream delays are longer, the dependence between upstream and downstream delays is stronger; hence, there is a greater probability of downstream delays. Figure 7(b) shows that the sample points distribute mainly in the top right area with a tendency to diffuse in the surrounding areas. Figure 7(c) shows that the sample points clustered around the diagonal area and are concentrated at the top right. This result indicates that the distribution of sequential flight delays caused by weather (Figure 7(b)) or ATC (Figure 7(c)) is subtler in the upper tail area. Specifically, in the upper tail area, where the upstream delays are longer, the dependence between upstream and downstream delays is also stronger. Hence, there is greater probability of downstream delays.
According to (3), the tail dependence coefficient, , of sequential flight delays caused by these three factors can be calculated with different values of -quantiles. is within (0, 1). When is close to 1, flight delays will become larger. The maximum delay correlation coefficient is expressed using up . When is close to 0, the flight delay will become shorter; similarly, the minimum delay correlation coefficient lo can be obtained. The tail dependence coefficients for the three states, weather, airline, and the ATC, are shown in Figure 8. From the figure, one can observe the following: (1) As the upstream delays are short (i.e., when the value of decreases), the lower tail dependence coefficients lo for weather and ATC causes (groups 2 and 3) tend to be zero, which indicates that these causes have little effects on downstream delays. In contrast, for airline factors (group 1), even when the upstream arrival delays are short (i.e., lo = 41.8%), airline factor has large effects on downstream departure delays. (2) As increases to a certain value, up reaches a limit.
That is, when the upstream arrival delays are large, the dependence between arrival and departure delays remains constant. In Figure 8, when the upstream delay probability is close to 0.95, the dependency coefficient up remains the same in all three states. Note that the effect of upstream delay under ATC factor is the highest with up close to 60%.

Conditional Probability Analysis Using Copulas.
According to the above analysis, departure delay time of a downstream flight is related to how long its upstream flight was delayed. Therefore, we can calculate the probability of the downstream flight delay based on the upstream delay caused by different delay factors as follows: where The conditional probability ( 1 ≤ < 2 | 1 ≤ < 2 ) means that the probability of the delay time of the downstream flight is [ 1 , 2 ) when the delay time of its upstream flight is [ 1 , 2 ). As flight delay that is longer than 15 minutes is always considered to be delay time (Nelsen [15]), here we categorize the arrival and departure delays into five levels (between 15 minutes and 165 minutes) and added the extreme ranges [0, 15] and [165, +∞) for comprehensive investigation. Then the value of was calculated under the conditions of airline, weather, and ATC factors, respectively, using (4).
The conditional probability examines the delays propagated to downstream flights. The propagated delay could be decreased ( 1 ≥ 2 ), passed on ( 1 = 1 , 2 = 2 ), or increased ( 2 ≤ 1 ). Figure 9 shows the probability histograms in terms of delay decrease, pass-on, and increase in sequential flights caused by the three delay factors. Figure 9(a) shows that the probability of the delay reduction effect under weather factors is larger than the delay reduction effect under the airline and ATC factors, and this increases with an increase in upstream delay. Results show that delays 6 Journal of Advanced Transportation    due to weather from flight D tend to be lower than those from flight C, especially when the length of the delays from flight C increases. Figure 9(b) shows that the probabilities of delay passon under the three delay factors are similar: they first decrease with the increase in the upstream delay time; then in the category of [165, -], the probability increases abruptly. Therefore, if the upstream delay is long enough (at least 165 minutes) without any flight delay recovery actions, the delay is likely to be passed on or increased. Results show that, in terms of the delays related to airline and ATC factors, when the delay time of flights A and E exceeds 165 minutes, there is a high probability that the delay time of flights B and F will also exceed 165 minutes. Figure 9(c) shows the probability of delay effects on the downstream flight influenced by airline, weather, and ATC factors. The probability of the delay effects under the ATC factor has an overall increasing trend. That is, delays due to ATC from flight F tend to be higher relative to flight E, especially when delays from flight E increase. In contrast, the trend for the probability under the airline factor remains relatively constant.

Sensitivity Analysis.
In order to examine the influence of ATC and airline factors on the level of flight delay propagation, a sensitivity analysis was conducted, focusing on the possible effects of increasing the buffer time in flight schedules. Two scenarios that adjust the original flight schedule with an increase of turnaround buffer time by 10 minutes and 20 minutes, respectively, are created. The degrees of delay reduction of the adjusted schedules relative to the original schedule without changing buffer time are then compared. Concretely, the original flight delay data are processed as follows: if the turnaround time increases by 10 minutes, then the downstream flight delay time decreases by 10 minutes. The result is assumed to be zero if the original downstream flight delay time is less than 10 minutes, assuming that delays can be fully absorbed by scheduled buffer time. Figure 10 shows the delay reduction probability for an increase in the buffer time under ATC factors. The average delay reduction probability increases from 30% to 87% when the buffer time increases by 10 minutes, and this average probability increases from 30% to 96% when the buffer time increases by 20 minutes. This indicates that an additional 10 minutes of buffer time will make a large impact on the conditional probability of delay reduction, but an additional 20 minutes will provide only a marginal improvement beyond the effect of adding 10 minutes to the buffer time. Therefore, it would be more efficient to add 10 minutes rather than 20 minutes to the buffer time. Thus, it suggests for practices that, for flights that are frequently influenced by ATC factors, an extra 10 minutes of buffer time would be effective in reducing delay propagation probabilities to downstream flights. Figure 11 shows the delay reduction probability for an increase in the buffer time under airline factors. The average delay reduction probability decreases from 30% to 25% when the buffer time increases by 10 minutes, while the average probability increases from 30% to 54% when the buffer time increases by 20 minutes. This suggests that the 10minute extra time would have a small negative effect on the conditional probability, while the 20-minute extra time would have a positive effect. Therefore, it is suggested that 20 minutes should be added to the turnaround buffer time of flights that are frequently influenced by airline factors in order to reduce the downstream delays.

Conclusion
This paper investigated the correlation between sequential flight delays under the delay factors of airline, weather, and ATC. It also examined the effects of buffer time on delay propagation in flight scheduling in order to reduce delay propagation. The main findings of this paper can be summarized as follows.
First, different from the previous studies, the correlation of delay propagation between sequential flights by using the copula function can be analyzed by dependence coefficients up and lo . The lower tail lo showed that when delay from upstream flights is short, the factors of weather, airlines, and ATC have small impacts on the delay of downstream flights. In addition, the impact of airline factor is found to be the largest; it can be understood that when the delay between sequential flights is short, flight scheduling (especially the scheduling of buffer time) from airline plays an important role in delay propagation. Meanwhile, the upper-tail dependence coefficients up showed that, under all three factors, delay from upstream flights may have an increasing effect on its downstream flights and will finally become a constant impact, which indicates limited influence of the three factors. As the delay from upstream flights gets larger, the factor of ATC represents the largest impact on delay propagation, which means that ATC has the largest influence from external environments.
Second, according to the conditional probabilities of delay propagation, flight scheduling can be effectively improved through reducing the uncertainty under different delay factors. With different scheduled buffer time, it is found that, for the flights that are frequently influenced by airline factors, 20 minutes should be added to the turnaround buffer time for delay reduction. For flights that are frequently influenced by ATC factors, adding 10-minute buffer time would be sufficient for delay reduction.
As presented above, the copula function used in this study is useful and capable of analyzing the causal structure of sequential delays. Conclusions were drawn from each of the influential factors. However, this may not identify the possible joint effects of multiple factors. Future works therefore are necessary to further investigate flight delay correlation under the combined delay factors, such as weather, airline, and ATC.

Conflicts of Interest
The authors declare that they have no conflicts of interest.