Beyond Linear Delay Multipliers in Air Transport

Delays are considered one of the most important burdens of air transport, both for their social and environmental consequences and for the cost they cause for airlines and passengers. It is therefore not surprising that a large effort has been devoted to study how they propagate through the system. One of the most important indicators to assess such propagation is the delay multiplier, a ratio between outbound and inbound average delays; in spite of its widespread utilisation, its simplicity precludes capturing all details about the dynamics behind the diffusion process. Here we present a methodology that extracts a more complete relationship between the inand outbound delays, distinguishing a linear and a nonlinear phase and thus yielding a richer description of the system’s response as a function of the delay magnitude. We validate the methodology through the study of a historical data set of flights crossing the European airspace and show how its most important airports have heterogeneous ways of reacting to extreme delays and that this reaction strongly depends on some of their global properties.


Introduction
The analysis and characterisation of delays is one of the most important research topics in air transport management, due to their profound implications in the cost efficiency, safety, and environmental friendliness of the system [1][2][3][4].One of the most important metrics for understanding delays propagation is the delay multiplier (DM).Firstly introduced in [5], it is defined as the relation between the downline () and the initial () delays as seen by an airport or an airline: DM = /, or alternatively as DM = (+)/.Here downline delay refers to the departure delay of the same tail number airplane that had an original delay at landing.High DM values indicate that an initial inbound delay is, on average, multiplied in the airport, thus triggering a (average) higher outbound delay; conversely, small values suggest that delays are efficiently recovered or dampened.Since its introduction, it has been used countless times, both to understand the delay propagation taking place at one single airport [6,7], across multiple airports [8,9], and as seen by one airline [10][11][12].
The main advantage of the delay multiplier is its simplicity; a single number is able to synthesise the capacity of the system to cope with delays and assess whether a single element (e.g., an airport) is negatively contributing to the delays observed in the system, or, on the other hand, if it has adequate procedures in place to reduce them.This advantage is, at the same time, its main drawback; little or no information is yielded about the details of this multiplication/absorption. Let us consider the case of the DM associated with an airport.It is reasonable to think that airports may behave according to two different phases, that is, in a different way when normal or abnormally high delays are received from inbound flights [13].One may thus ask the following question: is a high DM value due to a systematic delay generation, or is it the result of the presence of a few abnormal events?In the former case, is this due to an inadequate capacity of the airport, or due to too small airline buffers?
In this contribution, we propose to study the airportmediated propagation of delays by reconstructing a more rich metric for delay multiplications.Such metric is based on a polynomial fit between inbound and outbound average delays, calculated within one-hour time windows.This allows synthesising two indicators: the linear part of the system's response, that is, corresponding to small delays and yielding a meaning similar to the standard delay multiplier, and the nonlinear part, representing the additional response triggered by abnormally high delays.We show how these two metrics can be calculated for different time shifts, allowing differentiating between the propagation due to airportrelated issues (appearing concomitantly at departures and arrival) and the propagation due to aircraft turn-around (appearing after some time).We further validate the proposed methodology by analysing a 5.7 million flights' data set corresponding to the European airspace and characterising its most important airports.We demonstrate how the linear and nonlinear parts of the delay propagation are uncorrelated, as some airports strongly propagate small delays, but are better able to stop the propagation of large ones.Each flight is characterised by an array including the day; the hours of take-off and landing; the origin and destination airports; and the corresponding take-off and landing delays.Delays have been calculated as the difference between the actual landing/take-off time and the ones predicted in the last filed flight plan.

Methods
All flights have been preprocessed, to discard those for which the previously listed fields were not available.First of all, some days of October 2011 have been filtered, due to the lack of executed trajectories information, reducing the effective number of days to 286.Additionally, all intercontinental flights have been discarded, as delays outside the European airspace are not reported, thus reducing the number of analysed flights to 5.7 million.Afterwards, negative delays (i.e., flights departing or arriving before what was planned in the last filed flight plan) have been set to 0. The rationale of discarding negative delays resides in the fact that they are not expected to generate negative reactionary delays.In other words, an aircraft arriving before the scheduled time might generate small delays to other departing flights, for instance, because of an unexpected utilisation of the runway, but will definitively wait until the planned time before taking off again.On the other hand, a considerable positive delay (e.g., one hour) will surely postpone the next planned departure of the same aircraft.The effects of negative delays are thus negligible and must not be used in this context to compensate positive delays.An excerpt of the resulting data set is shown in Table 1.

The Proposed Metrics.
With the aim of reaching a more complete characterisation of the delay propagation process, here we propose a metric that highlights the dynamics of system's response as a function of the inbound delay.In other words, it is based on the analysis of the nonlinear relationship between in-and outbound delays of flights on a per-airport base.
In broad terms, the average landing and departure delays are calculated, for a given airport, considering 1-hour time windows.Note that here we consider all departure delays and not just the one of direct consequence of inbound flights, as in the downline delay of the standard DM [5].
All time windows presenting a similar inbound average delay are then grouped, and the corresponding average outbound delay is calculated, as the average of the average outbound delay for these same time windows.Finally, a polynomial fit is performed on the resulting values.
While the proposed methodology is prima facie easy to describe and visualise, for the sake of completeness, it is necessary to describe it in mathematical terms.This can be done in seven steps, which are listed here below and developed in the following pages: (1) Create inbound and outbound data sets.
(3) Compute the evolution of the average inbound and outbound delays.
(4) Cluster the average inbound delays in bins of 1 minute.
(5) Compute the corresponding average outbound for each bin.
It is firstly necessary to consider the subset of all flights departing from one given airport and, on the other hand, a second subset of all flights arriving at that same airport (step 1).For the sake of simplicity, we will here consider the example of Munich airport (EDDM), although the same methodology is valid for any other airport of the study.Both subsets are represented as vectors, one of them outlining the arrival delay of the flights landing at EDDM and the other listing the departure delay of the flight departing from EDDM:  1).The reader should also note that the two vectors' lengths  and  are usually distinct; that is, the number of flights arriving at and departing from EDDM in this case and at/from any airport in general differs.
A direct comparison of these two vectors is impossible due to the fact that arrivals and departures are not synchronised (step 2).It is indeed possible, during specific hours (e.g., early and late hours), to have only arrivals and no departures, or the opposite.Flights are then grouped according to their time stamp, which by definition represents 1-hour time windows.In order to reduce noise, time stamps with less than 3 landing and 3 take-off operations are discarded.Both vectors are then merged by grouping the inbound (  )  and outbound (  )  delays occurring at the same time stamp: ( is an index that denotes a specific day and a specific hour where at least 3 landing and 3 take-off operations took place (  ≥ 3).Then, inbound and outbound delays are averaged at each time stamp and the pair is assigned to the following bivariate vector D (step 3): ≤ min (, ) . ( D thus represents the average outbound delay corresponding to each average inbound delay suffered by the airport.  and    correspond to the average delay of the flights landing at and taking off from EDDM at the time stamp   .
The aim of the methodology consists in extracting the relation between inbound and outbound delays.In other words, for an increase of Δ of the average inbound delay, we extract how much the departure delay on average increases.For that, let us consider the range of Then, for an average landing delay of , we have an average corresponding departure delay of (step 5): Moreover, delays are not necessarily propagated within the same hour; outbound delay might be caused by an inbound delay that happened  hours before.To tackle this problem, a phase  must be introduced (step 6): At this point, we have sliced the average inbound delay in parts of one minute and calculated for each of these minutes the corresponding average outbound delay.This can be resumed in the following vector: The last step of the method (step 7) consists in a least squares fitting of a polynomial function of degree 2 between [1, 2, . . ., ] (i.e., the average inbound delay) and Δ  EDDM (i.e., the corresponding average outbound delay  hours later): represents the linear part of the system's response, that is, how outbound delays are linearly linked to the inbound at each airport, while  highlights the nonlinear part.In order to make those two coefficients comparable, we further calculate a normalised quadratic coefficient, defined as  * = /.
Summarising, the proposed methodology yields two different metrics: (1) A linear coefficient of the fit, , representing the correlation between arrival and departure delays for small delay values.

Results
In this section, we present the results obtained by the proposed methodology for the European airports, according to the 2011 data contained in the data set described in Section 2.1.Figure 1 depicts a scatter plot of the average departure delay as a function of the average delay at landing for the 10 most important European airports, in terms of the number of operations recorded, for a phase of zero.It can be appreciated that, in all cases, departure and arrival delays are positively correlated, which is to be expected; this is confirmed by the linear coefficient being always positive (Figure 1(b), top graph).At the same time, the normalised quadratic coefficient (bottom graph) presents a more heterogeneous behaviour.Some airports, for example, Paris Charles de Gaulle (LFPG) and Roma Leonardo da Vinci (LIRF), present a marked negative behaviour; note how, for high inbound delays, the fit is almost horizontal, thus indicating that the outbound delay becomes independent of the inbound one.On the other hand, the opposite situation is exemplified by the Munich airport (EDDM), in which the positive quadratic coefficient indicates that outbound delays increase superlinearly with inbound delays.The reader may note the higher dispersion of points for high inbound delays.These statistical fluctuations are due to the lower occurrence frequency of high delays, which results in a larger uncertainty of the average.
In order to confirm that a negative normalised quadratic coefficient, or equivalently a horizontal linear fit, indicates an independence between inbound and outbound delays, Figure 2 presents three different validations for the EDDM airport.Specifically, Figure 2   Finally, the third and fourth panels represent the result of considering large phases, that is, 12 and 36 hours, for which little or no correlation is expected.In all cases, the fit is almost lineal, with negative quadratic coefficients in the two latter cases.
When considering a phase of zero, as in Figure 1, inbound and outbound delays are matched for flights operating in the same 1-hour time window.As those flights are independent (as a single aircraft seldom operates twice in such a short time frame), the correlation among delays is due to external factors.These may be, for instance, the global traffic level at the airport; the more operations are performed, the easier it is to suffer from delays on both departure and arrival.
To complement the results of Figure 1, Figure 3 presents the same graphs for a phase of 1 hour.The turn-around process usually requires between 60 and 90 minutes [14]; therefore, when considering a phase of one hour, one is effectively matching aircraft arriving and departing with a difference of 60-120 minutes.In other words, aircrafts have enough time to execute one turn-around, and therefore the correlation between inbound and outbound delays is expected to be due to the presence of reactionary delays.The focus is thus shifted from the airport to the airline operation and procedures.The more extreme behaviours that appear in this case are worth noting: three airports (EDDM, Munich, LEMD, Madrid Barajas, and LEBL, Barcelona El Prat) present a superlinear behaviour, while five of them (EHAM, Schiphol, LFPG, Paris Charles de Gaulle, EGLL, London Heathrow, LIRF, Roma Leonardo da Vinci, and LOWW, Vienna) present a loss of correlation for high inbound delays.
In order to better understand the behaviour of the system as a whole, Figure 4(a) depicts the histogram of the normalised quadratic coefficients for the most important European airports.Only airports for which enough delay data were available (i.e., for which all points in the scatter plot were defined) have here been considered, for a final set of 47 airports.On average, large inbound and outbound delays are not correlated; 66% of the coefficients are negative, with an average of −4.34 ⋅ 10 −5 and a mode of −5.3491 ⋅ 10 −4 ; nevertheless, an important number of airports show a superlinear behaviour.This is further depicted in Figure 4(b), which represents the scatter plot for the three airports with lowest (top graphs) and highest (bottom graphs) quadratic coefficients.The latter case is especially interesting; the airports of Rodhes (LGRP), Verona Villafranca (LIPX), and Iraklion Nikos Kazantzakis (LGIR) both exhibit high outbound delays and note that the vertical scale has been changed to 0-60 minutes to accommodate all points.
We finally study if the normalised quadratic coefficient is related to some airport properties.Figure 5 presents six scatter plots, respectively, representing (Figures 5(a)-5(f)) the coefficient as a function of the following: (i) The airport size, that is, its total number of departure and arrival operations: note that as intercontinental flights are here not considered some long-range hubs rank lower than expected.
(iii) The average inbound (black squares) and outbound (green diamonds) delay of delayed flights, in seconds.
(iv) The entropy of the airlines distribution at the airport: this metric assesses whether one airline dominates the activity at the airport or if on the other hand multiple airlines are sharing the resources.It is calculated as follows: being the considered airport and   () the fraction of flights operated by airline  at  (over the total number of operations).The lower   is, the more one airline is dominating the airport.
(v) The entropy of the distribution of aircraft types, operating at the considered airport: for the sake of simplicity, aircrafts have been classified in three categories: turbofan narrow body (above 80 seats), turbofan regional aircraft (below 80 seats), and turboprop.Other types of aircraft, for example, freight ones, have been discarded; additionally, note that wide-body aircrafts were not present in the data set, as intercontinental flights have been discarded; see Section 2.1.
Given the probability   () of a flight being operated at airport  by an aircraft of type , with  ∈ (1, 2, 3), the corresponding Shannon entropy is defined as (vi) The number of flights operating at the airport, per available runway.
It can be appreciated that the quadratic coefficient is mostly independent of these metrics.Specifically, a linear fit (solid lines in Figure 5) yields -squared of 0.066 (slope of 8.81 ⋅ 10 −3 ) for the airport size and of 0.067 (slope of 88.32) for the standard delay multiplier.This last result is of special importance, as it confirms that the proposed metric conveys information that is complementary to the standard approach of the delay multiplier.The quadratic coefficient is slightly more correlated with the airport average delay, with -squared of 0.163 (slope of 1.34 ⋅ 10 5 ) for inbound and of 0.340 (slope of 1.81 ⋅ 10 5 ) for outbound delays.Both the airlines distribution entropy and the aircraft type entropy present a small correlation, almost null in the latter case, with, respectively, -squared of 0.067 (slope of 8.81 ⋅ 10 6 ) and of −0.029 (slope of −23.73).Finally, the correlation with the number of flights per runway yields -squared of 0.072 (slope of 3.28 ⋅ 10 6 ); note that this metric is strongly correlated with the number of operations at the airport (-squared of 0.805), as runways are built according to the forecasted demand.
While these characteristics are not able to explain the dynamics of delays propagation at one airport when considered independently, a deeper understanding may be obtained by considering them altogether.Towards this aim, we propose the use of predictive models that, taking as inputs the previously described characteristics, try to forecast the real value of the normalised quadratic coefficient.The goodness of the prediction, assessed through -squared of the lineal fit between the real and forecasted values, can then be used as an indicator of the quantity of information that these characteristics encode about the behaviour of the airports.We here consider four commonly used regression models: (1) Linear regression model: mathematically, the predicted value of  * at one airport is given by 1 ⋅ ⋅ ⋅  7 being the seven airport features and they are as follows.

Variables and Results of the Fit
Variables  1 : average arrival delay  2 : average departure delay (2) Regression tree: an evolution of the classical decision tree classification model [15].It involves generating comprehensive tree structures that predict the records' value by sorting them based on attribute values.Each node in a decision tree represents an attribute in the record under analysis, while each branch represents a value that the attribute can take.The Classification And Regression Tree (CART) model has been considered here [16].
(3) LASSO: the Least Absolute Shrinkage and Selection Operator is a regression method that involves penalising the absolute size of the regression coefficients, thus allowing some regression parameter estimates to be exactly zero [17,18].This is especially useful when some of the features are expected to be highly correlated, as, for instance, in this case, the size of the airport and the number of flights per runway.
(4) Multilayer perceptron is a regression model inspired by the structural aspects of biological neural networks and composed of a set of connected nodes in which each connection has a weight associated with it; the resulting network learns the classification function adjusting the node weights [19,20].Following the standard configuration, neurons were organised in three layers: an input one, with a number of neurons equal to the number of input features; an intermediate, or hidden one, with ten neurons; and a final output layer comprising just one computational element.
The training has been performed with the standard backpropagation algorithm [21].
Table 1 reports the -squared achieved by the four considered models.The multilayer perceptron model yields the best results: an -squared of 0.6653, indicating that it is able to explain the 66.53% of the variance of  * .This is partly to be expected, as this is the only model, of the four considered, that includes nonlinearities between parameters.On the other hand, the three other models present a consistent behaviour in the range 0.5409-0.5499.
In spite of its relatively low performance, we present further details of the linear regression model, as it is the only one

Discussion and Conclusions
In this contribution, we have explored an alternative formulation of the well-known delay multiplier (DM) metric, aimed at better characterising the phenomenon of delay propagation in air transport.It is based on a nonlinear fit between the average inbound delay at one airport, calculated within 1-hour time windows, and the corresponding observed response, that is, the average outbound delay in the same window.If the linear part of the fit roughly corresponds to the classical DM, the nonlinear coefficient yields additional information about the behaviour of the system under strong stresses.By analysing a large set of flights operating over the European airspace, we have shown that airports can be classified into two groups: those in which outbound delays get independent of inbound ones, as indicated by a negative normalised quadratic coefficient, and those in which large inbound delays are propagated, on average, under the form of large outbound delays.Independently of this, a linear relationship can be observed in both groups between small inbound and outbound delays; this indicates that the system reacts in a linear fashion to small perturbations, as represented by the standard delay multiplier, for then bifurcating at increasing stresses.
We have further demonstrated, by means of a linear model, that half of the nonlinear behaviour of an airport can be explained in terms of simple characteristics, like the average inbound and outbound delays or its size; this suggests that such behaviour is only partly the result of local idiosyncrasies and that global regulations and procedures play a relevant role.
The gain in understanding yielded by the proposed metric comes at the cost of increased data requirements.In order to reconstruct the scatter plots of outbound versus inbound average delays (like the ones of Figures 1 and 3), a large number of time windows are required, with both a specific inbound average delay and a large enough number of operations; these may not be available for small airports, especially for those operating only few flights per hour.Note that the nonlinear fit can be obtained even if some points are missing; yet, this will increase the relative weight in the fit of the available points, thus potentially introducing biases.
For the sake of simplicity, we have here opted for a quadratic polynomial fit, while any other nonlinear functions may in principle be considered.One may, for instance, pursuit a better fit by increasing the order of the polynomial fit; we have nevertheless chosen against this possibility for two reasons.First, the objective here is not reaching a perfect fit, but instead a fit that we can easily explain and interpret.Following the previous example, and in spite of achieving a lower error, it would be more complex to understand and classify airports if a fourth-order polynomial was used, as the nonlinear behaviour would then be described by three different and interacting coefficients.Second, only 30 points are available in each scatter plot, as average delays above 30 minutes are extremely infrequent; with higher order functions requiring a larger number of parameters, there is an important risk of overfitting.
The proposed metric has here been applied as an airport descriptive tool; that is, it has been used to describe the behaviour of airports and classify them according to the nonlinear part of their reaction.Future works may easily expand this scope.For instance, the delay propagation process may be studied from the point of view of an airline (as the aggregation of all its flights, independently on the connected airports) or of an Area Control Centre (ACC, as the aggregation of all flights entering and exiting the corresponding air space).Additionally, the metric can be used as a predictive tool to forecast the outbound delays at one airport, given a forecast of its inbound delays, and as a prescriptive tool, to guide the design of new procedures, once a causal analysis is used to highlight which elements are responsible for positive and negative values of the quadratic coefficient.

2. 1 .
Data Set.Information about flights crossing the European airspace has been extracted from the Flight Trajectory (ALL-FT+) data set as provided by the EUROCONTROL's PRISME group.The data set covers the period from 1st of March to the 31st of December 2011, including a total of 10.3 million flights.
and bins of 1 minute (step 4) as follows: bin

Figure 1 :
Figure 1: Behaviour of the top 10 European airports for a phase of zero.(a) depicts the ten scatter plots of outbound versus inbound average delays and (b) the sign and magnitude of the linear and normalised quadratic coefficients.Green and red, respectively, represent airports with highly negative and positive quadratic coefficients.
(a) represents the original scatter plot of Figure 1.
Figure 2(b) depicts the result of a randomisation, in which the delays are randomly assigned to a new flight, thus effectively breaking any temporal correlation.

Figure 2 :
Figure 2: Validation of the meaning of the horizontal fit.From (a) to (d), the panels represent the scatter plot for the following: the original Munich airport data; a randomisation of the delay data; a phase of 12 hours; and a phase of 36 hours.

Figure 3 :
Figure 3: Behaviour of the top 10 European airports for a phase of one hour.Panels and colours represent the same information as in Figure 1.

Figure 4 :
Figure 4: Global behaviour of the European system.(a) represents the histogram of quadratic coefficients for the top 200 airports and (b) the three airports with lower (top graphs) and higher (bottom graphs) quadratic coefficients.

𝑥 3 :LASSOFigure 5 : 6 ⋅ 2 =
Figure5: Normalised quadratic coefficient and airport properties.From (a) to (f), the six panels, respectively, depict the correlation between the coefficient and the size of the airport (number of operations); the airport standard delay multiplier; the average inbound (black squares) and outbound (green diamonds) delay; the entropy of the airline distribution; the entropy of the aircraft type distribution; and the number of flights per runway; see main text for definitions.

Table 1 :
Excerpt of the considered data set; see main text for details.
(  )  and (  )  , respectively, represent the landing and take-off delays (columns 6 and 7 of Table can easily be represented and interpreted.The last part of Table1reports the full model, while Figure6depicts a scatter plot comparing observed and forecasted values.Interestingly, the model resulting from the LASSO algorithm presents coefficients that are almost equal to the linear regression one (differences are of the order of 0.1%), suggesting that no characteristic can be safely discarded.It can be noted that this model is able to explain a large part of the original variance, thus indicating that the airport's response to delays is mostly driven by a linear combination of the seven previously identified features.If quadratic relations are included, the percentage of explained variance rises to 81.14%, yet at the cost of increasing the number of model parameters and thus increasing the risk of overfitting. that