Missed Approach, a Safety-Critical Go-Around Procedure in Aviation: Prediction Based on Machine Learning-Ensemble Imbalance Learning

,


Introduction
At the fnal approach, phase of the aircraft, bad weather conditions, runway excursions, and unstabilized approaches are the primary causes of nearly half of all aviation accidents worldwide. An unsafe landing can be avoided by initiating a missed approach (MAP) protocol. Although the protocol is in place to prevent unsafe landings, the complex and tough maneuvering operations and constrained time availability can increase potential dangers, especially in extreme weather conditions. Airport throughput and on-time performance of the fights are adversely afected, as is the workload of air trafc controllers and noise levels [1][2][3][4]. Most MAPs are made at low altitudes and slow speeds, and therefore, several actions must be taken instantly, such as altering the altitude, thrust, and fight path of the aircraft to ensure no conficts with nearby air trafc are encountered as a result. Since safe and efective MAP execution depends on both the pilot and the air trafc controller, their roles are crucial.
While wind shear is a local meteorological phenomenon, it may be associated with larger-scale phenomena like thunderstorms and cold fronts on the mesoscale or spatial and temporal scales. Te International Civil Aviation Organization (ICAO) specifes low-level wind shear as a sustained change of 15 knots or more in headwind or tailwind within 1600 feet above ground level, and this is a crucial weather phenomenon from an aviation perspective. It varies the lift of the aircraft, which can cause it to deviate from its intended approach path, having to put both incoming and departing aircrafts in hazard [5,6]. Tis can ultimately lead to the execution of MAPs (Figure 1).
Low-level wind shear during the fnal approach phase can have two potentially catastrophic efects on the aircraft, and pilots feel enormous pressure: (1) low-level wind shear can destabilize the glide path, and (2) it can induce the approach speed to veer away from the predefned threshold [7]. Te efects of decreasing and increasing shear from the headwind on an aircraft are depicted in Figure 2 under the assumption of no pilot intervention and a standard instrument landing system (ILS) approach with a glide slope of 3 degrees. Te frst case, depicted in Figure 2(a), is the decreasing headwind experienced by an incoming aircraft. Te aircraft's airspeed (its speed relative to the air fow around it) decreases as it gets closer to the ground, which reduces lift and typically results in a steeper descent angle due to the momentary force imbalance. If this occurs, the plane may crash land short of the runway. Te pilot in this case may increase the level of throttle to go-around (MAP) and try for a second attempt. With the same glide slope, but now with a rising headwind (Figure 2(b)) (3 degrees), the aircraft's airspeed increases in relation to the surrounding air fow, creating more lift that tends to result in a fatter angle of descent or even a climb. In this case, the pilot can choose to abort the landing and proceed with a MAP.
In either case, MAP may be activated, and pilots and controllers must be able to collaborate in order to make MAP decisions based on anticipated weather conditions, which is essential for the safe approach of aircraft.
Numerous works of research have examined the criteria and variables that contribute to MAPs occurrences. Numerous methods have been tried by researchers in the past for predicting and modeling MAPs based on a wide variety of input factors. Using statistical and machine learning models, Zaal et al. [8] assessed the infuence of environmental factors on MAPs. Tey noticed that visibility, wind speed, and localizer deviation signifcantly impact MAP decision-making. Chou et al. [9] used machine learning models to analyze the causes of MAPs. Tey observed that the categorical boosting model performed better than other models and that factors such as visibility, wind speed, and pressure were among the most important causes of MAPs. Donavalli et al. [10] utilized a statistical method to evaluate the weather factors infuencing MAPs. Te results indicated that thunderstorms and winds exceeding 29 miles per hour signifcantly increase the likelihood of MAP. However, visibility did not indicate a signifcant impact. Numerous occurrences of MAPs were attributed to adverse weather conditions, particularly convective storms on the approach path, according to Proud [11] who employed and compared various MAPs detection methods.
In addition, several researchers have emphasized the modeling of MAPs resulting from an unstable approach and a change in runway confguration. Using the sparse variation Gaussian process (SVGP) model, Singh et al. [12] developed a framework to demonstrate the aircraft's 4D trajectories during the fnal approach phase. Te experimental analysis revealed that SVGP delivers an interpretable probabilistic bound for the parameters of aircrafts that can assess deviation and detect anomalies in real time. To predict the occurrences of MAPs, the authors of [13] used a logistic regression model based on principal component analysis. Tey found that factors such as fight spacing, approach stability, departure air trafc, and ceiling have a signifcant impact on MAPs. A number of researchers have also investigated MAPs, focusing on the performance and behavior of pilots and air trafc controllers. Causse et al. [14] discovered that the unpleasant psychological efects are tied to the MAPs. Te uncertainty of a decision's outcome temporarily compromises pilot decision-making and cognitive performance. During the execution of MAPs, the authors of [15] analyzed the anomalies in pilot fying performance, including fight path deviations and visual scanning behaviors. According to Jou et al. [2], situational unawareness by air trafc controllers was the major cause of MAP occurrences. Kennedy et al. [16] observed that the age and experience of air trafc controllers have signifcant efects on MAP decision-making.
When compared to statistical models, machine learning models' predictions are less translucent because of their black-box nature, despite the fact that machine models are more fexible. Equally important for a more accurate evaluation of the model's efectiveness is a comprehensive explanation of how the model would actually work. Traditional methods for interpreting the outcomes of machine learning models involved feature ranking analysis, such as importance scores based on permutation. Even though the feature ranking interpretation can rank the signifcance of the diferent factors, it cannot exhibit interactions among factors or how much each factor infuences the model's prediction. Recent studies have utilized posthoc SHapley Additive ex-Planations (SHAP) interpretation tool, which is based on the notion of game theory, to assess the efect of diferent factors on the outcome [17]. Using SHAP in conjunction with machine learning models, it is possible to assess the importance and relative contribution of various factors on the prediction. Tey have been used in a variety of felds, such as the safety assessment of infrastructure projects [18,19]; clinical, medicine, and healthcare modeling [20][21][22][23][24][25][26][27]; transportation and trafc safety [28][29][30][31][32][33][34][35][36][37][38]; fnance and economics risk analysis [39][40][41][42]; and ofshore safety analysis [43]. 2 Advances in Meteorology Although MAPs are exceptionally rare, being able to foresee when they might occur in response to extreme weather is crucial. Prior research has provided useful insights into the numerous factors that contribute to MAPs; however, to the best of our knowledge, no studies have accounted for the impact of low-level wind shear and used cutting-edge ensemble imbalance learning strategies in conjunction with SHAP interpretation tool, and this area remains largely unexplored in the existing literature. Tis study aims to quantify the factors that contribute to Hong Kong International Airport (HKIA)-based MAPs triggered by extreme weather and situational factors by employing three state-of-the-art machine learning-ensemble imbalance learning techniques, namely self-paced ensemble (SPE) framework [44], balanced cascade model [45], and easy ensemble model [45] with three state-of-the-art machine learning classifers as base estimators, namely random forest [46], light gradient boosting machine (LGBM) [47], and extreme gradient boosting (XGboost) [48].
Te intent of this paper has three aspects. Te outcomes of our study would help pilots, air trafc controllers, and aviation policymakers truly comprehend the factors that raise the likelihood of a MAP occurring. Second, MAPs and the conditions that lend support to them may be regarded as unusual; therefore, analyzing the causes of MAP occurrences can help identify specifc measures that can be taken to lessen the frequency with which MAPs are executed. Te number of MAPs can be reduced by implementing mitigation strategies such as adjusting protocols, fight training, and technical equipment designs.
To that end, we frst examine the HKIA-based pilot reports (PIREPs) to identify the causes of the MAPs. When talking about pilot reports in civil aviation, the acronym PIREP is commonly used. Pilots who experience MAP and other forms of rough weather are directed to contact air trafc controllers via PIREPs. Low-level wind shear conditions (the magnitude, altitude, and encounter location from the runway threshold, as well as its causes), type of aircraft (narrow or wide-body), precipitation aspect (clear sky or rainfall), fight (HKIA inbound international or domestic), arrival (approaching) runway (07L, 07R, 25L, and 25R), and temporal factors including season of the year and time of day can all contribute to MAPs. Upon establishing which factors are related to MAPs, we build machine learning-ensemble imbalance learning models to test their efcacy. Te SHAP interpretation method is then used to determine how much each factor contributes to MAP occurrence.
Te remainder of this paper is structured as follows: Section 2 details the study's methodology, including its data source and analysis phases, machine learning-ensemble imbalance learning models, and posthoc SHAP explanation approach. In Section 3, we present the performance assessment of machine learning-ensemble imbalance learning models as well as posthoc explanation results via SHAP. In Section 4, we summarize our fndings and discuss their relevance.

Method and Data
In this study, machine learning-ensemble imbalance learning techniques, including self-paced ensemble (SPE), balance cascade, and easy ensemble, with state-of-the-art machine learning models as base estimators, i.e., random forest (RF), light gradient boosting machine (LGBM), and extreme gradient boosting machine (XGBoost), were used Te PIREP data were frst preprocessed for the missing values and removed irrelevant information, such as reported hazardous weather conditions during takeof. Ten, for the development and assessment of the machine learning-ensemble imbalance learning techniques, the data were split into training (70%) and testing (30%) datasets. In addition to model development, hyperparameters of LGBM, XGBoost, and RF (base estimators) were tuned employing Bayesian optimization in combination with 10fold cross-validation. Machine learning-ensemble imbalance learning techniques with fne-tuned base estimators were used for performance evaluation and model comparison. To further interpret the efect of each factor on the MAP occurrences, the optimal model was used to calculate Shapley additive values using the posthoc SHAP analysis tool, which provides factor importance and contribution analysis as well as factor interaction analysis. Te overall operational conceptual framework proposed in this study is shown in Figure 3.

Study Location.
Te HKIA (IATA code: HKG, ICAO code: VHHH) is the main airport of Hongkong, which is located on the artifcial island of Lantau of the subtropical coast of the Chinese mainland ( Figure 4). Hong Kong's regular convective weather consists of tropical cyclones and the southwest monsoon. In addition to causing fight delays, the convective weather also tends to bring thunderstorms and downpours to the area. Tis airport is one of the airports most susceptible to low-level wind shear. Innumerable observation-based and simulation studies demonstrated that HKIA's complex land-sea contrast and intricate orography are favorable conditions for the emergence of low-level wind shear [49]. Approximately every 400-500 fights, a signifcant low-level wind shear event takes place [50]. From 1998, when HKIA frst opened, through 2015, 97% of reports indicated level-level wind shear between 15 and 25 knots [51].

Data Processing from Pilot Reports (PIREPs).
Te aviation sector generally abbreviates pilot reports as PIREPs. Pilots alert air trafc controllers when they encounter potentially dangerous weather conditions. Turbulence, icing, and the status of the fight path are typical aspects covered in PIREPs. Since HKIA is particularly prone to low-level wind shear, specifc information about its occurrence is also provided in HKIA-based PIREPs. Tis includes the type of aircraft and the fight, the vertical low-level wind shear encounter locations (such as 200 ft, 500 ft, and 1000 ft), the horizontal encounter locations of low-level wind shear from the threshold of runway (such as 1 MF, 2 MF, and 3 MF), the magnitude of low-level wind shear, and the date and time of the occurrence of low-level wind shear. Te pilot may also report MAP in the HKIA-based PIREPs if it is executed due to low-level wind shear caused by sea breeze, or gust front, as shown in Table 1. It should be noted that the shear from a headwind and a tailwind is represented by plus and minus signs. From 2017 to 2021, PIREPs depicted a total of 1731 occurrences of low-level wind shear on both outbound and inbound fights. However, 1388 (80.418%) were indicated by HKIA inbound fights, while 343 (19.81%) were confrmed by HKIA outbound fights out of a total of 1731 occurrences. In this research, we focused on the factors that contribute to MAP during low-level wind shear events, so we only kept data from arriving fights and eliminated the data that was reported by outbound fights. As this study considers only low-level wind shear-induced MAPs, the data reported by approaching fights were kept in the dataset while that from outbound fights were excluded. In addition, the dataset was preprocessed to clean up the extraneous data. Once redundant and erroneous data were excluded, a dataset containing 765 low-level wind shear occurrences in which MAPs have been observed 184 times was achieved. Furthermore, a binary classifcation problem was setup by labeling all MAPs (the minority class) as "1" and all APs (approaches being the majority class) as "0." Table 2 lists all the variables and provides descriptions of each.

Machine Learning-Ensemble Imbalanced Learning
Techniques. In this study, three ensemble imbalanced learning techniques were proposed for the prediction of MAP occurrence: the self-paced ensemble (SPE) framework, the balance cascade model, and the easy ensemble. Tese ensemble-imbalanced learning techniques combine resampling and ensemble learning models. We incorporated LGBM, XGBoost, and RF as the base estimators to evaluate the performance of ensemble imbalanced learning techniques in MAP prediction. Te details of each ensemble imbalanced learning technique are provided below while the details of base estimators are provided in Appendix A-1.

Easy Ensemble Approach.
Easy ensemble is a straightforward approach and is known for its high performance. It is essentially an integration of an undersampling technique and a base estimator. First, bootstrap samples several subsets of a majority class independently and then builds and learns a base estimator for each subset. Te fnal robust ensemble is built by adding all of the generated classifers. Te working procedure of EasyEnsemble is described as follows in Algorithm 1) [45]:

Balance Cascade Approach.
In this approach, multiple balanced subsets of data are generated, and a weak classifer is learned for each subset. Tis approach reduces the majority of class training sets at each step by removing all correctly classifed instances. In two ways, it difers from EasyEnsemble. First, the weights are modifed in accordance with the false positive rates that a classifer must achieve. Second, instances that have been correctly classifed are eliminated. Tis sequential dependence focuses primarily on minimizing redundant information in the majority class. Following Algorithm 2) [45] is a description of the balance cascade's operational procedure.

Self-Paced Ensemble Approach.
In this study, we also proposed a newly developed SPE framework [44], which is an ensemble-based imbalance learning framework. Te basic concept regarding hardness harmonize and self-paced factor is highlighted as follows, which is then followed by the SPE algorithm.
(1) Hardness Harmonize. Each instance from the majority class is sorted into one of the "β" bins according to its hardness value. Every k th bin indicates a distinct degree of hardness. Te dataset is then rendered more equitable by under-sampling instances from the majority class while preserving the same total hardness contribution in each bin. Harmonize is the term used to describe this approach in the "gradient-based optimization" literature. Te initial iteration employs a similar strategy to harmonize the hardness. Nevertheless, hardness harmonize is not always used in all iterations. Te main cause of this is that as the ensemble  Advances in Meteorology classifer learns to ft the training set, the number of trivial instances rises. Consequently, there remains a signifcant amount of trivial instances after merely harmonizing the hardness contribution. Tese less instructive instances signifcantly slow down later iterations of the learning process. Instead, "self-paced factors" have been developed to enable under-sampling to be performed at a user-defned pace.
(2) Self-Paced Factor. Specifcally, the sample probability of bins with a huge population is gradually reduced after harmonizing the hardness contribution of each bin. A selfregulating factor (Ω) sets the rate of decay. In the presence of a large Ω, the simple hardness contribution harmonize takes a back seat to a heightened focus on the harder samples. Outliers and noise have less of an efect on the model's ability to generalize in the early stages of training because the framework prioritizes informative borderline samples. To avoid over-ftting, the framework keeps a respectable fraction of trivial (high confdence) instances as the "skeleton," even in later iterations when Ω is large. In Algorithm 3, we examine how the SPE framework works in detail. It is important to note that in each iteration, the hardness value is updated to choose the most useful data instances for the current ensemble. Te development of the self-paced factor has been modulated by means of the tangent function. As such, in the frst iteration, the self-paced factor is zero, and in the last iteration, it is infnite (see Algorithm 3 [44]).

Performance Metrics.
Overall classifcation accuracy is a common model performance metric that is calculated as the ratio of the total number of correct predictions to the total number of predictions. Tis metric tends to favor the majority class; therefore, it could be misleading if the data were not evenly distributed. Tis precludes the use of classifcation accuracy as a performance metric. Several other performance metrics, besides accuracy, can be used to deal with this issue. For the binary classifcation task, assume that n 1 denotes the sample size of the majority class and n 2 represents the minority class. Te total number of records of both majority and minority classes in the training set is n. Te binary classifer determines the likelihood that each instance is positive or negative. Tus, it produces four distinct results: true positive (∆ p ), true negative (∆ n ), false positive (∇ p ), and false negative (∇ n ), as illustrated in the confusion matrix ( Figure 5). Some important measures, including balanced accuracy, recall, precision, F1-score, and the geometric mean (G-mean), can be determined from the results of the confusion matrix, which are shown by the following equations:

Posthoc Interpretation of Ensemble Imbalance Learning
Model. In order to interpret the ensemble imbalance learning model, the SHAP interpretation tool is one of the posthoc explanation tools that were developed by Lundberg and Lee [17]. Te fundamental concept underlying the interpretation by the SHAP tool is to calculate the marginal contribution of each factor to the model's outcome and to interpret the results in both a global and a local context. A prediction value is computed for each instance during the training of the model, and the SHAP value corresponds to the value given to each factor in the instance. Equation (6) is used to calculate each factor's contribution, which is represented by the Shapley value.
where φ i is the contribution of i th input factor from the dataset. Π is the set of all the input factors from the dataset. Υ is the subset of given predicted factors from the dataset.
f(Υ i ) and f(Υ) are the outcomes with and without i th factor from the dataset, respectively. Using an additive factors imputation strategy, the SHAP tool generates an interpretable machine learning model. Te outcome of the model is represented as a linear sum of all the input factors, as shown by the following equation: (1) Input: Minority class in the training data (A), majority class in the training data (B), number of subsets T to sample from majority class instances set N, and the number of iterations k j required to learn the "base estimator" H j . (2) j ⇐ 0 (3) repeat (5) j ⇐ j + 1 (6) Randomly sample a subset B j from B, |B j | � |A| (7) Learn H j by using A and B j . H j is the base estimator with k j weak classifers h j,l and the corresponding weights distribution ω j,l . Te ensemble's threshold is ∆ j , i.e., H j (x) � sgn( Until j � T (9) Output: Te robust ensemble model: ALGORITHM 1: EasyEnsemble Approach.
(1) Input: Minority class in the training data (A), majority class in the training data (B), number of subsets T to sample from the majority class instances set, and the number of iterations k j required to learn the base estimator H j . (2) j ⇐ 0 (3) Te false positive rate ∇ pr ⇐ ������� � (|A|/|B|) that model H j has to achieve. Te ∇ pr is the basically misclassifcation rate of a majority class instances to the minority class (4) repeat (5) j ⇐ j + 1 (6) Sample a subset B j randomly from B such that |B j | � |A| (7) Learn the model H j by using minority class set A and subset B j . H j is the base estimator with weak classifers h j,l and corresponding weights distribution ω j,l . Te ensembles' threshold is ∆ j , i.e., H j (x) � sgn( (1) Input: Training dataset (x k , y k ) n 1 , hardness function (z), total number of bins (β), base estimator (ζ), the number of base estimators (∀), minority class in the training data (A), majority class in the training data (B) (2) Initialize: With the subsets of majority class (B ′ ) and minority class (A), train the base estimator (ζ) by utilizing random undersampling approach such that |B ′ | � |A| (3) for k � 1 to n do (4) Ensemble of the base estimators F k (x) � 1/k k− 1 k�0 ζ k (x) (5) Te majority class dataset is separated into β bins with regards to z(x, y, F k ): (b 1 , b 2 , ..., b ξ ) (6) Te mean hardness contribution can be obtained in the any i th bin as z i � s∈b i z(x s , y s , F i )/|b i |, i � 1, 2, ..., β (7) Te self-paced factor has been updated as Ω � tan(kΠ/2ζ), (8) For the i th bin, non-normalized sampling weight can be obtained as θ i � 1/z i + Ω (9) From the i th bin, perform the under-sampling with θ i / m θ i |A| instances (10) Using newly under-sample data subset, train ζ i (11) End ALGORITHM 3: Self-Paced Ensemble.

Advances in Meteorology
where z′∈ 0, 1 { } Λ . If a factor is supplied z′ � 1, else z′ � 0. Λ is the amount of the input factors that are supplied. φ 0 is the base value.
In this research, the optimal ensemble imbalanced learning technique was interpreted using the posthoc SHAP analysis tool, and the most important factors likely to cause MAPs were evaluated. Additionally, the SHAP tool in this study performed an analysis of the factor interaction.

Results and Discussion
Te ensemble imbalance learning techniques with various base estimators were used in conjunction with HKIA-based PIREPs to address the imbalance data problem and predict the occurrence of MAP under low-level wind shear conditions. Table 3 provides the aggregate statistics for all of the factors in the HKIA-based PIREPs. We used Pearson correlation analysis to look for correlations between the PIR-EPs' various factors. Based on Figure 6, a weak to moderate correlation is indicated by a Pearson's correlation coefcient with an absolute value between 0.019 and 0.62. Although we found a negative Pearson correlation coefcient of − 0.62 between causes of low-level wind shear and precipitation, we decided not to rule them out of further modeling due to the moderate nature of the relationship between the two factors. Both of these aspects are environment-specifc and could have a major efect if incorporated into the model. It is pertinent to note that the analysis was conducted within the Python programming environment. Te Python codes can be found in Appendix B.
To commence the modeling process, the PIREPs data were divided into two subsets: 70% of the data were used to train the ensemble imbalance learning models, while the remaining 30% were used for ensemble imbalance learning models performance evaluation. Before developing the models, hyperparameters tuning of machine learning models (base estimators of ensemble imbalance learning) were done, which was a critical step and can infuence generalization capability, prevents over-ftting, and reduces model complexity. Bayesian optimization strategy was employed to obtain the optimal hyperparameters for the base estimators by maximization of "G-mean." A 10-fold cross-validation approach was used in conjunction with Bayesian optimization that randomly splits the training set into 10 subsets (each time, nine subsets were being used for training and one for testing). Table 4 shows the hyperparameters list of diferent base estimators of ensemble imbalance learning models with the search space and optimal values.

Performance Assessment of Base Estimators of Ensemble
Imbalance Learning Models. In this study, the positive and negative classes were referred to, respectively, as MAP and AP. Initially, each base estimator (LGBM, RF, and XGBoost) was evaluated separately, without the use of an ensemble imbalance learning model. Using the testing dataset, the confusion matrix ( Figure 7) has been built, and the required performance metrics of balanced accuracy, precision, recall, F1-score, and G-mean have been derived. It is crucial to recognize that, unlike the AP class, the MAP class is a minority class and that we are more concerned with the correct classifcation of this class.
Only 28 out of 61 instances that were actually MAPs were correctly classifed by the LGBM model based on the testing data set, as shown in Figure 8(a). Meanwhile, 33 instances were incorrectly classifed, resulting in a recall value of 46.24%. Despite the fact that 181 out of 194 instances of the majority class AP were correctly classifed, we do not concentrate on this class because it is not of interest to us. As a result, the LGBM model was able to classify the majority of instances with a higher degree of precision. However, it was unable to efectively classify minority instances, which is the parameter that we are interested in. Te overall balanced accuracy was found to be quite low, equaling 41.40%%, while the F1-score was 55.43% and the G-mean was 65.33%. Following the evaluation of one more cutting-edge XGBoost model, a confusion matrix was obtained. Figure 8(b) demonstrates that 23 out of the 58 actual MAP instances were correctly predicted as MAPs, while 35 were incorrectly predicted as APs, resulting in a recall value of 40.21%. Te results obtained from the XGBoost model were marginally lower than those obtained from the LGBM model, with an overall imbalanced classifcation accuracy of equal to 34.32% and an F1-score of 46.65%. Te G-mean value was 60.41%. Afterwards, the RF model was also evaluated, and a confusion matrix was obtained. According to the standalone RF model, Figure 8(c) demonstrated that only 27 instances of actual MAPs were correctly predicted as MAPs, while 31 were incorrectly predicted as AP. Tis case also had a higher percentage of incorrect classifcations, yielding a recall value of 46.75%, precision of 66.42%, F1-score of 55.46%, balanced accuracy of 41.30%, and G-mean of 66.04% as shown in Table 5.
In all these three state-of-the-art machine learning models as base estimators of ensemble imbalance learning models, the percentage of correct classifcation    of MAP was quite low and cannot be used for the classifcation of MAPs as a standalone model. Te better among these three machine learning models was RF; however, still as a standalone classifer, it fails to show a promising result.

Performance Assessment of Ensemble Imbalanced
Learning Models with Diferent Base Estimators. Although we have employed state-of-the-art machine learning models for the prediction and classifcation of MAPs, however, we observed that with fned tuned LGBM, XGBoost, and RF, the   minority class MAP was poorly classifed. In order to improve the accuracy of the correct classifcation of minority class MAP, the SPE, balanced cascade, and easy ensemble models were used with LGBM, XGBoost, and RF as their base estimators. It is pertinent to mention that we have used those models as base estimators with their optimal hyperparameters, which we previously obtained via Bayesian optimization. First, LGBM, XGBoost, and RF were used as base estimators for the SPE framework, and then, LGM, XGBoost, and RF were used as base estimators for the balance cascade and fnally for the easy ensemble model. Figure 8 depicts the SPE, balance cascade, and easy ensemble models' confusion matrices; Table 6 pulls the performance indicators from these matrices. In the case of the SPE framework using XGBoost as the base estimator, it was found that 48 out of 61 instances could be correctly classifed, yielding a recall value of 79.69%. Te balanced accuracy, precision, recall, and F1-score, as well as the Gmean, were, respectively, 59.68%, 50.11%, 79.69%, 61.25%, and 78.14%. Tese results were superior to those of other models. Te next model in the sequence was the balance cascade model, which used XGBoost as the base estimator. It had a G-mean value of 77.31% and a balance accuracy of 59.22%. Te SPE model with LGBM as the base estimators performed the worst out of all the models, with a G-mean value of 70.02% and a balance accuracy of 49.33%.
As shown in Figures 9(a) and 9(b)), among three ensemble imbalance learning models with three machine learning base estimators, the best result was shown by the SPE framework with XGBoost as a base estimator, resulting in a G-mean of 78.14% and balanced accuracy of 59.68%. It was then followed by the balance cascade model with the XGboost model as a base estimator with a G-mean of 77.31% and balanced accuracy of 59.22%. Te G-mean value (74.71%) and balanced accuracy (54.22%) of balance cascade with RF as base estimators and G-mean value (74.15%) and balanced accuracy (53.71%) are ranked third and fourth, respectively. Te results of the top two models based on Gmean values and balanced accuracy are quite close to each other and can be considered as optimal models for the classifcation and prediction of MAPs. Furthermore, the best model has been in conjunction with SHAP analysis for interpretation to obtain signifcant factors as well as assessment of the interaction among risk factors.

Sensitivity Analysis.
Te formation of a precise MAP prediction model is crucial because more accurate models may more efectively describe the relationship between MAP and various environmental and situational factors. Te ability to decipher the outcomes of ensemble-imbalance learning models is just as vital. In order to interpret the results of the best models (SPE with XGBoost) and determine the efect of the particular risk factors and their interactions, the SHAP implementation is discussed.

Factor Importance and Contribution. SPE with
XGBoost as a base estimator was the optimal model for the prediction of MAPs based on its performance measures, and therefore, we used it to analyze the signifcance and value of the contributions of specifc factors. It is indeed important to note that the two concepts of "factor importance" and "factor contribution" are not synonymous. To what extent a given factor contributes to a model's accuracy is indicated by its importance. Te results can be rationally explained by the factors identifed through the factor contributions (MAP or AP). As shown in Figure 10(a), the SHAP global importance scores are applied to the factors in the SPE with the XGboost base estimator. Nonetheless, the result does not reveal how much each factor contributed to the overall probability of a MAP occurrence. It demonstrates that lowlevel wind shear magnitude, with a mean SHAP value of +0.210, is the most important factor causing the MAPs, followed by runway orientation, with a mean SHAP value of +0.160, and altitude of low-level wind shear, with a value of +0.110. Similarly, the SPE with XGboost was investigated further via a SHAP contribution evaluation utilizing SHAP beeswarm plots (Figure 10(b)). Using the SHAP contribution plots, we arrived at a quantitative value by combining the Shapely values and expressing them in terms of the SPE model's factor contributions. On the vertical axis, the input factors are listed in order of growing infuence, from most infuential to least. Using a horizontal axis for the SHAP value and a color scale from blue (low signifcance) to red (high signifcance), this graph displays the contribution of diferent factors.
Te SHAP beeswarm plot of the SPE with XGboost as base estimator illustrated that most of the tailwinds resulted in the initiation of MAPs. Te aircraft in this may not be able to touch down at the designated touchdown location. Te result is also consistent with the fnding of previous research conducted by [52,53] in which it was also observed that the strong tailwind encountered due downdraft of the thunderstorm and MAPs occurred. Te orientation of the runway was the second infuential factor. Runways 07C and 07R were more prone to MAP initiation. Some previous studies [54][55][56] employing numerical simulation and wind tunnel testing also observed that the southerly or southeasterly gusts of wind at HKIA are more likely to trigger low-level wind shear, which can have a signifcant impact on runways 07C and 07R. Terefore, approaching aircraft could experience a severe low-level wind shear and initiate MAP. Given that MAPs have become a safety concern, runways 07C and 07R should not be utilized for landings during low-level wind shear events.
Te third most crucial factor was the low-level wind shear V Location. According to Figure 11(b), low-level wind shear events that took place at lower altitudes were what led to the high number of MAPs. Te outcomes are also in line with wind tunnel tests of [57], which showed that about 55% of the severe low-level wind shear occurs below 600 feet and could be regarded as a critical zone for aircrafts on the fnal approach. Te combination of bad weather, complicated terrain, and nearby buildings would cause more turbulence along the glide path. Because of this, the cockpit crew is constantly in motion during the landing phase, and the captain and copilot must make a number of split-second decisions to complete the landing checklist. Te best course of action in the event of an unprecedented low-level wind shear that occurs very close to the runway is to abort the landing attempt and perform a go-around.
It is also pertinent to mention that a previous study [58] demonstrated that 300 ft above the ground might be the acceptable value for MAP unless no environmental factors are involved. However, in case of low-level wind shear condition in the runway proximity, initiating MAP at 300 ft above the ground might be very dangerous.

Factor Interaction.
Te factor importance and contribution (beeswarm) plot revealed no evident connection between the shift in factor value and the change in SHAP value. Figure 11    Advances in Meteorology 13 eigenvalues. By analyzing the interaction plots provided by SHAP, we were able to assess how much each input factor had an efect on the fnal score obtained from SPE with XGboost as a base estimator. Te impact of low-level wind shear V_Location and runway orientation on model predictions is shown in Figure 11(a). From V_Location of 0 to 500 ft, the points with high densities are those that are above the SHAP 0.00     Figure 12: Bootstrapping and aggregation in random forest.  Figure 13: Tree learning mechanism of XGBoost. reference line. Te majority of the points have blue labels that indicate Runway 07C and 07R. It demonstrated that the majority of the MAPs were seen on these runways at a lower altitude. As a result, caution should be used when landing on these runways at 500 feet above ground because low-level wind shear could occur. Te impact of low-level wind shear magnitude in relation to low-level wind shear V_Location, however, might also be of interest. Te relationship between low-level wind shear magnitude and low-level wind shear V Location is thus shown in Figure 11(b). All headwinds are shown on the horizontal axis to the right of reference point 0, and all tailwinds are shown to the left of reference point 0. Te graph shows that MAPs were typically started at low altitude and with a tailwind most of the time. Pilots attempt to abort the landing and proceed for a second attempt to reduce the risk of landing short of the runway because the recovery margin time in the event of low-level wind shear occurring at low altitude is limited. As a result, at HKIA, pilots are required to be more vigilant during strong tailwind situations.
Additionally, as shown in Figure 11(c), we evaluated the efects of the Season of the Year factor and runway orientation. Tere were more blue dots than red, which demonstrated that runways 07C and 07R are extremely susceptible to the occurrence of MAPs year-round. However, the majority of the MAPs were seen in the summer. Te pilots' fnal approach might have been afected by tropical cyclones and southern monsoon winds. Tese results are in line with earlier research fndings as well [59][60][61][62].  Input: Training data: D � (X k , Y k ) M 1 , loss function: L(Y, φ(X)), iterations: J, ratio of big gradient data sampling: α, ratio of slight gradient data sampling: β (1) Using FEB approach, combine the factors that are mutually exclusive the factors, i.e., they never concurrently accept nonzero values X k (2) Set φ 0 (X) � argmin c M k L(Y k , c), (3) for j � 1 to J do (4) Compute the absolute value of gradient λ k � |zL(Y k , φ(X k ))/zφ(X k )| φ(X)�φ j− 1 (X) , (5) Te data set is resampled by using GOSS approach D′ � A + B; (6) Te Information Gains (IG) are computed: V e (d) � 1/n[(( x k ∈ A l r k + 1 − α/β x k ∈ B 1 r k ) 2 /n e l (d)) + (( x k ∈ A r r k + 1 − α/β x k ∈ B r r k ) 2 /n e r (d))], (7) New decision tree is developed φ j (X) ′ on set D ′ (8) Update φ j (X)� φ j− 1 (X)+φ j (X), (9) End for (10) Return φ(X) � φ J (X),

Conclusions and Future Work
Tis study presents the application of three machine learning-ensemble imbalance learning techniques: SPE framework, balance cascade, and easy ensemble to the prediction of the occurrence of MAP events. For these three ensemble imbalance learning techniques, three state-ofthe-art machine learning models, LGBM, XGBoost, and RF were used as base estimators. To the best of our knowledge, this is the frst work to detect, model, and interpret MAPs occurrences from HKIA-based PIREPs data using ensemble imbalance learning techniques in conjunction with SHAP. Te MAPs were predicted under low-level wind shear conditions considering both environmental and situational factors using 2017 to 2021 low-level wind shear data from HKIA-based PIREPS. Initially, LGBM, XGBoost, and RF were evaluated separately to assess their performance in case of imbalance in low-level wind shear data. Afterwards, these models were employed as base estimators for SPE, balance cascade, and easy ensemble and performance measures were obtained. Regrettably, machine learning algorithms often receive criticism for being ambiguous and challenging to comprehend. However, the increased adaptability and quite often improved reliability of engineering domain modeling over more conventional predictive statistical methods do have an impact on their universal popularity. In this research, the posthoc SHAP interpretation tool was used to decipher the best-predictive ensemble imbalance learning model, and the impact of various factors on the likelihood of a MAP event occurring was demonstrated. From the study, the following conclusions can be drawn: (i) On the testing dataset, all three machine learning models, LGBM, XGBoost, and RF, even with welltuned hyperparameters showed a poor performance in predicting MAPs under low-level wind shear condition with G-mean value of 65.19%, 60.541%, and 66.04%, respectively. (ii) Te performance of each individual machine learning varied marginally. Te G-mean of LGBM was relatively 7.33% higher than XGBoost. Similarly, G-mean of RF was 1.28% higher than LGBM and 8.5% higher than XGboost. (iii) Te SPE framework in conjunction with XGboost model as a base estimators performed best among all with the G-mean value of 78.14% and balance accuracy of 61.25%. It was then followed by balance cascade model with XGboost as a base estimator with Gmean value of 77.31% and balance accuracy of 59.22%. (iv) SHAP demonstrated efcacy in interpreting the optimal model's outcome (SPE with XGBoost as base estimator). Te low-level wind shear magnitude was the most infuential factor, followed by runway orientation and low-level wind shear V_Location. (v) Most of the MAPs were observed during strong tailwind situation. Runway 07C and Runway 07R were observed to high highly vulnerable to MAPs.
(vi) Similarly, most of the MAPs were initiated within a ceiling of 500ft. In case of severe low-level wind shear, MAPs at altitudes as low as 300 ft might be very dangerous to recover. However, in case of calm weather conditions, 300 ft might be suitable for initiating a go-around protocol.
Tis proposed method could be used to examine MAPs on a large scale at other airports around the world. It serves as a benchmark for aviation authorities and intellectuals intrigued by air safety. Given that it is crucial for aviation and meteorological applications to comprehend the intricate interactions between several risk aspects that infuence the occurrence of MAPs, researchers focusing on civil aviation safety should take advantage of this opportunity. Besides that, this article only addressed the issue of predicting MAPs under low-level wind shear conditions, taking contextspecifc and environmental factors into account. Additional research could be carried out by combining a number of diferent machine learning or deep learning approaches with a wide variety of other potential risk factors.