Improving the Landslide Susceptibility Prediction Accuracy by Using Genetic Algorithm Optimized Machine Learning Approach

,


Introduction
Instability can form on slopes during the construction of mines, water conservancy actions, and construction projects, which can have substantial catastrophic consequences [1,2].However, there are few efective and accurate tools for prediction and prevention [3,4].Accurate prediction of landslide susceptibility provides a basis for the safe operation of projects and helps to determine the stability state of slopes, prevent risk hazards and provide safety management solutions.Terefore, constructing a reliable and efcient landslide susceptibility prediction model is of great theoretical and practical signifcance.
Te current research broadly classifes landslide susceptibility prediction models into two categories: traditional and machine learning.Traditional models mainly include analytical methods [5,6], numerical simulations [7,8], and others.Tese evaluate landslide susceptibility based on the corresponding mechanical theories (ex., elastoplasticity theory and viscoelastic theory).However, the above methods have several limitations.Te limit equilibrium method is convenient and straightforward.However, the theoretical basis is imperfect and requires the assumption of a slip surface, which does not refect the actual stress conditions of the slip surface, resulting in reduced accuracy because of the simplifed assumptions [9].Numerical methods are usually time-consuming, and their accuracy depends heavily on assessing the geotechnical and physical parameters [10,11].As a nonlinear, dynamic, and complex openloop system, slopes have many of risk factors that are difcult to assess.Tis leads to applying traditional models with signifcant limitations to achieve the desired prediction accuracy and efectiveness.
Conventional landslide susceptibility analysis methods are complex, iterative, and overload the computational system [12].Tis has inspired researchers to look for alternative methods for calculating landslide susceptibility.Soft computing techniques can solve highly complex, nonlinear, multivariate problems [13].Machine learning algorithms such as neural networks and support vector machines have also been used to solve landslide susceptibility problems [14,15].Te increased use of machine learning algorithms has promoted the crossover, integration, and development of such tools with slope engineering problems, providing new ideas and methods for landslide susceptibility prediction problems [16].Qi and Tang [17] proposed and compared six integrated artifcial intelligence (AI) methods for landslide susceptibility prediction based on metaheuristics and machine learning algorithms and demonstrated that integrated AI methods have great potential in predicting landslide susceptibility.Tien Bui et al. [18] employed machine learning-based techniques to predict the safety factor of slope failures, showing that the multilayer perceptron (MLP) outperformed other machine learningbased models.Chang et al. [19] investigated the performance of eight commonly used machine learning models for predicting slope safety coefcients.Parameter optimization and cross-validation combined historical slope data to establish a machine learning-based slope safety coefcient prediction system.Lin et al. [20] developed a machine learning (ML) model for landslide susceptibility evaluation and found that the performance and reliability of nonlinear regression methods were slightly better than linear regression methods.Huang et al. [21] used 369 recorded landslides and 13 associated conditional factors to study landslide areas.Tey compared analytic hierarchy process (AHP), general linear model (GLM), information value (IV), binary logistic regression (BLR), multilayer perceptron (MLP), BPNN, support vector machine (SVM), and C5.0 decision tree (C5.0DT) models for prediction.It is found that machine learning models are more suitable for landslide susceptibility prediction than the other two types of heuristic and general statistical models.Machine learning (ML) models based on remote sensing (RS) imagery and geographic information systems (GIS) have been widely and efectively implemented for landslide susceptibility prediction.Chang et al. [22] compare the landslide susceptibility prediction performance of these supervised machine learning (SML) and unsupervised machine learning (USML) models to further explore their strengths and weaknesses.Tey were able to achieve more accurate and reliable prediction results.In summary, machine learning algorithms have become a hot research topic in data mining and classifcation prediction in landslide susceptibility, but different prediction algorithms have their limitations [23,24].
Landslide susceptibility evaluation research focuses on high prediction accuracy, which requires algorithms to continuously fnd newer and more robust algorithms to build landslide susceptibility evaluation models for better prediction results [25].Terefore, it is necessary to fnd intelligent algorithms with high accuracy and better applicability.Integrated learning can train multiple algorithms, resulting in complementary advantages and better landslide susceptibility prediction results than a single algorithm.Both gradient boosting decision tree (GBDT) and eXtreme gradient boosting (Xgboost) are classifed as ensemble learning, which is an excellent engineering implementation and optimization improvement of random forest (RF).Te purpose of integrated learning is to improve a single learner's generalization ability and robustness by combining the prediction results of multiple base learners.Achour and Pourghasemi [26] found that the RF model achieved the highest prediction accuracy by comparing RF, support vector machine (SVM), and boosted regression tree (BRT) to assess the susceptibility of landslides near roads.Pham et al. [27] used 16 landslide condition factors to predict landslide susceptibility, which was found to be more accurate with RF prediction capability after comparing traditional models with machine learning.Achour et al. [28] prepared an inventory map with 12 variables (including geomorphological, geological, hydrological, and environmental factors) to predict landslide susceptibility.Te RF and Xgboost models had the same prediction accuracy (AUC) and better prediction performance.Drid et al. [29] selected and evaluated eleven gully erosion condition factors to identify the areas most vulnerable to this hazard, and the results showed that the Xgboost model had the best predictive performance.Xgboost and GBDT have been widely used in various scenarios and achieved good results, but the single integrated learning model is afected by its parameters.In this paper, we explore the problem of algorithm accuracy based on the prediction of landslide susceptibility optimized by heuristic algorithms.
Tis study investigates the feasibility of GA-GBDT and GA-Xgboost algorithms with numerous machine-learning algorithms for landslide susceptibility prediction.Firstly, the collected data are described and analyzed.Ten the principle process of GA-optimized GBDT and Xgboost algorithms and the accuracy evaluation criteria of prediction models are introduced.Te performance of diferent prediction models under the same data is compared and analyzed by calculating various prediction model evaluation indexes and receiver operating characteristic curve (ROC) curve quantitative tests to explore the feasibility of the method in this paper.

Dataset and Predictor
Variables.Te database includes 290 cases (156 and 134 for stable slopes and failed slopes, respectively) derived from the information in Feng et al. [30] and Zhou et al. [31].Te database contains the basic geometric slope design parameters, such as slope height (H), slope angle (β), unit weight (c), cohesion (c), friction angle (φ), and pore water pressure coefcient (ru).Te external 2 International Journal of Intelligent Systems trigger considered in this study is the pore water pressure (ru), defned as the pore water pressure to overburden pressure (Michalowski, 1995; Kim et al., 1999) [6].Te six parameters chosen are strongly correlated with the geometry and geotechnical properties of the slope and have diferent degrees of infuence on slope stability.Slope instability is recorded as 0, and stability is recorded as 1. Figure 1 shows the violin plots of the six infuencing factors for each of the 290 historical slope cases under the failure and stability scenarios.Te advantage of the violin diagram is that it can be more intuitive to visualize the distribution of diferent infuencing factors when the slope is stabilized or damaged.Te violin plot combines a box-line plot and a density plot to show the data dispersion statistics for each group and to provide the density of data distribution.Wider violin curves correspond to higher densities and represent areas of concentrated distribution.Here, the data are widely distributed.However, the distribution of variables is asymmetric, and the data distribution of diferent infuencing factors under stable and unstable conditions is broad.From basic plots such as Figure 1, it is impossible to visually distinguish the essential parameters afecting landslide susceptibility.

Principal Component Analysis.
Principal component analysis (PCA) was used to further determine the infuence of the six basic characteristics study on landslide susceptibility.First, PCA investigated the relevant contribution of each factor to landslide susceptibility, summarizing and visualizing the collected landslide susceptibility data for interpreting its variance-covariance structure.PCA also assessed the database to ensure a representative dataset.
As shown in Figure 2, slope height has the highest contribution to PC1 at 41.983%, and pore water pressure has the lowest contribution to PC1 at 5.992%.PCA allows for the visualization of the classifcation function of the slope dataset in a two-dimensional plane.Tere are overlapping domains between the two types of landslide susceptibility from the two-dimensional space.In addition, some indicators with signifcant skewness can impact the prediction model.According to the PCA results (shown in Figure 2(a)), the components of the frst and second dimensions are visualized (Figure 2(b)).Te data distribution areas for the two types of slope states on the frst two components are relatively close with overlapping areas.
A comprehensive analysis of the statistical characteristics of the six infuencing factors of the slopes shows that each characteristic presents a diferent distribution, and the span and density of the values are relatively large.Tis indicates that the database contains slopes of diferent heights, slope angles, lithologies, and types.Table 1 shows the variability of the efect of diferent infuencing factors on slope stability.By using completely diferent types of slopes and using their common characteristics for slope instability prediction, the powerful ability of machine learning algorithms to handle nonlinear data can be better refected.

GBDT Model.
GBDT is one of the boosting integrated learning algorithms, often used for classifcation and regression problems.Rather than simply adjusting the weights of weak learners, GBDT reduces the residuals after each computation by building a new model in the direction of gradient descent of the residuals [32].Te GBDT model inherits the advantages of statistical models and artifcial intelligence methods, using the advantage of calculating the relative importance between variables while identifying complex nonlinear relationships [33].Because of the complex nonlinear relationship of a slope system, this paper selected GBDT as the core model to study the landslide susceptibility judgment problem.Te steps of the GBDT training model are as follows: Step 1: Initialize the model function with the side slope training set and loss function: Each i th record of the training slope dataset is of the form y i , x → i  , where (y, x → ) are known and refer to the slope features (e.g., unite weight, slope height, etc.).Te aim is to predict the value of y (landslide susceptibility) based on x → .Hence, a mapping F * ( x → ): x → needs to be identifed such that the expected value of the loss function ϑ(y, F( x → )) is minimized as given in the following equation: ) is a base learner with its parameters given by a Every iteration attempts to fnd a better ft for the expansion coefcients β m   M 0 and the parameters of the base function a m   M 0 to achieve a better prediction.In the beginning, the training is initialized by guessing F 0 ( x → ).
Step 2: For m � 1, 2, • • • , M, M regression trees are generated iteratively: International Journal of Intelligent Systems  ) is ft by minimizing the k-class multinomial negative log-likelihood.Te predicted value of output for x → i at the mth iteration is given in the following equation: Te base learner τ( x → ; a → ) is a decision tree where, in each iteration m, the tree segments the input slope feature x → space into Z-disjoint regions R zm   Z z�1 and predicts a separate constant value in each one as described in the following equation: ... zm is the majority class predicted in each region R zm , i.e., the majority of the points in the R zm region are predicted to belong to this class.It can also be considered as the class with the highest probability to be predicted in that region, i.e., y ... zm � argmax  y zm  p m, y zm .In decision trees, the parameters are the features/attributes/variables being split at node and the specifc value at which the chosen variable is split.Tese two parameters defne the region R zm   Z 1 of the partitions at the mth iteration.

International Journal of Intelligent Systems
Because the decision tree produces a constant value y ... zm within each region R zm , the expansion coefcients and base learner function's value can be reduced to the following equation: Te current approximation F m−1 ( x → ) is updated in each region as depicted in the following equation: Te shrinkage parameter 0 < t ≤ 1 controls the learning rate of the procedure [32,34].

Xgboost Model.
Xgboost is widely used in major machine-learning puzzle tasks and is ftting for landslide susceptibility prediction.Chen and Guestrin [35] added a regularization term to the loss function on Xgboost to control the complexity of the trees and prevent overftting.Te regularization term is represented as follows: where x i is the ith sample data of input;  y i is the model prediction value of the ith sample; K is the number of trees; F is the set space of trees; and f k is a function in set space F.
Te objective function in the formula consists of two parts.Te frst calculates the error between the predicted value  y i and the true value y i .Te second is the regularization term, which represents the sum of the complexity of each tree: For the tth round loss function 1) and the loss function can be calculated for the accumulation of every leaf node as follows: where Ij represents the samples in leaf node j.
3.4.Hyperparameter Tuning.Classical machine learning prediction algorithms are more sensitive to hyperparameters, which are crucial for building high-accuracy prediction models for landslide susceptibility.To fnd the optimal hyperparameters, particle swarm optimization [36], genetic algorithm (GA) [37], artifcial bee colony ABC [24], grid search [38], and frefy algorithm (FA) [39] have been adopted, amongst others.GBDT and Xgboost have many parameters that are tedious to adjust.Furthermore, each parameter can signifcantly impact the algorithm's prediction performance and needs to be optimized for tuning parameters.Based on GBDT and Xgboost integrated with multiple decision trees, the global search ability and fexibility of GA are used to compensate for the defects of the GBDTand Xgboost model, including the abundant tuning of parameters, slow convergence, and easily falling into local optimum.
3.5.GA Feature Selection.GA is a class of randomized search algorithms that draws on natural selection and natural genetic mechanisms in the biological world [40].After each iteration, the termination criterion is checked to see if convergence has been reached or if the maximum number of iterations allowed has been completed.Convergence occurs when all chromosomes in the population have reached the same ftness level.Tis indicates that an optimal set of characteristics has been determined, and the process can be terminated.However, because this condition may not always be satisfed, a limit is placed on the number of iterations (the maximum number of iterations � 50).Tus, the algorithm is ended if convergence is achieved before the completion of 50 iterations.Otherwise, the maximum number of generations is executed.If neither of these termination criteria is met, the next chromosome population is generated from the previous population by applying tournament selection, mutation, and crossover operations.Figure 3 shows the principle and optimization fowchart of the GA.
3.6.Evaluation Criterion.Tis work evaluated the predictive performance of classifcation algorithms for landslide data using the area under the operating characteristic curve (AUC).Te receiver operating characteristic curve (ROC) curve plots the relationship between sensitivity and specifcity.It evaluates the performance of diferent classifers.

Comparison of Model Performance after GA Optimization.
Height (H), cohesion (c), slope angle (β), unit weight (c), friction angle (φ), and pore water pressure (ru) [10] 3 shows the optimal values of the parameters after optimization by the GA algorithm.Te n _ estimator is a numerical parameter with a default value of 100, which specifes the number of weak classifers.Te max depth is numeric with a default value of 3, which is a parameter related to pruning.Te learning rate is numeric, with a default value of 0.1, and is commonly used to specify a learning rate.Te random state is a random seed that controls random mode as a parameter in any random class or function.`Table 4 shows the number of accurate predictions and recalls of diferent models for predicting landslide susceptibility and instability conditions.Figure 6 shows the AUC line graphs of diferent models before and after optimization.After comparison, before the optimization of the GA algorithm, both the GBDT and Xgboost algorithms have the same accuracy of 93% and recall of 85.2%, but after the optimization, both the model performance and accuracy improved.Overall, GA-GBDT has better accuracy and model performance.

Multivariate Statistical Logistic Regression Prediction
Model.Multiple logistic regression is simpler and more convenient for analyzing multifactor models, and it can accurately measure the degree of correlation and ft between each factor [43]. Te landslide data are dichotomous, so binary logistic regression was used to further investigate the efect of six factors on the slope.First, the overall validity of the model was analyzed.Table 5 shows that the original hypothesis is that the quality of the model is the same in both cases, whether the independent variables (slope height, slope angle, unit weight, cohesion, friction angle, and pore water pressure) are inputs or not.Te p value is less than 0.05, which indicates that the original hypothesis is rejected, i.e.,  Slope height, slope angle, unit weight, cohesion, friction angle, and pore water pressure were used as independent variables, while stability was a dependent variable for binary logit regression analysis.Table 6 shows that slope height, slope angle, unit weight, cohesion, friction angle, and pore water pressure can explain the 0.22 variation in stability.Te ftted model equation is where p represents the probability that stability is 1 and 1 − p represents the probability that stability is 0. Based on the results of multiple logistic regression, the AUC and recall are 0.824 and 77.8%, respectively, which have lower accuracy and performance compared with GA-GBDT and GA-Xgboost.0.5 0.6 0.7 0.8 0.9 1.00.5 0.6 0.7 0.8 0.9 1.00.5 0.6 0.7 0.8 0.9 1.00.5 0.6 0.7 0.8 0.9 1.0 Subsample        International Journal of Intelligent Systems (z� 0.529,p� 0.597 > 0.05), implying that cohesion has little infuence on stability.Te regression coefcient value of the friction angle is 0.086 and shows a signifcance at 0.01 level (z� 3.806,p ≤ 0.01), implying that the friction angle signifcantly infuences stability.Te dominance ratio (OR value) is 1.090, implying that stability increases by 1.090x when the friction angle increases by one unit.Te regression coefcient value of the pore water pressure is 0.370, but it does not show signifcance (z� 0.444,p� 0.657 > 0.05), implying that the pore water ratio does not have a signifcant infuence on stability (z� 0.444,p� 0.657 > 0.05).
Overall, according to the results in Table 7, unit weight and friction angle signifcantly positively afect stability.Te slope angle has a negative efect on stability.However, slope height, cohesion, and pore water ratio have less infuence on slope stability.

Comparison of Statistical-Based Models with Multiple
Machine Learning Algorithm Models.To compare the performance of diferent classifcation algorithms, SVM, logistic regression (LR), GBDT, K-nearest neighbor (KNN), RF, XGboost, Naive Bayes model (GaussianNB), GA-Xgboost, and GA-GBDT methods were used for landslide susceptibility prediction.Te prediction results of the models are shown in Figure 7. Te AUC of SVM is 0.746, LR is 0.824, GBDT is 0.894, KNN is 0.817, RF is 0.894, Xgboost is 0.910, GaussianNB is 0.824, GA-Xgboost is 0.928, and GA-GBDT is 0.933.Table 8 shows the recall of SVM, LR, KNN, RF, and GaussianNB.It is worth noting that the AUC determines classifer performance, with 1.0 representing an ideal performance.Te ROC curve of the GA optimization model is closer to the left and upper axes than the other models.Te AUC value of GA-GBDT is highest at 0.933, slightly higher than RF and GA-Xgboost, and signifcantly higher than SVM, LR, GaussianNB, GBDT, and Xgboost.
Figure 8 is the radar chart of AUC and recall distribution, which visually shows the prediction efects of different algorithm models.Te ensemble learning algorithm is more suitable for landslide susceptibility prediction than the other algorithms, while the prediction performance of SVM is the lowest.Using GA to optimize Xgboost and GBDT signifcantly improves the AUC curve compared with the original model, indicating that the algorithm accuracy strongly correlates with the parameters.However, the GA-GBDT model has the best model accuracy and performance.Te test results also show that the GA-GBDT model is more suitable for predicting landslide susceptibility.
Diferent modeling approaches can lead to diferent results.An evaluation of the predictive performance of numerous machine learning models shows that most models are more accurate than the traditional statistical models used for landslide susceptibility modeling.Machine learning methods can automatically identify the hidden complex relationships between valid variables.Te results of this study are more applicable to the integrated learning algorithm for landslide susceptibility prediction when compared with the results of recent studies [27][28][29] and that the accuracy of the model is improved after the optimization of the heuristic algorithm.Te results of the integrated algorithm model difer from other studies in diferent country regions of the world, widely showing excellent landslide susceptibility predictions (AUC > 0.8).It is worth noting that different geological conditions and regional infuences have  International Journal of Intelligent Systems important efects on the accuracy of the model prediction results.Terefore, selecting a rich and high-quality landslide dataset can increase the actual predictive power of the model.

Analysis of the Importance of Infuencing Factors.
It is crucial to determine the sensitivity of factors afecting landslide susceptibility to evaluate the landslide susceptibility and the design of support structures.Te GA-GBDT algorithm has a good feature identifcation function and can output the strength of diferent parameters on landslide susceptibility.Figure 9 shows the importance of the six infuencing factors ranked by the GA-GBDT model.Tis study used relative importance scores to investigate the sensitivity based on the best prediction results (GA-GBDT).Te method was selected based on the superior performance during the test setup.Te results of the GA-GBDT feature selection were displayed to obtain the variable importance ranking.Figure 7 shows the normalized scores for the importance of the variables.Unit weight (score � 0.4593), friction angle (score � 0.1245), and cohesion (score � 0.1237) are the most sensitive factors for landslide susceptibility, which indicates the importance of the slope material variables.Terefore, the values of material unit weight, friction angle, and cohesion in artifcial slopes must be selected reasonably and accurately based on specifc indoor and feld tests.Te geological material cohesion and friction angle   International Journal of Intelligent Systems should be increased when assessing a slope for landslide potential.Te importance scores of slope angle and height are 0.1161 and 0.1026, respectively, indicating that geometric variables also afect landslide susceptibility.Optimizing these two variables in the actual design is a feasible approach to ensure the stability.Finally, pore water pressure (ru) (0.0737) has the lowest sensitivity.

Variability in Model
Performance.For landslide susceptibility prediction results compared with other models, the integrated learning models GBDT, RF, and Xgboost have higher accuracy than statistical models and some machine learning models such as SVM and GaussianNB.Te models optimized by the heuristic algorithm GA had increased prediction accuracy.Pham et al. [27] strongly recommend implementing the chosen heuristic along with the machine learning model rather than directly applying only the machine learning model or its integrated form.It is also essential to discuss the strengths and weaknesses of the applied models, as diferent models have their strengths and weaknesses.Usually, model performance depends on diferent research areas and related infuencing factors [44].In this study, GA-Xgboost and GA-GBDT have higher accuracy than machine learning models (ex., Xgboost and GaussianNB).GA has the advantages of fast random search capability and scalability with free attention to problem domains and is easy to combine with other algorithms, but the potential power of the algorithm's parallel mechanism is not fully exploited [45].Tere are several other drawbacks as well, such as the dependence on ftness function.Tis is new and current research in GAs [46].RF is highly tolerant to outliers and noise and can handle multidimensional data without overftting, yet many trees may slow down the algorithm and prevent realtime prediction (Arabameri et al., 2019).KNN only needs to save training samples and tokens without estimating parameters and training.However, the categories of the new samples are biased towards the category with the dominant number in the training sample when the samples are unbalanced, which can easily lead to prediction errors [47].Te GaussianNB model can achieve high prediction accuracy in learning data with missing conditions [30].It assumes the independence of attributes among data, which leads to difculties in practical applications and low efciency of model classifcation when facing more data attributes or stronger correlations among data [48].Te Xgboost method supports linear classifer and CART classifer, which can better prevent overftting and reduce model complexity despite lacking smoothness [49].Although SVM is a classical small-sample learning algorithm, which can reach a high level of learning accuracy with a small amount of data, its accuracy and computational speed both decrease when facing multidimensional variables and large data [50].Te performance of GBDT is a step up from RF, so its advantages are also obvious.It is fexible to handle various data types and has high prediction accuracy with relatively less tuning time.Because it is boosting, there is a serial relationship before the base learner, which makes it challenging to train data in parallel [51].Terefore, it is necessary to apply the selected models to diferent datasets and scenarios of landslide susceptibility problems for training and comparative analysis concerning the merits and performance of each model.

Challenges in Landslide Susceptibility Assessment.
With the deepening of research, many researchers have proposed optimization models with high prediction accuracy and strong generalization ability [52].Tese techniques have a very strong self-learning ability and can process large amounts of data efciently and accurately.In addition, these techniques are mainly used in landslide susceptibility evaluation to help make decisions on risk reduction [53].However, there are still some problems and challenges to be solved in the future.(  International Journal of Intelligent Systems infuencing factors when using machine learning techniques for landslide susceptibility modeling.In most of the current research focuses on geotechnical properties, slope, height, and so on.But in fact factors not covered often have signifcant impacts, such as climate and environmental changes.(2) Since machine learning is based on data-driven search for inherent hidden relationships as a way to achieve the purpose of landslide geohazard prediction interacting with the natural world.If the prediction process has a better machine learning model but is not able to obtain higherquality data, it will not be able to obtain better prediction results.(3) Based on the data-driven machine learning prediction model is difcult to explain the mechanism of landslide occurrence, the model applicability is poor, which also afects the accuracy of the model prediction, so how to improve the interpretability of the machine learning algorithm is also an important research direction in the future.

Conclusions
Tis work successfully used GA-GBDT and GA-Xgboost methods to study landslide susceptibility prediction using 290 historical case records of slope conditions.Te variability of model performance with diferent parameters was checked, and GA was used to optimize the four parameters of Xgboost and GBDT, which are n_estimators learning_ rate max_depth random_state.
GA-GBDT and GA-Xgboost models were compared with SVM, LR, KNN, RF, GaussianNB, Xgboost, and GBDT to check the ft.It was found that Xgboost and GBDT optimized using GA obtained classifcation models with the highest AUC of 0.928 and 0.933, respectively.Relative variable importance analysis showed that the geometric slope design parameters (unit weight, friction angle, and cohesion) signifcantly afected landslide susceptibility.Te results suggest that the GA-GBDT and GA-Xgboost can explore the nonlinear relationship between landslide susceptibility and its infuencing factors.
Based on the database of this paper, the results of logistic regression to study the infuence of diferent variables show that unit weight and friction angle signifcantly positively infuences stability, slope angle has a signifcant negative infuence on stability.However, slope height (m), cohesion, and pore water pressure have less infuence on stability.Te intrinsic efects were further explored.
However, although the analysis results are impressive and encouraging, there are still some outstanding issues.Te impact of data imbalance on landslide susceptibility prediction will be discussed in future research.Te developed model can be improved by analyzing more extensive datasets, and its applicability to other mining and geotechnical damage problems can be recommended when data are available.

3. 1 .
Ensemble Learning Algorithm.Te ensemble learning algorithm, commonly known as GBDT or Xgboost, is implemented by changing the data distribution to determine the weights of each sample based on whether a sample is correctly classifed in each training set and the accuracy of the last overall classifcation.Te new dataset with modifed weights is sent to the lower classifer for training.Te classifers obtained from each training are fnally combined as the fnal decision classifer.

Figure 1 :
Figure 1: Violin diagram of data distribution of infuencing factors.

Figure 4 :
Figure 4: Performance of the Xgboost model with the diferent hyperparameters.

Figure 5 :
Figure 5: Performance of the GBDT model with the diferent hyperparameters.

Table 1 :
Variance explanation rate table.
is a two-dimensional plot of the false positive rate (FPR) (1-specifcity) versus the true positive rate (TPR or sensitivity) on the horizontal and vertical axes.It is a quantitative metric based on which to assess the model's overall performance.Te AUC represents the area under the ROC curve and is mainly used to measure the model's generalization performance, namely, the good or bad clas- [41,42]on efect.It can quantitatively compare good or bad models.Diferent AUC values refect diferent classifcation efects[41,42]: 0.900-1.00representsoutstanding performance, 0.800-0.900representsgood performance, and 0.700-0.800representsaverage performance.Te destructive consequences of slope instability are more serious, so accuracy recall � TP/(TP + FN) is introduced to evaluate 6International Journal of Intelligent Systems the model performance.When slope instability actually occurs, the model prediction is not wrong.Such a classifer is optimal, and the classifer's recall must be as high as possible under the premise of a certain correct rate.

Table 2 :
Hyperparametric search of spatial tables.
International Journal of Intelligent Systems slope angle can signifcantly negatively infuence stability.Te regression coefcient value of unit weight was 0.202 and shows signifcance at 0.01 level (z� 3.849,p ≤ 0.01), which indicates that the unit weight will signifcantly positively infuences stability.Furthermore, the dominance ratio (OR value) is 1.224, indicating stability increases 1.224x when the unit weight increases by one.Te regression coefcient value of cohesion is 0.002, but it does not show signifcance

Table 4 :
Confusion matrix and recall before and after optimization of GBDT and Xgboost.

Table 5 :
Likelihood ratio test results of binary logit regression model.

Table 6 :
Summary of results of binary logit regression analysis.

Table 7 :
Results of binary logit regression analysis-simplifed format.

Table 8 :
Confusion matrix and recall for the prediction model based on 290 slope cases.
1) Incomplete consideration ofFigure 8: Radar plot of AUC vs recall distribution for diferent models.