Hybridization of Machine Learning Algorithms and an Empirical Regression Model for Predicting Debris-Flow-Endangered Areas

,


Introduction
Debris flow is a common geological hazard in the mountainous areas, characterized by the rapid movements of saturated soil, rocks, and organic debris down steep mountain channels or slopes [1][2][3][4][5][6].Debris flows can cause serious damages to the local residents, buildings, and infrastructures on the depositional area [7][8][9][10][11][12][13]. Therefore, it is crucial to delineate accurate debris-flow-endangered areas in order to provide a practical guidance for the local authority in debris-flow-hazard assessment and control measure designs.
The maximum runout distance, L f , on a depositional fan is one of the most important parameters for delineating the debris-flow-endangered areas [14][15][16][17].Currently, statistical methods have been proved to be a simple and effective tool for predicting the maximum runout distance of debris flow [18,19].The statistical methods generally relate the endangered area with geometric-morphological factors and the debris-flow volume through mathematical models based on the field investigation data of historical debris flows [20][21][22][23].Both empirical regression analysis and machine-learning algorithms are widely employed to develop the mathematical models, such as univariate or multivariate nonlinear regression [16,17], multivariate adaptive regression splines (MARS) [20], and support vector machine (SVM) [24][25][26][27].The empirical models are generally developed by nonlinear regression analysis, where the underlying regression functions are preassumed by the experienced engineers or researchers.However, the assumed form of empirical models may not objectively capture the complex nonlinear relationship between the debris-flow runout distance and its influencing factors.Compared with the empirical models, machine-learning models have the powerful capacity in capturing the internal complex patterns from data and providing a rational forecasting as a guidance [28][29][30].Machine-learning algorithms generally require sufficient data to guarantee their accuracy.Insufficient sample size of training datasets may affect the accuracy and generalization ability of the machinelearning models.However, the number of site-specific investigation data of a given debris basin is usually very limited.For example, Xu et al. [27] applied only 53 shallow landslides and 22 loess-bedrock landslides in the Heifangtai terrace to construct and validate the optimal machine-learning algorithms for runout distance prediction of shallow and loess-bedrock landslides, respectively.When a very limited number of data are applied, the machine-learning models may be overfitting or underfitting [28].Thus, how to develop a robust model for accurate forecasting of debris-flow-endangered areas still remains a difficult task.
Different statistical models have their inherent mathematical theories and strengths.Currently, due to the ability to reasonably describe complex relationships, machine-learning models seem to have better accuracy than empirical models in debris-flow runout forecasting [20].However, the empirical models derived from the historical debris flows still can provide useful information for the preliminary assessment of endangered areas.The empirical models show the main influencing factors of debris-flow runout from the perspective of expert experiences, thereby reducing data dimensions and avoiding overfitting.Therefore, empirical statistical models can be regarded as supplementary information for machine-learning models.In this case, the limited investigation data are expanded to increase the amount of training data to improve the robustness and accuracy of the predictive model.However, hybridizing the machine-learning algorithms and empirical models is scarcely seen in debris-flow runout prediction.
This paper proposes a hybrid method for predicting the endangered areas of debris flow by integrating machinelearning algorithms and an empirical regression model.Firstly, the collected investigation data is randomly divided into training and testing stages by k-fold cross-validation.Three commonly used machine-learning models (i.e., MARS, RF, and SVM, see the Appendix) and a nonlinear regression model (NLRM) are developed based on the randomly selected training data, respectively.Then, the calculated maximum runout distance by the empirical model is considered as supplementary inputs and hybridized with the machine-learning models to generate a hybrid training model.Finally, the performance metrics (i.e., goodness-offit, root-mean-square error (RMSE), and mean absolute error (MAE)) of the hybrid models are comprehensively investigated and compared with the single machine-learning model and the empirical model under k-fold cross-validation.134 datasets of channelized debris-flow events in Wenchuan earthquake zone are used to illustrate the applicability and reliability of the proposed approach.
The rest of this paper is organized as follows: Section 2 introduces the databases and data preparation.Section 3 elaborates the methodology.Section 4 presents the results and discussion, following by discussion on limitations of the paper, and Section 5 concludes the paper.

Databases and Data Preparation
2.1.Study Area.This study reanalyzed 134 datasets of channelized debris-flow events in Sichuan province, China [17].The 134 debris flows occurred along the Yingxiu-Beichuan fault zone (e.g., Beichuan, Qingping, and Longchi area) between 2008 and 2012 [17].The Yingxiu-Beichuan fault triggered the 8-magnitude Wenchuan earthquake and caused a number of landslides in Sichuan province.Loose landslide debris deposited on the slopes or channels were easily carried downstream to form debris flows by the torrential rain, resulting in serious damages to the local people on the depositional fan [12].Previous studies in the study area show that the maximum runout distance of debris flow on a fan is mainly affected by the catchment internal relief (H), and the debris-flow volume (V D ) [17].Therefore, these two parameters are taken as input variables to predict the endangered area of debris flow in the study area.More details can be referred to Zhou et al. [17].Figure 1 shows the histograms of the maximum runout distance (L f ), and input variables (i.e., H and V D ) as well as their mean values and standard deviations.It can be seen that the distribution of these three parameters approximates a lognormal distribution.Advances in Civil Engineering data.Then, the model's performance is evaluated on each validation dataset.After the k iterations, the performance metrics obtained from each validation dataset are averaged to provide a single evaluation metric for the model's performance.k-fold cross-validation ensures a predictive model is evaluated on different subsets of the data, providing a more comprehensive assessment of how well the model generalizes to unseen data.Previous studies show that a value of k = 5 (or 10) is very common in the application of machine-learning algorithms [31].By taking k = 5, the datasets are randomly divided into five equal parts, which are denoted as CR1, CR2, CR3, CR4, and CR5, respectively.80% and 20% of the total datasets are training and testing data, respectively (as shown in Figure 2).Among 134 datasets of debris flow in Wenchuan area, 107 sets of data are selected for training the models and 27 sets of data are selected to evaluate the performance of predictive models.Table 1 shows the ranges of H, V D , and L f for training and testing data under fivefold cross-validation.It can be seen that the data ranges of H, V D , and L f for the fivefold cross-validation are almost the same.The training data for the five different splits of CR1, CR2, CR3, CR4, and CR5 are then used to construct the machine-learning models and a NLRM.The testing data for the five different splits are used for validation.

Methodology
Various machine-learning models (e.g., C&RT, CHAID, boosting tree, MARS, RF, and SVM) can characterize the complex nonlinear relationship between input and output parameters [28][29][30].In this paper, three commonly used Advances in Civil Engineering machine-learning models (i.e., MARS, RF, and SVM) are employed.The procedure for developing the hybrid prediction model of the maximum runout distance of debris flow is shown in Figure 3. First, the three machine-learning models and a NLRM are developed independently.The hybrid models (e.g., MARS-NLRM, RF-NLRM, and SVM-NLRM) are then generated by combining the NLRM with the machinelearning models.The proposed method is briefly presented as follows.
3.1.Multi-Nonlinear Regression Empirical Model.According to the previous studies, the maximum runout distance of debris flow is usually related to geometric-morphological factors and the debris-flow volume through statistical regression analysis.The existing empirical models are typically exponential formulas [20], which can be expressed as follows: where a, b, and c are the unknown parameters of the empirical model.By applying multi-nonlinear regression method, the model parameters in Equation ( 1) can be obtained for the five different splits of CR1, CR2, CR3, CR4, and CR5.

Machine-Learning Models.
The core idea of machinelearning algorithms is to apply data-driven learning to construct models that can generalize well to the data, make accurate predictions, and improve their performance over time as they are exposed to more data.In this paper, three common machine-learning models are used to construct the training models, namely, MARS, RF, and SVM.

MARS Model.
MARS is a nonparametric regression analysis method that can be used for modeling multidimensional nonlinear problems [33].This method does not assume a specific functional relationship between the input variables and the output variable.Instead, it adaptively selects nodes to partition the training dataset into independent segments with

Total dataset
Training folds Testing folds  4 Advances in Civil Engineering different gradients [34,35].Each segment of MARS is called a basis function, and the endpoints of each segment are called nodes.The basis functions are generated by MARS through a stepwise search, and the node positions are selected by the adaptive regression algorithm.By estimating the contribution of the basis functions, it allows determining the additive and interaction effects of the predictive variables.The MARS algorithm consists of two steps: forward selection and backward pruning.The forward selection process involves partitioning and fitting the sample data using spline functions to obtain new basis functions and a fitted model.As a basis function is added to the model space, interactions between basis functions that are already in the model are also considered.Basis functions are added until the model reaches some maximum specified number of terms leading to a purposely overfit model.The backward pruning process involves removing basis functions with smaller contributions to the model while maintaining the model's accuracy.The basis functions maintained in the final optimal model are selected from the set of all candidate basis functions used in the forward selection step.Model subsets are compared using the less computationally expensive method of generalized cross-validation (GCV).
Finally, the optimal model is selected as the regression model to avoid overfitting.
Considering n independent or input variables X = (x 1 , x 2 , …, x n ).The dependent or output variable, y can be estimated from a predefined function g(X) with a model error ε, which can be given by The predefined function g(X) can be approximated by linear combination of basis functions and their interactions.The MARS model of g(X) can be expressed as follows: where β 0 is a constant; M is the number of basis functions; ω j X ð Þ is the j-th basis function; β j is the coefficient of the j-th basis function.The coefficients β 0 , β 1 ,…, β M are determined by the least square method.The basis function ω j X ð Þ can be characterized by a piecewise linear function, which is written as follows: where t is the value of knot.
The construction of MARS model is an adaptive process where basis functions and knots are all entirely "driven" from the training data.To obtain the MARS model in Equation ( 3), the forward building procedure is performed on the training data.Basis functions that produce the largest decrease in the training model error are added until the predefined maximum number of terms is reached.This procedure can easily lead to an overfitting MARS model.Subsequently, the backward procedure prunes extraneous variables and basis functions with the least contributions based on the GCV method.The GCV index is an indicator that penalizes the complexity of large numbers of basis functions in the MARS model in order to reduce overfitting problems.With the N observations of training data, GCV for a model can be obtained by: where c 0 is the penalizing parameter; g(x i ) is the predicted values using the MARS model in Equation ( 3).The penalizing parameter c 0 is set as a default value of three according to Friedman [33].Based on the GCV index, the MARS model with the minimum value of GCV is selected as the optimal MARS model.In addition, it should be noted that the maximum number of basis functions needs to be predefined.The optimal preassumed M can be determined by comparing the evaluation metrics of MARS model (e.g., RMSE) with different preset values.The one with the lowest RMSE is considered as the optimal predefined maximum number of basis functions for the final MARS model.

RF Model.
RF is a typical ensemble learning method based on classification and regression trees (CART) [36].RF creates a collection of decision trees and combines their predictions through averaging to make the final prediction.Decision trees can be divided into classification trees and regression trees.Because the primary objective of this research is to predict the endangered area of debris flow, only the regression tree is discussed in this section.RF regression uses the random sampling with bootstrap resampling to extract multiple samples for the original sample, model the regression tree for each bootstrap sample and average the predictions of multiple decision trees to make the final forecasting [37,38].
Considering X = (x 1 , x 2 , …, x n ) is an n-dimension input vector that forms a forest.RF consists of a set of K trees {y 1 (X), y 2 (X),…, y K (X)}.The ensemble produces K outputs corresponding to each tree y k (k = 1, 2,…, K).The modeling procedure of RF is as follows: draw a bootstrap sample from the original data set.For each bootstrap sample, a total of two-thirds of the sample of the new training sample is utilized for deriving the regression function, and the remaining one-third constitutes the out-of-bag (OOB) sample.Each time, a regression tree is constructed using a randomized training sample drawn from the original data set.The OOB sample is utilized to validate accuracy.After the predictions of the k regression trees are collected, a regression model sequence {y 1 (X), y 2 (X),…, y K (X)} is obtained.Then, the final prediction is obtained by calculating an average of all tree predictions, which is given by: Advances in Civil Engineering where f(x) represents the combined regression model, y k represents an individual decision tree regression model, and K is the number of regression trees.The number of regression trees at RF model structure is a critical hyperparameter, which can be determined by Grid search method.Grid search is a hyperparameter tuning technique used in machine-learning to systematically search through a predefined hyperparameter space for the optimal combination that maximizes or minimizes a chosen evaluation metric.It involves exhaustively exploring various combinations of hyperparameters to identify the one that yields the best performance.In this work, the optimal value for the number of regression trees is selected using the Grid search method.
3.2.3.SVM Model.SVM is a powerful supervised machinelearning algorithm used for both classification and regression [39].Based on the statistical learning theory and structural risk minimization principle, SVM optimizes a tradeoff between the complexity and learning ability of the model to obtain best generalization ability according to limited sample information.Its basic idea is to map input vectors into a high-dimension feature space via a kernel function and construct an optimal separating hyperplane that best separates or fits the data and maximizes the margin between the different classes or regression targets [40,41].The hyperplane effectively separates the data points and is supered by a small subset of critical data points, which is called support vectors.It can effectively handle high-dimensional data and nonlinearly separable problems by using the kernel trick to transform the data into a higher dimensional space.
Given a set of training data, (x i , y i ), i = 1, 2,…, l, x i 2R n , y i 2R, a support vector machine for regression is to obtain a function in the following form: where ω and β are the parameters to be determined from the training set; ϕ x ð Þ represents a high-dimensional feature space that is nonlinearly mapped from the low dimensional space x. ω and β can be determined by minimizing the regularized risk function, which are defined as follows: where ω k k 2 is the regularized term; C is called as penalty factor. 1  l ∑ l i¼1 h ε y i ; ð f x i ð ÞÞ is the empirical error measured by the ε-insensitive loss function, which is given by: In order to obtain ω and β, kernel functions are usually used to make computations performed directly in input space, without calculating the ϕ x ð Þ.At present, four basic kernels have been widely used, namely linear kernel, polynomial kernel, the radial-basis function and sigmoid kernel.In this paper, the Gaussian radial basis function (RBF) kernel function is used to construct the SVM model.More details of the RBF kernel function can be found in [40].In SVM regression, two critical hyperparameters, i.e., gamma parameter g and penalty factor C need to be preset before the learning process.Grid search method is employed to select the optimal values of gamma parameter and penalty factor.

Hybrid Models.
To improve the predictive accuracy of the debris-flow runout distance, hybrid models are proposed to integrate the robustness of machine-learning models with the limitation of the empirical regression models.The maximum runout distance of debris flow obtained from the NLRM using the training data is taken as auxiliary input for the three machine-learning algorithms to construct the training models.Taken the MARS-NLRM hybrid model as an example.First, the 107 sets of training data for CR1 split in Table 1 is used to develop the empirical relationship between the maximum runout distance and its influencing factors (i.e., H and V D ) using Equation (1).Then, the maximum runout distance can be calculated by using the developed empirical model for CR1 and the training data.The calculated maximum runout distance and the corresponding catchment H and V D is applied as supplementary input data for the MARS.To this end, a total of 214 training datasets are generated to construct the MARS-NLRM data-driven model.Finally, the performance of established hybrid MARS-NLRM model is evaluated by using the testing data for CR1 split.Similar method is utilized for CR2, CR3, CR4, and CR5, respectively.After completing the five iterations, the performance metrics of the hybrid MARS-NLRM model derived from each testing data are averaged to provide a final evaluation for the MARS-NLRM model's performance.As for other two hybrid models (i.e., RF-NLRM and SVM-NLRM), the same procedures are adopted to obtain their model evaluation indexes.

Performance Metrics of Different Models.
It is essential to evaluate the accuracy and reliability of a predictive model.Hence, the predictive accuracy and robustness of the proposed hybrid models in the paper are assessed by three mathematical metrics, i.e., coefficient of determination (R 2 ), RMSE, and MAE.R 2 is a statistical metric used to assess the goodness-of-fit of a predictive model.It reflects the proportion of the variance between the predicted dependent variable and the measured value, which is given by: where N is the number of the data; L m f ; i represents the actual measured maximum runout distance; L p f ; i is the predicted maximum runout distance; L m f is the mean value of the actual measured maximum runout distance.The value of R 2 ranges from 0 to 1, where a higher value indicates a better fit of the model to the observed data.
RMSE quantifies the average discrepancy between the actual measured values and the predicted values produced by the predictive model, which is calculated as follows: A lower RMSE value indicates a better predictive performance of the model, with smaller deviations between predictions and actual measurements.RMSE not only considers the relationship between predicted and actual values but also quantifies the magnitude of the errors.
MAE computes the absolute errors between the actual values and predicted values, which are calculated as follows: The MAE can characterize the accuracy of the predictive model by considering the absolute size of the errors, which is not affected by the RMSE.
These three statistical metrics are used to evaluate the predictive models under k-fold cross-validation, where R 2 reflects how well the model fits the data, RMSE gauges the accuracy of the model, and MAE quantifies the absolute errors.A good predictive model should ideally have a high R 2 , a low RMSE, and a small MAE, demonstrating its ability to effectively model the data and make accurate predictions for unseen datasets.

Results and Discussion
4.1.Hyperparameters for Machine-Learning Models.The empirical models for CR1, CR2, CR3, CR4, and CR5 are developed based on the fivefold cross-validation by using Equation (1).The model parameters of multi-nonlinear empirical relationships for the five different splits are summarized in Table 2.It is found that the values of model parameters, i.e., a, b, and c, are very close to each other under the fivefold cross-validation.Then, the maximum runout distance is calculated by using the developed empirical models for the fivefold cross-validation, and used as supplementary inputs to establish the hybrid models.Totally, four single model (i.e., NLRM, MARS, RF, and SVM) and three hybrid models (i.e., MARS-NLRM, RF-NLRM, and SVM-NLRM) are generated for CR1, CR2, CR3, CR4, and CR5, respectively.
In machine learning, hyperparameter is the parameter that is needed to be predefined before the modeling procedure.For MARS, RF, and SVM, there are several hyperparameters that have significant effects on the predictive accuracy.Reasonable selection of the optimal hyperparameters, is a necessary prerequisite for machine learning.To tune hyperparameters in these three machine-learning algorithms, Grid search is used to systematically search through a predefined hyperparameter space for the optimal combination that minimizes RMSE.
As for the MARS model, the predetermined maximum number of basis functions for CR1, CR2, CR3, CR4, and CR5 is determined with the minimum RMSE.The range of basis functions is adjusted as 60.The optimal MARS models adopt 30, 40, 40, 30, and 28 basis functions of linear spline functions for CR1, CR2, CR3, CR4, and CR5, respectively.As for the MARS-NLRM modeling procedure, the optimal maximum number of basis functions for CR1, CR2, CR3, CR4, and CR5 are 40, 30, 30, 30, and 40, respectively.As for the RF and RF-NLRM model, a maximum value of regression trees is defined as 500, i.e., N R = 500 trees.Through trial and error, the tree numbers higher than 300 for the five different splits have no significant effect on the model performance.As for the SVM modeling procedure, the ranges of C and g are set as (0, 1,500) and (0, 10).The optimal C values of SVM model for the five different splits are 1,024, 1,024, 8, 256, and 512, respectively.The gamma parameter g equals to 0.177, 0.125, 0.177, 0.022, and 0.707, respectively.For the SVM-NLRM model, the optimal C values for CR1, CR2, CR3, CR4, and CR5 are 724, 512, 512, 0.707, and 0.5.Meanwhile, the optimal g is found as 0.354, 0.25, 0.125, 2.828, and 8, respectively.These optimal hyperparameters are applied in the machinelearning algorithms to generate the single training model and hybrid models.3 shows the performance metrics of all the predictive models under the fivefold cross-validation.It is shown that except for MARS, NLRM, RF, and SVM have comparable predictive performance.Under the fivefold cross-validation, the MARS model shows the worst performance with the lowest average of R 2 , largest mean of RMSE, and maximum mean of MAE.Meanwhile, it can be seen that the hybrid models generally show better performance than individual models in terms of R 2 , RMSE, and MAE.From Table 3, the ranges of R 2 values of NLRM, MARS, RF, and SVM for the testing data are 0.292-0.840,0.115-0.747,0.252-0.722,and 0.233-0.619,respectively.However, the R 2 ranges of MARS-NLRM, RF-NLRM, and SVM-NLRM are 0.542-0.861,0.543-0.763,and 0.540-0.769,respectively.Compared with NLRM, the R 2 values of MARS-NLRM, RF-NLRM, The main reason is that for a small sampling size, RF algorithm is prone to overfitting, which will lead to the degradation of model performance.However, after incorporating the supplementary datasets from NLRM, the RF-NLRM has better data-fitting ability for both training and testing stages, showing better predive accuracy.Similar results are also observed for the MARS-NLRM and SVM-NLRM.This indicates that the hybrid models' performance is better than individual models in model evaluation.Figure 5 plots the RMSE curves of training and testing stages under fivefold cross-validation.It is clear that at the training stages, the hybrid models have smaller RMSE values than stand-alone models except for RF.However, under fivefold cross-validation, the RMSE values of MARS-NLRM, RF-NLRM, and SVM-NLRM are all smaller than that of a   Advances in Civil Engineering empirical model.Totally, the hybrid predictive models have better performance and smaller deviations between predictions and actual measurements compared with an individual model.

Comparisons of Different Predictive Models. Table
Figure 6 plots the MAE curves of training and testing stages under fivefold cross-validation.At the training stages, the MAE values for the three hybrid models are lower than that of the single model except for RF.However, the MAE values for the hybrid models are significantly smaller than that of all the individual predictive model at testing stages, implying that hybridization of the machine-learning models and the empirical model has better predictive accuracy.For example, the MAE values of MARS for the testing dataset under fivefold crossvalidation are 0.055, 0.048, 0.084, 0.070, and 0.150, respectively.While the MAE values of MARS-NLRM in the testing dataset for the five splits are 0.036, 0.035, 0.045, 0.044, and 0.035, respectively, which is obviously lower than that of MARS and NLRM (i.e., 0.047, 0.054, 0.069, 0.065, and 0.051).This shows that the integration of MARS with NLRM can greatly reduce the absolute errors between the actual values and predicted values, and improve the predictive accuracy.Similar results can also be observed for the RF-NLRM and SVM-NLRM.It can be found that hybridizing empirical statistical models and machinelearning algorithms can expand the amount of training data to improve the robustness and accuracy of the predictive model.
The average values of R 2 , RMSE, and MAE under the fivefold cross-validation for all the predictive models are summarized in Table 4.As shown in Table 4, the mean values of R 2 for the testing dataset of MARS-NLRM, RF-NLRM, and SVM-NLRM are 0.71, 0.70, and 0.69, respectively.They all outperform the NLRM (i.e., 0.58), MARS (i.e., 0.46), RF (i.e., 0.54), and SVM (i.e., 0.46).R 2 is improved by an average of 70.5% compared to the three machine-learning algorithms.Compared with the NLRM, the total improvement of R 2 average values for the hybrid models is about 29.6%.
It can also be seen that the mean values of RMSE for the testing dataset of MARS-NLRM, RF-NLRM, and SVM-NLRM are 0.061, 0.062, and 0.063, respectively, which are lower than  In contrast to the NLRM, the average MAE value of MARS-NLRM, RF-NLRM, and SVM-NLRM decreases about 32.5%.Compared with the three machine-learning algorithms, the average MAE value of corresponding hybrid models is reduced by 41.1%.Furthermore, the three hybrid models generally exhibit comparable predictive performance with similar indexes of R 2 , RMSE, and MAE.On the whole, the three hybrid models have higher prediction accuracy and lower errors.This implies that the proposed method can generate more accurate and reliable predictions over the single machine-learning algorithm and the empirical relationship.

Model Evaluations of Different Models at
The results of k = 1 are selected as a representative to show the performance of the hybrid models.Figure 7 shows the predicted runout distance values for NLRM, MARS, RF, SVM, MARS-NLRM, RF-NLRM, and SVM-NLRM using testing data versus measured values.From Figure 7, the estimated values of the maximum runout distance using the single model and hybrid models are all close to the measured values.To further illustrate the performance of hybrid models, Figures 8-9 shows the comparisons of predictive performance of hybrid models (i.e., MARS-NLRM, RF-NLRM,

Advances in Civil Engineering
and SVM-NLRM) with single model (k = 1) for testing dataset.As shown in Figure 8, the gray area represents the 95% confidence interval for predictions of the maximum runout distance.The blue square, purple triangle, and red circle in Figure 8 represent the predicted L f derived from the NLRM, MARS, and MARS-NLRM, respectively.It is clear that almost all the predicted data points fall into the 95% confidence interval.The predicted L f by using MARS-NLRM is closer to the 1 : 1 line compared with NLRM and MARS.As for the performance of RF-NLRM, the predicted values of L f are also closer to the actual values compared with NLRM and RF, as shown in Figure 10.Similar results can also be found in Figure 9.The predicted values derived from SVM-NLRM are generally closer to the actual values than NLRM and SVM.Especially, when the actual measured L f equals to 0.39 km, the predicted values from NRLM, RF-NLRM, and SVM-NLRM are 0.36, 0.38, and 0.39 km, respectively.It is clear that the RF-NLRM and SVM-NLRM model can provide more accurate predictions of the maximum runout distance of extreme debris-flow events.

12
Advances in Civil Engineering Figure 11 displays as radar diagrams the results for the R 2 , RMSE, and MAE statistical measures of accuracy for the four single predictive model and three hybrid models for training dataset under the first cross-validation.Figure 11 identifies that the MARS-NLRM, RF-NLRM, and SVM-NLRM models perform more accurately than the individual model in the calculation of the maximum runout distance of debris flow.Figure 12 plots the performance metrics of four single predictive model and three hybrid models for testing dataset.It is apparent that the three hybrid models have larger R 2 and smaller deviations between predictions and actual measurements (i.e., lower RMSE and MAE) than the other models.Evaluation of the performance of the predictive models shows that MARS-NLRM, RF-NLRM, and SVM-NLRM outperform the empirical model and single machine-learning algorithm both for the training data and the validation data in terms of prediction accuracy.It is clear that combing the empirical model and machine-learning algorithms to predict the debris-flow runout zone can potentially overcome the overfitting or underfitting of machine-learning models due to the limited amount of sampling data.Advances in Civil Engineering

Summary and Conclusions
Accurate delineation of debris-flow-endangered areas (e.g., the maximum runout distance) is a necessary prerequisite for the debris-flow risk assessment and countermeasures design.This paper proposes a hybrid method by integrating the machinelearning models and an empirical regression model to predict the maximum runout distance of debris flow.The proposed method takes the calculated maximum runout distance by the empirical model as supplementary inputs to increase the amount of training data of machine-learning models.The predictive performance of the proposed hybrid models is comprehensively evaluated by three statistical accuracy metrics (i.e., R 2 , RMSE, and MAE) and compared with the single predictive model (i.e., MARS, RF, SVM, and NLRM) under fivefold cross-validation.The proposed method is illustrated by using 134 datasets of channelized debris-flow events in Sichuan province, China.The following conclusions are drawn from the results and analysis: (1) For the individual predictive model, the MARS model shows the worst performance compared with the other single models (i.e., NLRM, RF, and SVM).The NLRM, RF, and SVM models have comparable predictive accuracy in estimating the maximum runout distance of debris flow in Wenchuan earthquake area.(2) For the study area, all the proposed hybrid MARS-NLRM, RF-NLRM, and SVM-NLRM models provide more accurate predicted values of the maximum runout distance than NLRM, MARS, RF, and SVM under fivefold cross-validation.After hybridizing Cross-Validation.To construct and validate a predictive statistical model, the 134 debris-flow datasets in the study area are randomly divided into training data for model development and testing data for model validation.Different random combinations of training datasets and testing datasets may lead to fluctuations in model performance evaluation.To avoid the bias in data selection, k-fold crossvalidation approach is employed for randomly selection of the training and testing data [31, 32].k-fold cross-validation is a popular technique used in machine-learning and model evaluation to assess the performance and generalization ability of a model.It randomly divides the original dataset into k subsets D 1 , D 2 ,…, D k with an approximately equal size.The process is then repeated k times.One of the k subsets is used as testing data, and the other k-1 subsets are considered as training data.The model is trained k times with k-1 folds as the training data and the remaining fold as the validation 2

FIGURE 1 :
FIGURE 1: Histograms of L f and input variables (i.e., H and V D ): (a) V D , (b) H, and (c) L f .

FIGURE 4 :
FIGURE 4: R 2 curves of training and testing stages under fivefold cross-validation: (a) training data and (b) testing data.

FIGURE 5 :
FIGURE 5: RMSE curves of training and testing stages under fivefold cross-validation: (a) training data and (b) testing data.

FIGURE 6 :
FIGURE 6: MAE curves of training and testing stages under fivefold cross-validation: (a) training data and (b) testing data.

FIGURE 8 :
FIGURE 8: Comparisons of predictive performance of MARS-NLRM with single model for testing dataset (k = 1).

FIGURE 10 :
FIGURE 10: Comparisons of predictive performance of SVM-NLRM with single model for testing dataset (k = 1).

FIGURE 9 :
FIGURE 9: Comparisons of predictive performance of RF-NLRM with single model for testing dataset (k = 1).

FIGURE 11 :
FIGURE 11: Performance metrics of four single predictive model and three hybrid models for training dataset (k = 1): (a) R 2 , (b) RMSE, and (c) MAE.

FIGURE 12 :
FIGURE 12: Performance metrics of four single predictive model and three hybrid models for testing dataset (k = 1): (a) R 2 , (b) RMSE, and (c) MAE.

TABLE 1 :
Ranges of V D , H, and L f at training and testing stages.
FIGURE 3: Flowchart of hybrid model development.

TABLE 2 :
Model parameters of multi-nonlinear regression empirical models.
2of MARS-NLRM, RF-NLRM, and SVM-NLRM at the training stages are generally larger than NLRM, MARS, and SVM.At the testing stages, the three hybrid models' R 2 are obviously larger than the other four stand-alone models' R 2 values.Although RF achieves a good training model, the prediction model has a large deviation for the testing data.

TABLE 3 :
Performance metrics of all the predictive models under fivefold cross-validation.
From Table3, the ranges of RMSE values for NLRM, MARS, RF, and SVM at testing stages are 0.057-0.104,0.061-0.257,0.060-0.115,and 0.059-0.127,respectively, while the RMSE for three hybrid models are 0.048-0.074,0.044-0.074,and 0.042-0.078,respectively.Compared with NLRM, the RMSE values of hybrid models decrease about 24%, 22%, and 21%, respectively.In contrast to MARS, RF, and SVM, RMSE values for MARS-NLRM, RF-NLRM, and SVM-NLRM are improved by 37.5%, 26.6%, and 34.4%, respectively.Obviously, hybridization of the empirical model and machinelearning algorithms can significantly reduce the prediction deviation of any individual model.The performance improvement of individual machine learning is higher than that of the

TABLE 4 :
Comparisons of performance metrics of the predictive models.