Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines , Decision Tree , and Naı̈ve Bayes Models

1 Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003IMT, 1432 Aas, Norway 2 Faculty of Surveying and Mapping, Hanoi University of Mining and Geology, Dong Ngac, Tu Liem, Hanoi, Vietnam 3 Department of Civil Engineering, Spatial and Numerical Modelling Research Group, Faculty of Engineering, Universiti Putra Malaysia, Selangor, 43400 Serdang, Malaysia


Introduction
Vietnam is identified as a country that is particularly vulnerable to some of the worst manifestations of climate change such as sea level rise, flooding, and landslides.In the recent and incomplete data 48 .The main potential drawback of this method is that it requires independence of attributes.However, this method is considered to be relatively robust 49 .
The main objective of this study is to investigate and compare the results of three data mining approaches, that is, SVM, DT, and NB, to spatial prediction of landslide hazards for the Hoa Binh province Vietnam .The main difference between this study and the aforementioned works is that SVM with two kernel functions radial basis and polynomial kernels and NB were applied for landslide susceptibility modeling.To assess these methods, the susceptibility maps obtained from the three data mining approaches were compared to those obtained by the logistic regression model reported by the same authors 2 .The computation process was carried out using MATLAB 7.11 and LIBSVM 50 for SVM and WEKA ver.3.6.6The University of Waikato, 2011 for DT and NB.

Study Area
Hoa Binh has an area of about 4,660 km 2 and is located between the longitudes 104 • 48 E and 105 • 50 E and the latitudes 20 • 17 N and 21 • 08 N in the northwest mountainous area of Vietnam Figure 1 .The province is hilly with elevations ranging between 0 and 1,510 m, with an average value of 315 m and standard deviation of 271.5 m.The terrain gradient computed from a digital elevation model DEM with a spatial resolution of 20 × 20 m is in the range from 0 • to 60 • , with a mean value of 13.8 • and a standard deviation of 10.4 • .
There are more than 38 geologic formations that have cropped out in the province Figure 2 .Six geological formations, Dong Giao, Tan Lac, Vien Nam, Song Boi, Suoi Bang, and Ben Khe, cover about 72.8% of the total area.The main lithologies are limestone, conglomerate, aphyric basalt, sandstone, silty sandstone, and black clay shale.The ages of rocks vary from the Paleozoic to Cenozoic with different physical properties and chemical composition.Five major fracture zones pass through the province causing rock mass weakness: Hoa Binh, Da Bac, Muong La-Cho Bo, Son La-Bim Son, and Song Da.
In the study area, there are heavy rainfalls with high intensity, especially during tropical rainstorms, and with an average annual precipitation varying from 1353 to 1857 mm data shown for the period 1973-2002 .The precipitation is most abundant during May to October with a rainfall that accounts for 84-90% annual precipitation.Rainfall usually peaks in the months of August and September with the average around 300 to 400 mm per month.The climate has a typical characteristic for the monsoonal region with a high humidity, being hot, and rainy.January is usually the coldest month with an average temperature of 14.9 • C whereas the warmest month is July with an average temperature of 26.7 • C.
Landslides occurred mostly in the rainy season when heavy rains exceeded 100 mm per day and continued for three days.Landslides also occurred when rainfall continued for five to seven days with rainfall larger than 100 mm for the last day.For example, landslides occurred in the Doc Cun and Doi Thai areas on September 2000 when the 7 days accumulated rainfalls were 308 and 383 mm, respectively.Many landslides occurred on 5 October 2007, in the Thung Khe, Toan Son, Phuc San, Tan Mai, Doc Cun, and surrounding areas with 3 days of accumulated rainfalls amounting from 334 to 529 mm.

Data
Landslides are assumed to occur in the future under the same conditions as for the past and current landslides 10 .Therefore, a landslide inventory map has been considered to be the most important factor for prediction of future landslides.The landslide inventory map portrays the spatial distribution of a single landslide event a single trigger or multiple landslide events over time historical 51 .For the study area, the landslide inventory map Figure 1 constructed by Tien Bui et al. 2 was used to analyze the relationships between landslide occurrence and landslide conditioning factors.The map shows 118 landslides that occurred during the last ten years, including 97 landslide polygons and 21 rock fall locations.The size of the largest landslide is 3,440 m 2 , the smallest is 380 m 2 , and the average landslide size is 3,440 m 2 .
Based on previous research carried out by Tien Bui et al. 2 , ten landslide conditioning factors are selected to build landslide models and to predict spatial distribution of the landslides in this study.They are slope angle, slope aspect, relief amplitude, lithology, soil type, land use, distance to roads, distance to rivers, distance to faults, and rainfall.The slope angle, slope aspect, and relief amplitude were extracted from a DEM that was generated from national topographic maps at the scale of 1 : 25,000.The slope angle map with 6 categories was constructed Figure 3 a .The slope aspect map with nine layer classes was constructed: flat, north, northeast, east, southeast, south, southwest, west, and northwest.The relief amplitude that presents the maximum difference in height per unit area 52 was constructed with 6 categories: 0-50 m, 50-100 m, 100-150 m, 150-200 m, 200-250 m, and 250-532 m.For the construction of the relief amplitude map, different sizes of the unit area were tested to choose a best one 20 × 20 pixels using the focal statistic module in the ArcGIS 10 software.
The lithology and faults were extracted from four tiles of the Geological and Mineral Resources Map of Vietnam at the scale of 1 : 200,000.This is the only geological map available for the study area.The lithology map Figure 3 b was constructed with seven groups based on clay composition, degree of weathering, estimated strength, and density 53, 54 .The distance-to-faults map was constructed by buffering the fault lines with 5 categories as: 0-200 m, 200-400 m, 400-700 m, 700-1,000 m, and >1,000 m.The soil type map Figure 3 c was constructed with 13 categories.The land-use map Figure 3 d was constructed with twelve categories.b  A road network that undercut slopes was extracted from the topographic map at the scale of 1 : 50,000.A distance-to-roads map was constructed with 4 categories: 0-40 m, 40-80 m, 80-120 m, and >120 m.A hydrological network that undercut slopes was also extracted from the topographic map at the scale of 1 : 50,000.And then a distance-to-rivers map was constructed with 4 categories: 0-40 m, 40-80 m, 80-120 m, and >120 m.
The rainfall map was prepared using the value of maximum rainfall of eight days seven rainfall days plus last day of rainfall larger than 100 mm for the period from 1990 to 2010, using the Inverse Distance Weighed IDW method.The precipitation data was extracted from a database from the Institute of Meteorology and Hydrology in Vietnam.

Support Vector Machines (SVM)
Support vector machines are a relatively new supervised learning method based on statistical learning theory and the structural risk minimization principle 55 .Using the training data, SVM implicitly maps the original input space into a high-dimensional feature space.Subsequently, in the feature space the optimal hyper plane is determined by maximizing the margins of class boundaries 56 .The training points that are closest to the optimal hyper plane are called support vectors.Once the decision surface is obtained, it can be used for classifying new data.
Consider a training dataset of instance-label pairs x i , y i with x i ∈ R n , y i ∈ {1, −1}, and i 1, . . ., m.In the current context of landslide susceptibility, x is a vector of input space that contains slope angle, lithology, rainfall, soil type, slope aspect, land use, distance to roads, distance to rivers, distance to faults, and relief amplitude.The two classes {1, −1} denote landslide pixels and no-landslide pixels.The aim of the SVM classification is to find an optimal separating hyperplane that can distinguish the two classes, that is, landslides and no landslides {1, −1}, from the mentioned set of training data.
For the case of linear separable data, a separating hyperplane can be defined as where w is a coefficient vector that determines the orientation of the hyper plane in the feature space, b is the offset of the hyper plane from the origin, and ξ i is the positive slack variables 57 .
The determination of an optimal hyper plane leads to the solving of the following optimization problem using Lagrangian multipliers 58 : where α i are Lagrange multipliers, C is the penalty, and the slack variables ξ i allows for penalized constraint violation.The decision function, which will be used for the classification of new data, can then be written as In cases when it is impossible to find the separating hyper plane using the linear kernel function, the original input data may be transferred into a high-dimension feature space through some nonlinear kernel functions.The classification decision function is then written as where K x i , x j is the kernel function.
The choice of the kernel function is crucial for successful SVM training and classification accuracy 59 .There are four types of kernel function groups that are commonly used in SVM: linear kernel LN , polynomial kernel PL , radial basis function RBF kernel, and sigmoid kernel SIG .The LN is considered to be a specific case of RBF, whereas the SIG behaves like the RBF for certain parameters 60 .According to Keerthi and Lin 61 , the LN is not needed for use when the RBF is used.And generally, the classification accuracy of the SIG may not be better than RBF 62 .Therefore in this study, only the two kernel functions, RBF and PL, were selected.According to Zhu et al. 63 , the main advantage of using RBF is that RBF has good interpolation abilities.However, it may fail to provide longer-range extrapolation.On contrast, PL has better extrapolation abilities at lower-order degrees but requires higher order degrees for good interpolation.The formulas and their parameters are shown in Table 2.
The performance of the SVM model depends on the choice of the kernel parameters.For the RBF-SVM, the regularization parameter C and the kernel width γ are the two parameters that need to be determined, whereas C, γ and the degree of polynomial kernel d are three for the case of the PL-SVM.Parameter C controls the tradeoff between training errors and margin, which helps to control overfitting of the model.If values of C are large, that will lead to a few training errors, whereas a small value for C will generate a larger margin and thus increase the number of training errors 64 .Parameter γ controls the degree of nonlinearity of the SVM model.Parameter d defines the degree of the polynomial kernel.
The process of picking up the best pairs of parameters, which produce the best classification result, is considered to be an important research issue in the data mining area 65 .Many methods have been proposed, such as the heuristic parameter selection 66 , the gradient descent algorithm 67 , the Levenberg-Marquardt method 68 , and the cross-validation method 69 .However, the grid search method that is widely used in the determination of SVM parameters is still considered to be the most reliable optimization method 70 and was selected for this study.Firstly, the ranges of all parameters with a stepsize process were determined.Secondly, the grid search was performed by varying the SVM hyperparameters.Finally, the performance of every combination is assessed to find the best pairs of parameters.However, the grid search is only suitable for the adjustment of a small number of parameters due to the computational complexity 71 .

Decision Tree (DT)
A DT is a hierarchical model composed of decision rules that recursively split independent variables into homogeneous zones 72 .The objective of DT building is to find the set of decision rules that can be used to predict outcome from a set of input variables.A DT is called a classification or a regression tree if the target variables are discrete or continuous, respectively 73 .DT has been applied successfully in many real-world situations for classification and prediction 74 .
The main advantage of DT is that DT models have the capability of modeling complex relationship between variables.They can incorporate both categorical and continuous variables without strict assumptions with respect to the distribution of the data 75 .In addition, DTs are easy to construct and the resulting models can be easily interpreted.Furthermore, the DT model results provide clear information on the relative importance of input factors 76 .The main disadvantage of DTs is that they are susceptible to noisy data and that multiple output attributes are not allowed 77 .
Many algorithms for constructing decision tree models such as classification and regression tree CART 78 , chi-square automatic interaction detector decision tree CHAID 79 , ID3 80 , and C4.5 81 are proposed in the literature.In this study, the J48 algorithm 82 , which is a Java reimplementation of the C4.5 algorithm, was used.The C4.5 uses an entropy-based measure as the selection criteria that is considered to be the fastest algorithm for machine learning with good classification accuracy 83 .Given a training dataset T with subsets T i , i 1, 2, ..., s, the C4.5 algorithm constructs a DT using the top-down and recursivesplitting technique.A tree structure consists of a root node, internal nodes, and leaf nodes.The root node contains all the input data.An internal node can have two or more branches and is associated with a decision function.A leaf node indicates the output of a given input vector.
The procedure of DT modeling consists of two steps: 1 tree building and 2 tree pruning 84 .The tree building begins by determining the input variable with highest gain ratio as the root node of the DT.Then the training dataset is split based on the root values, and subnodes are created.For discrete input variables, a subnode of the tree is created for each possible value.For continuous input variables, two sub-nodes are created based on a threshold that was determined in the threshold-finding process 81 .In the next step, the gain ratio is calculated for all the sub-nodes individually, and the process is subsequently repeated until all examples in a node belong to the same class.And those nodes are called leaf nodes and are labeled as class values.
Since the tree obtained in the building step may have a large number of branches and therefore may cause a problem of over-fitting 85 , therefore, the tree needs to be pruned for better classification accuracy for new data.Two types of tree pruning can be seen: before pruning and after pruning.In the case of pre-pruning, the growing of the tree will be stopped when a certain criterion is satisfied, whereas in the post-pruning case the full tree will be constructed first, and then the ending subtrees will be replaced by leafs based on the error comparison of the tree before and after replacing sub-trees.
The information gain ratio for attribute A is as follows: GainRatio A, T Gain A, T SplitInfo A, T , where

3.6
A DT can estimate the probability of belonging to a specific class and therefore the probability isused to predict the probability of landslide pixels.The estimated probability is based on a natural frequency at the tree leaf.However, the estimated probability might not give sound probabilistic estimates; therefore Laplace smoothing 86 was used in this study.

Naïve Bayes (NB)
An NB classifier is a classification system based on Bayes' theorem that assumes that all the attributes are fully independent given the output class, called the conditional independence assumption 48 .The main advantage of the NB classifier is that it is very easy to construct without needing any complicated iterative parameter estimation schemes 40 .In addition, NB classifier is robust to noise and irrelevant attribute.This method has been successfully applied in many fields 87 .Given an observation consisting of k attributes x i , i 1, 2, . . ., k x i is landslide conditioning factor , y j , j landslide, no landslide is the output class.NB estimates the probability P y j /x i for all possible output class.The prediction is made for the class with the largest posterior probability as y NB argmax P y j y j ∈{Landslide, no-landslide} n i 1 P x i /y j .

3.7
The prior probability P y j can be estimated using the proportion of the observations with output class y j in the training dataset.The conditional probability is calculated using where μ is mean and δ is standard deviation of x i .

Performance Evaluation
The performances of the trained landslide models were assessed using several statistical evaluation criteria using counts of true positive TP , false positive FP , true negative TN , false negative FN .TP rate sensitivity measures the proportion of the number of pixels that are correctly classified as landslides and is defined as TP/ TP FN .TN rate specificity measures the proportion of number of pixels that are correctly classified as non-landslide and is defined as TN/ TN FP .Precision measures the proportion of the number of pixels that are correctly classified as landslide occurrences and is defined as TP/ TP FP .Overall accuracy is calculated as TP TN /total number of training pixels.The F-measure combines precision and sensitivity into their harmonic mean and is defined as 2 * Sensitivity * Specificity/ Sensitivity Specificity 88 .
In order to measure the reliability of the landslide susceptibility models, the Cohen kappa index κ 89-91 was used to assess the model classification compared to chance selection: where P C is the proportion of number of pixels that are correctly classified as landslide or non-landslide and is calculated as TP TN /total number of pixels.P exp is the expected agreements and is calculated as TP FN TP FP FP TN FN TN /Sqrt total number of training pixels .
A κ value of 0 indicates that no agreement exists between the landslide model and reality whereas a κ value of 1 indicates a perfect agreement.If κ value is negative, it indicates a poor agreement.A κ value in the range 0.80-1 is considered as indicator of almost perfect agreement while a value in the range 0.60-0.80indicates a substantial agreement between the model and reality.For a value in the interval 0.40-0.60 , the agreement is moderate and the values of 0.20-0.40 and <0.2 indicate over fair and slight agreement, respectively 92 .

Preparation of the Training and the Validation Datasets
In this study, a total of ten landslide conditioning factors were used.They are slope angle, lithology, rainfall, soil type, slope aspect, landuse, distance to roads, distance to rivers, distance to faults, and relief amplitude.For each conditioning factor, a map is generated.These maps were then converted into a pixel format with a spatial resolution of 20 × 20 m.In the next step, frequency ratio values 93 were calculated for all categories based on the landslide grid cells.Based on these ratio values, each category was assigned an attribute number and then was rescaled in the range 0.1 to 0.9 Table 1 using the Max-Min normalization procedure 94 as follows: where v is the normalized data matrix, v is the original data matrix, and U and L are the upper and lower normalization boundaries.In landslide modeling, the landslide data should be split into two parts, training and validation datasets.Without the splitting, it would not be possible to validate the results 95 .In this study, the landslide inventory map with 118 landslide polygons was randomly split into two subsets: subset 1 comprised 70% of the data 82 landslides with 684 landslide grid cells and was used in the training phase of landslide models; subset 2 is a validation dataset with 30% of the data 36 landslides with 315 landslide grid cells for the validation and estimate the prediction accuracy of the resulted models.
All of the 684 landslide grid cells in the subset 1 were assigned the value of 1. SVM may seriously have negative effects on the model performance when the numbers of landslide and non-landslide grid cells in the training dataset are significantly unbalanced.Therefore, the same amount of no-landslide grid cells was randomly sampled from the landslide-free area and assigned the value of −1.In the cases of DT and NB classifiers, no-landslide grid cells were assigned to the value 0. Finally, an extracting process was conducted to extract values for the ten landslide conditioning factors to build a training dataset.This dataset contains a total of 1368 observations, ten input variables, and one target variable landslide, no landslide .

Training of the Support Vector Machines, Decision Tree and Naïve Bayes
Models and Generation of Landslide Susceptibility Indexes

Support Vector Machines (SVM)
In the case of SVM, the model selection with its optimal parameters searching plays a crucial role in the performance of the model.In this study, RBF and PL kernel functions were selected.The training process was started by searching the optimal kernel parameters using the gridsearch method with cross-validation that can help to prevent overfitting.Since the numbers of landslide grid cells in the study area are not large, 5-fold cross-validation was used to find the best kernel parameters.The training dataset was randomly split into 5 equally sized subsets.Each subset was used as a test dataset for the SVM model trained on the remaining 4 data subsets.The cross-validation process was then repeated five times with each of the five subsets used once as the test dataset.
With the RBF kernel, the two kernel parameters of C and γ need to be determined.The procedure is as follows: 1 we set a grid space of C, γ , where C 2 −5 , 2 −4 ,. .., 2 10 and γ 2 10 , 2 9 , . .., 2 −4 ; 2 for each parameter, pairs of C, γ in the grid space, conduct 5-fold crossvalidation on the training dataset; 3 choose parameter pairs of C, γ that have the highest classification accuracy; 4 use the best parameters to construct a SVM model for landslide prediction of new data.The best C and γ are determined as 8 and 0.25, respectively.The correctly classified rate is 91.1%.
With the PL kernel, the two kernel parameters of C and d need to be determined.Table 3 shows the results of training the SVM model using different d values.The result shows that when the values of d increase, AUC in the training dataset is increased as well.However, AUC in the validation dataset increases until d equals 3 and then decreases with the increasing of the d values.And therefore, the SVM model with three degrees of the polynomial kernel is selected.The accurately classified rate of SVM using PL kernel is 91.1%.The best C and γ are determined as 1 and 0.3536, respectively.
A detailed accuracy assessment for RBF-SVM and PL-SVM is shown in Tables 4 and 5.It could be seen that precision, F-measure, and TP rate are high >90% whereas FP rate is low <10% .It indicates a high classification capacity for the training dataset for the two models.The Cohen kappa indexes are 0.822 and 0.823 for RBF-SVM and PL-SVM, respectively.It indicates a good agreement between the observed and the predicted values.

Decision Tree (DT)
In the case of DT, the first step is to determine the optimal value of the algorithm parameter such as the minimum number of instances MNIs per leaf and the confidence factor CF .Since a lower MNI is required to a leaf tree, the more branching will be created resulting in a larger tree.And thus, it may cause overfitting problem.In contrast, a higher MNI required per leaf will result in a narrow tree.Figure 4 shows the MNI required per leaf versus the classification accuracy.In this test, the MNI required in a leaf was varied from 1 to 25 with a step of one, and the corresponding classification accuracies were obtained and plotted.The result shows that the highest classification accuracy is 92.8% corresponding to a MNI of 6.Therefore, the MNI per leaf of 6 was selected.
In order to explore the effect of the CF on the classification accuracy, the CF value was varied from 0.1 to 1 using a step size of 0.05.The corresponding classification accuracy was calculated.The result is shown in Figure 5.The result shows that the highest classification accuracy occurred with the CF of 0.35.Therefore CF of 0.35 was selected.With the two aforementioned parameters being determined, the decision tree model was constructed using the J48 algorithm.The probability of belonging to the landslide or the no-landslide classes for each observation was estimated using the Laplace smoothing.Using10-fold cross-validation, the decision tree model was constructed.The classified rate is 92.9%.The Cohen kappa index is 0.860.Detailed accuracy assessment of the decision tree model by class is shown in Tables 4  and 5.It could be observed that the TP rate, the precision, and the F-measure are greater than 90%.FP rates are 9.5% and 4.5% for the landslide and the non-landslide classes, respectively.
Figure 6 depicts the inferred DT model for landslide susceptibility assessment in this study.It could be observed that the size of the tree is 55 including the root node, 26 internal nodes, and 28 leafs green rectangular boxes .In leaf nodes, value of 0.1 indicates the class of no landslide, whereas value of 0.9 indicates the landslide class.The number in the parentheses at each leaf node represents the number of instances in that leaf.It is clear that some instances are misclassified in some leaves.The number of misclassified instances is specified after a slash Figure 6 .The highest number of instances in a leaf node is 288, whereas the lowest number of instances in a leaf node is 7.The top-down induction of the tree shows that landslide conditioning factor in the higher level of the tree is more important.The relative importance of the landslide conditioning factor is as follows: distance to roads 81.5% in relative importance , slope 71.6% , land use 66.7% , aspect 61.1% , rainfall 61.5% , relief amplitude 61.6% , distance to rivers 60.1% , distance to faults 58.7% , lithology 57.7% , and soil type 52.8% .In the case of NB classifier, the probability is first calculated for each output class landslide, no landslide , and the classification is then made for the class with the largest posterior probability.The NB model was constructed using the WEKA software.The NB model obtained an overall classification accuracy of 86.1% in average.TP rate, precision, and Fmeasure are varied from 83% to 89%.The Cohen kappa index of 0.722 indicates that the strength of agreements between the observed and the predicted values is substantial.A summary result of the model assessment and performance is shown in Tables 4 and 5.

Mathematical Problems in Engineering
Once the SVM, DT, and NB models were successfully trained in the training phase, they were used to calculate the landslide susceptibility indexes LSIs for all the pixels in the study area.The results were then transferred into a GIS and loaded in the ARCGIS 10 software for visualization.

Success Rate and Prediction Rate for Landslide Susceptibility Maps
The validation processes of the four landslide susceptibility maps were performed by comparing them with the landslide locations using the success-rate and prediction-rate methods 95 .Using the landslide grid cells in the training dataset, the success-rate results were obtained.Figure 7 shows the success-rate curves of the four landslide susceptibility maps obtained from RBF-SVM, PL-SVM, DT, NB models in this study in comparison with the logistic regression model.It could be observed that RBF-SVM and logistic regression have the highest area under the curve, with AUC values of 0.961 and 0.962, respectively.They are followed by PL-SVM 0.956 , DT 0.952 , and NB 0.935 .Based on these results we can conclude that the capability of correctly classifying the areas with existing landslides is highest for the RBF-SVM equals to logistic regression , followed by the PL-SVM, DT, and NB.
Since the success-rate method uses the landslide pixels in the training dataset that have already been used for constructing the landslide models, the success-rate may not be a suitable method for measuring the prediction capability of the landslide models 96 .According to Chung and Fabbri 95 , the prediction rate could be used to estimate the prediction capability of the landslide models.In this study, the prediction-rate results of the four landslide susceptibility models were obtained by comparing them with the landslide grid cells in the validation dataset.And then the areas under the prediction-rate curves AUCs were further estimated.The more the AUC value is close to 1, the better the landslide model.
The prediction-rate curves and AUC of the four landslide susceptibility maps are shown in Figure 8.The results show that AUCs for the four models vary from 0.909 to 0.955.It indicates that all the models have a good prediction capability.The highest prediction capability is for RBF-SVM and PL-SVM with AUC values of 0.954 and 0.955, respectively.They are followed by NB 0.935 and DT 0.907 .Compared with the logistic regression AUC of 0.938 that used the same data, it can be seen that the prediction capability of the two SVM models may be slightly better whereas the prediction capability of DT and ND is lower.

Reclassification of Landslide Susceptibility Indexes
The landslide susceptibility indexes were reclassified into four relative susceptibility classes: high, moderate, low, and very low.In this study, the classification method proposed by Pradhan and Lee 8 was used to determine landslide susceptibility class breaks based on percentage of area: high 10% , moderate 10% , low 20% , and very low 60% Figure 9 .
Landslide density analysis was performed on the four landslide susceptibility classes 97 .Landslide density is defined as the ratio of landslide pixels to the total number of pixels in the susceptibility class.An ideal landslide susceptibility map has the landslide density value increasing from a very low-to a higher-susceptibility class 32 .A plotting of the landslide density for the four landslide susceptibility classes of the four landslide susceptibility models RBF-SVM, PL-SVM, DT, and NB is shown in Figure 10.It could be observed that the landslide density is gradually increased from the very low-to the highsusceptibility class.Figure 11 shows landslide susceptibility maps using RBF-SVM, PL-SVM, DT, and NB models.
Table 6 shows the characteristics of the four susceptibility classes of the four maps of the study area.It can be observed that the percentages of existing landslide pixels for the high class are 87.2%,87.5%, 90.7%, and 81.3% for RBF-SVM, PL-SVM, DT, and NB, respectively.In contrast, 80% of the pixels in the study areas are in the low-and very-low-susceptibility classes.These maps are satisfing two spatial effective rules 98 , 1 the existing landslide pixels should belong to the high-susceptibility class and 2 the high susceptibility class should cover only small areas.

Discussions and Conclusions
This paper presents a comparative study of three data mining approaches SVM, DT, and NB for landslide susceptibility mapping in the Hoa Binh province Vietnam .The landslide inventory was constructed with 118 polygons of landslides that occurred during the last ten years.A total of ten landslide conditioning factors were used in this analysis, including slope  angle, lithology, rainfall, soil type, slope aspect, landuse, distance to roads, distance to rivers, distance to faults, and relief amplitude.For building the models, a training dataset was extracted with 70% of the landslide inventory, whereas the remaining landslide inventory was used for the assessment of the prediction capability of the models.Using the three data mining algorithms, SVM, DT, and NB, the landslide susceptibility maps were produced.These maps present spatial predictions of landslides.They do not include information "when" and "how frequently" landslides will occur.
In the case of SVM, the selection of the kernel function and its parameters play an important role in landslide susceptibility assessment.For the RBF function, the best kernel parameters of C and γ are 8 and 0.25, respectively.For the PL function, it is clear that the degree of polynomial function had significant effect in the model.The SVM model with a polynomial degree of 3 has the highest accuracy.The best kernel parameters of C and γ are 1 and 0.3536 respectively.In the case of DT, the probability that an observation belongs to landslide class using Laplace smoothing was used to calculate the landslide susceptibility index.For building the DT model, the selection of MNI per leaf tree and CF has largely affected the accuracy of the model.In this study, the best decision tree model is found with MNI per leaf tree as 6 and the CF as 0.35.Relative importance of landslide conditioning factors are as follows: distance to roads, slope angle, landuse, slope aspect, rainfall, relief amplitude, distance to rivers, distance to faults, lithology, and soil type.In the case of NB, the application for landslide modeling is relatively robust.This is not a time-consuming method, the logistic regression.Additionally, the findings also agree with Marjanović et al. 101 , who reported that SVM outperformed the logistic regression and DT.Similarly, the results also agree with Ballabio and Sterlacchini 102 , who concluded that SVM was found to outperform the logistic regression, linear discriminant, and NB.
The reliabilities of the landslide models were assessed using Cohen kappa index κ .In this study, the kappa indexes are of 0.822, 0.823, and 0.860 for RBF-SVM, PL-SVM, and DT, respectively.It indicates an almost perfect agreement between the observed and the predicted values.Cohen kappa index is 0.722 for NB indicating substantial agreement between the observed and the predicted values.The reliability analysis results are satisfying compared with other works such as Guzzetti et al. 91 and Saito et al. 44 .
Landslide susceptibility maps are considered to be a useful tool for territorial planning, disaster management, and natural hazards' mitigation.This study shows that SVMs have considered being a powerful tool for landslide susceptibility with high accuracy.As a final conclusion, the analyzed results obtained from the study can provide very useful information for decision making and policy planning in landslide areas.

Figure 1 :
Figure 1: Landslide inventory map of the study area.

Figure 2 :
Figure 2: Geologic map of the study area.

Figure 3 :
Figure 3: Landslide conditioning factor maps a slope, b lithology, c soil type, and d landuse.
Minimum number of instances per leaf

Figure 4 :
Figure 4: Minimum number of instances per leaf versus classification accuracy.

Figure 6 :
Figure 6: Decision tree model for landslide susceptibility assessment for the study area.

Figure 7 :Figure 8 :
Figure 7: Success-rate curves and area, under the curves AUCs of RBF-SVM, PL-SVM, DT, and NB models in comparison with the logistic regression model.

Figure 9 :
Figure 9: Percentage of landslides against percentage of landslide susceptibility maps using of RBF-SVM, PL-SVM, DT, and NB models.

Figure 10 :
Figure 10: Landslide density plots of four landslide susceptibility classes of RBF-SVM, PL-SVM, DT, and NB models.

Table 1 :
Normalized classes of landslide conditioning factors used.

Table 2 :
RBF and PL kernels and their parameters.

Table 3 :
Degree of polynomial kernel versus area under the ROC curves in the training and validation datasets.

Table 4 :
Detailed accuracy assessment by classes of RBF-SVM, PL-SVM, DT, and NB models.
Confidence factor used for pruning versus classification accuracy.

Table 6 :
Characteristics of the four susceptibility zones of the four landslide susceptibility models obtained from RBF-SVM, PL-SVM, DT, and NB models.