A Comparative Study of Landslide Susceptibility Mapping Using SVM and PSO-SVM Models Based on Grid and Slope Units

The main purpose of this study aims to apply and compare the rationality of landslide susceptibility maps using support vector machine (SVM) and particle swarm optimization coupled with support vector machine (PSO-SVM) models in Lueyang County, China, enhance the connection with the natural terrain, and analyze the application of grid units and slope units. A total of 186 landslide locations were identified by earlier reports and field surveys. The landslide inventory was randomly divided into two parts: 70% for training dataset and 30% for validation dataset. Based on the multisource data and geological environment, 16 landslide conditioning factors were selected, including control factors and triggering factors (i.e., altitude, slope angle, slope aspect, plan curvature, profile curvature, SPI, TPI, TRI, lithology, distance to faults, TWI, distance to rivers, NDVI, distance to roads, land use, and rainfall). The susceptibility between each conditioning factor and landslide was deduced using a certainty factor model. Subsequently, combined with grid units and slope units, the landslide susceptibility models were carried out by using SVM and PSO-SVM methods. The precision capability of the landslide susceptibility mapping produced by different models and units was verified through a receiver operating characteristic (ROC) curve. The results showed that the PSO-SVM model based on slope units had the best performance in landslide susceptibility mapping, and the area under the curve (AUC) values of training and validation datasets are 0.945 and 0.9245, respectively. Hence, the machine learning algorithm coupled with slope units can be considered a reliable and effective technique in landslide susceptibility mapping.


Introduction
Landslide is a damaging geological phenomenon all over the world, which has characteristics of wide distribution, high frequency, and strong destruction [1][2][3][4]. China is one of the countries greatly affected by landslides in the world, which causes great losses to national construction and people's lives and property because of the occurrence of landslides every year [5,6]. It is reported that a total of 6,186 geological disasters occurred in 2019, resulting in 211 dead, 13 missing, 75 injured, and direct economic losses of 2.77 billion CNY (https://www.cigem.cgs.gov.cn). e occurrence of landslide disasters has directly or indirectly affected economic development and social stability. erefore, the study on quantitative assessment of landslide susceptibility provides not only scientific basis for landslide prevention and land resource utilization planning but also great significant to predict landslide stability for medium and long terms [7].
In recent years, the development and application of "3S" (global position system, remote sensing, and geographic information system) provides important theoretical and technical means for monitoring and preventing the landslides [8][9][10]. A large number of methods are applied in landslide prediction. According to the different theories, there have been many GIS-based models for landslide susceptibility analysis and mapping. All of the current models can be summarized into two groups: knowledge-driven models and data-driven models [11]. e first model, also called heuristic analysis, is based on geological expert experience and field work situation, such as analytic hierarchy process [12,13] and fuzzy mathematics [14][15][16]. Disadvantages and limitations of knowledge-driven model are strongly subjective and standard difference. e second model is to establish the function relationship or expression between the landslide and factors by selecting an appropriate mathematical means, so as to conduct landslide susceptibility mapping, for example, frequency ratio [17][18][19], weights-of-evidence [20][21][22][23], certainty factors [24][25][26], and logistic regression [27][28][29]. e occurrence of landslides is complicated nonlinear and affected by conditioning factors, including geomorphological, geological, hydrological, surface cover index, geophysical, and meteorological factors [30,31].
Although many models have been used in landslide susceptibility mapping, a comparative study of SVM and PSO-SVM models based on grid units and slope units has been seldom considered so far. erefore, this study aims to construct the landslide susceptibility models through different units in Lueyang County, China. Also, the performance of every model and unit was evaluated and compared. e results are a certain reference significance for other areas.

Study Area and Data
Lueyang County is located in the southwest part of Hanzhong City, Shaanxi Province, China, between the longitudes 105°42′E∼106°31′E and latitudes 33°07′N∼33°38′N, and covers an area of 2831 km 2 ( Figure 1). e highest altitude of the study area is 2399 m; on the contrary, the lowest point is 559 m, and the altitude increases from the southwest to the northeast. e landform can be classified into mountain, hill, and plain. e study area is characterized by typical subtropical humid continental monsoon climate. According to years of meteorological data, the average annual is 13.2°C, and the mean annual precipitation is 826.2 mm. e rivers of the Lueyang County are densely distributed and belong to the Yangtze River Basin which is divided into the Hanjiang River and Jialing River. e geology of the study area is very complicated. e lithological stratum varies from Proterozoic to Quaternary, and the mainly outcropped lithologies including granite, tuff, phyllite, sandstone, shale, and limestone are the main outcropped lithology. e geological structure system of study area belongs to the Kunlun-Qinling-fold system. ere are several faults and make this area highly susceptible to landslide stability. ese faults are very developed and have approximately SEE-NWW and NNE-SWW directions.

Landslide Inventory.
e most significant and critical step in the landslide map is to identify the location and type of the existing landslides. e landslide dataset determines the quality of the landslide susceptibility modeling. According to the historical reports, aerial photo, image interpretation, and field investigation in this study area, the landslide inventory map was produced and 186 landslides were ascertained. However, analysis of the landslide inventory map shows that a large proportion of landslides occurred in the study area are slides (178) and a very small proportion is rock falls (8). e smallest landslide was about 450 m 2 , the largest was about 4.9 × 10 4 m 2 , and the average was about 1.8 × 10 4 m 2 . Most of the landslides in the study area are less than 10,000 m 2 and shallow seated (<6 m). erefore, the centroid point was used to represent the corresponding landslide location, by randomly dividing 186 landslide points into 70% (130 landslides) for training and the remainder of 30% (56 landslides) for validation. e nonlandslide points were randomly selected from the landslide-free areas and also randomly divided into the same proportion (70/30) to build training and validation.

Landslide Conditioning Factors.
According to an analysis of historical landslide data and a summary of previous research study, the occurrence of landslide is affected by various factors. e selection principle is to consider the mechanism and geoenvironmental characteristics of landslide occurrence in the study area. In the present study, the landslide conditioning factors used to evaluate susceptibility are classified into two categories, namely, control factors and triggering factors, respectively. In this area, control factors consist of altitude, slope angle, slope aspect, plan curvature, profile curvature, stream power index (SPI), lithology, topographic position index (TPI), topographic ruggedness index (TRI), distance to faults, topographic wetness index (TWI), distance to rivers, and triggering factors including normalized difference vegetation index (NDVI), land use, distance to roads, and rainfall. e landslide conditioning factors were obtained by a variety of data sources, such as point, polygon, and raster data (Table 1). ese were extracted from available sources and must transform into the same format, resolution, and coordinate system. Altitude affects the slope by sunshine, plant, temperature, and human activity. According to DEM data, the altitude ranges from 599 m to 2399 m and was classified into five classes (Table 2 and Figure 2(a)).
Slope angle describes the degree of slope inclination at a point and directly influences the slope stability through stress and runoff nonuniformity. In this study, slope angle was created by DEM data and divided into five classes (Table 2, Figure 2(b)).
Slope aspect represents the orientation of the slope. It is a very important parameter to assess landslide susceptibility and also produced by DEM data. e aspect values vary from −1 to 360°and divided into nine classes with an equal interval of 45°(Table 2, Figure 2(c)).
Plan curvature and profile curvature are the quantitative index that describes degree of the terrain distortion. e plan curvature is the change rate of slope aspect at a point, as well as reflect the inflect degree of the contour line. e profile curvature is the change rate of slope angle and also means the second derivative of the altitude change. In light of the natural break method, the plan curvature and profile curvature produced by DEM data were classified into three categories ( Table 2, Figures 2(d) and 2(e)).
SPI expresses the erosion power of water flow and also was considered as an important hydrological factor. e SPI values were classified into five categories ( Table 2, Figure 2(f )). SPI is calculated by the following equation: Here, A S is the specific catchment areas and β is the local slope gradient in degree.
TPI describes position information of a point, and its values were classified five classes (Table 2, Figure 2(g)). TPI is defined as follows: Here, e is the center point altitude and e i is the neighborhood altitude.
TRI reflects the terrain fluctuation and the erosion degree, which can generally express the ratio with the surface area to its projected area. TRI was produced by DEM data, and its values were grouped into five classes (Table 2, Figure 2(h)). e value is computed as follows: Lithology is the material basis of slope development; at the same time, it can control the occurrence of slope. In    Figure 2(i)). Faults can cut the rock and soil and make the broken and eventually may influence slope stability. So, it is a very important conditioning factor, which is selected as the distance to the faults. e distance was grouped into five classes (Table 2, Figure 2(j)).
TWI represents the saturated state of soil moisture within a certain watershed and simulates topography for a hydrological process. According to the TWI, the value can be classified five groups (Table 2, Figure 2(k)) and is calculated by Here, α is the area drained per unit contour length at a point and β is the slope.
Rivers erosion leads to reduction in the rock and soil strength. In order to research the relationship between the rivers and slopes, the distance to rivers is selected as the factor. e value can be classified five categories (Table 2, Figure 2(l)). NDVI is a signal that reflects on the growth status and quantity distribution of plants and vegetation. e NDVI value can be grouped into five categories (Table 2, Figure 2(m)) and is computed as Here, NIR is the near-infrared band and R is the red band.
e distance to roads is closely related with human engineering activity. e activity changed the original topographical structure and generally accelerates slope instability. e distance is divided into five groups (Table 2, Figure 2(n)).
Land use was extracted by remote sensing imagery in this study area. e interpretation result is classified into five groups (Table 2, Figure 2(o)).
Rainfall can trigger the occurrence of landslide to some extent. e average annual rainfall value is classified into five categories (Table 2, Figure 2(p)).

Methodology
e methodology of this study is shown as a flowchart ( Figure 3). Firstly, identify and determine the temporal and spatial information of the landslides in study area. Next, according to the multisource data, 16 factors were selected and each conditioning factor's susceptibility was determined with landslide by using a certainty factor model. en, apply the SVM and PSO-SVM models and compare the grid units and slope units for landslide susceptibility mapping. Finally, use the receiver operating characteristic (ROC) curve to select the best unit and model.

Mapping Units.
e first step is the selection of mapping units in the landslide susceptibility, so its rationality determines the accuracy and reliability of the assessment results. Mapping unit is the smallest and indivisible space unit in landslide susceptibility evaluation, which can be either regular or irregular [52]. According to the current research results, all units can be classified into five types: grid unit, terrain unit, unique condition unit, slope unit, and topographic unit [53,54]. Grid units divide the territory into a regular square with the same size, and it is easy to compute and sample [55]. Slope units divide the territory into independent slopes by ridge and valley line, as well as it can reflect the natural topography of the study area [56,57]. Based on the above, combined with the characteristics of the thematic data of the landslide condition factors, grid unit and slope unit are selected to calculate the landslide susceptibility in this study, respectively. e resolution of grid units is 30 × 30 m, a total of 3,141,646 grids. e slope units are based on the 30 × 30 m ASTER DEM data and were divided into 18,346 slopes in total, including the minimum 900 m 2 and the maximum 1.86 km 2 . e result is shown in Figure 4.

Certainty Factors.
Certainty factors (CFs) were first proposed by Shortliffe [58,59] and improved by Heckerman [60], which is a probability function. It is used for the certainty degree of an event under specific conditions and also can be used for the analysis of the landslide conditioning factors susceptibility [61]. e function expression is as follows:  where PPa is the conditional probability, which is expressed as the ratio of landslide area to the total area in classification a, and PPs is the prior probability, which is expressed as the ratio of the total landslide area to the total area of the study area. e values of CF range −1 to 1; the positive values represent high certainty of landslide occurrence, which indicate that the landslides are prone to occur, while the negative values represent low certainty, which indicate that the landslide susceptibility is decreasing.

Support Vector
Machine. Support vector machine (SVM) is a new machine learning algorithm that was first proposed by Vapnik [62,63]. SVM derives from statistical learning method based on the principle of structural risk minimization and the structural risk minimization principle. It is especially suitable for processing of small sample datasets and aims to construct classification hyperplane for separating different data [64,65]. e core idea of SVM can be summarized as follows: first, the input vector is mapped to a high-dimensional feature space by some preselected nonlinear mapping (kernel function), and then, the optimal classification hyperplane is found in the feature space so that the two types of data points can be correctly separated as much as possible; the classification interval is maximized at the same time ( Figure 5(a)).
For example, given a training set of instance-label pairs (x i , y i ), i � 1, 2, ··, n, where x ∈ R n is an input vector that includes landslide conditioning factors and y i {+1, -1} is the output classes that represent landslide and nonlandslide and is the number of training sample [66]. It can be expressed as follows:

Mathematical Problems in Engineering
Here, w is the coefficient vector that determines the orientation of the hyperplane in the feature space and b is the offset of the hyperplane from the origin. Introducing the Lagrange multiplier λ i , the cost function can be defined as follows: For the application of nonseparable, introduce slack variables ζ i and penalty parameter C, in which the former describes classification of interval errors and the latter adjusts the limit of a sample data misclassification. Hence, the equation can be modified as Besides, the commonly used kernel functions are linear function (LN), polynomial function (PL), sigmoid function (SIG), and radial basis function (RBF) at present. Among the above four kinds of kernel functions, RBF has strong nonlinear mapping ability and was widely used in landslide susceptibility mapping. In this study, RBF is also used to analyze the decision function of the optimal hyperplane.

Particle Swarm Optimization.
Particle swarm optimization (PSO) is an intelligent evolutionary algorithm derived from complex adaptive systems (CASs). It was first proposed by Kennedy and Eberhart [67] and originated from the foraging behavior of birds. In a PSO, each solution of the optimization problem is regarded as a particle in the search space. Each particle is adjusted according to the fitness values of themselves and the swarm. And the iteration and optimization are not terminated until all the particles converge to optimal solution [68][69][70][71] (Figure 5(b)). Hence, particles are optimized by constantly updating their speed and position, in which process can be expressed as where i � 1, 2, ··, m, m is the total number of particles in the current optimization problem, n is the number of iterations, w is the inertia weight, c 1 and c 2 are learning factors, r 1 and r 2 are two random numbers between 0 and 1, and p i n and p i g are the optimal position of the ith particle and the current position of all particles at the n th iteration cycle, respectively. V n+1 i and x n+1 i are the updated velocity and position of the ith particle at the (n+1)th iteration, respectively.
In the study of landslide susceptibility mapping, there is a problem that it is necessary to find the optimal parameters when RBF is used as the kernel function of SVM model. PSO can be applied to seek the optimal parameters of SVM model, which is the penalty factor C and the kernel parameter c. erefore, in order to improve the performance of the SVM model, the PSO is built coupled with SVM (PSO-SVM) model in this study.

Results
e results of this study consist of four parts, which are as follows: (1) Analysis of the susceptibility between conditioning factors and landslide by CF method. (2) Screening of landslide conditioning factors using the correlation. (3) Application of SVM and PSO-SVM models based on grid and slope units. (4) Validate and compare the performance of above models using ROC curves.

Landslide Conditioning Factors
Susceptibility. e susceptibility between landslide and conditioning factor classification is calculated by using statistical method, certainty factor (CF) model, and GIS technology. e conditioning factors may be either categorical or numerical. Categorical variables were generally classified according to the heuristic classification of the related thematic information. For the numerical variables, the variables were classified by using the equal intervals or natural breaks methods. Table 2 shows the classes, percentage of landslide, percentage of domain, and CF values of each conditioning factor. As for five classes of altitude, the highest CF value is 559-931 m because of dense population. In slope angle, the largest CF value is 0-20°. For slope aspect, the largest CF value is south and the smallest is plan. Among the three categories of plan curvature and profile curvature, the largest CF values are -0.31-0.18 and -0.40-0.13, respectively. For SPI, the highest value class is 3000-4000. For TPI, the largest and smallest values are -8.14∼-2.52 and 8.56-66.27, respectively. In TRI, the highest value is the classes of 0-5.50. About lithology, soft rocks have a higher CF value. ere is a negative correlation between the distance to faults, rivers, roads, the number of landslides, and CF values. As the distance increases, the landslides and CF values gradually decrease. About TWI, the positive CF values are the classes of 1.12-6.19 and 11.14-13.92. In terms of NDVI, the highest CF value is greater than 0.21. According to the land use classification result, it indicates the landslides prone to residential area. e relationship between landslide and rainfall in this area is positive, which reflects that rainfall is a triggering factor.

Screening of Landslide Conditioning Factors.
If there is a strong correlation between some environmental factors, it will lead to decrease the running speed of the model and overfit the assessment result. Hence, it is very necessary to examine landslide conditioning factors for selecting. In this study, the Pearson correlation method is considered to use and the result is shown in Table 3. From the results, it can be seen that the correlation coefficient of between slope angle and TRI, plan curvature and TPI, profile curvature and TPI, and SPI and TWI are 0.985, 0.799, −0.819, and −0.613, respectively, showing high correlation. Consequently, the conditioning factors of TPI, TRI, and TWI are removed and use the other factors to build the model.

Landslide Susceptibility Mapping.
e equal number of landslide and nonlandslide is randomly divided into training dataset and validation dataset, accounting for 70% and 30%, respectively. According to the above data, the SVM and PSO-SVM models are built with RBF kernel function. In addition, it is very decisive to seek the best kernel parameter (C) and penalty parameter (c). SVM model obtained the optimal parameters through a grid-search method. PSO-SVM model obtains optimal parameters based on an intelligent optimization algorithm. e initial values of PSO algorithm are the total number of particles m � 50, the number of iterations n � 200, and learning factors c 1 � c 2 � 1.5. Landslide susceptibility index (LSI) is computed by the models, and it is positively correlated between LSI values and probability of landslide occurrence. Finally, landslide susceptibility mapping was produced by SVM and PSO-SVM models based on grid and slope units. e landslide susceptibility index (LSI) for all models ranges from 0 to 1. In research of regional landslide susceptibility map (LSM), the natural breaks classification is usually used to classify. is method makes intraclass variance smallest and class-class variance largest. According to this method, the four maps were classified into five categories, namely, very low, low, moderate, high, very high, and respective ( Figure 6). e proportion of areas with very low, low, moderate, high, and very high is 11.58%, 9.97%, 26

Validation and Comparison.
e predictive capability of landslide susceptibility assessment result directly and indirectly affected the prevention and control of landslide disasters in this study area. In order to evaluate the performance of the landslide susceptibility model, a receiver operation characteristic (ROC) curve was introduced to analyze the accuracy. ROC curve defines sensitivity as Y-axis and 1-specificity as X-axis. Area under curve (AUC) is a key indicator to measure the accuracy of the ROC, and the value range is between 0 and 1 [72,73]. e calculation formula is as follows: where P (positive) and N (negative) are the total number of landslides and nonlandslides. In this study area, respectively, TP (true positive) and FP (false positive) denote the number of landslides and erefore, the confusion matrix of training dataset is shown in Table 4, and it was calculated to evaluate the performance of landslide susceptibility models. According to the results, the highest value and the lowest value of accuracy are PSO-SVM model based on slope units (95.00%) and SVM model based on grid units (84.23%).
e highest values of PPV and NPV are also PSO-SVM model based on slope units (96.15% and 93.85%). Similarly, the confusion matrix of validation dataset is computed in Table 5.  (Figure 7(b)). e results of prediction capability indicated that PSO-SVM model and slope units are higher than SVM model and grid units, respectively.

Discussion
A large number of machine learning and data mining algorithms have been applied to the regional-scale landslide susceptibility modeling, which solves the nonlinear relationship between landslides and conditioning factors. e previous research studies have indicated that methods and techniques improved; however, the prediction performance is still challengeable. In this study, the landslide susceptibility maps were produced through SVM and PSO-SVM models coupled with grid units and slope units in mountainous Lueyang County, China. e prediction accuracy of landslide susceptibility assessment is influenced by methodological model, mapping unit, and landslide conditioning factors. Machine learning model SVM can transform nonlinear data to high-dimensional space to seek the optimal classification hyperplane. e two key parameters of SVM can directly determine the model fit and performance. SVM model obtained the parameters by grid-search method, which results in time and memory consuming. PSO, as an evolutionary algorithm, can optimize the parameters and improve robustness. e accuracy of PSO-SVM model is 5% higher on average than SVM model. On the other hand, it is easier to obtain, sample and calculate in GIS for grid unit, while not closely related to topographic environment. Slope units were independent watershed area generated by DEM and Reverse DEM data. e advantages and limitations of grid units and slope units were presented during     e conditioning factors through multisource were selected from spatial multisource data, including geomorphological, geological, hydrological, surface cover index, geophysical, and meteorological factors. e CF method is applied to explore the susceptibility between conditioning factors and landslides.
e results clearly demonstrate that the residential areas, altitude, distance to roads, rivers, and faults have positive effect on landslide occurrence.

Conclusion
e landslide susceptibility mapping is the preliminary preparation for landslide forecasting and warning. erefore, it is very important for landslide prevention, prone area management, and land use planning. In the present study, the machine learning (SVM) model and intelligent evolutionary optimization algorithm (PSO-SVM) model were applied for landslide susceptibility mapping in Lueyang County, Shaanxi Province, China. e grid units and slope units were considered as computing units for analysis and comparison. A total of 16 landslide conditioning factors, including altitude, slope angle, slope aspect, plan curvature, profile curvature, SPI, TPI, TRI, lithology, distance to faults, TWI, distance to rivers, NDVI, distance to roads, land use, and rainfall, were selected to build the model. e susceptibility between landslides and conditioning factors was calculated by CF method and removes the obvious relation factors of TPI, TRI, and TWI. ROC curve was introduced to  evaluate the performance of two models and two mapping units. e results show that PSO-SVM model based on slope units presented a higher accuracy in landslide susceptibility mapping than the other three models (SVM model based on grid units, SVM model based on slope units, and PSO-SVM model based on grid units). e usage of PSO in order to seek the optimal parameters and slope units enhances the relationship with natural terrain and geological environment. Nevertheless, the classification of landslide conditioning factors was mainly based on the natural break method and might be not appropriate in this study. In future studies, the effect of different classification method should be explored for landslide susceptibility assessment. Conclusively, the PSO-SVM model based on slope units provides a useful tool for landslide susceptibility mapping and can be extended to other mountainous regions and mitigates landslide hazard.

Data Availability
e data used to support the findings of this study are inclued within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.