GIS-Based Landslide Susceptibility Mapping Using Information, Frequency Ratio, and Artificial Neural Network Methods in Qinghai Province, Northwestern China

Landslides are one of the nature hazards causing a lot of casualties and property losses in the world. Over the last decades, many researchers have made contributions in landslide susceptibility maps using qualitative and quantitative methods. Parameters of DEM, geology, etc. are selected to analyze the mechanism of landslides. The quality of data is essential in the landslide studies, and more credible results can be obtained if the data is adequate and accurate from the wide range of parameters. The aim of this study is to evaluate the landslide susceptibility of Huangyuan County of Qinghai. Through ﬁeld investigations, 100 landslide disaster locations in the study area were selected, and 11 inﬂuencing factors including elevation, slope, aspect, plane curvature, proﬁle curvature, road distance, river distance, fault distance, stratum rock property, vegetation coverage index, and terrain humidity index were selected as the inﬂuencing factors of landslide disaster based on GIS. In this paper, the information method (IM) model, frequency ratio (FR) model, and artiﬁcial neural network (ANN) model are used to evaluate the susceptibility of geological hazards, and the receiver operating characteristic (ROC) curve of disaster points at diﬀerent levels is used to test the evaluation accuracy of three models. The results show that factors that have great inﬂuence on landslides are associated with witness, and the terrain humidity index has the highest weight in the occurrences of landslide. The values of AUC indicate that the ANN model is the best evaluation model suitable for the study area and can be extremely useful for landslide hazard mitigation strategies. Based on the calculation of ANN model, three valley areas are determined with high landslide susceptibility, and necessary rein-forcement measures should be taken.


Introduction
Landslide is a kind of natural hazard in the mountainous regions, threatening human life and property [1][2][3][4]. Over the past decades, the huge catastrophe ability of landslides attracted many researchers to devote themselves to assessing landslide susceptibilities [5][6][7]. From 2014 to 2020, 47614 geological disasters occurred in China, including 33659 landslides, accounting for 68.27% of the total number of disasters. In the Qinghai region, large loess areas existed, where there are mountainous topographical features and have a high incidence of severe landslides. e occurrences of landslides are extremely complicated and affected by many factors such as geologic structure, lithological association, topography, rainfall, earthquake, and human activity [8][9][10][11][12]. Based on these factors, various assessing methods have been proposed to analyze the landslide susceptibility, which can be divided into two categories: qualitative analysis and quantitative analysis [13][14][15][16][17][18]. e qualitative method mainly relies on the judgment of experts, which is can be seen as a kind of subjective method [19][20][21][22][23][24]. In the process of qualitative analysis, the spatial distribution of unstable slopes is based on experts' understanding of the relationship between the occurrence of landslides and the assumed antecedent factors, directly determined by the existing landslides or potentially unstable areas. Quantitative analysis, an objective evaluation, is a kind of numerical estimation, such as calculating the probability of landslides. is analysis method is to estimate the potentially unstable area by using the inducing factors related to landslide occurrence. e quantitative methods mainly include deterministic and statistical ones [25]. A deterministic method is a mathematical model based on the physical and mechanical mechanism to control the slope failure. e most common method is to combine various hydrological models and slope stability evaluation methods to obtain the safety factor of corresponding units. It has the highest accuracy in these models. On the other hand, a statistical model is to analyze the relationship between the landslide catalogue map and factors affecting landslide occurrence and then get the spatial possibility of landslide occurrence, which belongs to the indirect quantitative evaluation method.
is method is more practical for landslide evaluation at the mesoscale and is the most widely used method in landslide sensitivity evaluation at present. Each evaluation model has been proved to have different advantages and disadvantages, some of which may be more suitable for solving specific problems in specific areas or implementing specific projects [2].
Although a large number of models and methods have been proposed to produce landslide susceptibility maps (LSM) using geographic information systems (GIS) [13][14][15][16][17], a consensus has not been established regarding which methods are the most suitable, because qualitative techniques can be limited by unconsidered phenomena or incomplete knowledge that the expert decisions are based upon. On the other hand, quantitative methods suffer from inaccurate or low-precision data. Until July 2012, 171 geological disaster points have been investigated. Among them, landslide and debris flow account for 81% [26]. e disasters damaged 20 housed, killed 54 people, and injured thousands of people. Economic losses caused by these events are estimated to be around 70 million RMB (about 10 million U.S. dollars). erefore, it is necessary to assess and manage areas that are susceptible to landslides and to mitigate any risk associated with them.
Our study aims to find a more suitable landslide susceptibility model for Huangyuan City, Qinghai Province, China. Firstly, a total of 100 landslides were mapped in the study area based on a geological hazard survey (1 : 50,000) of the Qinghai region. en, according to the geological data, field survey, and landslide information, eleven influencing factors, namely, elevation, slope, aspect, plane curvature, profile curvature, road distance, river distance, fault distance, stratum rock property, vegetation coverage index, and terrain humidity index, were selected for landslide susceptibility mapping. en, the information method (IM) model, frequency ratio (FR) model, and artificial neural network (ANN) model were adopted to establish landslide susceptibility models. Finally, the receiver operating characteristic (ROC) curve was used to validate and compare the prediction abilities of the landslide susceptibility models and select the optimal one. e results of this study can be extremely useful for landslide hazard mitigation strategies in the Qinghai region.

Study Area
e study area is located in the southeast of Qinghai Province, which is the transition zone between the first and second terrain steps in China (Figure 1), and also the marginal zone of the uplift of the Qinghai Tibet Plateau. e unique geographical location creates unique geological environment conditions, which provides a good disaster pregnant environment for the development of landslide disasters in the study area. e whole study area covers an area of 1545 km 2 , with an altitude of 2470 m-4484 m, the maximum vertical elevation difference of 2014 m, and the terrain inclines from north, West, and south to East. According to the geomorphic types, the aiming area can be divided into four units: tectonic erosion high mountain area, tectonic erosion middle mountain area, tectonic erosion low mountain and hilly area, and valley belt plain area. ere are 86 large and small rivers, which belong to the Yellow River system. e exposed pre-Quaternary strata are Proterozoic, Triassic, Cretaceous, and Paleogene. Tectonically, the study area is located in the first-order tectonics of the Qilian geosynclinal fold system, crossing two second-order tectonic belts of the middle Qilian geosyncline and South Qilian geosyncline. e fault structures are developed, and the folds are mostly in the form of compound structures. e tectonic movements of each stage are reflected in varying degrees. e main faults are distributed along the NW, NW, nearly EW, NE. e tectonic line is NWW. Affected by various external forces, geological disasters easily occur, which is one of the most developed areas in Huangshui River Basin. e climate of the study area belongs to continental semiarid climate, the rainfall is concentrated in May to September every year, and the average annual precipitation is about 405.5 mm. According to statistics, many fatal landslides occurred in this period.
Data of this study includes 100 landslide collapse points, which are provided by Qinghai geological environment monitoring station. rough field investigation, this study makes an in-depth study on the landslide. e study shows that almost all landslides and most soil collapses occur in the loess layer, and the bedrock is the basement stratum of the whole area, which constitutes the sliding bed of the Loess bedrock interface landslide. e losses slopes are a main type in the study, as shown in Figure 2

Landslide Inventory Mapping.
e information of the existing landslide distribution is essential for the   Advances in Civil Engineering identification of likelihood of landslides [27]. In this study, landslide locations were determined from the analysis of Google Earth images, the historical records of which are the landslide inventory maps of the Department of Natural Resources of Qinghai Province, and from field investigation ( Table 1). e total landslides mapped are 100, which are translated to 572 pixels of 50-meter landslide cell size. e landslide inventories were divided into training data (80% of total landslide cases, 456 pixels) and validation data (20% of total landslide cases, 116 pixels).

Influencing
Factors. e occurrences of landslide are comprehensive effects of influencing factors. rough the analysis of literature and data, combined with field investigation, this paper selects 11 landslide-prone factors for modeling, which are elevation, slope angle, slope aspect, plane curvature, profile curvature, distance to road, distance to river, distance to fault, stratum lithology, vegetation coverage index, and terrain humidity index, as shown in Figure 4. Elevation has connection with landslide occurrence [29], especially in plateau area. Slope angle can affect the slope stability, and the slope aspect controls the hours of sunshine and the effects of rainfall, moisture, and wind conditions over the study area [30]. Plane curvature and profile curvature are morphological factors, which control the water flow on Earth surface affecting landslide occurrences [31]. e traffic and road construction can produce vibration, which is an inducement of landslide, so the distance to road is also an important influencing factor [29]. e erosion by the river to the bank can reduce the strength of the soil and make the slope less stable, which is a positive correlation with the distance to river [30]. e existence of fault makes the rock fragmented and increase the probability of slope instability [30,32]. Stratum lithology is the material basis of landslides, which is also an influencing factor of slope stability [32]. Vegetation coverage index and terrain humidity index are important environmental factors and are associated with the structure of soil, which are frequently used in mapping the landslide susceptibility [30]. e Jenks natural breaks method is a one-dimensional clustering algorithm. is method considers that there are discontinuities in the data itself, so that the variance of different categories is the largest, and the variance of the same category is the smallest to optimize the classification. e Jenks natural breaks method can be used to classify the landslide impact factors, which can well characterize the distribution of the impact factors. erefore, the Jenks natural breaks method was applied in the process of classification [29].   Validation and selection of the optimal models

Slope Aspect.
e surface receives different solar radiation for different aspect of slope, so it will affect the degree of vegetation coverage, surface weathering, and surface evaporation, thus affecting the occurrence of landslides. e DEM data can be divided into 9 types according to the slope aspect: 0°to 40°, 40°to 80°and 80°-120°, 120-160°, 160-200°, 200-240°, 240-280°, 280-320°, and 320-360°. Under the slope aspect background, the landslides in the study area mainly occur between 160°and 200°of the slope aspect, and the landslides with the slope aspect of 160°-200°account for the largest proportion for 21%.

Plan Curvature.
e plane curvature reflects the terrain. Positive value means that the terrain surface is convex, negative value means that the terrain surface is concave, and 0 means that the ground is flat. According to the plane curvature, DEM data can be divided into nine categories:

Profile Curvature.
e curvature of the profile reflects the change rate of the ground slope. A positive value indicates that the terrain surface is convex, a negative value indicates that the terrain surface is concave, and 0 indicates that the ground is flat. e DEM data are classified into 9 types according to the slope curvature by the Jenks natural breaks method: 4.2.6. Distance to Road. When people build roads, they will carry out a series of activities, such as manual excavation of slope toe and blasting. ese activities will change the original rock and soil structure, affect the stability of rock and soil, and leave hidden dangers for the occurrence of landslide disasters. According to the Jenks natural breaks method, it can be divided into 9

Distance to River.
e occurrence of landslide disasters is closely related to surface water. e river constantly erodes and hollows out the slope toe, which will lead to slope instability. Based on the Euclidean distance calculation of the river data in the study area by ArcGIS, the river data can be

Stratigraphic Lithology.
Stratigraphic lithology is the internal control factor that affects the occurrence of landslide geological disasters, and its type and composition affect slope stability. ere are 5 types of lithology in the study area: hard thick medium thick-bedded metamorphic rock group (Pt1), hard block intrusive rock group (Pt2), single structure Aeolian loess group (Q1), double structure alluvial proluvial sand and sand gravel pebble soil group (Q2), Advances in Civil Engineering multilayer structure clay, argillaceous gravel pebble, and broken stone soil group (Q3). According to ArcGIS data reclassification, under the background of stratum lithology, the landslide disaster in the study area is mainly in the single structure Aeolian loess, accounting for 34%.

4.2.10.
Normalized Difference Vegetation Index. Vegetation can play a role in soil and water conservation and slope protection, which has a certain impact on the occurrence of landslide geological disasters. Normalized Difference Vegetation Index (NDVI) is used to represent the state of plant growth. e negative value indicates that the ground is covered by clouds or snow, 0 indicates that the ground is covered by bedrock or bare soil, and the positive value indicates that the ground is covered by vegetation. Band image data are obtained from the geospatial data of Huangyuan county: Landsat 8 LC 8132035201 6028lGN00.
e image data was generated on January 28, 2016, and the cloud cover was 1.89, which can observe the vegetation coverage in the study area. NDVI was calculated by envi5.

Variance Inflation Factor (VIF).
e variance inflation factor is a method to judge multicollinearity by examining the degree to which a given explanatory variable is explained by all other explanatory variables in the equation. Any influencing factor with a VIF value of greater than 10 should be excluded from the landslide susceptibility model. e VIFs in Table 1 show that the values of the influence factors are all below 10, so no factor needed to be excluded from the landslide susceptibility model.

Information Model.
e information method (IM) [16,19] is a kind of bivariate statistical analysis method. By analyzing the actual situation and the information provided in the deformed or occurring geological disaster areas, it studies the quantity and quality of the information that has an impact on their stability and quantifies the degree of its impact through the information.
e information is calculated as follows: where N i is the total number of geological hazards in the class i evaluation factor of the study area; N is the total number of units with geological hazards in the study area; S i is the number of units with class i evaluation factors in the study area; S is the total number of evaluation factor units in the study area; I i is the total information value of the study area; n is the number of evaluation factors. e landslide susceptibility index (LSI) can be calculated as follows: LSI ICM � IM elevation + IM slope angle + IM slope aspect + IM plane curvature + IM profile curvature + IM distance to road + IM distance toriver + IM distance to fault + IM stratum lithology + IM vegetarian cover ageindex + IM terrain humidity index , where IM indicates the influencing factor maps that have been reclassified as per their information content values.

Frequency Ratio Method.
e frequency ratio method (FR) [31,33] is a more traditional statistical analysis model. A landslide (M) is affected by many factors (n) (such as elevation, aspect, and lithology), so the frequency ratio method divides M into n classes or n grades according to certain rules. e simplified calculation formula is as follows: where FR ij is the frequency ratio of ith factor, jth subfactor; HA ij is landslide area of ith factor, jth subfactor; XA ij is the area of ith factor, jth subfactor; LSI FR is the Landslide sensitivity index.

Artificial Neural Network.
e artificial neural network model (ANN) [31,34] consists of the input layer, one or more hidden layers and an output layer with the different number of neurons. e adjacent layers are fully connected, and each connection is assigned weight. According to Kolmogorov theorem [35], an artificial neural network model with a hidden layer can simulate any nonlinear mapping from n-dimension to m-dimension on a closed set with any precision. erefore, this study adopts a three-layer network structure, including an input layer, output layer, and a hidden layer. In the input layer, the number of neurons is the number of landslide risk factors; in the output layer, the softmax function is used to output two nodes, which represent the landslide-prone and not prone. e landslideprone is marked as (1, 0), and the not prone is marked as (0, 1); in the hidden layer, the tanh function is used. e model structure used in this paper is shown in Figure 5. In the process of calculation, each neuron receives the output data of the upper layer neuron, processes the data according to the corresponding connection weight, and outputs the calculation results to the next layer neuron. e learning process of the artificial neural network model is the process of continuously adjusting the network parameters. BP artificial neural network model, based on the cumulative error between the real value and the output value, adopts a gradient descent algorithm to optimize the parameters. e determination process of disaster areas can be divided into three steps. Firstly, the nonlandslide units and landslide units selected from the information partition are used as the test set and training set, 80% of which are trained, and the remaining 20% are used as the test set. en, 1826315 grids of Huangyuan county are input into the trained neural network model to get the grid probability. Finally, the Jenks natural breaks method is used to reclassify in probability.

IM Method.
Based on the information model (IM), the IM values of the influencing factors were calculated, and the results are presented in Table 2. If the value of IM is bigger than 0, it means that the tendency of the landslide is high; on the other hand, it means that the tendency is low. e IM values of distance to a river, road, and fault show that too small distance can also cause landslides. Especially, the vibration from passing vehicles on the road, the erosion of the river, and the weakness of fault are the main influencing factors. e values in Table 3 show that the IM of classes Q1 and Q2 is 1.395 and 1.322, respectively. It indicates that Q1 and Q2 have a high probability of landslide occurrences in this region. e lithology of Q1 is mainly loess, and the lithology of Q2 is mainly argillaceous gravel pebble and broken stone soil. e soil of Q1 and Q2 is extremely weak of the property and easy to lose stability. us, landslides are easy to occur in these two strata. e vegetation can effectively reduce the landslide, but the wetness of soil can promote the landslide based on the IM of Normalized Difference, Vegetation Index, and Topographic Wetness Index. e landslide susceptibility map produced by the IM method is shown in Figure 6(a). Table 2. e FR values in Table 2  e landslide susceptibility maps were reclassified using Jenks natural breaks method. e outcome was an interpretable map showing increasing spatial possibility of future landslide incidence ranging from very low to very high susceptibility to a landslide (Figure 6(b)).

ANN Method.
e ANN method was performed with a mean square error of 0.02 in the training process. e landslide susceptibility value can be obtained by  Advances in Civil Engineering 9 where fw i is the weight of each factor, and w ij is the normalized weight for the category of the factor i. e values of importance and normalized importance are presented in Table 4. It can be seen that the important values of TWI, slope aspect, and NDVI are 0.16, 0.14, and 0.12, respectively. e relevant normalized importance values are 100%, 87.5%, and 75%，respectively. e values of elevation, profile curvature, and plane curvature are much smaller, 0.06, 0.05, and 0.03, respectively. e results show that the factors associated with the wetness are the main causes of the landslide. e loess soil is the main component in the study area, which is extremely easy to lose stability under the influence of water.

Validation.
Based on the grid data map of susceptibility index obtained from the three models, the random point tool  is created through ArcGIS 10.2, and 100 nonlandslide points with the same number of geological hazard points to be studied are randomly created and imported into SPSS software. e accuracy of the results of the above three models is evaluated through the ROC curve of the analysis tool in the software. e AUC value can be obtained by equation (5) [36]. e results are shown in Figure 7.
where coefficients a and b represent the dependence of the test accuracy on threshold; x is the value of ROC. It can be seen from Figure 7 that the ROC curve verification results of three different evaluation models are above the reference line, which is in line with the feasibility expectation of ROC curve verification. e accuracy of the ANN model is the highest, and AUC is 0.907; the accuracy of FR model is the lowest, and AUC is 0.867. In this respect, the ANN model also has the highest predictive power. In terms of methods, it can be seen that the ANN model has a higher prediction ability than the IM model and FR model.

Discussion.
e landslide susceptibilities were calculated by three methods based on the GIS engine, and the areas and percentages distribution of the susceptibility classes were produced. e susceptibility classes were divided into five classes, very low, low, medium, high, and very high risk. To compare the models in detail, the statistical results of the landslide susceptibility maps produced in this study are also  According to the landslide susceptibility map produced by the IM method, the very low and low landslide susceptibility areas are 17.24% and 32.69% of the total study area, respectively. Moderate, high and very high susceptible areas account for 27%, 15.19%, and 7.88% of the total area. e landslide susceptibility generated by the FR method, which contains 31.3% of the total area, is determined to be very low landslide susceptibility. Low, moderate, and high areas make up 35.14%, 20.54%, and 8.63%, respectively. e very high landslide susceptibility area is 4.39% of the total study area. e landslide susceptibility map, produced through the ANN method, has similar results with IM method, but different with FR method. e very low, low, and moderate area are denoted at values of 14.32%, 30.54%, and 29.36% of the total study area, respectively. e high and very high susceptibility areas are 18.36% and 7.45% of the total area, respectively. e number of landslides divided into the high and very high classes characterizes the accuracy of the model. In this respect, the ANN method has the highest predictive ability.   According to the above analysis, the ANN model is the optimal model. us, the ANN model was adopted for landslide susceptibility map analysis in this study. In Figure 6, it can be seen that the very high susceptibility area is mainly distributed along the rivers and valleys.
is is consistent with the landslide distribution along the river and valley found in our field investigation. It also can be seen that the very high susceptibility area is mainly distributed in three regions: (a) Yuandong-Dahua; (2) Dongga-Shangshigou; (3) Dongxia-Yuanju. In the study area, collapses and landslides are distributed on both sides of the valley or both sides of the river, and their disaster-causing effect and distribution density are closely related to the erosion and cutting of the valley and river. Generally, in the source and upstream of the gully, vertical erosion is the main factor, and the landslides and collapses occur frequently on both sides of the gully. In the middle and lower reaches of the valley, side erosion is the main cause. e unloading and weathering of the valley slopes on both sides of the valley are strong, and the slope on one side of the river erosion bank is prone to collapse and landslide.

Conclusion
In this study, landslide susceptibility for losses slopes in Huangyuan city is proposed. According to geological data, field survey, and landslides information, eleven influencing factors, which include elevation, slope, aspect, plane curvature, profile curvature, road distance, river distance, fault distance, stratum rock property, vegetation coverage index, and terrain humidity index, were selected for the landslide susceptibility of Huangyuan, Qinghai. e IM, FR, and ANN methods were adopted to establish the landslide susceptibility model, which is validated by the ROC curve.
rough the analysis of the 11 factors by three methods, every factor has a specific subclass with high landslide susceptibility. Based on the results of ANN method, the landslides in this study area are mainly influenced by factors associated with the wetness. erefore, the specific subclass of every factor is conducive to the accumulation of water, which leads to the occurrence of landslide. e ANN model is the most optimal model, with an AUC value of 0.907, followed by the IM model (0.900) and FR model (0.867) in the study area. e high and very high landslide susceptibility classes make up 84%, 84%, and 95% of the total landslide count in the IM model, FR model, and ANN model, which also indicates that the ANN model is better than the others. From the landslide susceptibility map produced by the ANN model, the very high and high susceptibility areas are mainly distributed in valley area. erefore, it is necessary to prevent and control landslide disasters in these areas by applying measures, such as slope protection, retention, and anchoring.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.