Landslide Susceptibility Assessment Using Frequency Ratio Technique with Iterative Random Sampling

1Geo-Environmental Hazards&QuaternaryGeology Research Center, Korea Institute of Geoscience andMineral Resources (KIGAM), 124 Gwahang-no, Yuseong-gu, Daejeon 305-350, Republic of Korea 2Geological Research Center, Korea Institute of Geoscience and Mineral Resources (KIGAM), 124 Gwahang-no, Yuseong-gu, Daejeon 305-350, Republic of Korea 3Korea University of Science and Technology, 217 Gajeong-ro, Yuseong-gu, Daejeon 305-350, Republic of Korea 4Department of English Language and Literature, University of Seoul, 163 Seoulsiripdaero, Dongdaemun-gu, Seoul 02504, Republic of Korea


Introduction
Damage caused by natural hazards has been increasing due largely to residential expansion and population growth in many countries in the world [1][2][3].Thus, government employees and researchers in those countries have been believed to be vulnerable to natural disasters and hence have studied hazard warning systems and hazard risk analyses in order to prevent the loss of lives and minimize the damage to property.In case of landslide hazards, high-risk landslide areas were identified and detected in the 1980s, and the landslide susceptibility analysis has been applied for the purpose of assessing the degree of risk in landslide-prone areas.
Most studies on landslide susceptibility modeling (1) separated the training and validation data from given landslide location data through the random sampling approach, (2) created the landslide susceptibility map using the selected training data, and (3) calculated the accuracy of the landslide susceptibility map using the validation data.In those studies, the modeling procedure was applied just once and thus had inevitably fundamental weaknesses.That is, the outcomes obtained from the one-time modeling were most likely to be biased.For this reason, it is logically necessary to derive the landslide susceptibility maps through a well-established statistical approach via an iterative random sampling.In essence, this paper assessed the performance of the landslide susceptibility analysis by way of frequency ratio (FR) with an iterative random sampling.The FR model has a key advantage that it is able to attain the rank of the causative factors with respect to a landslide occurrence as well as determine whether a given range of causative factor values will be threatening in the case of a landslide occurrence or not [27].First, a pair of before-and-after digital aerial photographs with 50 cm spatial resolution was used to detect the occurrence of landslides in Yongin area, Korea.
Then, a training and a validation dataset were iteratively random-sampled half and half from the detected landslide occurrences ten times.In each iteration, a landslide susceptibility map was generated from the training data by using the FR method and was validated from the validation data (Figure 1).Indeed, running FR modeling ten times enabled us to properly understand the regularized relationship between the causative factors and landslide susceptibility.

Mapping Landslides Using Aerial Photos
Mapping landslide occurrence areas is the first and foremost step to assess landslide susceptibility [28,29].One of the most common methods for landslide mapping is either a field survey or visual interpretation of stereoscopic aerial photography.However, these methods are not only time-consuming but also resource-intensive.What is more, they evidently require data for the extensive areas.So, the development of state-of-the-art remote sensing techniques is especially useful to detect landslide occurrence areas from high-resolution images [28,[30][31][32][33][34].Most of all, a high-resolution digital aerial photograph can be used to produce a high-quality landslide inventory map through the visual interpretation of the aerial photographs with a field survey.This method has been employed as a principal source for landslide inventory map preparation [35][36][37].
Rainfall-triggered debris flows and shallow soil slide were both caused by intensive rainfall at Yongin area, Korea, in July 2011 (Figure 2).Landslide mappings of the study area were impossible with the field survey.Instead, high-resolution digital aerial photos were used for landslide mappings.In Korea, it is easy to download time-series digital aerial photos with 50 cm spatial resolution from a web portal (http://map.daum.net).The relevant photos were obtained biennially in 2008 and sky maps in 2012 were downloaded (http://www.skymaps.co.kr).The photos were taken from UltraCam X sensor (Microsoft, Graz, Austria) by Samah Aerial Survey Company (http://www.samah.com).The details of the UltraCam X instrument can be found in Lee et al. [38] and Gruber and Schneider [39].
For this research, the high-resolution photos were collected from Skymaps (http://www.skymaps.co.kr) and Daum (http://map.daum.net)portal sites.Because landslides occurred during the rainy season of 2011, both the preslide images in 2010 from Daum and the postslide photos taken in the same season in 2012 and 2014 from Skymaps were used.The photos were selected from each region of landslide occurrences.So, a total of eighteen photos were used and five ground control points (GCPs) were applied to each photo.
The photos were rectified via GCP from digital topographic features using ArcMap.The separation of landslides from tombs was a particularly demanding task in the study area.Thus, landslide locations were accurately detected by visual interpretation of the aerial photographs taken before and after landslide occurrences and then were checked in field investigations (Figures 3 and 4).Figures 3(a

Construction of Predisposing Factors
Landslides and debris flows are controlled by the interaction of complex factors.Generally, shallow landslides occur during intense rainfalls due to elevated levels of soil moisture and low soil strength [40].Topography affects the shallow landslide initiation through a gradient on the slope stability and concentration of a subsurface flow.Other predisposing factors include soil thickness and strength, rainfall direction, lithology, fracture, and root strength.The preparations of thematic data layers were essential for a landslide susceptibility mapping (enhancing or mitigating) [41].Although the above factors were important controls, in practice, spatial distribution of those factors was difficult to determine.However, digital topographic, soil, forest, and geological maps were obtainable.Accordingly, those attainable maps were collected for this work.The scale of all those maps was 1 : 25,000 except for soil and forest maps with 1 : 5,000 (Table 1).And landslide-related topographic, hydrologic, soil, forest, and geologic factors can be used for the FR modeling.The factors are listed in Table 1 and  The topographic and hydrologic factors were prepared by a digital elevation model (DEM) which was derived from a triangulated irregular network (TIN) surface.The TIN was created from contour lines at 5 m intervals from digital topographic maps using ArcMap 10.2.Topographic and hydrologic factors such as slope gradient, plan curvature, slope aspect, midslope position (MSP), topographic position index (TPI), stream power index (SPI), and topographic wetness index (TWI) were taken into account for landslide susceptibility and calculated from DEM using SAGA GIS modules [42][43][44][45] (Figures 5(a)-5(g)).MSP and TPI with continuous value types were divided into ten equal areas in order to make a quantifiable comparison of a frequency ratio as per each factor per class.However, slope gradients were divided into nine classes, SPI was divided into six classes, and TWI was divided into nine classes since the specific values of these three factors cover more than 10% of the study area (Table 2).
In the landslide susceptibility assessment, it is important to check the multicollinearity of landslide causative factors.The tolerance and variance inflation factor (VIF) are wellestablished indexes for checking the multicollinearity.That a tolerance value is less than 0.1 or a VIF value exceeds 10 indicates a multicollinearity problem (Tien Bui et al., 2011).In this study, the tolerance and VIF values were calculated with the eighty-two landslide occurrences using SPSS software, and the results are shown in Table 2.The results indicate that there is no multicollinearity among the thirteen landslide causative factors.

Methods
The FR, as a leading probability model, is based on the observed spatial relationships between landslide causal factors and landslide occurrences.Consequently, the FR can be used to quantitatively assess the landslide susceptibility.The FR score for each range of factors was calculated (Table 2).More details of the FR score are found in Table 3.And then the landslide susceptibility index (LSI) was quantified by summation of each factor's FR score using the following equation [46]: where LSI is landslide susceptibility index and FR is the FR of each factor range or class.
Here, the FR represents that of the area where landslides occurred in the entire study area.Generally, a value of 1.0 is an average value.When the value is higher than 1.0, it means a higher correlation between landslide occurrences and their causal factors.On the other hand, when the value is lower than 1.0, it indicates a lower correlation between landslide occurrences and their causal factors.

Result
The tolerance and VIF values of thirteen causative factors were calculated to detect the multicollinearity problem.The maximum VIF and minimum tolerance were 8.49 and 0.11, respectively.Therefore, there was no multicollinearity between these factors (Table 2).
Slope gradients, SPI, and TWI with continuous values were divided into nine, six, and nine classes, respectively, since the specific value of these three factors covers more than 10% of the study area (Table 2).
As shown in Table 2, when a class of a factor has the FR value higher than 1.0, it may be assumed that the class is susceptible to a landslide.The FR trend of the slope degree factor showed a clear positive correlation.The classes with 18.45 to 62.53 slope degrees showed FR value almost higher than 1.0.In the case of plan curvature, the concave area with a              2.18 ratio was much more susceptible than the flat and convex area with 0.36 and 1.31 ratios, respectively.The east (E) and southeastern (SE) classes of the aspect factor had higher than 2.05 ratio values at all the estimates.The FR values of the MSP factor were higher than 1.0 in the 1st to the 4th classes.In the SPI factor, FR values higher than 1.0 are found in the 2nd to the 6th classes The FR values of the TWI were higher than 1.0 in the 1st, 2nd, and 5th classes.
The FR values of soil thickness greater than 20 cm and less than 100 cm were above 10.The FR values of the soil material and topography factors were higher than 1.3 in gneiss residuum and hilly areas classes.The FR values of timber diameter and density factors exceeded 1.0 in all areas except for the regions with no available data.In particular, the FR values of very small and small classes of diameter factors were above 5 while those of loose and moderate classes of density factors were above 3.In the lithology factor, the FR values were the highest in the biotite gneiss (PCEbgn) class.
A landslide susceptibility map should be able to effectively predict future landslide areas and it also ought to be validated by incorporating data from new landslide locations as they may occur.Therefore, the results from the landslide susceptibility analysis were confirmed by using the validation data.To approve each landslide susceptibility map, the calculated LSI values of all cells were sorted in a descending order.Next, the ordered cell values were divided into one hundred classes with cumulative 1% intervals.Additionally, the above procedure was adapted for the cells in which landslides had occurred by comparing the 100 classes in the study area.Then, a graph was made to compare the two sets of classifications.
To compare the results quantitatively, the areas under curves (AUCs) were recalculated as the total study area [47,48].The validation results showed 91.81%, 92.15%, 92.24%, 89.70%, 92.04%, 93.21%, 89.48%, 91.43%, 92.33%, and 91.77% for the 10 landslide susceptibility maps, each.The difference between the highest and lowest susceptibility accuracies was as small as 3.73%.The landslide susceptibility maps of the highest and lowest susceptibility accuracies are compared in Figure 6, and the success rate curves of them are shown in Figure 7.This means that the landslide susceptibility accuracy of the FR method is almost higher than 89%.

Conclusion and Discussion
In Korea, shallow landslides and debris flows have been caused by heavy rainfalls every single year.Accordingly, the landslide susceptibility mapping is instrumental in making a decision for the land use planning and urban development in Korea.Most studies for the landslide susceptibility modeling randomly sampled the training and validation data from the landslide occurrence data only once.However, those methods are highly dependent on the only one-time randomsampling step.So, this study performed the FR modeling with the 10 times iterative random sampling in order to properly understand the regularized relationship between the causative factors and the landslide susceptibility and then evaluated the performance of the landslide susceptibility analysis using the iterative FR modeling.The FR model can be used for the landslide susceptibility assessment because the landslide susceptibilities among classes in each landslide causative factor can be compared.It is logically necessary to generate a qualified landslide inventory map in the landslide susceptibility assessment.The accuracy of landslide mappings is closely based on (1) the scale and quality of the aerial photographs or the satellite imagery, (2) the scale and quality of base or thematic maps, and (3) the interpreters' experience and skills.As the qualities of images, maps, and interpreters' skills get bettered, a margin of error for a landslide mapping will get decreased.Fortunately, time-series digital photos with the ground resolution of 50 cm have been provided in Korea every two years since 2008.The photos can be freely downloaded from the Daum and Skymap websites.
Even though landslide experts can detect landslide scars in photos, it is extremely hard to distinguish landslide scars from the tomb and its surrounding areas in the aerial photos due to close similarities.Nonetheless, with the help of visual comparison of before and after landslide photos and digital topographic maps, the landslide scars will be considerably easily distinguished from the tomb and its surrounding areas to improve the accuracy of the landslide scar identification.
Debris flows are typically controlled by the interaction of complex parameters such as intense rainfalls and topographic, hydrologic, soil, forest, and geological factors.However, it is particularly difficult to express the complex parameters in terms of spatial distribution.So, in this study, the landslide-related thirteen factors without multicollinearity among these factors were extracted from DEM, soil, forest, and geological maps in order to replace those complex parameters.
The landslide susceptibility was calculated according to the FR value in a class of each factor.If a class of a factor has the FR value of more than 1.0, the class must be susceptible to a landslide.As the slope degree increases, FR is higher in the slope classes of 18. 45-21.80, 21.81-26.33, and 26.34-62.53.This means that when a slope gets higher, a landslide is more likely to occur.The landslide is more susceptible in the concave and convex areas than in the flat area.On average, 50 percent of landslides occurred in concaves areas, and 30 percent occurred in convex areas.This suggests that concave slopes are more vulnerable to shallow landslides due to running water concentration and the build-up of pore pressure [49].Wind is also taken to be a crucial element which affects rainfall intensity on various slope aspects [50].The landslide is highly susceptible on SE slope aspects.According to the data of Korea Meteorological Administration (KMA, http://web.kma.go.kr), the July 27, 2017, landslide was caused by the torrential rain.The then wind direction was reported to be almost SSE or SE.A most likely landslide initial location can be estimated based on MSP and TPI.MSP values are zero (e.g., midslope location) to one (e.g., valley or ridge with maximum vertical distance).Ridges and valleys have positive and negative TPI, respectively.Therefore, the risk of landslides gets higher in midslope to ridge where MSP value is below 0.47 and TPI value is above 1.0.As the catchment area enlarges and the slope gradient gets steeper, the amount of water influenced by upslope areas and the velocity of water flow will get increased.Accordingly, the SPI increases.The result of FR of SPI factors suggests that the region with the higher SPI has a higher risk of landslides.TWI has been used to describe the effect of topography on the location and size of saturated areas.The lower slope positions have a large catchment area and small slope angle; conversely, the upper slope positions have a small catchment area and high slope angle.The former has higher TWI and the latter has lower TWI.As a consequence, the areas with lower TWI are more prone to landslides in the study area.
Besides topological and hydrologic factors extracted from DEM, other factors also influence the spatial distribution of shallow landslide including soil thickness, soil strength, bedrock, and root strength (David and William, 1994).The landslide susceptibility was determined from FR values in their spatial distribution.Landslide typically occurred in hilly areas with 20-100 cm of soil depth and gneiss residuum.The forest factors related to strength of the root-soil bond restrict occurrences of a slope failure [51].The large and dense timber usually tends to have strong roots and less landslide susceptibility.Consequently, this study confirms that when the FR of the area with the very small and loose timber is above 10, this area is more prone to landslides.
The results indicate a regularized relationship between the causative factors and the landslide susceptibility, which was obtained from the 10 times iterative modeling.Thus, the regularized relationship can be expansively used for a knowledge-driven landslide susceptibility mapping to other study areas.Moreover, it can also be applied for a threshold value of landslide early warning system.
The LSI was calculated by summation of each factor's FR score and then it carried out a landslide susceptibility assessment.The 10 landslide susceptibility maps were generated by the 10 times iterative frequency ratio modeling.The accuracy of the susceptibility maps was each 91.81%, 92.15%, 92.24%, 89.70%, 92.04%, 93.21%, 89.48%, 91.43%, 92.33%, and 91.77% from the 1st to the 10th iteration.The difference between the highest and their lowest accuracy was as small as 3.73%.Indeed, frequency ratio modeling with both random subsamples of high-resolution and high-quality data and high-scale maps confirmed their accuracy as high as 89%.

Figure 2 :
Figure 2: Digital aerial photograph with 82 landslide occurrences in the study area.

Figure 3 :
Figure 3: Digital aerial photographs (a) before and (b, c) after landslide occurrence and (d) the installed debris barrier after landslide occurrence in 2011.
spatial distributions of the factors are shown in Figures 5(a)-5(m).The factors had a 5 m × 5 m grid format.The dimensions of the study area were 1,760 columns by 1,090 rows for a total of 1,918,400 cells in Yongin area, Korea.

Figure 4 :
Figure 4: Digital aerial photographs (a) before and (b, c) after landslide occurrence and (d) the installed debris barrier after landslide occurrence near an elementary school and houses in 2011.

Figure 5 :
Figure 5: Spatial database of the landslide causative factors.

Figure 6 :
Figure 6: Landslide susceptibility maps generated by the frequency ratio approach.The rank was classified into four classes based on the area for simple and visual interpretation: very high, high, medium, and low index ranges in 5%, 10%, 15%, and 70% of the study area, respectively: (a) 6th and (b) 7th susceptibility maps with the highest and lowest susceptibility accuracies, respectively.

Figure 7 :
Figure 7: Success rate curves of the 6th and 7th landslide susceptibility maps with the highest and lowest susceptibility accuracies.

Table 1 :
Data layer related to landslide of study area.

Table 2 :
The multicollinearity diagnosis indexes for factors and frequency ratio (FR) values by 10 iterations of training data (TD) (bold FR values mean 1 >

Table 3 :
Frequency ratio (FR) values by iterations of training data (TD) (bold FR values mean 1 > of FR value).