^{1}

^{2}

^{3}

^{2}

^{1}

^{1}

^{2}

^{3}

The main purpose of the present study is to investigate the possible application of decision tree in landslide susceptibility assessment. The study area having a surface area of 174.8

The landslide susceptibility and hazard assessments can be carried out either by using direct mapping techniques or by using indirect mapping techniques. Direct hazard assessment, in which the degree of hazard is determined by the mapping geomorphologist, based on his/her experience and knowledge of the terrain conditions [

According to Miller and Han [

Location map of the study area; rectangle in black pointed out by the arrow in the figure covers the study area.

The study area having a surface area of 174.8

In the study area, summers are hot and slightly rainy while winters are warm and rainy. The topography of the region and presence of lakes and dams also affect the weather conditions (

Various lithological units from Middle-Late Eocene to Quaternary crop out in the region. The 1/25000 scaled geological map of the study area was prepared by Duman et al. [

The distribution of the geological formations with respect to landslide in the study area.

Formation | Symbol | Grid cells with landslides | All grid cells | LandslideDensity (%) | ||
---|---|---|---|---|---|---|

Frequency | % | Frequency | % | |||

Quaternary | Qa | 1127 | 2.1 | 15495 | 5.54 | 7.27 |

Bakirkoy fm. | Tmb | 4473 | 8.34 | 56231 | 20.1 | 7.95 |

Ergene fm. | Tme | 13617 | 25.37 | 71287 | 25.49 | 19.1 |

Cantakoy fm. | Toc | 5706 | 10.63 | 14210 | 5.08 | 40.15 |

Danisment fm. | Tod | 4728 | 8.81 | 20650 | 7.38 | 22.9 |

Ihsaniye fm. | Teoi | 1810 | 3.37 | 43337 | 15.49 | 4.18 |

Kirklareli limestone | Tek | 0 | 0 | 1613 | 0.58 | 0 |

Danisment fm. Acmalar m. | Toda | 17151 | 31.95 | 34213 | 12.23 | 50.13 |

Ihsaniye fm. Tuff m. | Teoi2 | 66 | 0.12 | 478 | 0.17 | 13.81 |

Suloglu fm. | Tos | 4996 | 9.31 | 21166 | 7.57 | 23.6 |

Yassioren limestone | Teoiy | 0 | 0 | 1035 | 0.37 | 0 |

Geological map of the study area [

The altitude values in the study area are between 0 and 200 m while the dominant altitude ranges are 75–100 and 100–125 m (Table

General descriptive statistics of topographical variables with respect to landslides.

Data | Variable | Min. | Max. | Mean | Std. |

deviation | |||||

Grid cells with landslides | Altitude (m) | 0.000 | 194.680 | 85.009 | 45.165 |

Slope gradient | 0.000 | 57.950 | 7.966 | 5.307 | |

(°) | |||||

Plan curvature | –3.130 | 2.870 | 0.238 | ||

Profile | –2.930 | 3.080 | 0.023 | 0.303 | |

curvature | |||||

Heat load | 0.000 | 1.000 | 0.529 | 0.338 | |

Stream power | 0.000 | 8.330 | 0.724 | 0.934 | |

index | |||||

Grid cells without landslides | Altitude (m) | 0.000 | 200.000 | 98.229 | 52.370 |

Slope gradient | 0.000 | 55.230 | 4.642 | 3.924 | |

(°) | |||||

Plan curvature | –4.500 | 2.690 | 0.004 | 0.137 | |

Profile | –2.920 | 3.430 | 0.188 | ||

curvature | |||||

Heat load | 0.000 | 1.000 | 0.524 | 0.342 | |

Stream power | 0.000 | 8.670 | 0.453 | 0.775 | |

index |

In this section, the landslide conditioning factors observed in the study area are explained. Before the explanations, the data used is given. In the present study, the digital elevation model (DEM) produced by Duman et al. [

One of the most important stages of landslide susceptibility mapping is to describe the factors governing the landslides identified in the area. A landslide susceptibility mapping procedure for the application site has been performed previously by Duman et al. [

The characters of landslides identified in the region are mainly deep seated and active. They are generally located in the lithologies including the permeable sandstone layers and impermeable layers such as claystone, siltstone, and mudstone layers. This is typical for the landslides identified in the study area. When considering this finding, it may be said that one of the main conditioning factors of the landslides in the study area is lithology [

One of the most important topographical factors conditioning landslides is the slope gradient. In the regional landslide susceptibility or hazard assessments, several researchers (i.e., [

The term curvature is generally defined as the curvature of a line formed by intersection of a random plane with the terrain surface [

The last parameter considered in the present study is stream power index (SPI). It is a measure of erosive power of water flow based on the assumption that discharge (q) is proportional to specific catchment area (

Data mining involves various techniques such as statistics, neural networks, decision tree, genetic algorithm, and visualization techniques that have been developed over the years. Data mining problems are generally categorized as association, clustering, classification, and prediction [

In practice, there are several data mining tools such as Oracle DM, SQL Server Analysis Services, SPSS Clementine, and SAS Enterprise Miner for commercial use. In the present study, the decision tree technique is used to predict the landslide susceptibility classes by employing Microsoft Server 2008 Analysis Services. Decision tree is a data mining approach that is often used for classification and prediction. Although other methodologies such as neural network can also be used for classification, decision tree has the advantages of easy interpretation and understanding for the decision makers to compare with their domain knowledge for validation and justify their decisions [

Decision trees are built through recursive data partitioning, where in each iteration the data is split according to the values of a selected attribute. The recursion stops at “pure” data subsets which only include instances of the same class [

Schematic illustration of the construction of decision tree by using training data set and an example view of a prediction on test data [

ID3 is a well-known decision tree algorithm proposed by Ross Quinlan of the University of Sydney, Australia. ID3 tree was later enhanced to be C4.5. C4.5 can handle numeric attributes, missing values, and noisy data. Some decision trees can perform regression tasks, for example, to predict continuous variables such as temperature and humidity. The Classification and Regression Tree (CART) proposed by Briemann is a popular decision tree algorithm for classification and regression [

In order to perform the research reported in the present paper, Microsoft SQL Server 2008 Analysis Services software is chosen as the analyzing platform as it supports decision trees with continuous variables (called as regression trees). High scalability and having support for nested table, automatic feature selection, automatic cardinality reduction features of it are the other reasons for choosing this data mining platform. Additionally, Microsoft Analysis Services allows building data mining applications via the support of Microsoft Visual Studio Integrated Development Environment and ADOMD extensions [

In this study, all of the input variables and target output variable are continuous, so resulted tree is a special version of decision tree named regression tree. Regression is similar to classification. The only difference is that regression predicts continuous attributes. Although the basic task of a decision tree algorithm is classification, it can be used for regression as well. Another well-known regression tree algorithm is CART. The Microsoft Decision Trees algorithm adds the support for regression in SQL Server 2005 and 2008. Microsoft Regression Trees contain a linear regression formula at each leaf node. Using a regression tree has its advantages over simple linear regression in that a tree can represent both linear and nonlinear relationships [

The data understanding and the data preparation stages are among the most important steps in the data mining applications [

The attributes considered in the study and the effect importance order on the predictable variable.

Attribute | Continues/discrete | Usage | Effect importance order on the output |
---|---|---|---|

Altitude | Continuous | Input | 12 |

Heat load index | " | " | 14 |

Plan curvature | " | " | 15 |

Profile curvature | " | " | 5 |

" | " | ||

" | " | ||

Alluvium (Qa) | Discrete in nature handled Continuous | " | 10 |

" | " | ||

Ihsaniye fm. (Teoi) | " | " | 6 |

Ihsaniye fm. Tuff m. (Teoi2) | " | " | 13 |

Yassioren limestone (Teoiy) | " | " | 17 |

Bakirkoy fm. (Tmb) | " | " | 16 |

Ergene (Tme) | " | " | 8 |

Cantakoy fm. (Toc) | " | " | 9 |

Danisment fm. (Tod) | " | " | 11 |

" | " | ||

Suloglu fm. (Tos) | " | " | 7 |

Landslide | " | Output |

An example view of a part of the decision tree.

Using the predicted landslide susceptibility values, the landslide susceptibility map of the study area is produced (Figure

Landslide susceptibility map of the study area produced by using the decision tree technique.

ROC (Receiver-Operating Characteristic) curve evaluation of the constructed model.

Duman et al. [

As a final point, in the present study, decision tree, one of the data mining methods, is investigated to produce landslide susceptibility map of a landslide-prone area (Cekmece, Istanbul, Turkey). By using the developed decision tree model, two important results can be obtained; the model is used to predict the landslide susceptibility degrees and the effect order of input attributes on landslide occurrence is investigated.