Modeling Typhoon Event-Induced Landslides Using GIS-Based Logistic Regression: A Case Study of Alishan Forestry Railway, Taiwan

This study develops a model for evaluating the hazard level of landslides at Alishan Forestry Railway, Taiwan, by using logistic regression with the assistance of a geographical information system (GIS). A typhoon event-induced landslide inventory, independent variables, and a triggering factor were used to build the model. The environmental factors such as bedrock lithology from the geology database; topographic aspect, terrain roughness, profile curvature, and distance to river, from the topographic database; and the vegetation index value from SPOT 4 satellite images were used as variables that influence landslide occurrence. The area under curve (AUC) of a receiver operator characteristic (ROC) curve was used to validate themodel. Effects of parameters on landslide occurrencewere assessed from the corresponding coefficient that appears in the logistic regression function.Thereafter, the model was applied to predict the probability of landslides for rainfall data of different return periods. Using a predicted map of probability, the study area was classified into four ranks of landslide susceptibility: low, medium, high, and very high. As a result, most high susceptibility areas are located on the western portion of the study area. Several train stations and railways are located on sites with a high susceptibility ranking.


Introduction
Landslides are major geological hazards in numerous areas of the world and cause economic losses, deaths, and injuries.To prevent damage, numerous government and research institutions have attempted to assess landslide hazards and produce susceptibility maps that portray their spatial distribution [1].Landslide susceptibility mapping fits an appropriate model to assess the probability of failure caused by a landslide at specific locations [2].Various approaches for landslide susceptibility mapping have been proposed and implemented.These approaches may be distinguished into 2 main classes: qualitative and quantitative approaches.The hazard levels for the qualitative approaches are subjectively represented by descriptive terms based on expert opinions.The predictive power of these approaches differs in the weights of the parameters.However, weights evaluated by experts are highly personal and may contain a degree of virtual admission [3].Quantitative approaches reduce personality and bias in the weight assessment process and analyze the relationship between landslide occurrence and its dependency on environmental factors.There are 2 types of quantitative approaches: deterministic and statistical [4].Deterministic methods analyze the slope stability using physically based models and are expressed as the safety factor [5,6].Deterministic methods are limited for predicting potential landslides on a regional scale because of the need for detailed geotechnical data [7].Statistical approaches are used to analyze landslide-controlling factors associated with landslide occurrence and rank the factors by using statistical models.Statistical approaches have recently become common for evaluating landslide hazards.Statistical approaches with logistic regression models were applied in landslide studies of various regions during the past few years, [3,[8][9][10][11][12][13][14][15][16][17][18][19][20][21].The Alishan Forestry Railway and its vicinity suffered from landslides for numerous years.Typhoon Morakot and its heavy rainfall occurred in the southern portion of Taiwan on August 7, 2009 and left 652 people dead and 47 missing.It caused approximately US$ 3.3 billion of property damage [22].The extreme amount of rainfall caused numerous landslide disasters in the mountainous areas throughout Southern Taiwan [23].This study attempts to extend the application of logistic regression because the method overcomes the bias in the weight assessment process and may predict the potential distribution of landslides in large areas.A landslide susceptibility map for the Alishan Forestry Railway is produced, which uses a logistic regression model with a landslide inventory and environmental factors developed with using geographic information system (GIS) technology.The readability and classification of the obtained landslide susceptibility map are the focus because the map will provide valuable novel information for landslide assessment and hazard mitigation.

Study Area
The Alishan Forest Railway is located in the southwestern portion of Taiwan with a 71.4 km network of railways (Figure 1).The main line begins from the city of Chiayi (30 m a.s.l.) to the final Shihou station (2,216 m a.s.l.).The lines were originally constructed for logging, but the lines today cater mostly to tourists.The vegetation along the railway varies from tropical, temperate, and alpine because of the vast range of altitudes.The average annual precipitation of the Alishan area varied from 2,200 to 5,900 mm during 1999-2010.Rainfall is heavy and occasionally intense during thunderstorms and typhoons (Figure 2).The Alishan experienced significant landslide damage in recent years following 3  be considered a trigger for landslides in the future.Figure 3 shows the distribution of landslides triggered by Typhoons Toraji, Mindulle, and Morakot.Landslides are mainly located in the western portion of Dulishan Station.The area along a 64.8 km long railway, beginning from Dulishan Station, with a buffer of 1,200 m was selected for this study.The study area encompasses a total area of 96.38 km 2 .Table 1 shows a summary of the number, total area, and maximum area for landslides triggered by Typhoons Toraji, Mindulle, and Morakot within the study area.The largest number and landslides occurred during Typhoon Morakot.The Alishan Forest Railway was severely damaged by Typhoon Morakot and currently remains unrepaired (Figure 4).

Landslide Inventory
The reconstructions following Typhoon Morakot increased the need for analyzing the structural and geomorphological  features of landslides.A landslide prediction model based on an event-based landslide hazard analysis [24] was built to characterize the landslides triggered by Typhoon Morakot.High-resolution satellite images taken in pre-and post-Typhoon Morakot were interpreted to produce the landslide map used in this study.Aerial photographs and a geographic information system (GIS) were used to examine the quality of the landslide map.A field survey was performed to verify the landslides in the study area.An event-based landslide inventory map was established following the examination of the map.Thereafter, the landslide inventory map was transformed into a grid database with 20 m × 20 m cell sizes.A total of 131 landslides were mapped within 3.87 km 2 to assemble the database.The landslide inventory map of the study area is shown in Figure 5.Other data such as digital elevation models (DEM), SPOT4 satellite images, 1 : 50,000 geologic maps of the study area, and GIS railway map required in the development of landslide prediction model were additionally prepared.

Landslide Causative Factors
Identification and mapping of an appropriate set of factors related to landslides require a prior knowledge of their main causes [1].Three categories of landslide-causative factors  were used in the present study: topography, geology, and hydrology.The topographic category includes 7 factors: aspect, terrain roughness, profile curvature, distance to river, normalized difference vegetation index (NDVI), and normalized difference water index (NDWI).Moreover, the geology category includes bedrock lithology, and the hydrologic category includes the maximum 1 h rainfall intensity.The hydrologic factor is considered as the triggering factor.The factors considered for the analysis were extracted from the constructed spatial database, including the DEM with a 20 × 20 m 2 resolution, SPOT 4 satellite images, 1 : 50,000 geologic maps of the study area, GIS road maps, and GIS railway maps.Table 2 shows a summary of the data source and characteristics of landslide-causative factors.The factors were transformed into a vector-type spatial database using GIS software, and landslide-causative factors were extracted using the database.Thereafter, the factors were converted to a raster grid for application.All factors were computed for a 20 × 20 m 2 grid because of the limited spatial resolution of DEM.The aspect, terrain roughness, and profile curvature were derived from the DEM using GIS software.The distance to river was extracted using the geologic map and ArcMap.The NDVI and NDWI were acquired from the SPOT4 satellite images taken prior to the typhoon event.The values of the NDVI and NDWI were calculated using NDVI = (NIR − RED)/(NIR + RED) and NDWI = (NIR − SWIR)/(NIR + SWIR), respectively.NIR is the reflectance in the near-infrared channel, RED is the reflectance in the visible channel, and SWIR is the reflectance in the midinfrared channel.The geological formations of the study area mainly consist of Miocene and Pleistocene deposits.Five geologic formations are distributed along the study area: Cholan formation (Cl), Chinshui formation (Cs), Kueichoulin formation (Kc), Shihti formation (St), and Toukoshan formation (Tk).The bedrock consists of the Cholan formation and Toukoshan formation, which are composed of alternating sandstone, siltstone, and mudstone beds.The Kueichoulin formation and Shihti formation consists of sandstone and shale.The Chinshui formation consist of shale with sandy or silty intervals.Lithological units with similar geological properties and conditions were grouped, and the geological maps were differentiated into 3 lithological units (Figure 6).The rainfall event by Typhoon Morakot from August 7 to 9, 2009 is considered in this study.This event has a return period of over 200 years [23].The maximum 24 h and 48 h rainfalls recorded at the Alishan weather Station were 1623 mm and 2361 mm, respectively.These figures were both near the respective world records.The rainfall data from 17 weather stations near and within the study area were gathered to examine the hydrological factors.The rainfall data comprised maximum hourly rainfall figures from 2 d before and after the rainfall event.Rainfall data were used to determine the maximum 1 h rainfall intensity.Thereafter, the rainfall data were used for a spatial interpolation using the Kriging interpolation within GIS software (Figure 7).

Logistic Regression Model
In logistic regression, the nonlinear relationship between a dependent variable and several independent variables is constructed based on a multiple regression.The dependent variable is dichotomous, and the independent variables may be a nominal, ordinal, interval, or ratio scale.The relationship between a dependent variable and independent variables is established using the maximum likelihood or Bayesian methods.In this study, the dependent variable is a binary variable representing the presence (1) or absence (0) of a landslide.The results of the logistic regression generate a useful formula for predicting the probability of the dependent  variable with a range of 0 to 1.Moreover, logistic regression coefficients can be used to measure the contribution of the independent variables.The logistic regression model of the dependent variable on independent variables can be expressed in a simple form as where  is the probability of dependent variable occurrence which denotes the probability of landslide occurrence in this study.The value of  is constrained within a range between 0 and 1, where 0 indicates a 0% probability of a landslide and 1 indicates a 100% probability.The  variable is the linear combination of independent variables, and its value varies from −∞ to +∞.The linear combination  can be expressed as where  0 is a constant,   ( = 0, 1, 2, . . ., ) are the independent variables,  is the number of independent variables, and   ( = 0, 1, 2, . . ., ) are the coefficients of the logistic regression model.The coefficients of the logistic regression model were assessed by using the maximum likelihood method in this study.A landslide has a higher probability of occurring when a coefficient is positive and a lower probability of occurring when a coefficient is negative.Variation Inflation Factors (VIF) and Tolerance (TOL) were used as indexes for multicollinearity diagnosis.Variables with VIF > 2 and TOL < 0.4 were excluded from the logistic analysis in order to avoid multicollinearity.Analysis of landslide occurrence involves a high level of uncertainty due to data limitation [25].In this respect, sample datasets were input into the logistic regression algorithm within the Statistical Package for Social Science (SPSS) to model the relationship between landslide probability and the independent variables.It is generally recommended that an equal number of landslide presence (1) and absence (0) cells are in the training dataset [8,10,19,26].Five different training datasets were created by using the total number of landslide cells and an equal number of randomly selected nonlandslide cells from the landslide-free areas.A chisquare of the Hosmer-Lemeshow test, Cox and Snell  2 , and Nagelkerke  2 were used to evaluate the influence of these datasets on model performance.The results of classification accuracy for different datasets are summarized in Table 3.As the values of chi-square are larger than 0.05, the model is considered to be able to fit the equation.On the other hand, the values of Cox and Snell  2 and Nagelkerke  2 suggest the independent variables can explain the landslide probability.Considering different random samplings, the values of overall corrected percentage depending on different datasets are acceptable.The maximum overall corrected percentage is 74.1%.There is only a little difference for the predicted accuracy between different datasets.As a result of the logistic regression analysis, the randomly selected training data are able to represent generalized conditions of the study area.

Results and Discussions
During the logistic regression analysis, 8 significant independent variables were included in the model: bedrock lithology, aspect, terrain roughness, profile curvature, distance to river, NDVI, NDWI, and maximum 1 h rainfall intensity.The aspect, terrain roughness, and profile curvature were derived from the DEM using GIS software.The distance to river was extracted using the GIS software and geologic map.To obtain accurate information, the NDVI and NDWI were obtained from SPOT4 satellite images taken prior to the typhoon event.Based on the significant independent variables the logistic regression equation ( 2) is compiled as where TRRRAIN ROUGHNESS is the terrain roughness value; PROFILE CURVATURE is the profile curvature value; NDVI is the NDVI value; NDWI is the NDWI value; DISTANCE TO RIVER is the distance to river; MAXIMUM 1-H RAINFALL is the maximum 1 h rainfall intensity; and ASPECT and LITHOLOGY are logistic regression coefficient values shown in Table 4.The landslide occurrence probability was calculated for all cells in the study area by using ( 1) and ( 3).The probabilities ranged between 0.018 and 0.957.A probability value of 0.5 was selected as the cutoff value [8], which distinguishes the cells into 2 cases: places where landslide occurred and did not occur.Figure 8 shows a comparison between known landslides following Typhoon Morakot and predicted landslides.The logistic regression model generally shows good classifications for the presence and absence of landslides.Moreover, the known landslide location data with the landslide susceptibility map were used in a cumulative frequency diagram as shown in Figure 9(a).
The figure indicates that the model successfully explained 46.4% of the landslides in the 90%-100% (10%) class of the study area.The model successfully explained 58.5% of the landslides for the 80%-100% (20%) class of the study area.Moreover, the model successfully explained 74.4% of the landslides in the 60%-100% (40%) class of the study area.Thereafter, the performance of the constructed model was evaluated.The area under curve (AUC) of the receiver operator characteristic (ROC) curve was used to qualitatively assess the prediction accuracy.The AUC value ranges from 0.5 to 1, where 1 indicates an excellent prediction and 0.5 represents a failed prediction.The ROC curve for the logistic regression analysis is shown in Figure 9(b).The AUC value is 0.816, which can be regarded as a good correlation between the independent and dependent variables.The coefficients in (3) can be used to assess the relative importance of the independent variables.Among the used independent variables, the aspect (N), aspect (NE), terrain roughness, profile curvature, and NDWI are positively associated with landslide occurrence, whereas aspect (E), aspect (SE), aspect (S), aspect (SW), and aspect (W), distance to river, and NDVI are negatively associated with the landslide occurrence in the study area.The contributions from the lithology classes show a clear indication of the influence of the bedrock type on landslide occurrence in the study area.The lithology of sandstone and schist has a positive coefficient estimate, indicating that the lithology type contributes to landslides.The lithology of sandstone and mudstone also has a positive coefficient estimate.However, its coefficient slightly deviated from 0 and resulted in a small impact on landslide occurrence.Terrain variables show that terrain roughness is strongly correlated to the probability of landslide occurrence.Moreover, profile curvature has a positive contribution to landslide occurrence when its value is high.Contributions from the aspect classes indicate that the landslides have high probabilities of occurring at the eastern (E) and northeastern (NE) slopes.
The probabilities of the predicted landslides from the logistic regression model were used to produce susceptibility maps for the study area.The landslide susceptibility map provides stability rankings that show where landslides may occur and is a useful general guide for future spatial planning, site selection, policy making, and land use assessment for the hazard area.In the literature, numerous methods were adopted for dividing the landslide probability of a susceptibility map into various ranks [1,[8][9][10][11]24].In this study, a quartile-based method was used to generate susceptibility ranks, and the susceptibility map was classified into 4 ranks: low (0.00-0.25), medium (>0.25-0.50),high (>0.50-0.75),and very high (>0.75-1.00).When considering rainfall as the triggering factor for landslides, rainfall data for a particular return period, such as 10 years, 20 years, 50 years, and 100 years, can be used to replace a Typhoon event's rainfall data to produce susceptibility maps [27,28].Frequency analysis was performed using the rainfall data from weather stations, as shown in Figure 7.The Kriging method within GIS software was applied for spatial interpolation to prepare the spatial rainfall data.Total rainfall data corresponding to return periods of 10 years, 20 years, 50 years, and 100 years were used as input in the developed logistic regression model.Figure 10 shows the ranked susceptibility maps for the study area with various return periods of maximum 1 h rainfall intensity.Moreover, the areas with high susceptibility are revealed by the susceptibility maps.The maps indicate that several train stations and railways in Figure 10 are located on a high-susceptibility site.The western side of the study area is especially covered by relatively weak bedrocks such as the Cholan formation and Chinshui formation, which are classified to have a high susceptibility.Furthermore, the susceptibility in these areas increases significantly as the return periods increase.These results can be used as basic data to assist hazard assessments and reconstruction plans for the Alishan Forestry Railway.

Conclusion and Outlook
This study applied a statistical method with data from typhoon event-induced landslides to prepare a susceptibility map for the Alishan Forestry Railway, Taiwan.A logistic regression model within the GIS framework for landslide susceptibility estimation was developed based on landslides triggered by Typhoon Morakot in combination with independent variables such as bedrock lithology from the geology database; topographic aspect, profile curvature, terrain roughness, and distance to river, from the topographic database; and the vegetation index value from SPOT 4 satellite images.It is shown that the training datasets only result in a minor difference for the predicted accuracy.Using the area under curve (AUC) of the receiver operator characteristic (ROC) curve, the validation results show that the logistic regression model is capable of predicting the presence or absence of landslides.Aspect (E) and terrain roughness are found to be relatively important factors among the independent variables because maximum 1 h rainfall intensity was selected as the triggering factor.Furthermore, the landslide susceptibility maps were derived from a combination of both the logistic regression model and various return periods of total rainfall.An inference was completed to divide the study area into 4 ranks of landslide susceptibility: low, medium, high, and very high.Several train stations and railways are located on a high-susceptibility site.These results can be used to assist planners and engineers for hazard assessments and reconstruction plans for the Alishan Forestry Railway following the damage from Typhoon Morakot.This study showing logistic regression model may be a valuable tool for predicting landslide potential in the study area.However, detailed spatial information is needed to improve model predictions.The information is useful for spatial statistical modeling of landslides in a more realistic manner.

Figure 1 :
Figure 1: Location and extent of the study area.

Figure 2 :
Figure 2: Mean monthly rainfall values for the period between 1999 and 2010 for the Alishan weather Station.

Figure 6 :
Figure 6: Regional geological map of the study area.

Figure 7 :
Figure 7: Distribution of maximum 1 h rainfall intensity during typhoon Morakot in 2009.

Figure 8 :
Figure 8: Comparison between the observed landslides after the Typhoon Morakot and predicted landslides based on logistic regression model with cutoff value equal to 0.5.

Figure 9 :
Figure 9: (a) Cumulative curve shows success rate of landslide prediction in cumulative percentage of landslide area; (b) ROC curve used to assess the predictive performance of the logistic regression model.

Table 2 :
Data, resolution/scale, data source, and procedure for the independent variables.Ministry of the Interior Aspect, surface roughness, and profile curvature are derived from DEM data.

Table 3 :
Classification accuracy for logistic regression analysis with respect to different random samplings.

Table 4 :
Regression coefficients obtained and statistics for the independent variables.