Flood Detection and Susceptibility Mapping Using Sentinel-1 Time Series, Alternating Decision Trees, and Bag-ADTree Models

Department of Remote Sensing and GIS, University of Tabriz, Tabriz 5166616471, Iran Institute of Environment, University of Tabriz, Tabriz 5166616471, Iran Department of Architecture and Building Engineering, Tokyo Institute of Technology, Yokohama 226-8502, Japan Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran Department of Zrebar Lake Environmental Research, Kurdistan Studies Institute, University of Kurdistan, Sanandaj 66177-15175, Iran Department of Civil, Environmental and Natural Resources Engineering, Lulea University of Technology, Lulea 971 87, Sweden


Introduction
Natural disasters such as tornadoes, floods, volcanic eruptions, hurricanes, tsunamis, and earthquakes occur across the globe [1,2], with associated loss of life and damage to economies [1,3]. Flooding is one of the natural hazards with the highest impact and social, economic, and environmental consequences, especially for cities and agricultural lands [4].
During the past three years, different parts of Iran have been devastated by floods which claimed hundreds of lives and millions of dollars of damage to properties [5,6].
us, comprehensive flood assessment and management are essential to downgrade floods' effects on human lives and livelihoods [7][8][9].
Typically, floods occur when rivers overflow over the banks after a heavy rainfall. Floods lead to economic, environmental, and social problems, such as damage to roads, farms, and infrastructures, and sometimes pollute surface water resources via the transfer of industrial waste [10,11], which itself creates many health problems. Flash floods claim about 20,000 lives annually [12], and, from 1995 till today, approximately 110 million people have been affected by the floods' damage [10,13,14]. During the last 50 years, there were 2400 severe floods in Iran, which resulted in the homelessness of a large number of people and serious financial damage of about 200 million USD [15]. Iran is a country with arid and semiarid climate that is prone to destructive floods [5].
In the last three years, almost all provinces of the country have been affected by floods [6]. One of the most recent and destructive floods happened in Sistan and Baluchestan Province in January 2020. e factors contributing to the extent of damage were heavy rainfall, poor watershed management, farming on river banks, and poor flood warning systems [5]. Flood susceptibility mapping (FSM) is an essential management way to identify areas at risk and also to prevent licensed and unlicensed construction in highrisk areas [1]. Due to superficial assessment, maps generated for the floods in Sistan and Baluchestan Province have not been suitable for crisis flood management. Our aim with this research is to improve flood detection and susceptibility mapping to aid flood managers and minimize loss to life and property.
During the past decades, remote sensing (RS) and Geographic Information Systems (GIS) have opened up new opportunities to create and assess large datasets to extract more accurate and valuable flood hazard maps [5,12,16]. Synthetic Aperture Radar (SAR) can collect data, day or night, penetrate through clouds, and is both air-and spacebased [17,18]. European Space Agency (ESA) has provided researchers with free and comprehensive data of Sentinel for variety of purposes. In this study, Sentinel-1 was applied for floods analysis. Hybrid algorithms are now more reliable than the single models [5,19], because the ensemble is used to increase the prediction accuracy of a single classifier [5,20,21]. Researchers tend to apply machine learning ensemble models that are better suited for sophisticated flood assessment. ey have improved flood prediction [1,2,4,5,10,[22][23][24]. It is worth mentioning that there is not a universally accepted model yet, which has been known to be superior among all [5]. In this study, we use ADT and bagging ensemble based alternating decision trees (bag-ADT) models for mapping areas susceptible to floods.
Unfortunately, many flood susceptibility models have limitations in performance, meaning that they do not involve nonflooded locations and generally consider only class weights instead of weights for layers [5,51]. Additionally, MCDM models generate bias and errors that are based on experts' opinions [5]. At the same time, flooding at a county scale is a complex and a nonlinear process, which cannot be predicted and mapped using these simple techniques.
Our study has two main purposes: (1) to present a new ensemble approach of bag-ADTree for mapping areas prone to flooding, which has rarely been applied for FSM, either locally or globally, and (2) to map flooded areas using time series analysis of Sentinel-1 data, which has not yet been applied in Iran.

Description of the Study Area
From 10 to 12 January 2020 (for three days), a very destructive flood occurred in Sistan and Baluchestan Province, southeast of Iran ( Figure 1). We selected Fanuj County (1835 km2) for our study area as it was heavily impacted by flooding. Fanuj is one of the oldest cities in the southwest of Sistan and Baluchestan Province. It is located 180 km south of Iranshahr city and 550 km from Zahedan city (the capital of Sistan and Baluchestan Province). e study area is located at 26°34′33″N 59°38′23″E and is 185 meters above sea level [52].
is study area was selected for two reasons: (1) based on the local and government reports, the selected study area has experienced most of the damage, and (2) it is located in the southern part of the province and the upper-hand water pours to the study area. It has a warm and dry climate with relatively low humidity and an annual rainfall of about 550 mm [53]. Khermat, Katich, Ramak, Zavar, Kaskan, Megoon, Pir Sehran, Fenuj Gorge, Muskotan, and Modanche are the most important rivers within the study area [15,54]. Much of the arid landscape is poor rangeland. It is rocky with frequent rock outcrops and entisol and aridisol soil types. Bedrock is predominantly Eocene flysch in general (composed of shale, marl, sandstone, conglomerate, and limestone). is arid landscape is prone to flash floods. Figure 2 shows the extent of the 2020 flooding in the study area.

Data Acquisition.
We used Sentinel-1 data (collected from scihub.copernicus.eu) for flood inventory and time series analysis of the study area. A 30 m spatial resolution Digital Elevation Model (DEM) was collected from United States Geological Survey (USGS) website and used for extracting a variety of layers, such as slope, aspect, elevation, stream power index (SPI), topographic wetness index (TWI), curvature, proximity to the river, and river density (rivers networks were extracted using hydrology toolbox in GIS). Geology and soil maps were acquired from Geological Organization of Iran and applied to digitize the geology and soil type layers. Rainfall data was obtained from Tropical Rainfall Measuring Mission (TRMM) and, finally, Normalized Differences Vegetation Index (NDVI) was extracted from Sentinel-2 data. 2 Complexity Figure 3 shows the methodology we used in this study step by step. e workflow consists of the following: (1) data collection and preparation (containing appropriate conditioning factors, satellite data for flood detection, and time series assessment), (2) extraction of a flood inventory map, (3) modelling flood susceptibility by using ADT and bag-ADT algorithms as well as time series analysis, and finally (4) accuracy assessment and comparison of the models. A pair of Sentinel-1 satellite data was employed for flood detection. e flood inventory was randomly divided into two groups of 70% (for training the models) and 30% (to validate the algorithms). Moreover, twelve Sentinel-1 scenes were acquired and used for extracting the area of water in the flooded regions of the study area. ADT and bag-ADT models were employed for flood susceptibility analysis in the study area. A set of statistical methods, including RMSE, overall accuracy, and receiver operating characteristic (ROC) curve, were utilized for validation purposes.

Selection of the Best Factors for Flood Modelling.
In order to obtain a better finding, spatial association for all factors should be met [34]. Eliminating null factors from the algorithms can enhance the prediction power of the methods [46,55,56]. e Latent Semantic Analysis (LSA) technique using Waikato Environment for Knowledge Analysis (WEKA 3.6.9) software was applied for selection of the most effective factors in the study area. It is worth mentioning that our criteria were the other study and geographical situation of the study.

Flood Conditioning Factors.
In order to run a model, a set of conditioning and triggering parameters must be collected and applied [5,34,57]. It is generally accepted that the magnitude of a flood depends greatly on rainfall intensity and duration [46]. Other factors, including topography, geology, vegetation, and soil types, can have also considerable effects on flooding. In this study, twelve conditioning and triggering parameters (related to the geoenvironmental characteristics of the study area) were initially selected for flood susceptibility assessment. ese factors are slope, elevation, aspect, curvature, SPI, TWI, lithology, rainfall, NDVI, river density, proximity to river, and soil type.
It is clear that the higher slopes increase surface runoff and water velocity [5,58]; conversely, lower slopes mean greater flood depths [5,56]. In this study, the slope layer was created from the SRTM DEM (Figure 4(a)). Relative   Complexity elevation is obviously an important factor in flood analysis as lower elevations have a greater potential for runoff and flooding [56], while the higher areas are above flood levels [5]. Figure 4(b) shows elevation factor for the current study. Aspect was selected as one of the factors for modelling the algorithms, because it is related to the directions and convergence of water flow [59]. For this study, we categorized the aspect map into nine classes (Figure 4(c)). Stream flow is normally affected by curvature [60]. Based on other studies, curvature with value of zero has been almost more potential for flooding than the positive or negative values [5,11,56] ( Figure 4(d)). SPI refers to the erosive ability and power of water in rivers, which is calculated by the following equation [60][61][62][63]: where α is flow accumulation of streams in an area and β is the slope. Generally, sediments relocating and river bank erosion are among the most noticeable issues related to SPI, where the high SPI rates lead to extreme channel transformation [64,65]. erefore, more SPI values will boost flooding rates ( Figure 5(a)). TWI ( Figure 5(b)) is an important factor in flood susceptibility analysis, which is a way to calculate accumulation of water at any location under the influence of gravitational force [65,66]. It highlights soil moisture patterns [67]. TWI is measured through the following equation [65]: where α is flow accumulation of streams, β is the slope of an area, and c is 0.001. Variations in permeability of rocks and sediments may influence water flow [4,65]. We digitized the lithology layer in a GIS environment using the geology map ( Figure 5(c)). e role of rainfall on flooding is a significant factor for flood susceptibility analysis [55]. Rainfall layer was extracted from TRMM data for recent five years from early 2015 to late 2019. We used Inverse Distance Weighted (IDW) method in a GIS environment to create the rainfall layer ( Figure 5(d)).
NDVI can provide a good indication for flood potential [5] as vegetation can control runoff [68]. We extracted NDVI for the study area from Sentinel-2 data using the following formula ( Figure 6(a)): River density refers to the quantity of streams in a specific area [5,62]. erefore, high river density has a greater potential for flooding and vice versa [10] ( Figure 6(b)). Proximity to river plays a major role in magnitude and distribution of floods [1,59,69]. e longer the distance, the lower the probability of flooding [70], and, because of the lower infiltration rates in the study area, rapid 4 Complexity runoff occurs ( Figure 6(c)). Regarding infiltration and runoff, soil type has to be considered in flood susceptibility assessment [46,61,67]. e soil layer was digitized from soil map of the study area. e layer has three groups of rock outcrops/entisols, rocky lands, and entisols/aridisols ( Figure 6(d)). Classification of geoenvironmental factors must be carried out for flood susceptibility assessment [11,71]. Based on the mechanism of flooding in the study area, a set of twelve conditioning and triggering parameters were finally taken into account. e floods-affecting factors were classified into different classes (Table 1). is classification is based on the susceptibility assessment of each factor to flood occurrences.

Flood Inventory
Using Sentinel-1 Data. Sentinel-1 (Cband and 5.7 cm wavelength) is the first satellite program of the European Space Agency (ESA) which consists of two satellites of Sentinel-1A and Sentinel-1B [57,72]. It has Synthetic Aperture Radar (SAR) instrument that acquires data in all weather conditions, day and night [5,57,72,73]. Sentinel-1 has three product types: (1) Ground Range Detected (GRD), (2) Single-Look Complex (SLC), and (3) Ocean [72]. Strip Map (SM), Wave (WV), Interferometric Wide (IW) Swath, and Extra Wide (EW) Swath are four sensor modes for the satellite with different applications and spatial resolutions [57]. e main negative point for the C-bands satellite missions is that they cannot penetrate into dense vegetation coverage [5,72,74]. Sentinel missions are available online freely (scihub.copernicus.eu and vertex. daac.asf.alaska.edu). In this study, we used Sentinel-1 (GRD and IW) data for flood detection ( Table 2).

Data Preprocessing, Processing, and Postprocessing.
Inventory maps play an important role in flood susceptibility studies [11,29,59]. A pair of Sentinel-1 (GRD and IW) data for the dates of 31/12/2019 and 12/01/2020 were acquired to identify and detect flood locations in the study area. Using Sentinel Application Platform (SNAP 7.0) and ArcGIS 10.5, the data were preprocessed and processed, respectively. First of all, using SNAP software, the data were clipped as the study area and the orbit files of them were updated successfully and then were calibrated to optimize extracted information. Normally, there is speckle in raw  Complexity 5 satellite data. erefore, they were smoothed using speckle filtering command in SNAP. In order to remove distortions from data, such as layover, shadow, and foreshortening as well as georeferencing, terrain correction module was employed. To extract maximum ratio of information, we created a dB band for both imageries. Finally, we stacked the data for further processing into ArcGIS. ArcGIS is able to clip data as basin borders exactly, so all bands (one by one) were clipped as the study area and used for RGB creation. dB band of before an event was utilized for red window, while the dB band of after event image was used for green and blue windows. By using this model, flooded areas can be easily identified and recorded, because using this technique flooded areas can even be differentiated from constant water bodies. erefore, red areas (showing flooded areas) were digitized as flood prone areas. en they were digitized in point format; these points were divided into training 80% and testing 20%. A total of 199 locations were extracted as flood susceptible regions. Using a handheld Global Positioning System (GPS), 20% (40 locations) of the detected flood locations were well validated in the study area (Figure 7).

Time Series Analysis.
In this study, twelve pieces of Sentinel-1 data were acquired and used for time series analysis. All data are GRD and IW, from which five imageries are S1B and the remaining are S1A (Table 3). It is worth mentioning that imageries of the dates 31/12/2019 and 12/01/2020 belong to before and right after the event, respectively.

Data Preprocessing and
Processing. reprocessing of twelve images is time-consuming. For efficiency, we employed Graph-Builder to preprocess the data in SNAP environment using the workflow described in Section 3.4.1 (e.g., subset, apply orbit file, thermal noise removal, calibration, speckle filtering, and terrain correction). To extract more information, the imagery should be manipulated on a pixel basis. Texture analysis is an essential method for distinguishing objects, especially in SAR data [34]. is technique was applied to get a better discrimination among different objects. Using pixel info tools in SNAP, flooded areas can be identified initially. Once pixel information of      Complexity flooded areas was recognized, we then extracted it using the band math tools. For further processes, all imageries were exported into ArcGIS as GeoTIFF.
In order to omit single pixels and noise across the data, we postprocessed the data in an ArcGIS environment. First of all, a majority filter has to be applied to the pixels one by one. Before separating the noisy and single pixels from the others by using region group tool, boundaries of pixels should be severed through the boundary clean module. Now the noisy pixels should be selected using set null command. Nibble is a tool by which selected pixels can be eliminated. Moreover, prior to converting into polygon, the data have to be reclassified into two groups of flooded and nonflooded (Figure 8).

Alternating Decision Trees (ADTree) Method.
e ADT algorithm is a robust technique for modelling [75]. A simple decision stump is required for processing this method [4,76] and the decision tree structure for this model is relatively simple [75,77]. Decision and prediction are two types of nodes for ADT model [4], from which prediction node is a conditional statement [75][76][77]. e model acts like a boosting technique to grow the tree until a specified number is achieved [76]. e final prediction probability is calculated by the weights' summation [4,75]. e model can be measured using the following equations: where C 1 is a precondition, C 2 is a base condition, ∩ is intersection, W denotes weights, and α and b are two real numbers (instance to real numbers).

Bagging Ensemble.
Finding a best and absolute single technique for susceptibility mapping is very difficult [23].
Recently, hybrid models have been considered as the best solutions for developing new algorithms. For constructing ensemble models and improving the predictive accuracy of single algorithms, bagging is very popular among researchers [5,71]. In bagging, training sets are formed by constructing a bootstrap replicate of the original training set [5,78], meaning that, given a training set (T) from N instance, a new training set is formed by drawing N classes from T [5]. Each instance has a probability of 1 − (1 − (1/N)) N , which will be selected at least once in N times [78]. Bagging is a method that creates multiple and aggregate classifiers through training classifiers in redistributed training sets [78][79][80]. Combined training sets produced from bagging classifiers have a lower error rate than the single models [78,80].

Root Mean Square Error (RMSE).
In order to evaluate the applicability and performances of a new model, proved evaluation methods should be employed, such as RMSE, which can be applied to know about the difference between predicted and observed values ( [5,19] and Bui et al. [40,82]). Researchers tend to use RMSE as a standard way to calculate the errors of models [83]. RMSE can be measured as follows: where estimated floods, real floods, and quantity of floods are F est. , F obs. , and N, respectively.

Overall Accuracy.
Overall accuracy needs to be determined for further capability testing of the models. e overall accuracy is normally calculated based on True Negative (TN), True Positive (TP), False Negative (FN), and False Positive (FP) values [5]. TP is the number of flooded pixels that are correctly classified as flood and TN is the number of flooded pixels that are correctly classified as nonflood [5,18,84]. FP and FN are nonflood pixels that are correctly classified as flood and nonflood pixels, respectively [5,11,63,70]. Overall accuracy can be measured as follows:

Area under the ROC Curve (AUC).
We tested the overall capability of our models via AUC, which is known to be a robust method ( [5,10,84] and Bui et al. [40,50]). Here, specificity (the number of incorrectly classified floods) is located on the x-axis, while sensitivity (the number of correctly classified floods) is located on the y-axis [85]. A value of 1 indicates that the represented model is perfect and a value of 0 indicates that the model is unideal [25,86]. e AUC can be calculated as follows: where M represents the number of total flood pixels and N represents the total nonflood pixels [5].

Flood Detection Using Sentinel-1 Data.
Using SNAP software, a coregistered image was created from a pair of Sentinel-1 data (for before and after the flood event). Before coregistration, we created dB bands for both data, and using these bands with the RGB window in ArcGIS we identified and digitized the flooded areas. In this model, the flooded areas are differentiated (in red color) from the other areas (preflood water bodies). A total of 199 locations were finally recognized and mapped as flood prone regions ( Figure 11). Because of the local reports for seeing crocodiles in the study area during flooding days (Figure 12), researchers were able to validate 20% (40 locations) only. ese locations were selected randomly and checked within seven working days. It is worth mentioning that the scar of flood appeared in all the validated locations. Figure 13 shows the geographical location of the validated locations.

Time Series
Analysis. From 10 to 12 January 2020 (three days), severe and destructive rainfall occurred in Fanuj County, Iran. e intense rainfall and subsequent minor rainfalls contributed to the gradual residing of flooded areas (Figures 14 and 15) that were among those that experienced a lot of damage, especially economically.
is section attempted to study the area of the water using Sentinel-1 time series data. erefore, twelve images were acquired and analyzed via SNAP 7.0 and ArcGIS 10.5. e results show that, on 23/02/2020, the study area returned to the normal situation as before severe flooding (area of water for both dates of before flooding 31/12/2019, and last date, 23/02/ 2020, was recorded about 5.5 km 2 ). Right after rainfall (on 12/01/2020) the area of water was almost 583 km 2 . Not surprisingly, the amount of water from then showed a remarkable decline date by date. Based on the field survey and the local reports, area of water within the study area was affected by the upper areas' rainfall as well as scattered and local rainfalls during the research period of time; otherwise, the study area could have returned to the normal situation sooner (here on 23/02/2020 the study area could return to the normal situation as before flooding days). More findings indicate that on 02/02/2020 the water has lost half of its area (approximately 280 km 2 ). e area of water for the other dates is shown in Figures 14 and 15. Before the flooding in December 2019, less than 0.3% of the area was covered by water. At peak flooding on 12 January 2020, more than 42% of the study area was inundated. It took more than a month for flood waters to subside to normal levels.

Evaluation and Comparison.
Validations were applied to determine the accuracy and prediction capabilities of the models. e ROC was applied for the two FSMs using the training and validation datasets, which is shown in Figure 18 and Table 4. Both models displayed reasonable results with respective success rates of 0.736 and 0.714 and prediction rates of 0.786 and 0.784 for the bag-ADTree and the ADTree models, respectively. e results demonstrate that the bag-ADTree model was marginally better than ADTree algorithm.
e bag-ADTree model also performed slightly better in terms of accuracy and RMSE (

Discussion and Conclusion
Complex natural hazards, such as flash floods, can never be entirely eliminated from society. Flood prediction methods can help mitigate the loss of human life and economic damage. Widespread flooding over the last three years in Iran motivated us to improve flood susceptibility mapping capabilities. We chose Sentinel-1 data because it was freely available and applicable for studying flood hazard [84]. In recent years, ensemble techniques have become mainstream in flood and landslide susceptibility assessment and were favoured over single-evaluation methods due to their higher predictive capabilities [1,10]. We previously used the bag-ADTree model successfully for landslide susceptibility [88] and tested it for flood susceptibility mapping in this study. e factors for generating flood susceptibility mapping are complex, so that their mechanisms remain under study [1]. From initial fifteen flood susceptibility factors, we excluded topographic position index (TPI), sediment transport index (STI), and terrain ruggedness index (TRI) as they had a negligible effect on flood occurrence based on LSA analysis. e analysis used included twelve factors. ey are rainfall, river, NDVI, lithology, elevation, aspect, curvature, TWI, soil, and SPI. e most important factor was rainfall for flood detection.
We used SNAP and ArcGIS together to preprocess, process, and postprocess a pair of Sentinel-1 data to extract areas prone to flood in the study area, where a total of 199 locations were finally identified and mapped, from which 20% (40 locations) were randomly selected to be validated in the study area. Validation of the selected locations was done through a handheld GPS and proved that the locations were extracted correctly. Because of the local scattered rainfall and upper cities' rainfall pouring into the study area, about fifty days were needed until the study area could clear from surface water and return back to the normal situation. e results of the time series analysis showed that the high and very high susceptible classes overlap with the flood prone areas.
We selected ADTree and bag-ADTree algorithms to map areas susceptible to flooding; however, ADTree model has been widely used for hazard studies but, in the current study, it was boosted using bagging model. In the models, the low susceptible area had the smallest areas of 355 km 2 and 294 km 2 , respectively. For the ADTree model, the moderate susceptible area with 529 km 2 is the biggest area, while the biggest area for bag-ADTree algorithm belongs to the high susceptible area with 564 km 2 . Despite these differences, based on the evaluation performance using a few statistical measurements, including RMSE, accuracy, and ROC curve, the models can map areas with high accuracy. RMSE for ADTree and bag-ADTree algorithms was 0.31 and 0.3, respectively. With 86.61% accuracy compared with 85.44% for ADTree, bag-ADTree model showed itself as a marginally superior technique to map areas prone to flooding. Success rates were also higher for the bag-ADTree model (0.736) compared to the ADTree model (0.714) [1,10]. Prediction rates were almost identical, but the bag-ADTree model came in marginally better with a ratio of 0.786, compared to 0.784 for the ADTree model. Overall, according to the obtained results, the authors declare that our objectives have been achieved in the current study.
One of the limitations of this study comes from the time series input data, which provides no information related to flooding depth but just the area of water. Another source of error is associated with the RADAR remote sensing that is sensitive to soil moisture such that nonflooded wet soils could be mistaken for flooded units. We hope that the findings of this study may help guide local policy and decision-makers to better cope with future floods. We recommended the use of our bag-ADTree model for future flood susceptibility modelling and that infiltration and permeability of area soils and rock be given more studies.

Data Availability
e applied data for current research is under copyright rule of University of Tabriz in Iran. e data used in this manuscript were collected from scihub.copernicus.eu website and preprocessed, processed, and postprocessed using SNAP software and ArcGIS software, which are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.