Artificial Intelligence in Geospatial Analysis for Flood Vulnerability Assessment: A Case of Dire Dawa Watershed, Awash Basin, Ethiopia

This study presents the novelty artificial intelligence in geospatial analysis for flood vulnerability assessment in Dire Dawa, Ethiopia. Flood-causing factors such as rainfall, slope, LULC, elevation NDVI, TWI, SAVI, K-factor, R-factor, river distance, geomorphology, road distance, SPI, and population density were used to train the ANN model. The weights were generated in the ANN model and prioritized. Initial values were randomly assigned to the NN and trained with the feedforward processes. Ground-truthing points collected from the historical flood events of 2006 were used as targeting data during the training. A rough flood hazard map generated in feedforward was compared with the actual data, and the errors were propagated back into the NN with the backpropagation technique, and this step was repeated until a good agreement was made between the result of the GIS-ANN and the historical flood events. The results were overlapped with ground-truthing points at 88.46% and 89.15% agreement during training and validation periods. Therefore, the application of the GIS-ANN for the assessment of flood vulnerable zones for this city and its catchment was successful. The result of this study can also be further considered along with the city and its catchment for practical flood management.


Introduction
Flood is one of the natural hazards that happens when either the capacity of the river bank or the infiltration capacity of soil is less than the intensity of the rainfall [1,2]. is natural hazard occurs when there is torrential rainfall that lasts for a few minutes/hours, resulting in the overflow of the natural river banks. e natural factors and influences of human activities can derive this natural hazard to happen [3]. Floods can harm both properties and human lives [4]. e impacts of the flood are histrionically increasing worldwide [5]. is natural hazard can affect both the residents of urban and rural areas, and the magnitude of the impacts is relatively high in urban areas. According to the global natural disaster reports, over 2.4 billion people have been affected, while about 165,020 people lost their lives due to this natural hazard between the years 2019 and 2020 as approximated by the United Nations (UN). In Africa, over the past two decades of the 2021 century, floods have caused approximately about 280$ billion economic damage. e African countries such as Uganda, Burundi, Djibouti, Kenya, Rwanda, and Somalia have experienced overwhelming floods in the last few years [6][7][8]. In Ethiopia, which is one of the East African countries, more than 1.1 million people have been affected by flooding. A bunch of studies forecasts and predicts that the severity of this natural disaster would be rise significantly in the climate change scenarios [9][10][11]. Currently, researchers have been emphasizing a long-term contribution to sustainable disaster risk management using geospatial analysis. e effective flood risk management strategy targets to reduce the loss of human lives and properties in advance of the co-occurrence of the disaster. e reduction and mitigation strategies require high performance and accuracy in spatial analysis. Geospatial analysis for effective flood risk planning is intensively used as a significant tool in handling flood hazards [12,13]. Due to the absence of geospatialbased flood risk planning in most of the poor countries such as Ethiopia, the impacts of this natural hazard are doubling. Up to recent years, the geographic information system (GIS) and remote sensing (RS) technologies played a vital role in planning for flood disaster risks in urban watersheds [14]. Geospatial technology played a vital role in flood vulnerability assessment for the past few decades and provided the best possible results in making a decision on flood risk management [15]. As the flood risk is a function of spatial analysis, the application of GIS and RS techniques is very significant to flood management strategies. e other capability of GIS is its suitability for the processing of spatially varied physical factors in flood risk assessment. Flood vulnerability analysis and mapping are intensively done by the researchers of natural hazard management in urban catchments as it gives information on the severity of the flood. Geospatial analysis and multicriteria analysis (MCA) are integrated approaches that got international attention in deciding regarding the complicated interrelationship between the physical, social aspects, and economic issues of floods. Pieces of research studies are available in which the MCA and GIS are linked in geospatial analysis. Flood vulnerability analysis is a function of different factors such as hydrologic factors (rainfall, stream power index (SPI), and stream networks), morphometric factors (elevation, slope, landforms, and distance from river), permeability factors (soil type, topographic wetness index (TWI), soil erodibility factor (K), and rainfall erosivity factor (R)), surface dynamics (LULC, soil-adjusted vegetation index (SAVI), normalized difference vegetation index (NDVI)), and anthropogenic influence (population density), and the significance of these factors is prioritized and weights are given in MCA by the method called the analytical hierarchy process (AHP). Nowadays, the application of AHP in prioritizing flood driving factors is lacking its novelty after the application of an artificial begun in areas of geospatial analysis. e artificial neural network (ANN) is a machinelearning approach that uses the processing of the human brain as a basis to develop algorithms that can be used to model complicated relationships among the spatial phenomenon and assess the spatial vulnerability of floods. e ANN is an empirical modelling technique that can solve the complicated relationship between the physical and nonphysical phenomena. Currently, a bunch of research studies is available in which the ANN was used as a modelling tool in areas of hydrological modelling. Wahab and Muhamad Ludin [16] assessed flood vulnerability using the ANN model, and a good result was obtained. It is the novelty of the ANN model that it can easily capture the complicated characteristics of both factual and value-based information that cannot be handled by traditional geospatial techniques. In literature, it was observed that the scholars of hydrological modelling intensively used the ANN model for flood forecasting worldwide [17][18][19]; however, the application of this newly emerged approach is very rare in areas of geospatial analysis. e current study area (Dire Dawa) is located in the east-central part of Ethiopia, and it is one of the most flood-prone cities when compared to other cities of the country. In Ethiopia, the intensity of rain is very high between June and September for the consecutive three months, and the majority of the flood events were observed in one of these months. In Ethiopian history, the flood event that occurred on 5 th and 6 th August in 2006 displaced 3000 people and damaged the lives of 200 people in the city (Dire Dawa) over one night. As witnessed by [20,21], the city was flooded due to the torrential rainfall from upstream highlands. e assessment of flood vulnerability using different criteria such as hydrologic, morphometric, permeability, anthropogenic, and surface dynamic change of the city and its watershed is vital as it can provide information about the spatial severity of the flood. erefore, this study aimed to present the novelty of the ANN model and geospatial analysis to assess the flood vulnerability in Dire Dawa city, Ethiopia.

Study Area.
e current was conducted in Dire Dawa city, which is one of the cities in Ethiopia. Dire Dawa is situated in eastern-central of Ethiopia, and it is the city which links Ethiopia with Djibouti. e city is geographically located between 9°25′N and 9°45′N latitude and 41°40′E and 42°50′E longitude (Figure 1). e city is surrounded by mountainous areas. e elevation of the entire parts of the city ranges from 1000 to 1600 m above the sea level, and the flood comes from the mountains of the upper catchment. Dhangago is the highest mountain found at the upstream of the city. e urban land of the city is divided by several streams such as Dachatu, Goro, Malka Labu, Laga Hare, and Butuji. ese small rivers drain the entire parts of the city. e city is bordered in the north, east, and west by Somali Regional State and in the south and southwest by Oromia Regional State [21]. In this city, a total population of 400,000 was estimated in 2007 and became 466,000 after one year (2008). is city is known as the most flood-prone area when compared to other cities of the country. In history, the 5 th and 6 th August 2006 were known as black days of the city that very heavy rain caused flooding and both human lives and properties were damaged. According to the national flood report, over 3000 people were displaced, while about 200 people lost their lives.

Data and Sources
To achieve the objective of this study, data under five criteria, namely, hydrologic criteria, morphometric criteria, permeability criteria, surface dynamic change, and anthropogenic influences were confirmed as flood vulnerability assessing principles, and the detail of these data is presented in Table 1.
e data used in this study were derived from the digital elevation model (DEM), river networks, soil map, Landsat images, geology, landforms, rainfall, and population number. e types of data, sources, purpose, spatial, and temporal resolution of the individual each data are provided in Table 1.

Methodology
is study employed the novelty of an integrated ANN model and geospatial analysis approach [16,[22][23][24] to assess the vulnerability of flood in Dire Dawa city and its catchments. e assessment of flood vulnerability is a dynamic and complicated phenomenon. e artificial neural network (ANN) is a machine-learning method used to solve complicated and multiple criteria by prioritizing the significance of the individual criteria. e overall procedures used in this study were as follows: selection of flood deriving factors (criteria), preprocessing of the selected criteria and normalizing, selecting the best ANN architecture and setting up the model, assigning random weight values to the selected ANN network, feedforward training, backpropagation, testing the model, and finally generating the flood vulnerability zones using the updated weights in an overlay analysis. e current study uses the ANN multilayer perceptron (ANN-MLP) architecture that consists of three layers; the input layer (the connection between input nodes and hidden nodes), hidden layer (the connection between hidden nodes and output nodes), and output layer (the last node in the network). Before assigning the input data into the input nodes, the derived input for each criterion (the third column in Table 1) was prepared and resampled using a spatial analyst tool in a GIS environment on the same spatial resolution of 30 m × 30 m. e resampled input parameters were normalized, and the values in each pixel were squashed between 0 and 1. e reason why normalization is important is to minimize the training time. Once the values in each pixel of the resampled criterion were normalized using equation (1), the random weights were assigned to networks to start with the training processes.
where X is the normalized input parameters entered into the networks, and X min and X max are the minimum and maximum values of the normalized input parameters. e raw datasets and the derived input parameters were linked into the NN. e input parameters were selected on the basis of hydrologic, morphometric, permeability, and surface dynamic change and census criteria. ematic maps were prepared for the individual criteria and prioritized based on their importance of flood vulnerability assessment. e key significant factors of flood vulnerability analysis of the existing data regarding rainfall intensity, land use/land cover (LULC), topographic wetness index (TWI), stream power  index (SPI), soil classification, rainfall erosivity factor (Rfactor), normalized difference vegetation index (NDVI), soil-adjusted vegetation index (SAVI), population density, areal rainfall, slope, and stream distance of the study area are shown in Figures 2 and 3. e prioritizing of the selected input parameters was done in the ANN as shown in Figure 3 through weighting the criteria. e final weights of the individual criteria were obtained after training the networks by the method called the backpropagation algorithm. For the training purposes, 26 points were selected randomly (nonflood-prone areas and flood-prone areas), while 20 points of flood events (floodprone areas) were selected for testing the vulnerability map generated in the ANN. An overlay analysis in the GIS environment was used to redo the generation of the flood vulnerability map until the agreement between the selected points and the flood vulnerability map was made [19,25,26].

Flood Vulnerability Assessment in the ANN.
e application of geospatial for flood vulnerability assessment is intensively used worldwide; however, the application of machine-learning (ANN) is still emerging. Up to now, very limited studies are available in which the artificial neural network (ANN) was applied as a flood assessing tool. e hydrologic, morphometric, permeability, surface dynamic change, and the status of the population were used as information to train the ANN model. e random weight values initially assigned to the NN were systematically trained and updated. e information retrieved from the geospatial characteristics of the criteria was reused in the training processes until the derived flood vulnerability map assessed in the GIS-ANN method agreed with the flood events linked into the results based on the ground-truthing points. e general flowchart of fixing the weights for the selected flood vulnerability assessment criteria is shown in        . As we can see from figure, the geospatial input parameters were linked to the NN, and the weighted sum of the initial weights and input parameters were sent to the hidden nodes for a further process called activation (equation (2)). After hidden nodes, the networks again send the weighted sum of activated values to the output nodes, and at this stage, a rough result is obtained. Once the rough result is obtained in the final node (output layer), the backpropagation process will continue to reduce the difference between the rough result obtained in NN and the target values. At this stage, the pixel-to-pixel values of the roughly generated flood vulnerability map in the ANN are compared to the map generated based on flood events (ground-truthing values) recorded in the city. e NN keeps training until an agreement is made between the ANN result and ground-truthing values. e feedforward process receives the updated values of weights from the backpropagation, and this will continue till both results (GIS-ANN and ground-truthing) were agreed upon. After training, the result was further evaluated and checked by other floodplain areas.
To check the performance of the GIS-ANN method in assessing flood vulnerability, the result obtained in the GIS-ANN and the floodplain areas identified for the testing purpose were compared based on the pie chart plots and overlapping the result with ground-truthing points. e pie chart plots are based on plotting the degree of importance of each criterion with the flood levels generated in the GIS-ANN, where the visualization is based on the percentage of overlapping (equation (3)) points between the generated flood vulnerable areas and flood events (points). e detailed training process implemented in this study is shown in Figure 5.
Points were overlapped with the flood vulnerable areas, and the overlapped points were counted. e ratio of the number of the counted points to the total number of the points was calculated by the intersection analyst tool in the GIS environment [27].

Results and Discussion
In this study, the novelty of artificial intelligence in geospatial analysis for flood vulnerability assessment in Dire Dawa watershed, Awash basin, Ethiopia, is presented. e current study used flood causing factors such as hydrologic      factors (rainfall, river distance from the settlement, and stream power index), morphometric factors (elevation, slope, geomorphology, and distance of roads from rivers), permeability factors (soil erodibility factor, rainfall erosivity, and topographic wetness index), surface dynamic factors (land use/land cover, normalized difference vegetation index, and soil-adjusted vegetation index), and census status (population density around the flood-prone areas) for the assessment of flood vulnerable areas in Dire Dawa city and its catchments. e raw datasets were processed in a GIS environment and sent to the ANN model to prioritize (weighting) based on their importance for the assessment of flood vulnerable areas. e weights of the individual criteria selected in the study were fixed based on the feedforward and backpropagation processes. e reclassified maps of flood-causing factors derived from the existing data (Figures 2 and 3) are shown in Figures 6  and 7.
As we can see from the reclassified maps, the categories or classifications presented in all maps is based on the studies conducted in different parts of the world and consulted in different pieces of literature [4,9,28,29]. Random values of weights were assigned to the NN. e current study used the ANN multilayer perceptron (ANN-MLP) architecture as shown in Figure 4. e training process was started with the initial random values of weights assigned in R programming and activated in the hidden nodes with the sigmoid activation function (equation (2)). With the initial values of weights, a rough flood vulnerable map was obtained from the first feedforward training as shown in Figure 5 and compared to the ground-truthing points, and the error was propagated back into the NN. During the training processes, two major activities were simultaneously performed: the NN improves the weight assigned for the individual criteria and flood vulnerable (flood-prone areas) was generated using the improved weights in the GIS environment with the overlay analyst tool. In the same fashion, the second round of the training process was performed, and floodprone areas were generated. ese steps were repeated until the final map, as shown in Figure 5, which is the best map when compared to the historical flood points. e weights generated in the ANN model were used to prioritize the individual factor. Wahab and Muhamad Ludin [16] ranked the key significant factors and assigned percentage for each flood-prone classification. For this specific study, rainfall, slope, elevation, and LULC revealed the most significant for the assessment of flood hazard zones, and factors such as rainfall erosivity factor (R-factor), soil erodibility factor (K-factor), topographic wetness index (TWI), stream power index (SPI), normalized difference vegetation index (NDVI), soil-adjusted vegetation index (SAVI), and road distance from the settlement showed high to moderate significance, whereas the remaining flood deriving factors such as geomorphology and census status relatively showed less significance. e vulnerable areas were identified using the weights updated in the ANN. e final updated values of weights were used in overlay analysis in the GIS environment. Sarkar and Mondal [4] used an overlay analysis to generate flood vulnerable areas with the values prioritized in the AHP technique, and five flood hazard zones First, the flood hazard zones were generated at the catchment level and extracted for the city. e general flood vulnerable zones identified at the catchment and city level are shown in Figure 9.
Five qualitative-based flood vulnerable zones were identified in this study, as shown in Figure 9. As we can see from this figure, flood-prone areas based on the severity of the flood as very high (red), high (yellow), moderate (light-    Figure 10.
e areas under floodplain are due to the torrential rainfall from the upstream of the city. As we have already described the topographic characteristics of the current study area, the upper part of the catchment is surrounded by mountainous regions. As a witness to the flood events of 2006 [21,30], the city was flooded as a result of the heavy rain from the mountainous areas.
e results obtained in the GIS-ANN method were overlapped with the ground-truthing points for further evaluation. A total of 26 points were collected from floodplains; 88.46% (23 points) and 89.15% were fully overlapped with the generated flood hazard zones under very high flood levels during the training and validation periods, respectively.

Conclusion
In this study, the novelty of artificial intelligence in geospatial analysis for flood vulnerability assessment in Dire Dawa watershed, Awash basin, Ethiopia, is presented. To achieve the objective of the study, five major criteria such as hydrologic criteria (rainfall, stream power index (SPPI), and river Euclidean distance from settlement), morphometric criteria (elevation, road distance from the floodplains, slope, and geomorphology), soil permeability criteria (soil erodibility factor (K-factor), topographic wetness index (TWI), and rainfall erosivity factor (R-factor)), surface dynamic charge criteria (LULC, normalized difference vegetation index (NDVI), and soil-adjusted vegetation index (SAVI)), and census status (population density) were confirmed as flood causing factors to train the ANN model. e raw datasets of the selected flood causing factors were preprocessed and resampled in a spatial analyst tool in a GIS environment and distributed over 30 m × 30 m spatial resolution. e significance of the selected flood causing factors was prioritized in the ANN model with feedforward and backpropagation training processes. e neural networks (NNs) were initially assigned by random values of weights and started the feedforward training processes. With the initial values of weights and feedforward training processes, rough flood hazard zones were prepared, and at this stage, the error was very high. e backpropagation process was started targeting the ground-truthing points collected from the historical flood events, and the process was repeated until the error between the GIS-ANN result and the actual data mage an agreement. Accordingly, flood vulnerable zones classified under five flood levels, namely, very low, low, moderate, high, and very high were identified. e performance of GIS-ANN results was further evaluated by overlapping the ground-truthing points, and about 88.46% agreement was made. Among the flood causing criteria selected for the assessment of flood vulnerable zones, LULC, rainfall, elevation, and slope are the most important, and this is due to the topographic condition of the city. erefore, the application of an integrated artificial intelligence and geospatial analysis for the assessment of flood vulnerable zones was successful.

Data Availability
All data generated during the manuscript analysis are included within the article. Furthermore, datasets are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
e correspondent author initiated the research idea, reviewed relevant literature, designed the methods, collected field data, involved in data cleaning, analyzed, interpreted, and prepared draft manuscripts for publication. e coauthor evaluated the research idea, supervised the whole research activities, and developed the manuscript. All authors read and approved the final manuscript.