Wildfire Susceptibility Mapping Using Five Boosting Machine Learning Algorithms: The Case Study of the Mediterranean Region of Turkey

Forest fres caused by diferent environmental and human factors are responsible for the extensive destruction of natural and economic resources. Modern machine learning techniques have become popular in developing very accurate and precise susceptibility maps of various natural disasters to help reduce the occurrence of such calamities. Te present study has applied and tested multiple algorithms to map the areas susceptible to wildfre in the Mediterranean Region of Turkey. Besides, the performance of XGBoost, CatBoost, Gradient Boost, AdaBoost, and LightGBM methods for wildfre susceptibility mapping is also examined. Te results have revealed the higher testing accuracy of CatBoost (95.47%) algorithm, followed by LightGBM (94.70%), XGBoost (88.8%), AdaBoost (86.0%), and GBM (84.48%) algorithms. Resultant wildfre susceptibility maps provide proper inventories for forest engineers, planners, and local governments for future policies regarding disaster management in Turkey.


Introduction
Forest fres are critical natural disasters with severe ecological, economic, and social consequences worldwide [1,2]. In recent years, a signifcant increase in the frequency of wildfre incidents and the extent of afected areas has been observed, indicating the sternness of the problem. According to the European Forest Fire Information System (EFFIS), an area of 831.46 km 2 in Turkey was afected by wildfres in 2019, nearly double that of 2018, while the fgure reached 998.57 km 2 in 2020 [3]. Tese statistics show that Turkey's afected areas due to forest fres are increasing with time.
Research reveals that anthropogenic factors [4,5] and climate change [6][7][8] play a critical role in the frequency of wildfre occurrence and increase in afected areas. Wildfres are not only responsible for the mass destruction of forests but they also have several adverse efects on the natural environment, such as increased erosion risk [9][10][11][12], poor water quality [9], changes in land use [13], and elimination of wildlife [14]. Nevertheless, many Mediterranean plant species have evolved some form of adaptation mechanism for survival in a fre [15]. For instance, in the Mediterranean Region of Turkey, plants like Pinus brutia have become resistant to fre regimes [16].
Te fndings of previous literature disclose that multiple environmental and human factors may be responsible for triggering forest fres in the Mediterranean Region [17,18]. Diferent related studies have used data from various environmental parameters like elevation, aspect, slope, vegetation, temperature, humidity, and wind, along with human parameters like distance to roads, distance to settlement, and population, to fgure out the reasons for wildfres [19][20][21][22]. A review of related literature shows no consistently used model to evaluate fre risk analysis. Weights and variables in modelling indices may difer for diferent regions as wildfres have specifc characteristics worldwide [23,24]. According to the relevant literature, many other models have been used in forest fre risk analysis [25,26]. In various studies, a Geographic Information System (GIS) program is frequently used to process large datasets and produce useful fre susceptibility maps [27][28][29]. In addition, the use of satellite images in a GIS-based program for forest fre risk analysis is empirically tested to produce more robust results [30,31].
Statistical methods such as bivariate and multivariate analysis [32,33], multiple linear regression [34,35], and logistic regression [36][37][38][39] have been widely used for forest fre modelling. In recent years, machine learning algorithms in forest fre risk analysis have also gained popularity [40][41][42][43][44]. Rodrigues and De La Riva [45] developed random forest (RF), boosting regression trees (BRT), and support vector machine (SVM) algorithms in their study in the region covering almost the entire Spanish peninsula. It is concluded that the RF achieves the highest performance. Nelson et al. [46] compared CART, BRT, and RF in a study conducted in British Columbia, Canada, and found that the best performing model was BRT, followed by CART and RF. Analysis of related studies reveals that selecting an appropriate model for forest fre risk mapping is challenging as each algorithm's results vary from region to region [47].
Machine learning techniques have been applied and tested extensively in many empirical studies for developing susceptibility maps and accurate predictions of various natural calamities [48]. For example, Ma et al. [49] used an extreme gradient boosting (XGBoost) method for fash food risk assessment in Yunnan Province of southwest China. Te XGBoost method successfully identifed the relationships between selected factors and fash food events and outperformed the comparative LSSVM_RBF models. A study on landslide susceptibility mapping in Ulus district of Bartın Province in the Western Black Sea Region of Turkey [50] compared four new gradient boosting algorithms named gradient boosting machine (GBM), categorical boosting (CatBoost), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM). Te accuracy results revealed the highest predictive capacity of the CatBoost model.
On the contrary, the RF method was found to have the lowest predictive ability compared to ensemble methods, Wu et al. [51] created a map of landslide susceptibility using an alternative decision tree (ADTree) in Longxian County (Shaanxi Province, China). Additionally, they used new GIS-based ensemble techniques, including ADTree with bootstrapping (Bagging), adaptive boosting (AdaBoost), and ADTree. Te outcome showed that the ADTree-AdaBoost model had the best results. For the Tree Gorges Reservoir area in China, Chen et al. [52] created a landslide susceptibility map using three advanced machine learning methods of gradient boosting decision tree (GBDT), random forest (RF), and information value (InV) models. Among these compared models, the GBDT method showed the highest accuracy. Can et al. [53] used the extreme gradient boosting (XGBoost) method for landslide susceptibility mapping of the Atatürk Dam upper basin in Turkey. Te performance of the XGBoost algorithm was found to be high in various metrics. In the Wanzhou section of the Tree Gorges Reservoir area (China), a landslide susceptibility map was developed using the weighted gradient boosting decision tree (weighted GBDT) model by [54]. Te logistic regression (LR) model and gradient boosting decision tree (GBDT) model were also used for comparison in the study. Te results showed that the weighted GBDT model had the highest accuracy, followed by the GBDT and LR models. However, the weighted GBDT and GBDT models produced very similar results.
Dang et al. [55] used AdaBoost, XGBoost, RF, and multilayer perceptron (MLP) machine learning algorithms to identify commercial buildings at high fre risk in the Humberside area, UK. Te results revealed that AdaBoost's performance was better than other algorithms. In the Yunnan Province of China, Zhou et. al. [56] applied the CatBoost algorithm, for the risk estimation of forest fres. Te analysis was made using fve forest fre risk factors, and it was seen that the model and the actual fre points overlapped. Rosadi and Andriyani [57] compared the AdaBoost algorithm with decision tree and SVM method to predict the occurrence of forest fres. Te study explained that fuzzy cmeans clustering and AdaBoost methods provided good results in predicting forest fres. Michael et al. [58] used two satellite-derived measurements (NDVIW and NDVIT) in three machine algorithms (LR, RF, and XGBoost) to improve fre risk mapping. Te research conducted in a region of Greece determined that XGBoost model produced the best results.
Te review of recent studies reveals the application and performance of various algorithms to develop susceptibility maps of diferent natural disasters. However, most of these machine-learning methods are used in landslide susceptibility mapping. A literature gap is observed as very few studies have used and applied machine learning techniques to produce wildfre susceptibility maps. In addition, this research work is in the interest of and on the side of the United Nations' sustainable development 2030 agenda. Te present study aims to apply and test various algorithms to map the areas susceptible to wildfre in the Mediterranean Region of Turkey. Te formulation of susceptibility maps is essential to locate the areas prone to wildfre and take the necessary measures to avoid any mishaps in the future. Te main contribution of this article is to evaluate the performance of XGBoost, CatBoost, Gradient Boost, AdaBoost, and LightGBM methods for wildfre susceptibility mapping. To our knowledge of previous literature, none of the earlier studies investigated these models' performance for wildfre susceptibility mapping. Terefore, fnding the best performance model for wildfre susceptibility maps will help improve policymaking in the future to reduce the risks of forest fres.  Figure 1). Te selected area remains under the infuence of the Azores' High pressure in summer, while, in winter, typical climatic characteristics of the confuence of northern polar air mass and southern tropical air mass dominate the region [59,60]. Te area generally has a typical Mediterranean climate with hot and dry summers and mild, rainy winters. Te altitude reaches approximately 3500 meters from the seacoast. Te annual temperature varies from 12°C to 20°C, the annual precipitation average is 400-1200, and the relative humidity averages about 53-69 percent (http://www. mgm.gov.tr). In addition, prevailing dry and continuous north winds are considered highly responsible for forest fres, especially during fre periods. Te area has the highest number of endemic plants in Turkey due to the limestonecovered lands and climatic conditions suitable for karstifcation [61][62][63][64].
Te selected area extends over 66,014.26 km 2 , of which approximately 54% is covered by forests (Table 1). Te area is divided into diferent ecological regions based on the increase in altitude from the coastline. Some leading plant associations and forests in the study area are Quercus coccifera, Olea

Historical Forest Fires.
Te preparation of a historical forest fre inventory based on diferent sources (satellite images, feldwork, historical archives, etc.) is the frst step in modelling the susceptibility of forest fres [41,[67][68][69][70]. In this research, 3256 samples of past forest fre events were determined. Te historical forest fre dataset was generated using data from NASA's Fire Information for Resource Management System (FIRMS) (https://earthdata.nasa.gov/ frms) and NASA's Earth Observing System Data and Information System (EOSDIS). Te dataset of forest fre events was produced using near real-time (NRT) moderate resolution imaging spectroradiometer (MODIS), thermal anomalies/fre locations with 1 km spatial resolution from the Terra and Aqua platforms. Furthermore, the projection of this data is WGS84 and is known as MODIS Collection 61. Te historical scope of the dataset covers the period from April 2021 to August 2021, as the area was declared "Disaster Areas Afecting General Life" by AFAD. According to the dataset, the majority of forest fre events occurred in Mugla in August and Antalya in July. Nevertheless, almost no forest fre events occurred in Mersin in April and Osmaniye in August as shown in Figure 2 and illustrated in Table 2.

Forest Fire Conditioning Factors.
Te possibility of a wildfre occurrence is based on the environmental conditions of any forest area. Environmental conditions can be topographical, climatic, vegetation-related, and humanrelated. Tese classes are known as wildfre conditioning factors and are essential to generate a fnal susceptibility map [71]. Tirteen wildfre conditioning factors including elevation, slope degree, slope aspect, Topographical Wetness Index (TWI), annual mean temperature, annual mean relative humidity, annual mean wind speed, land use, distance from water bodies, distance from residential areas, distance from roads, Normalized Diference Vegetation Index (NDVI), and land surface temperature (LST), were selected and produced in the GIS framework as geospatial database (Figures 3-5).
Te LST is a critical indicator for a wide range of research topics. It represents the interaction of the atmosphere with the surface as well as the energy fow between them. Te LST has been calculated using thermal infrared data from polar orbit satellites and geostationary satellites. Various algorithms have been developed to overcome external infuences and retrieve LST data with high accuracy. In this research, the LST data from MODIS (moderate resolution imaging spectroradiometer) was calculated based on the generalized split-window (GSW) algorithm of [72]: In the formula, T s represents LST. ε � (ε i + ε j )/2, ∆ε � ε i − ε j (ε i andε j represent land surface emissivities in channels i and j). T i and T j are top of atmosphere (TOA) brightness temperatures measured in channels i and j. i � 31 and j � 32 for MODIS data. a 0 − a 6 are coefcients obtained from simulated data.
All the spatial variables are represented based on WGS 1984 Mercator coordinate system. Raw datasets were obtained from various data sources as shown in Table 3.
Initially, the digital elevation model (DEM) was acquired from ASTER (advanced spaceborne thermal emission and refection radiometer) as the GDEM (global digital elevation Advances in Civil Engineering model) with a 30 m spatial resolution. Topographical variables such as elevation, slope, aspect, and TWI (topographic wetness index) were generated from DEM. Te following equation was used to generate the TWI: In the equation, A s indicates the specifc catchment area (m/m 2 ) and "β" shows the angle unit of the slope degree.
Climatic elements observations for an extended period were obtained from MGM (Directorate General of Meteorology) for each meteorological station. Selected climatic variables of annual mean temperature, annual mean relative humidity, and annual mean wind speed were joined to station locations in the GIS environment. Afterwards, the database of the variables was produced using IDW (inverse distance weighted) interpolation method. Te vector-based data were acquired from the Turkish Ministry of Agriculture and Forestry to generate land use variables, distance from water bodies, and residential areas. Data were processed in GIS with tools such as rasterization for land use and Euclidean distance for the proximity of water reservoirs and settlements. Te road network data was downloaded from OSM (open street map) to produce distances from roads using the Euclidean distance tool in the GIS. All the variables were represented at the same spatial resolution (30 m). Contrariwise, the spatial variables obtained from MODIS (moderate resolution imaging spectroradiometer) were in diferent spatial resolutions, e.g., the resolutions of NDVI and LST were 250 m and 1 km, respectively.

Boosting Algorithms
(1) Gradient Boosting Machine (GBM). Gradient boosting machines (GBMs) are one of the unique machine learning algorithms that have shown signifcant success in many types of research. It was produced with a formula based on the gradient descent of the boosting methods to establish a statistical association within the studies. Tese boosting methods and related algorithms were named gradient boosting machines [73,74]. In GBMs, the learning system is concerned with sequentially ftting new models to ensure that the resulting response variable is a more accurate estimate. Te main goal of the algorithm is to construct new base learners in such a way that they are maximally correlated with the negative gradient of the ensemble-related loss function. In this aspect, GBMs have a signifcant record of success both in practical applications and in various machine learning and data mining challenges [75].
(2) Extreme Gradient Boosting (XGBoost). XGBoost is a gradient-boosting algorithm that combines weak learner predictions to get stronger learner predictions. Tis aspect has been frequently used by data scientists in research lately for better results [76]. In the XGBoost algorithm, CART acts as the base classifer. Te input sample of the following decision tree and the training and prediction results of the previous decision tree are associated with each other and   Advances in Civil Engineering decided jointly. In addition to solving regression and classifcation problems, it is a fexible algorithm according to its intended use [49].

(3) Light Gradient Boosting Machin (Light GBM).
LightGBM is a kind of gradient boosting decision tree. Tis algorithm is mainly utilised in classifcation, sorting, and regression. LightGBM uses a histogram-based algorithm to increase computational speed and reduce complexity. It supports algorithms such as GBM, GBDT, GBRT, and MART, and its accuracy and efciency are pretty high [77,78]. In LightGBM, gradient-based one-side sampling (GOSS) is one of the methods used to calculate information gain, so that less-trained instances contribute more to information gain [79]. With these aspects, Light GBM provides quick practice and more extensive performance, low memory usage, good accuracy, support of GPU learning, and capacity to process large-scale data [80].
(4) Categorical Boosting (CatBoost). Categorical boosting (Cat-Boost) was developed by [81]. CatBoost is a GBDTapplication in machine learning. Tis algorithm has two important features: ordered target statistics and ordered boosting. CatBoost is a good algorithm for solving complex data in a problem. However, it may not be very suitable for solving problems that are not too complicated [82]. Another major feature of CatBoost in problem solving is that it captures high-degree dependencies and uses combinations of categorical features [83].
(5) Adaptive Boosting (AdaBoost). Freund and Schapire developed the AdaBoost algorithm in 1995, whose weight can be regulated without the learner's requirement for prior knowledge [84]. Freund and Schapire developed the algorithm to solve the multiclass problem when there is a wide category in 1997 [85]. Since AdaBoost is an adaptive algorithm, it is among the most common boosting algorithms. In addition, AdaBoost is straightforward to use and practical in solving problems. It usually gives very efective results [86].

Importance of Wildfre Conditioning Factors.
Te importance degree of the wildfre conditioning factors used in the present study is given in Figure 6. It is observed that the importance of the elements had the same ranking in all the models. Wind speed was observed as the most important factor in all the models, followed by humidity, temperature, LST, distance from water bodies, distance from residential, elevation, slope, land use, NDVI, distance from roads, TWI, and aspect, respectively. Te AdaBoost model ignored the efects of some factors such as land use, NDVI, distance from roads, TWI, and aspect, while the GBM model disregarded the impact of the aspect.

Wildfre Susceptibility Models.
In this study, we evaluated the predictive performance of fve machine learning algorithms including LightGBM, GBM, XGBoost, AdaBoost, CatBoost in wildfre susceptibility mapping. Te prediction performance of all models was compared with each other. Tirteen conditioning factors were prepared to analyze the wildfre susceptibility of the study area. Afterwards, all the factors were extracted to a total of     Figure 6: Continued. and nonfre points (3036). Also, target points were separated as binary into 0 for nonfre and 1 for historical fre samples. Te models were trained with these sample points as the input dataset. Te input dataset was divided into 70% for training and 30% for validation. After the analysis, maps of the results of all selected models were produced in ArcMap. Te resultant wildfre susceptibility maps were classifed into fve classes: very low, low, medium, high, and very high, using the natural breaks classifer (Figure 7).
Te spatial distribution of forest fre classes according to the models is presented in Table 4 and Figure 8. It has been observed that the high and very high susceptibility classes share the total area of 25%, 11%, 23%, 78%, and 10% with the methods of XGBoost, CatBoost, GBM, AdaBoost, and LightGBM, respectively.

Evaluation and Comparison of the Wildfre Susceptibility
Models. Statistical evaluation of the selected models is presented in Table 5. Accordingly, all the models have shown high and acceptable accuracy in training and testing scores. Te training scores are found to be higher than the testing scores in all the models, validating that all the models have avoided the overftting problem.
According to the statistical measure evaluations, the testing scores reveal the better performance of CatBoost algorithm than the other models followed by LightGBM, XGBoost, AdaBoost, and GBM algorithms. Overall accuracy scores demonstrate that the CatBoost model correctly classifes the samples with 95% accuracy. Also, the CatBoost model has a more relevant sampling rate of 0.951 in precision and 0.954 in recall, with higher accuracy compared to the other models. According to F1 scores, the performance of the precision and recall measurements shows that the CatBoost (0.952) model reached higher accuracy, and it was followed up by the LightGBM   Figure 9). Te algorithms used in the present study have been applied in many predictive mapping studies of various natural processes because of their solid and separative prediction performance as an alternative to traditional statistical and machine learning methods. Te results of the present study are parallel to many previous studies which compare the efciency of diferent algorithms in susceptibility maps. Sahin [88] found that CatBoost model is superior in predicting landslide susceptibility areas in the Bolu region of Turkey. Similarly, Saber et al. [89] also appreciated CatBoost and LightGBM algorithms in fash food susceptibility. Zhou et al. [56] proposed a fre prediction model using the CatBoost algorithm for Yunnan Province in China. Te model that used inputs such as vegetation, meteorological, terrain, and human factors reached 0.83 AUC value. Te results indicated that the CatBoost model efectively predicted the risk of forest fre occurrence.
Te CatBoost algorithm is not tested extensively in previous studies on wildfre susceptibility despite its higher performance and accuracy in determining susceptibility maps. Hakim et al. [90] employed only AdaBoost and LogitBoost algorithms, while Arabameri et al. [91] tested VIKOR and Cforest models to create land subsidence susceptibility maps. Rosadi and Andriyani [57] applied the AdaBoost algorithm to predict the forest-fre occurrence and compared it with classical classifcation methods such as SVM (support vector machine) and decision tree methods. Michael et al. [58] investigated long-term vegetation condition efects on wildfre risk mapping by applying RF,

Conclusion
Wildfres are one of the most dangerous natural hazards for forest areas and habitats. Due to global warming, the frequency of forest fres has increased in the last decades, especially in the Mediterranean climate zones. Predictive susceptibility mapping for wildfres is an efective tool for planners and managers to prevent and protect against the undesirable efects of wildfres. Te reliability of the susceptibility maps difers from its input parameters and the methodology used. In recent years, ML-based predictive mapping studies have been increasing rapidly and gaining the trust of researchers. In the present study, we compared state-of-the-art ML algorithms such as XGBoost, CatBoost, GBM, AdaBoost, and LightGBM to produce wildfre susceptibility mapping for the Mediterranean Region of Turkey. To our best knowledge, no study compared these algorithms before in wildfre susceptibility mapping literature. For the analysis, the thirteen input parameters were used: elevation, slope degree, slope aspect, TWI, temperature, humidity, wind speed, land use, distance from water bodies, distance from residential, distance from roads, and NDVI LST. According to the order of importance of the factors, while wind speed is the most crucial factor, the aspect is the least important in all the models. After producing the susceptibility maps, statistical accuracy assessment techniques such as overall accuracy, precision, recall, sensitivity, specifcity,  14 Advances in Civil Engineering AUC, F1 score, and Kappa Index were applied. Te results showed that the CatBoost algorithm had higher accuracy than the other models, followed by LightGBM, XGBoost, AdaBoost, and GBM algorithms. However, all the models have revealed reasonably good AUC measurement performance: 0.955, 0.888, 0.859, and 0.846, respectively. Te present study has several limitations. First, the spatial interpolation technique produced wind speed, humidity, and temperature factors. Te interpolation technique has disadvantages such as dependence on sample locations, generalization, and ignoring the geomorphological conditions. Terefore, some parameters may not fully refect the actual climatic conditions of the selected area. Second, the present study does not depict human-induced factors of wildfre due to the absence of spatial dimension of such data. Tird, the present study only provides an ML-based evaluation to give some idea about wildfre-sensitive areas in the study area. Terefore, future wildfres should be followed and examined, considering the maps produced in this study. One of the main limitations is the contradiction between the large-scale areas and the required details for the modelling. Terefore, explaining the factors during the model development becomes more complicated due to the missing details. In addition, the requirement of high computation capacity made the modelling challenging to deal with all the available resolutions.
Te present study is considered novel research in producing hotspots and wildfre susceptibility maps of Turkey's Mediterranean Region. Te maps are expected to provide valuable inventories for forest engineers, planners, and local governments for future policies regarding disaster management in Turkey. Besides, the present study also provides a comparative analysis of relatively new ML algorithms such as XGBoost, Cat-Boost, GBM, AdaBoost, and LightGBM for wildfre susceptibility mapping research. Suggestions for future research include comparing the methods used in this study with other statistical and ML-based methods and using diferent input parameters in the models. Tis research recommends future research to use the cloud computing platforms such as Google Earth engine, Google Colab, Amazon AWAS, and Kaggle for performing the modelling of wildfre susceptibility mapping.

Data Availability
All the relevant data are included in the article.

Disclosure
Tis work was performed as part of the duties of the authors as well as a collaboration between the International College for Engineering and Management (ICEM) and Karabuk University (KBU).