Innovative Landslide Susceptibility Mapping Portrayed by CA-AQD and K-Means Clustering Algorithms

)is study aims at proposing and designing an improved clustering algorithm for assessing landslide susceptibility using an integration of a Chameleon algorithm and an adaptive quadratic distance (CA-AQD algorithm). It targets improving the prediction capacity of clustering algorithms in landslide susceptibility modelling by overcoming the limitations found in present clustering models, including strong dependence on the initial partition, noise, and outliers as well as difficulties in quantifying the triggering factors (such as rainfall/precipitation). )e model was implemented in Baota District, Shaanxi province, China. )e CA-AQD algorithm was adopted to split all grids in the study area into many groups with more similar characteristic values, which also owed to efficiently quantifying the uncertain (rainfall) value by using AQD. )e K-means algorithm divides these groups into five susceptibility classes according to the values of landslide density in each group. )e model was then evaluated using statistical metrics and the performance was validated and compared to that of the traditional Chameleon algorithm and KPSO algorithm.)e results show that the CA-AQD algorithm attained the best performance in assessing landslide susceptibility in the study area. )us, this work adds to the literature by introducing the first empirical integration and application of the CAAQD algorithm to the assessment of landslides in the study area, which then is a new insight to the field. Also, the method can be helpful for dealing with landslides for better social and economic development.


Introduction
Landslides are one of the world's threatening natural hazards, which are regarded as the part of the masses of rock, compound, or soil that falls down a steep slope [1]. De-termining the probability of landslide may be classified into two: (1) the intrinsic causes that accelerate slope failure, such as geological and morphological properties and (2) the extrinsic causes that change the slope from being in a marginally stable state to an unstable state, such as rainfall, earthquake shaking, and human activity [2].
In China, according to related reports [3], nearly all landslides are directly triggered by or related to rainfall. Being regional, aggregate, abrupt, and disastrous characteristics, rainfall-triggered, these landslides have imposed tremendous threats to lives of the people as well as economic activities [4] To diminish the likelihood of damage caused by landslides, it is significantly important to come up with accurate and effective methods for mapping landslide susceptibility to ensure safe environment and steadiness of economic activities and support reliable hazard prevention and reduction [5].
With the technological advancement of remote-sensing (RS) images and geographic information system (GIS) data processing, it has become easier to obtain significant information to analyze landslide susceptibility. Considerable effort has been posed to develop the assessment of landslide susceptibility using RS and GIS technology. Initially, specialists created susceptibility maps to provide an inventory of landslides using qualitative overlays of topographical and geological attributes [6]. Later, more landslide susceptibility assessment methods were developed in specific areas by applying deterministic approaches, statistical analyses, and computation intelligence method [7,8]. Given that appropriate soils and rocks engineering data, slope geometry, discontinuity features, and hydrological factors are required to compute the resisting and driving forces association, deterministic models have been limited to small study areas [9,10]. Statistical models, such as linear and logistic regression [11][12][13], bivariate statistical models [14][15][16], frequency ratio [17][18][19][20][21], and weight of evidence models [22][23][24], have been applied widely to the field of constructing assessment models for landslide susceptibility. ese models, however, cannot easily determine the relationship between significant landslide-influencing factors and complicated landslide systems [25].
Prediction models of landslides based on classification algorithms in data mining can overcome such difficulties. Specifically, being interested in coming up with ideal learning methods to determine the nonlinear relationship among landslides and the environmental factors [26], many researchers have successfully adopted them, for example, support vector machine [27][28][29], decision tree [30][31][32][33][34][35], naïve Bayesian [36,37], artificial neural networks [38][39][40][41], random forest models [42,43], and others, to construct landslide susceptibility map. ese models, however, depend on a big training data set to improve prediction accuracy. Training data sets need geoscientists or engineers to survey landslide sites, which, in reality, are not easy to be, in particular, to capture rainfall information.
Clustering analysis algorithms classify sets of objects (grids) into groups that are more similar to each other than they are to objects in other clusters (groups) [44]. is process is conducted by primitive observation with little or no prior knowledge; that is, it is unsupervised learning. Because of its advantages, several researchers have been interested in applying K-means [45,46], fuzzy C-means (FCM) [47], and K-means particle swarm optimization (KPSO) [47,48] to assess landslide susceptibility. K-means and FCM algorithms can be effective if the choice of initial partitions in the prediction model is correct [49]. In fact, these parameter thresholds (every clustering center) are not easy to be set in large data sets, precisely, large study areas [50]. KPSO can break away from initial partitions dependence using iterations to identify the best cluster partitions, but it is sensitive to data clusters that have diverse shapes, densities, and sizes (called outliers and noise) [51], which restricts the advantages of using KPSO to assess landslide susceptibility in large study areas. Fortunately, the Chameleon algorithm [51] separates itself from initial partition, noise, and outlier dependence, by merging the clusters using a dynamic model to find natural and homogeneous clusters.
e Chameleon algorithm, however, regards rainfall as the average value or discrete value, which leads to a distortion of value [33] and influences clustering results. e advantage of the adaptive quadratic distance [52] is changing at every iteration of the algorithm being either similar for all clusters or may change from one cluster to another; thus, we integrated AQD and Chameleon algorithm to construct the spatial prediction model that could be available to any of the study areas and would classify more similar groups with topography and geology from all objects. Finally, the landslide density of each group can be calculated by the sorting tool in ArcGIS. en, the K-means algorithm [53] was adopted to assign these groups to five susceptibility classes (very high, high, moderate, low, and very low) with the values of landslide density in each group. e algorithms will be incorporated into landslide susceptibility mapping in the study area of Baota District, China. More details of this study will be described in the following sections.

Study Area.
e study area is part of the Loess Plateau, sited in the northern part of Shaanxi province and situated between latitude 36°11′ and 37°02′N, and longitude 109°14′ and 110°07′E. It encompasses 3,556 km 2 and is often prone to landslides that are usually triggered by rainfall. Yanhe River is bounded to the north and Fenchuan River extends to the south of the study area ( Figure 1). e geomorphic of the study area is characterized as undulate slopes and ravines, with elevation values ranging between 800 and 1,800 m. e annual mean temperature is 10°C. Historical data shows that the highest amount of rainfall varies from 114 mm to 460 mm between June and October with an annual average of 550 mm. e landslides survey data indicated that 71.4% of the total rainfall and 84.6% of the landslides occur in this area between June and October. Figure 2 depicts how landslides relate to rainfall in the study area [54].

Landslide Inventory Map.
e landslide inventory is an important factor in portraying landslide susceptibility mapping. e Xi'an Center for Geological Survey (CGS) has been done through landslide surveys in Baota District, which includes interpretation of SPOT-5 satellite images for the whole area and of QuickBird satellite images which covered 225 km 2 of the urban area. CGS constructed a landslide inventory of the study area using aerial photos. From the study area, landslides were portrayed and analyzed at 1,081 locations, and 428 landslides were surveyed. Most of the landslides were prompted by rain, and 293 landslides (see the right-hand side of Figure 1) had recorded precipitation information. Most of the landslides were scattered along the sides of the Yanhe and the Fenchuan Rivers. Nearly the entire area was covered by Quaternary loess and thick, loose loess deposits; thus, almost all the landslides were soil landslides. Most of the landslides were medium-scale landslides with a sliding body volume between 10 1 × 10 4 and 10 2 × 10 4 m 3 .
e number of small-scale landslides accounted for 30.7% of the landslides, with a sliding body volume of less than 10 1 × 10 4 m 3 . Large-scale landslides had a sliding body volume between 10 2 × 10 4 and 10 3 × 10 4 m 3 and accounted for only 16.7% of the landslides [54]. e dominion of landslides is shown in Table 1.

Data Preparation.
Previous studies [54] have classified landslide conditioning factors in the study area into four groups: topography, geology, underlying surface, and triggering factor.

Advances in Civil Engineering 3
Topography factors, which depict the geomorphologic and topographic characters in the study area [5], comprise elevation, slope angle, slope aspect, and profile curvature. We derived these data layers (Figures 3(a)-3(d)) from a digital elevation model (DEM) with a resolution of 25 m, which was constructed from the topographic maps at a scale of 1 : 50,000. By computing and analyzing these data layers, previous research results [54] showed that the stability of slope angle and elevation ranged from 25°to 55°, 20 m to 120 m, respectively. e probability of landslides to occur was greater along the shaded slope. e geological data layer was gained by digitizing a geological map at a scale of 1 : 50,000 ( Figure 4) supported by CGS. e geomorphological layer includes Jurassic, Triassic, Neogene, as well as Quaternary strata, whereby the Quaternary loess and the Neogene red clay are exposed to landslides [54]. Because of thick, loose loess deposits in the Baota District, landslides and mudflows occur frequently, making the district more prone to landslides. erefore, we selected the rock-soil structure to evaluate the geologic condition.
Because of being considered as a factor of underlying surface in the previous study [54], vegetation cover data layer ( Figure 3(e)) was extracted using Enhanced ematic Mapper Plus (ETM+) RS images. e vegetation coverage was more than 60% in the southern area. According to a field survey, landslides were scarce; conversely, poor vegetation and extensive landslides were found in the north [54].
Rain penetrates into rock and soil and erodes them into fractures as a result that the average rainfall has the erosive capability for them. In 19 rainfall stations of the study area, maximal rainfall and minimal rainfall for every month were recorded by CGS. e rainfall map (Figure 3(f )) was constructed by obtaining the maximal average month rainfall in July during 2017 to 2018. e rainfall classes are defined in Table 2.

Chameleon Algorithm.
Chameleon is a clustering algorithm that explores dynamic modelling in hierarchical clustering. In its clustering process, two clusters are merged if the interconnectivity and closeness (proximity) between two clusters are highly related to the internal interconnectivity and closeness of objects within the clusters. e merging process based on a dynamic model facilitates the discovery of natural and homogeneous clusters and applies to all types of data as long as a similarity function is specified [51]. To its advantage, the Chameleon algorithm is a usersupplied model, static independent, as well as adapting to the internal characteristics of the clusters of being independent of the initial partitions, as well as insensitive to noise and outliers [44]; thus, we used Chameleon algorithm to assess the landslide susceptibility. In general, for landslide susceptibility assessment using the clustering algorithm, an object is regarded as a grid. In the Chameleon algorithm, an object is considered as a node of the weighted graph. e main concept of the Chameleon algorithm is presented in Figure 5. Firstly, to cluster the data nodes into a big number of relatively small subgroups, the algorithm applies a graph partitioning algorithm. en, by combining or merging the subgroups in an iterative process, the algorithm uses an agglomerative hierarchical clustering algorithm to identify the genuine clusters. It then takes into consideration especially the internal characteristics of the groups themselves (both the interconnectivity as well as the closeness of the clusters) in discovering the pairs of most similar subgroups. us, to this end, the algorithm is not a static, usersupplied model dependent and can automatically conform to the internal characteristics of the merged groups. Definition 1. Suppose i, j are nodes in a weighed graph with n-dimensional data; then, the Euclidean distance between them is that is used to define modularity to evaluate clustering results. Definition 4. If two nodes have exactly the same adjacent nodes, then the two nodes are called structural equivalents. e equivalence similarity of the structure is calculated as follows: e main processes of the Chameleon algorithm are as follows: (1) Build a weighted graph, the number of initialized clusters is n; that is, each node is a cluster (2)   Advances in Civil Engineering only in a range of $160,000-200,000. Similarly, a rainfall value in one or more days cannot be determined as a specific value and can be determined only in a range say, of 10 mm to 20 mm. For an uncertain data model (interval-value datum), the uncertain data is fixed in a two-dimensional array with the lower and upper bounds of the interval (x ij , x ij , resp.).

CA-AQD Algorithm.
In the Chameleon algorithm, the similarity of two nodes depends on their Euclidean distance values. e traditional Euclidean distance formula can process only those nodes whose values are continuous and discrete [50]. In the rainfall-induced regional landslide hazard assessment model, the data types of the node's attributes (landslide conditioning factors) include discrete (slope aspect), continuous (slope height), and uncertain (rainfall) values [37]. e traditional Euclidean distance formula, however, cannot describe nodes with uncertain data (rainfall value). To remedy the week point, the AQD distance between two nodes x, y is calculated in this paper.
e Unabridged formula can be found in the literature [52]. Its basic definition follows.
We assume the clustering model p k (k � 1, . . . , k) can be regarded as a vector of intervals y k (y 1 k , y 2 k , . . . , y p k ). Additionally, the vector of intervals can be disposed of a twodimensional matrix. For example, there is a vector space Ω with n objects 1, . . . , n In a similar way, the prototype of a cluster p k can be disposed by a vector of intervals y k � (y 1 k , y 2 k , . . . , y j k ] ∈ I. en, we use x iL , x iL as the lower and upper boundaries of the intervals, respectively, to describe x i . We also consider y kL , y kU indicating two vectors, respectively, of the lower and upper boundaries of the intervals of y k . is means we also solve the prototype p k . e equation of adaptive quadratic distance is follows: where M is a full positive definite symmetric matrix, Equation (3) can be used to calculate the distance between two nodes whose values are uncertain. We replaced the Euclidean distance formula by equation (3) to obtain the weight and structure equivalent similarity between nodes in the Chameleon algorithm, which called the Chameleon adopted adaptive quadratic distance (CA-AQD) algorithm.

K-Means
Algorithm. K-means algorithm yields better performance in the case of inputting the k of clusters and randomly choosing the center of clusters in advance; thus, we adopted it to classify the landslide density of each group into five clusters (susceptibility classes). Its main steps are as follows:  (1) Arbitrarily set k objects as the initial cluster centers (2) Classify every object to the cluster to which the object is the closest to the cluster centers (3) Calculate the new cluster centers and update them for new cluster centers (4) Repeat 2 and 3 until almost not changing the cluster centers Following the above steps, and having the landslide density of each group as an object, the K-means algorithm classifies these objects to k clusters, where each cluster is regarded as a susceptibility class.

Model Performance Evaluation.
To evaluate the performance of this study and compare with the others studies, some common statistical measure such as Cohen Kappa index (k), sensitivity, specificity, accuracy, and F1-measure were used, which are elaborated below. e Cohen Kappa index (k) [55] was used to analyze and compare the reliability of the model classification results of the landslide susceptibility models: (5) e value of the k is between 0 and 1 and is defined in the different groups such that the value close to 1 implies observed agreement between the landslide model and the actual data while a value close to 0 implies disagreement [56]. A negative value of k implies low agreement. K value from 0.8 to 1 implies closer complete agreement while the substantial agreement is between 0.60 and 0.80. On the contrary, 0.40 to 0.60 indicates moderate agreement, whilst the value from 0.20 to 0.40 indicates better than fair and slight agreement, respectively: whereby From the above equations, see the following.

Advances in Civil Engineering
Moreover, receiver operating characteristics (ROC) was adopted, which is also an essential evaluation metric for assessing the landslide models' performance. e receiver operating characteristics (ROC) curve is a popular model performance metric used to measure the overall performance of a landslide susceptibility model. In the ROC curve, the model can be described by drafting the ratio of the zonation identified as the error in the landslide-prone zonation (1 − specificity) in the x-axis against the ratio of the landslide zonation identified as the correct landslide-prone zonation (Sensitivity) in the y-axis [45]. e area under the ROC curve (AUC) is a model performance and quality measurement, whereby an AUC value of 1 implies an excellent model. e AUC value between 0.8 and 0.9 implies a very good model, while 0.6-0.7 implies an average performance. A poor-quality model has an AUC value between 0.5 and 0.6.

Models Comparison.
To compare the performance of the proposed landslide susceptibility models, some performance should be compared among CA-AQD, Chameleon, and KPSO (found in literature [47]) algorithms. KPSO clustering algorithm was chosen due to its remarkable computational efficiency and it is easy to implement. On account of initial parameter thresholds dependence, k-means and FCM algorithms were not taken into consideration. Furthermore, uncertain decision tree (found in literature DTU) [33] and uncertain naïve Bayesian (found in literature NBU) [37] classification algorithms were applied against the proposed model in quantifying the value of rainfall and attaining better prediction accuracy. At the same time, the state-of-the-art benchmark models such as SVM, ANN, and RF [5] can also be chosen for checking the ability of the proposed model.

Results
e CA-AQD algorithms outlined in this study were used to construct a landslide susceptibility map and validate and compare its performance with other methods, in different steps (as shown in Figure 6). We used the CA-AQD algorithms outlined in this study to construct a landslide susceptibility map and validate its performances. e study workflow is shown in Figure 6. e process includes four phases: data collection, clustering analysis, describing the landslide susceptibility map, and model validation. To collect data, we extracted the value of landslide conditioning factors from thematic graphs in ArcGIS. To conduct the cluster analysis, we partitioned whole grids into many groups with similar and geomorphology characteristics. To describe the landslide susceptibility map, we evaluated the landslide's susceptibility classes for each group according to the K-means algorithm or according to the characteristics of landslide conditioning factors. To validate the model, validate the performances of the proposed method and compare the performances with the others methods.

Data Collection.
After classifying landslide conditioning factors, the study regions were identified as polygons, which were converted into a raster map. e map featured 25 × 25 m grid spacing, and the study area had 5,672,922 grids, which included landslides and nonlandslides. Each grid can be extracted from the thematic graphs of the indicator variables (which were treated as attributes of grids). e attributes for each grid included discrete, continuous, and uncertain data types based on different algorithms, as shown in Table 2.

Clustering Analysis.
We used the CA-AQD algorithm to classify all grids into groups with similar geology and geomorphology. e main process was as follows.
Compared with the K-means and FCM algorithms, CA-AQD does not need to set the initial values of parameters. It justly calculated the distance of two grids using AQD distance resulting in the range of values. us, we imported the normalized attributes values of each grid into the CA-AQD algorithm and divided 5,672,922 grids in the study area into 465 groups in July, which did not need to manually set the initial parameter of the cluster. Some groups of CA-AQD algorithms in July are shown in Figure 7.
As shown in Figure 7, grids of the groups of CA-AQD algorithms were distributed intricately within the study area. For example, the grids of red color in the second group mainly were located along the side of Yanhe River and sporadically scattered along the sides of Fenchuan Rivers. Additionally, the left side of Figure 7 shows the attribute values for all grids of the first and second groups. As in Figure 7, the characteristics of geology and geomorphology which can indicate the attribute values between the first and second groups were different rather than being very similar in the same group. For example, in Figure 7(a), the values of attributes in the first group were almost similar. But the values of their grid ID were different, which means grids of the same group were distributed intricately within the study area.
ese results showed that the clustering algorithm based on the CA-AQD algorithm could be used to effectively divide the space grids in the study area into groups.

Describing Landslide Susceptibility Map.
e CA-AQD algorithm concentrated grids in the study area that had similar geological and geomorphic environments in the same group. Being similar in all attributes of one group, we used the mean value of the attribute to reflect the eigenvalues of all attributes for each cluster in the study area, as shown in the left-hand column of Table 3. However, the susceptibility classes cannot be labeled in each group of Table 3. Based on the general principle that the higher the landslide density is, the higher the susceptibility ought to be, we used the K-means algorithm to solve this problem. e steps are as follows: At first, the landslide density of each group, regarded as only one attribute for all objects in K-means, should be calculated. In the experiment, we interpreted RS of SPOT-5 satellite images for 1,081 locations to compute the landslide density of the entire study area and each group by using the Advances in Civil Engineering 9  sorting tool in ArcGIS. e landslide density of some groups is shown in the right-hand columns of Table 3. Next, the number of clusters is set to five landslide susceptibility classes based on the landslide susceptibility analyses in the study area. Initial cluster centers were chosen randomly among the range of landslide density in all groups. Finally, the above parameters were input to the K-means algorithm, and the landslide susceptibility classes of all groups are shown in Figure 8.
For the cases where a group obtains landslide density of 0, its susceptibility level may be considered as very low; however, when the landslide observation points are not   covered in the group, there might be a very high susceptibility level. It is a confusing case, thus, to overcome this confusion, the experts ought to use the eigenvalues of the group (the left-hand column of Table 3), to identify the susceptibility level of that group.
To evaluate the influence on the landslide from rainfall, we obtain the rainfall data from the rainfall map in April and July based on their scattered and heavy rain, respectively. According to the susceptibility classes of each group, the susceptibility assessment maps of April and July based on the CA-AQD algorithms are drawn as shown in Figure 9. Figure 9 demonstrates that landslides occurred in July more frequently than in April, which according to the previous study, was almost unanimous in the actual situation [54]. e majority of the landslides occurred as a result of rainwater infiltration because precipitation increased the pore-water pressure and the weight of the loess slope.

Model Validation.
As most landslides occurred from June to October, we validated the prediction model using precipitation data from July. is study compared the CA-AQD model with Chameleon and KPSO models to prove that the CA-AQD model outperformed the Chameleon and KPSO models. To compute the Cohen kappa index as well as performance accuracy, 293 landslides and 213 nonlandslides were randomly applied for model validation.
e Cohen kappa indices (k) for the CA-AQD and Chameleon algorithms were greater than 0.8, which demonstrated that the two models were in nearly complete agreement with the field survey. e prediction accuracy values based on CA-AQD, Chameleon, and KPSO algorithms were 0.9249, 0.9110, and 0.6621, respectively, whereas the Cohen kappa indices were 0.8471, 0.8192, and 0.3161, respectively (Table 3). Conversely, the CA-AQD model had the highest accuracy of all three models, which applied the AQD to quantify precipitation for improving the prediction accuracy. As shown in Table 3, comparing the CA-AQD, Chameleon, and KPSO models, the CA-AQD model obtained the highest sensitivity, specificity, and F1-measure of 0.9147, 0.9390, and 0.9341, respectively. Furthermore, from Figure 10, among the three models, the CA-AQD model showed the highest AUC value of 0.884. Also, the AUC values of the CA-AQD and Chameleon models were all closer to 1, which implies that the prediction capability of the CA-AQD and Chameleon models was good from the principle that the landslide prediction model's overall accuracy increased as the AUC value moved closer to 1 [34].
Moreover, to compare the performance of landslides prediction models between classification (supervised) and clustering analysis algorithms (unsupervised), accuracy measure was used to validate the DTU, NBU, SVM, ANN, RF, and CA-AQD algorithms, as shown in Figure 11. Landslide susceptibility map based on DTU and NBU algorithms can be found in the literature [33,37], in which the 506 points (landslides) were used to develop and evaluate DTU and NBU prediction model. us, we divided the 506 points from the literature into training and test data sets. We used 101 points (20% of the data) for the training data set    and used 405 points (80% of the data) to evaluate the accuracy of the three prediction models. We then added 10% of the data to the training data set and reduced 10% of the data in the test data set. e training data set had accounted for 80% of the data (405 points). At the same time, SVM, ANN, and RF have used the above training and test data set.
As shown in Figure 10, the larger the training data set, the higher the accuracy and k of the Cohen kappa index in the DTU model, which was followed by the CA-AQD model and the NBU model. As the percentage of total points increased, the k value and accuracy of DTU and NBU algorithm are also increased-in particular, these values were quite low for a few of the points, which indicated the strong dependence on the landslide sample data set to obtain higher prediction accuracy. With increments in the percentage of total points, the k value and accuracy of CA-AQD remained almost the same, which indicated the lack of dependence on the training data set to reach a higher accuracy of prediction.

Discussion
Landslides are very complicated processes which are constrained by various topographical as well as environmental factors. In addition to that, the landslide susceptibility model is a very significant visual way for determining the landslideprone area. us, the primary aim of this work is to use the CA-AQD algorithm for assessing landslide susceptibility and compare its performance with that of the chameleon and KPSO clustering algorithms; as well as the DTU and NBU, SVM, ANN, and RF classification algorithms in Baota District, China.
To evaluate the validity of the models, statistical metrics, as well as the AUC-ROC curve, were applied. e results indicated that the CA-AQD and Chameleon models outperformed the KPSO model in assessing landslide susceptibility in the study area, and more reliable landslide susceptibility maps were produced as they showed AUC values are closer to 1. ese results suggest that both models are good in classifying well the mapping units to their respective clusters. is is due to their ability to perform well in the large study area as well as detecting well the arbitrary shaped and sized clusters, and efficient handling of noise data, which cannot be carried out well by the KPSO model. Moreover, the CA-AQD model achieved the best performance as compared to the Chameleon and KPSO models in assessing landslide susceptibility. e assessment is significant as the CA-AQD is an improved version of the Chameleon algorithm, which has been based on improving the performance accuracy by taking into consideration the uncertain data processing, which has a significant effect on the clustering results and thus makes it more prominent than others.
On the other side, in comparison with the classification algorithms, the performance accuracy of the DTU, NBU,   SVM, ANN, and RF depended much on the training data sets during the experiments (which are in fact not easy to collect and prepare). is is to say, the performance accuracy of those models increased as there were increments in the training data, thus less training data and less performance accuracy. But the CA-AQD model, showed no such dependence, as it could obtain almost constant performance accuracy throughout the experiment; thus, good performance accuracy can be guaranteed.

Conclusion
is study aimed at proposing and designing an improved clustering algorithm for assessing landslide susceptibility using an integration of a Chameleon algorithm and an adaptive quadratic distance (CA-AQD algorithm). It targeted improving the prediction capacity of clustering algorithms in landslide susceptibility modelling by overcoming the limitations found in present clustering models, including strong dependence on the initial partition, noise, and outliers as well as difficulties in quantifying the triggering factors (such as rainfall/precipitation). e model was implemented in Baota District, Shaanxi province, China. e model was validated using statistical metrics as well as AUC-ROC. It was then compared with the Chameleon algorithm and KPSO clustering algorithms, as well as DTU, NBU, SVM, ANN, and RF algorithms. e results suggested that the CA-AQD model achieved the best performance in comparison with the other algorithms in assessing landslide susceptibility in the area. us, this work adds to the literature by introducing the first empirical integration and application of the CA-AQD algorithm to the assessment of landslides in the study area, which then is a new insight to the field. Also, the method can be helpful for dealing with landslides for better social and economic development.
Data Availability e data used in this paper were taken from the Xi'an Center for Geological Survey (CGS)

Conflicts of Interest
e authors declare that they have no conflicts of interest.