An Improved KNN-Based Slope Stability Prediction Model

,


Introduction
Landslide is a complex natural phenomenon of slope instability, and it usually causes huge losses to human life and property. It is widely understood that slope stability depends on different parameters, such as cohesion, internal friction angle, rainfall, and earthquake. At present, numerical analysis is commonly adopted in the slope stability analysis. However, numerical analysis will not help analyze slope stability solely because slope is a complex dynamic system affected by many factors [1]. Consequently, the prediction of the slope stability should be of practical significance. e aim of this work is to propose a prediction approach of the slope stability based on machine learning techniques.
Predicting the slope stability is still a challenge. e factors that influence the slope stability are various and complicated, and the main influence factors can be roughly divided into three categories [2], including physical and mechanical properties of the slope soil (unit weight, cohesion, and the angle of internal friction), natural topography (slope height and slope angle), and external factors (rainfall infiltration, groundwater seepage, and earthquake load). It is difficult to predict the slope stability due to various and complicated factors [3]. Lin et al. [4] chose six typical slope parameters-unit weight, cohesion, internal friction angle, slope inclination, slope height, and pore water ratio-to establish the evaluation index system and predicted the slope stability using four supervised learning methods. Zhao et al. [5] chose six input variables-density, friction angle, friction coefficient, slope angle, slope height, and pore water pressure-for the prediction of slope stability using the relevance vector machine method and found that the RVM is a robust tool for the prediction of slope stability. Samui and Kothari [6] chose six input variables-unit weight, cohesion, angle of internal friction, slope angle, height, and pore water pressure coefficient-for the prediction of slope stability using the least square support vector machine method and found that the developed LSSVM is a robust model for slope stability analysis. Hu et al. [7] used the support vector machine method to forecast the slope instance and found that the forecasting results are consistent with the actual states of slope stability. Li and Jiang [8] chose six characteristic parameters-unit weight, cohesion, angle of internal friction, slope angle, height, and pore water pressure coefficient-for the prediction of slope stability using KNN and found that the KNN method was more accurate than the backpropagation neural network algorithm. Xiong and Li [9] applied PNN to rock slope stability forecasting, and the results of the case study show that the analysis results are completely consistent with the actual situation. Consequently, machine learning approaches are being increasingly used for slope stability. However, the main external influencing factors of slope stability have been neglected in previous studies, such as rainfall and earthquake, which are the main inducing factors of landslide.
It is worth noting that the k-nearest neighbor (KNN) algorithm [10][11][12], which is one of the most well-known algorithms in classification recognition, has been proven to be very effective in prediction. To improve prediction accuracy, scholars have carried out studies adjusting its weights [13,14]. Dudani [15] proposed a weighted voting method named the distance-weighted k-nearest neighbor (WKNN) rule, which is the first distance-based vote weighting schemes. Gou et al. [16] presented a dual weighted k-nearest neighbor (DWKNN) rule that extended the linear mapping of Dudani. Furthermore, because the KNN algorithm generally requires a preset k value and runs multiple experiments with different k values to obtain the best prediction results, some new ideas are proposed to improve the selection of the k value. Zheng [17] proposed a strategy of dynamically setting k values. Liu and Zhang [18] proposed a scheme reconstructing points of the test dataset by learning the correlation matrix, in which different k values are assigned to different points of test data based on the training data. In addition, Ma et al. [19] proposed a coefficientweighted KNN classifier and a residual-weighted KNN classifier for making classification decisions on the basis of sparse coefficients in the sparse representation. Gou et al. [20] proposed the two-phase probabilistic collaborative representation-based classification (TPCRC) to enhance the power of pattern discrimination in PCRC. Huang et al. [21,22] analyzed the factors influencing the rockfall runout distance, predicted the rockfall runout distance based on an improved KNN algorithm, and predicted sand liquefaction using the local mean-based pseudo-nearest neighbor algorithm; however, the accuracy of the prediction still needs to be improved. From previous studies, it is found that the KNN algorithm is widely used in classification prediction, but it depends on the number of training samples. In this study, we improved the KNN algorithm to reduce its sample dependence and improve the robustness of the algorithm and built the prediction model of the slope.

Establishment of the Prediction Model
2.1. Our Improved KNN Algorithm. KNN, as a simple, effective, and nonparametric prediction method, was first proposed by Cover and Hart to solve text prediction problems [18]. Its principle is to expand the area from the test sample point x constantly until k training sample points are included. In addition, the test sample point x is classified into the category that most frequently appears in the nearest k training sample points.
(1) KNN algorithm implementation steps are shown in Figure 1.
(2) Dudani [15] first introduced a weighted voting method for the KNN, called the distance-weighted knearest neighbor rule (WKNN). In the WKNN, the closer neighbors are weighted more heavily than the farther ones, using the distance-weighted function. e weighted function of the WKNN is shown as follows: Accordingly, the prediction result of the query is made by the majority weighted voting as defined in the following: (3) DWKNN [16] is based on the WKNN: different weights are given to k-nearest neighbors according to their distances, with closer neighbors having greater weights. e dual distance-weighted function of the DWKNN is defined as en, we label the query x by the majority weighted vote of k-nearest neighbors, the same as rough comparative study, we find that the method in equation (5) for improvement has better robustness and less sample dependence. us, in this study, we used this method to predict the slope stability: Accordingly, we classify the query point x into class c by majority weighted voting of its neighbors as shown in the following: e prediction model of the slope stability based on our improved algorithm can be expressed as follows.
Let X denote a set of the slope stability sample, and suppose X is X � x n ∈ R m N n�1 , where x i represents the feature of the i-th surrounding rock stability sample, N is the total number of features, and m is the feature dimension. In addition, let y i represent the slope stability levels, and y i ∈ 0, 1, 2, 3, 4 { }, i � 1, 2, . . . , N. erefore, the sample set of the prediction model is shown as follows: x 11 x 12 · · · x 1m y 1 x 11 x 12 · · · x 2m y 2 · · · · · · · · · · · · · · · · · · · · · x N1 x N2 · · · x Nm y N Given the unknown sample x � (x 1 , x 2 , . . . , x m ), our proposed slope stability prediction model based on our improved KNN algorithm can be expressed as where x PNN i is the nearest neighbors of the unknown sample x in class w i (i � 1, 2, 3). Hence, the unknown sample x is classified into class y that has the closest neighbor among all classes.

Prediction Model of the Slope Stability
Based on Our Improved KNN Algorithm e prediction model is established using the training samples in [3]. ere are 50 cases which are used for training, and 14 cases are used for testing.

Data Information and Predictors.
e slope stability prediction is performed to find the nonlinear relationship between the influencing factors and the slope stability. e main influencing factors can be roughly divided into three categories [3], including physical and mechanical properties of the slope soil (unit weight, cohesion, and the angle of internal friction), natural topography of a slope (slope height and slope angle), and external factors (rainfall infiltration, groundwater seepage, and earthquake load). In our study, we chose the representative factors-unit weight, cohesion, internal friction angle, slope height, slope angle, groundwater level, earthquake intensity, and rainfall intensity-as the influencing factors.
By comparison of the slope codes of the earthquakeprone countries (China, Japan, European countries, and the United States), evaluation methods of the slope seismic stability in different specifications were determined at home and abroad, as shown in Table 1.
By comparing slope codes in different countries, we used safety factor and permanent displacement to evaluate the slope stability. According to the safety factor of the slope, Xiong [23] classified the slope stability into five grades which are particular instability, instability, potential instability, basic stability, and stability. e five grades are labeled as I, II, III, IV, and V, respectively, as depicted in Table 2.

Normalization.
Since the range of each predictor is significantly different and the test results might rely on the values of a few predictors, they are preprocessed using normalization [24]. We compute the upper and lower bound of each predictor, and the process for the used normalization is represented as A training set with N training samples: The class label of one sample x n is c n A query pattern x Compute the distances of training samples to x Rank these distances by increasing order Find k-nearest neighbors with the smallest distance: Assign class c′ to x by majority voting of its neighbors:

Advances in Civil Engineering
where y � (y 1 , y 2 , . . . , y n ) is each predictor. Accordingly, the value of each predictor is normalized to between 0 and 1 based on equations (9)-(11).

Criteria for Our Prediction Model Performance.
e accuracy, computed based on the percentage of all test samples classified correctly, is used to evaluate the prediction performance of the slope stability. Accuracy tells us about the number of samples which are correctly predicted, and it is defined as follows: where #test samples denotes the total number of test samples and #test samples predicted correctly is the number of test samples that are predicted correctly.

Procedure Algorithm of Our Proposed Prediction Model.
In this study, we improve the KNN algorithm to further overcome the influence of neighborhood k. Let T � x n ∈ R d N n�1 denote a training set with M classes which are w 1 , w 2 , . . . , w m . Training samples for each class are where T w 1 is the subset of the training samples w i , d is the dimensional feature space, and N is the training samples. In our improved KNN algorithm, the class label of a query point x is computed as shown in the following steps.
For computing the nearest neighbors k from the set T for the unknown query point are sorted in the ascending order according to the distance between their Euclidean distance and x. By assigning different weights to the nearest neighbors k, the weight w j of the j-th nearest neighbor is defined as Accordingly, we classify the query point x into class c by majority weighted voting of its neighbors as shown in the following: 3.5. Slope Stability Prediction. In this section, our proposed prediction model is trained by 50 typical slope stability cases and tested by 14 typical slope stability cases. e neighborhood size k ranges from 1 to 7 with an interval of 1, which is inspired by [25]. e 50 typical slope stability cases are shown in Table 3, and the 14 typical slope stability cases are shown in Table 4.
is prediction experiment is  Advances in Civil Engineering implemented in Eclipse 3.7.2 by Java language programming, and the hardware environment is Inter Core i7-6700 CPU 3.40 GHz. As shown in Table 4, our proposed prediction model has high accuracy and reliability, and the prediction results of the proposed prediction model are in good agreement with the actual results. e accuracy of our proposed prediction model is up to 92.85%. is illustrates that our proposed prediction model is feasible to predict the slope stability, which shows that our proposed prediction model could be used to evaluate the slope stability before the design and construction of slope engineering.
Next, the prediction performance of our proposed prediction model is compared with other prediction models  [15], and DWKNN algorithm [16]. e following prediction experiments will show whether our proposed prediction model will achieve better prediction performance. e comparison results between different prediction models are shown in Figures 2 and 3.
As can be seen in Figures 2 and 3, the prediction accuracy of our proposed prediction model is somewhat better than the prediction accuracy of the prediction models based on KNN, WKNN, and DWKNN algorithms in almost all of the test cases, which shows that our proposed prediction approach performs better than other approaches with the increasing of the neighborhood size k. It can be found that the accuracy of our proposed prediction model is the highest when the neighborhood size k is 4, and our proposed prediction model achieves an accuracy of 92.85%. is result suggests that our proposed prediction model based on the improved KNN algorithm has the robustness to the sensitivity of different choices of the neighborhood size k with a good prediction performance in predicting the slope stability.

Engineering Application of Our Proposed Prediction Model
To further determine the performance of our proposed prediction approach based on the improved KNN algorithm in engineering applications, we also conduct experiments to see the prediction performance for evaluating the slope stability along the Sichuan-Tibet railway in China and compared the prediction results with the finite element method and shaking table test results. e intensities of the historical earthquakes were within a radius of 500 km around.
Our research group drilled lots of boreholes in our survey region along the Sichuan-Tibet railway. On the basis of mass borehole data, the values of the influencing factors-unit weight, cohesion, internal friction angle, and groundwater level-are obtained, and we use our proposed prediction approach based on the improved KNN algorithm to predict the stability of the slope along the Sichuan-Tibet railway.
e seismic activity around the Sichuan-Tibet railway is relatively frequent.

Slope Stability Prediction of the Sichuan-Tibet Railway.
Cutting slopes along the Sichuan-Tibet railway in China are chosen as the research object. We simplified the slope shape, and the simplified slope models are established with finite element software MIDAS GTS NX. Mohr-Coulomb elastoplastic model is used to model the stress-strain behavior of the soil. And the grid size of the finite element model is 0.5 m, as shown in Figure 4. Furthermore, the bottom is set as the fixed boundary, and the left and right are set as viscoelastic artificial boundaries.
In the numerical simulation model, the quality damping coefficient α and the stiffness damping coefficient β are fixed as 0.2 and 0.0019, respectively. So, the damping coefficient of the numerical simulation model is calculated by the Rayleigh damping formula, as illustrated in the following: where α denotes the quality damping coefficient and β is the stiffness damping coefficient. Moreover, the quality damping coefficient and the stiffness damping coefficient are computed by where ω i is the natural frequency of the first model, ω j is the natural frequency of the second model, and ξ i and ξ j are conventional damping ratios ranging from 2% to 7%. e engineering geological conditions of the slope along the Sichuan-Tibet railway were investigated, and the influencing factor values were determined based on the indoor experiment. e value ranges of the influencing factors are shown in Table 5. First, we should know the influence laws of all the factors on slope stability, and we compute the safety factors of the slope under different influencing factors, as shown in Figure 5.
As shown in Figure 5, the safety factors increase with the increasing of the cohesion and internal friction angle, while the safety factors decrease with the increasing of other factors. Also, we found that the slope stability is significantly affected by slope angle, slope height, cohesion, internal friction angle, groundwater level, and peak acceleration.
In order to more directly demonstrate the influence of different influencing factors on slope stability, different influencing factors are normalized. And the safety factors under the normalized influencing factors are shown in Figure 6.
As shown in Figure 7, it can be found that the factors selected in our study are all sensitive to slope stability which shows the accuracy of the influencing factors chosen in our study. Also, we found that the slope is the most instable under the influence of the peak acceleration, which shows that the impact of potential future earthquakes on the slope cannot be ignored. Consequently, we could use our proposed prediction model to predict the slope stability under the potential future earthquakes, and some reinforcement measures can be taken according to the predicted results, which are important and useful for solving the realistic engineering problems.
Based on the nonlinear finite element method and the strength reduction methods, the slope damage contour can be obtained under different slope stable-states. We could   Internal friction angle (°) 15∼32 5 Unit weight (kN/m 3 ) 15∼28. 5 6 Groundwater level (m) 0∼30 7 Earthquake intensity 4∼7 8 Rainfall intensity (mm) 28∼60    more intuitively determine the slope failure degree by the contour. Grade V indicates that the slope is in a stable state; thus, we only plot the slope damage contour for slope stability grades I, II, III, and IV. In this section, the groundwater levels vary, and other factor values remain constant. e slope damage contours for different slope stability grades are shown in Figure 7.
As shown in Figure 7, the slope shows different stability states when the slope is at different groundwater levels. e slope stability degree could be determined through the plastic zone distribution. And we could more intuitively determine the slope stability grades by the finite element method. us, we could verify the accuracy of our proposed prediction model by comparing the finite element results.       To assess the stability of the slope along the Sichuan-Tibet railway, we chose 16 slope cases to predict the slope stability using our proposed prediction model and the finite element method, respectively. And the predicted results obtained by our proposed prediction model and the finite element method are compared as shown in Table 6.
As shown in Table 6, our proposed prediction approach based on the KNN algorithm almost achieves the best performance compared with the finite element method. e prediction accuracy is up to 93.75% which demonstrates that our proposed prediction model could be used for slope stability discrimination for the engineering geological hazard safety assessment. Table Test Results. In this section, we mainly conduct the effect of the earthquake on slope stability, and other influencing factors remain as a constant value.

Comparison of Our Prediction Model Results with Shaking
us, we conduct shaking table test which could reproduce the failure process of the slope under the real earthquakes to see the prediction performance of our proposed prediction model. Figure 8 shows shaking table test equipment to model the slope failure process under the seismic excitation.
As can be seen in Figure 8, the main technical indicators of shaking table test equipment include rated working frequency (40 Hz), the maximum acceleration (20 m/s 2 ), the maximum test load (5000 kg), and dimensions of the shaking table (1.5 m × 1.5 m).
e size of the test model is 1.96 m × 0.96 m × 1.2 m. e slope rate is 1 : 1.5. To keep the sandy soil uniform, the sandy slope is repeatedly stirred. e sponge whose thickness is 20 mm is used to reduce the reflection of seismic waves at the border of the slope. e test model is shown in Figure 9. e dynamic pore water pressure change of the slope under the earthquake is the main cause of the slope failure. In order to obtain the dynamic pore water pressure, many sensors are deployed at the slope toe. e layout of monitoring points of the test model is shown in Figure 10. e far-field seismic wave (type I: T1-II-1) and the nearfield seismic wave (type II: T2-II-1) are applied to the slope stability analysis. e parameters of the earthquake motions are shown in Table 7, and the acceleration-time histories of the seismic waves are shown in Figure 11. According to the code for seismic design of railway engineering (GB50111-2006) [26] of China, the peak accelerations of the seismic waves are adjusted to 4 degrees, 5 degrees, 6 degrees, and 7 degrees.
In the laboratory test, the effect of different influencing factors on slope stability is investigated. We could determine the damage degree of the slope with different stability levels through the shaking table test. e values of the influencing factors in the shaking table test are shown in Table 8. e scaling law between our test model and the actual projects follows the Buckingham Pi theorem [27], and the proportional relation for the similarity ratio is developed by Jiang et al. [28]. Poisson's ratio μ of the soil in the test is 0.35, the coefficient K � μ/(1 − μ) of the lateral pressure is 0.54, and the dimensionless index n � 2. Other similarity coefficients [29] based on the similarity principle are shown in Table 9.
[τ] is the horizontal shear strength, and [τ] is computed by Table 9: Similarity coefficient between our test model and the actual projects.

Physical quantity
Similarity coefficient Exegesis  [τ] � where σ v is the normal pressure stress (i.e., the geostatic stress caused by burial depth), φ is the internal friction angle, c is cohesion, and k is the lateral pressure coefficient. In our experiment, similar material was developed based on the slope material of the Sichuan-Tibet railway. e similarity coefficients simulating the slope are shown in Table 10.
Simulation of underground water level determines the accuracy of test results. Test program of the simulation of groundwater level is shown in Figure 12.
As shown in Figure 12, water is injected at the left side of the slope. e height of the groundwater level on the right side is always controlled at the height of the slope toe by turning on the tap. e stable seepage field inside the slope will be formed after a long time seepage of water. en, the seismic excitation is input into the model.
Before prediction, we analyze the development laws of the dynamic pore water pressure inside the slope, which is a more intuitive indicator of slope failure. e dynamic pore water pressure of the slope at different positions is calculated at the groundwater levels of 14 m and 20 m (before scaling using the equation in Table 10), as shown in Figures 13 and 14.
As shown from Figures 13 and 14, the dynamic pore water pressure rises sharply within a short time under nearfield earthquakes and far-field earthquakes. e rising pore water pressure is too late to dissipate and shows large fluctuations. Especially, the dynamic pore water pressure values of the monitoring point G are greater than the dynamic pore water pressure values of other monitoring points. e dynamic pore water pressure of the slope toe is 14 Advances in Civil Engineering greater influenced by the seepage force, which shows that the slope toe is the position which is most easy to have the plastic damage. e production of dynamic pore water pressure under the earthquake could decrease the strength of the soil slope, make the effective stress act on the soil skeleton change, limit its deformation, and cause the destruction of the slope. Shear slide occurs at the slope toe position under the dynamic pore water pressure action; thus, slope toe should be as the key protection position in actual engineering.
Grade V of the slope stability indicates that the slope is in a stable state; thus, we determined the level of the slope damage for I, II, III, and IV grades, as shown in Figure 15.
As can be seen in Figure 15, through the shaking table test, we can more directly determine the failure degree of the slope at different stability grades. e slope slides with the increasing of earthquake intensities, and the destruction begins at the slope toe. e cracks in the slope spread gradually when the slope stability grades vary from IV to I. Meanwhile, the position at the top of the slope has an obvious settlement phenomenon. e sliding surface is approximately circular in shape. And the slope is particular instability when the slope stability grade is I; thus, the shaking table test could accurately determine the slope stability grades. en, we compare the shaking table test results with the prediction results of our proposed prediction model, and the comparison results are shown in Table 11.
As can be seen in Table 11, our proposed prediction model based on the improved KNN algorithm achieves the best performance compared with the shaking table test results. e prediction accuracy is up to 92.30% which demonstrates that our proposed prediction model could be used for slope stability prediction before the major project construction near the slope.

Conclusions
(1) We improved the KNN algorithm and established a prediction model of the slope stability. And the performance of our proposed prediction model is evaluated by conducting extensive experiments on slope stability grade prediction, and the experimental results demonstrate the effectiveness of our proposed prediction model.
(2) We used our proposed prediction model to evaluate the stability of actual slope engineering, and the evaluation results using the finite element method match well with the predicted results of our proposed prediction model, which shows that our proposed prediction approach is an effective method to predict the slope stability. (3) e progressive failure process of the slope is conducted by the shaking table test, and the failure degree of the slope at different stability grades is determined. Our proposed prediction model could determine the failure degree of the slope by comparing the experiment result, which further demonstrates the effectiveness of our proposed prediction model of the slope stability.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Test number  Shaking table test results  Our proposed prediction model results  Comparison results  1#  V  V  √  2#  IV  IV  √  3#  III  III  √  4#  II  II  √  5#  I  I  √  6#  I  I  √  7#  IV  IV  √  8#  IV  IV  √  9#  III  III  √  10#  II  II  √  11#  III  II  ×  12#  III  III  √  13# II II √ Advances in Civil Engineering 15