Research Landslide Displacement Prediction Based on Transfer Learning and Bi-GRU

Predicting slope deformation prediction is crucial for early warning of slope failure, preventing damage to properties, and saving human lives. However, in practice, equipment maintenance causes discontinuity in the displacement data, and the traditional prediction models based on deep networks do not perform well in this case. To solve the problem of prediction accuracy in case of discontinuous and inadequate data, we propose a combined displacement prediction model that integrates the bidirectional gated recurrent unit (Bi-GRU), attention mechanism, and transfer learning. The Bi-GRU is employed to extract the forward and backward characteristics of displacement series, and the attention mechanism is utilized to give di ﬀ erent weights to the extracted information so as to highlight the critical information. Transfer learning is used to guarantee prediction accuracy in case of discontinuous and limited data. The model is then employed to predict the slope displacement of the JinYu Cement Plant in China. Finally, the modeling results excellently agree with measured displacement, especially in case of insu ﬃ cient sample data.


Introduction
There have been enormous casualties and economic losses due to landslides recently. On July 23, 2019, a landslide occurred in Shuicheng, China, which affected more than 1,600 people and caused an economic loss of 190 million RMB. On July 16, 2020, three landslides in Dunhao, China, resulted in two deaths and four missing people. Due to considerable damage from landslide disasters, it is necessary to establish reliable early warning models [1]. According to Web GIS techniques, areas at risk from landslides have been estimated by using various geospatial data [2]. Landslides are generally believed to be governed by their physical characteristics and triggering factors. The physical characteristics primarily include stratigraphic lithology, pore-fluid pressure, and gravitational stress. The triggering factors are much more diverse. Rainfall, earthquakes, human activities, vegetation cover, water level fluctuations, and erosion processes may cause slope failure [3]. The variation of the slope displacement can reflect the instability of landslides and has been considered a critical index for evaluating the stability and status of landslides [4]. The monitored series of slope displacements reflect the nonlinear dynamic evolution of the slope with the dual impact of internal and external environments.
By analyzing the monitored series of slope displacement, various displacement prediction models are constructed based on intelligent algorithms, such as the gray model, the support vector regression model, the wavelet neural network, and the particle swarm optimization (PSO). Combined with actual slope engineering, the mechanism and law of slope deformation are grasped to provide guidance for landslide prevention.
Kuradusenge et al. [5] proposed random forest (RF) and logistic regression (LR) models to predict rainfall-induced landslides in Rwanda. They used rainfall data sets and various internal and external parameters for landslide prediction, thereby improving its accuracy. In another work, Yan et al. [6] employed the Holt-Winters damped model to predict the deep horizontal displacement of sloping soil. He et al. [7] presented a machine-learning-aided stochastic reliability analysis of spatially variable slopes, and it significantly reduced the computational efforts and provided a complete statistical description of the safety factor with reasonable accuracy. Wang et al. [8] also developed a direct interval prediction approach based on least-squares support-vector machines (LS-SVM) and the differential search algorithm to predict the landslide displacements in the Three Gorges Reservoir area. In another work, Ma et al. [9] proposed measures based on mutual information (MI) for input variable selection (IVS) and optimized support vector regression (SVR) for predicting the displacement of seepage-driven landslides. Their experimental results demonstrated that MI-based measures could effectively identify relevant or critical variables, and the optimized SVR methods based on variable-reduced inputs could significantly enhance prediction performance.
Further, Wang et al. [10] compared the performance of several popular machine learning methods in predicting the displacement of reservoir landslides. Ray et al. [11] also developed an artificial neural network (ANN) to evaluate the safety factor of Shiwalik Hills in the Himalayan region. Shihabudheen et al. [12] developed an extreme learning, adaptive, neuro-fuzzy inference system (ELANFIS) using the empirical-mode decomposition (EMD) technique to predict the displacement of step-like landslides.
However, the prediction accuracy of the above methods depends on their hyper-parameters, which usually differ for each time series. Furthermore, misspecification reduces prediction accuracy by generating overfitted or underfitted models [13]. Table 1 lists the characteristics of some models for predicting slope deformation.
Since frameworks based on deep learning perform well in terms of discovering and extracting the internal structure of data, researchers are inspired to employ deep architectures in preference prediction tasks. Ref. [14] proposed an algorithm based on the deep recurrent neural network (RNN) called DeepVM to predict vehicle mobility in a future period of several (or tens) of minutes. Ref. [15] also developed a deep neural network named convolution-based LSTM network by mining the in situ vibration data to predict the remaining useful life of rotating machinery. In another work [16], a deep-learning-based approach was developed to characterize the probability density function (PDF) of wind for the following hours, where the convolutional neural network (CNN) and the gated recurrent unit (GRU) were employed to capture spatial and temporal features simultaneously. A composite GRU-Prophet model was also constructed to predict sales volume [17].
Further, Ref. [18] utilized the bidirectional long shortterm memory (Bi-LSTM) and bidirectional GRU (Bi-GRU) to predict the network traffic matrix, and the results showed that the bidirectional models offered the ability to look into data series in two opposite directions, allowing the acquisition of the full knowledge of past and future statuses for better prediction results. The social force model (SFM) was incorporated into an original LSTM network to predict vessel trajectories, taking account of the social force-driven LSTM network and the mixed loss function in artificial intelligence-(AI-) powered maritime Internet of Things (IoT) systems to achieve robust prediction [19]. Ref. [20] proposed a Bi-GRU-based framework to predict a rating for customer review, and the experimental results demonstrated that the developed framework could significantly enhance the rating prediction in the case of balanced and imbalanced data sets. The LSTM, GRU, RNN, Bi-GRU, and Bi-LSTM were also developed to forecast solar irradiance, and the experimental results showed that the Bi-GRU model performed better than the others in terms of the training time, the trainable parameters, and the epoch ratio [21]. It can be stated that, in a given time, bidirectional models are more reliable due to their capacity to learn more abstract features of timeseries data sets.
The remainder of the paper is organized as follows. Section 2 reviews the related literature on slope displacement prediction, and Section 3 presents the architecture of the proposed displacement prediction model and its execution procedure. Section 4 reports the experimental results of a case study in the JinYu Cement Plant, an overview of the study site, the prediction results, and the model performance evaluation. Section 5 summarizes the model outcomes and limitations.

Related Works
The related literature has addressed slope displacement prediction from different perspectives [22][23][24][25][26]. Here, we review the recent progress in displacement prediction methods based on deep learning.
Ref. [27] proposed a dynamic model based on the LSTM neural network to predict landslide displacement. The accumulated displacement was decomposed into a trending and periodic term, where a cubic polynomial function was employed to estimate the trend displacement, and an LSTM model was used to predict the periodic displacement. The performance of the model was validated with the observations of two step-wise landslides in the Three Gorges Reservoir area, the Baishuihe and Bazimen landslides. Another work introduced two new concepts, namely, the trend sequence and the sensitivity states, to quantificationally characterize the response of landslide displacement to external factors and the internal states of landslides, respectively [28]. Then, the PSO algorithm and the SVR method were used to obtain the trend sequence, and the LSTM neural network was employed to predict the sensitivity states. Taking the Baishuihe landslide located in the Three Gorges Reservoir area as a case study, the proposed model showed satisfactory performance. Another study [29] investigated a coupling prediction model based on the double moving average (DMA) method and the LSTM network to improve the accuracy of landslide displacement point prediction and quantify the uncertainties associated with the predicted values.
Moreover, three prediction models, namely the LSTM, GRU, and RF, were utilized to estimate the periodic and total accumulated displacement of three step-wise landslides in the Three Gorges Dam Reservoir area [30]. Zhang et al. [31] also developed a novel and dynamic model to predict the displacements of step-like landslides. To this end, variational mode decomposition (VMD) was used to decompose the cumulative displacement into stochastic, periodic, and trend components. A polynomial expression with an optimized order fitted the trend displacement, and the bidirectional long short-term memory model dynamically modeled the periodic and stochastic displacements. The experiments demonstrated that the proposed model better predicted the displacement of step-like landslides. In another work [32], a novel prediction model based on the graph convolutional network was derived to estimate the slope deformation; the model considered the spatial correlations between all points in the entire displacement monitoring system. Furthermore, the complete-ensemble empiricalmode decomposition with adaptive noise (CEEMDAN) algorithm was employed to divide the total displacement into a trending, periodic, and residual term [33]. Then, a novel model based on the LSTM was established to predict the displacement of landslides.
However, in real-world scenarios, the training data sets for slope displacement are insufficient, which somewhat restricts the applications of the above methods. In addition, monitoring equipment can be destroyed under complex external conditions, and monitoring interruption results in data missing for model training. The transfer learning (TL) method solves the above problems in such cases because the knowledge acquired from the source tasks with sufficient training data can be applied to the target task without enough training data. Therefore, this paper combines the Bi-GRU network, attention mechanism, and transfer learning to construct a model for predicting the slope displacement of the landslide in Hebei JinYu Zenith Cement Co., Ltd., China. Concurrently, some ordinary networks such as SVR, LSTM, GRU, Bi-LSTM, Bi-GRU, and Bi-GRU with attention mechanism and the transfer network proposed in this work are invoked for comparison.

Establishment of the Slope Displacement Prediction Model
Predicting slope displacement can be regarded as a timeseries prediction problem, and the Bi-GRU can deeply capture the relationship between the forward and backward characteristics of the displacement data. Bidirectionally handling data in the learning process offers the opportunity to have a strong representation of the features to better predict the future step. The attention mechanism can pay attention to only relevant pieces of information, and specifically, the importance of pieces of information is determined by their probability distribution, which eliminates unnecessary information.
During the process of slope deformation, the distribution of displacement monitoring data changes remarkably over time, and the maintenance of monitoring equipment causes discontinuity in the data, resulting in the discontinuity between the historical and newly collected data on displacement. Discontinuity in the training data leads to the deterioration of network prediction performance, so the transfer learning strategy is introduced to solve this problem. The model trained with historical data is used as the source domain model, and the knowledge and skills learned from the source domain model are applied to the new data (the target task) over a long period to assist in predicting new data. In this way, knowledge acquisition is no longer from scratch, which can enhance prediction accuracy to a certain extent. Figure 1 shows the technical method this paper employs to predict the slope displacement accurately. First, the collected displacement data are divided into the source and target data. Afterward, a displacement prediction model based on the Bi-GRU and the attention mechanism (denoted as Bi-GRU-ATT) is constructed, and model training is conducted using the source data. Then, a small amount of target data is used to fine-tune the prediction model so as to obtain the target model. After that, the prediction performance of the model is evaluated using the testing data set. Finally, the prediction results of the model are denormalized to estimate the displacement of the landslide. Figure 2 illustrates the model structure diagram, comprising the input layer, the Bi-GRU layer, the attention layer, the fully connected layer, and the output layer.
3.1. Bidirectional Gated Recurrent Unit. The recurrent neural network focuses on time-series problems because of its The correlation analysis of the input series is required, and the order and parameters of the model should be determined in advance.
Since a variety of factors affect landslide deformation, the statistical model cannot reflect the fluctuation in the landslide displacement with the influencing factors. Gray model and related improved models The original time series should be accumulated to yield the once accumulating generation operator (1-AGO) data sequence.

Machinelearningbased model
Extreme learning machine model The inputs are usually the selected inducing factors or features.
Static fitting is performed on historical data, which cannot reflect the nature of the dynamic evolution of landslides Support-vector machine (SVM) model Neural network model Deep-learning-based models, such as long short-term memory (LSTM) and gated recurrent unit (GRU) models The input can be original displacement series and various influencing factors.
When the sample data are limited, overfitting can easily occur, resulting in poor prediction accuracy.
3 Journal of Sensors advantages over sequence dependence. However, due to gradient exploding and vanishing, there is a length limitation when applying the RNN algorithm [34]. Subsequently, variants of RNN, such as the LSTM and GRU neural network, are proposed to deal with long-sequence prediction problems. The GRU retains some properties of the LSTM, and simplifies the structure, which can realize information forgetting and memorizing with the same gate, thereby giving rise to fewer parameters and faster convergence [35]. Figure 3 depicts the structure of the gated recurrent unit.
The GRU model consists of two gates, reset gate and update gate. The update gate controls the previous information that will be carried over to the current layer, while the reset gate decides the amount of information to forget. As seen in Figure 3, the GRU transmits the input state x t at the current moment and the hidden layer's output state at the previous moment to the reset gate, which determines how much memory information is saved. It can be formulated as where r t is the output of reset gate, σ is the logistic sigmoid function, x t and h t−1 are the input and the previous hidden state, respectively. W r and U r are learned weight matrices. b r is the bias. Similarly, the output of update gate is computed by The new memory state h t can be computed by The final hidden state h t can be computed by where ∘ represents the Hadamard product. The gating signal z t ranges from 0 to 1. The closer the gating signal is to 1, the more data will be memorized, whereas the closer to 0, the more forgotten. The unidirectional GRU model uses previous information to predict follow-up information, while the bidirectional GRU can comprehensively learn the time-correlated information in both forward and backward directions simultaneously to improve prediction accuracy [36]. The Bi-GRU extends the unidirectional GRU by introducing a second layer, as shown in Figure 4.
The horizontal direction calculates the forward GRU hidden vector (h ! t ) and the backward GRU hidden vector (h t ) at each time step(t)simultaneously. The vertical direction represents a unidirectional flow from the input layer to the hidden layer and then to the output layer. Connecting the two hidden states can calculate the final prediction of the Bi-GRU, as described below: where GRUð⋅Þ represents GRU function, W t , V t are the weights of forward GRU and backward GRU, respectively, and b y is the bias of the output layer.
Predicting the displacement of a slope can be considered a regression problem related to time series. The bidirectional structure offers the ability to look into the displacement series in two opposite directions, which allows acquiring the full knowledge of past and future statuses for better prediction results.

Attention Mechanism.
Generally, the GRU model can effectively avoid the phenomenon of gradient disappearance, however, when the time series is too long, the model may experience the phenomenon of gradient disappearance. Therefore, an attention mechanism is introduced to reduce the loss of historical information and strengthen the influence   Journal of Sensors of important information, which is one of the most influential ideas in the field of deep learning. The attention mechanism assigns different weights to the Bi-GRU hidden layer, which highlights the critical information, thereby optimizing the out-put value and improving the prediction accuracy. Figure 5 depicts a schematic of the attention mechanism. The attention mechanism can simply be a weighted summation [37]. It starts by calculating the importance of the     Journal of Sensors input information; then, the softmax function is utilized to ensure that the sum of all the weights equals 1.0. The weights and the input information are multiplied correspondingly, and are eventually summed to obtain the output. The relevant equations are as follows: where n is the total time steps of the input sequence, α t is the weight calculated for each state u t at each time step, W d is the weight matrix, b d is the weight offset, and y t is the hidden state of Bi-GRU.

Transfer Learning.
Transfer learning is a primary method to solve the problem of insufficient available label data in machine learning [38]. This method relaxes the assumption that the training and testing data sets are identically distributed so as to realize the knowledge transfer from the source domain to the target domain. After establishing the training network based on the Bi-GRU and the attention mechanism, this work uses the source data set to train the training network so as to obtain the source model. Then, the target data set is employed to fine-tune the full connection layer of the source model so as to obtain the target model for predicting small data sets. Figure 6 shows the specific process. Deep learning prediction models are prone to overfitting for small data sets, and their prediction accuracy is not high [21,39]. On the other hand, obtaining a perfect data set is expensive and time-consuming. Moreover, the collected data on slope displacement are usually insufficient, and the data reflecting the slope inclining to slide are limited, which affects the accuracy of early warning. Transfer learning is essential to solve the fundamental problem of insufficient training data. Therefore, concerning the less effective sample data during a long period of slope deformation, we introduce a transfer learning algorithm to solve the slope displacement prediction under limited sample data.

Case Study
4.1. Project Description. The case study area is the slope of the JinYu Cement Plant in the Luquan District, Shijiazhuang, China. The JinYu Cement Plant established a highly integrated global navigation satellite system (GNSS) online monitoring system, realizing the comprehensive network monitoring of the slope with high efficiency, sensitivity, quick response, and reliable operation, to improve the slope monitoring and the emergency response level. Figure 7 shows the location and actual view of the slope studied, and Figure 8 depicts the layout of the monitoring points and installed GNSS monitoring equipment.
Nine monitoring points are arranged on this slope, as shown in Figure 8. Table 2 presents some of the data collected by monitoring point G112 in 2019, where X, Y, and Z represent the displacement in the north, east, and vertical directions of the coordinate system, respectively. A negative value indicates the opposite direction.

Data
Preprocessing. The experiment selects 798 displacement data obtained from monitoring point G112 from 2019 to 2021 as the experimental objects, and the first 648 data from June 2019 to January 2021 are used as the source data. Due to the equipment overhaul, 150 groups of data from March 2021 to July 2021 are employed as the target data. For target model training, the first 110 data from the target data set are selected as the training data set. After finetuning the training model, the last 40 data are selected for its testing.

Journal of Sensors
To improve the training speed and prediction accuracy, we normalize the collected sample data as follows: where x is the original data, and x max and x min indicate the maximum and the minimum in the sample, respectively. The root-mean-square error (RMSE) and mean absolute error (MAE) are used for the evaluation criteria, which can be expressed by Equations (12) and (13), respectively.
where y i denotes the measured accumulative displacement of the slope,ŷ i denotes the predictive accumulative displacement of the slope, and n represents the number of predictive values. RMSE and MAE reflect the difference between the predicted value and the measured value and the dispersion degree of errors. The smaller the RMSE and MAE, the better the model performance.

Results and Discussion.
According to the training process, as described in the previous section, the performance of the ordinary networks, i.e., the SVR, LSTM, GRU, Bi-   GRU, Bi-LSTM, and Bi-GRU with the attention mechanism, and the transfer network model, i.e., the Bi-GRU with the attention mechanism and transfer learning, is evaluated employing a testing data set. Since deep learning methods have a specific degree of uncertainty, model training and predictions may differ each time. Each algorithm runs 100 times to evaluate the mean prediction accuracy of the model so as to reduce the uncertainty of the prediction results. For the SVR, the kernel function is linear and the penalty factor (C) is set at 200, the best prediction accuracy. The structural parameters and the training method of the Bi-GRU-ATT (the Bi-GRU model with the attention mechanism) and the TRA-Bi-GRU-ATT (the Bi-GRU model with the attention mechanism and transfer learning) are consistent: they comprise two layers of the Bi-GRU; the nodes in the hidden layers of the Bi-GRU-ATT and the TRA-Bi-GRU-ATT are also 64 and 128, respectively. The batch size and epochs are also set at 30 and 120, respectively. Tables 3 and 4 and Figures 9-11 present the comparisons, and Figure 10 shows the absolute prediction errors of the different models. Since the stochastic gradient descent method maintains a special learning rate for training, the convergence speed is relatively low. Therefore, the Adam optimization algorithm is adopted to adapt each parameter to the learning rate.
The experimental results confirm that model TRA-Bi-GRU-ATT proposed in this paper achieves the highest degree of prediction accuracy. According to the best values of RMSE and MAE, the degree of single prediction accuracy can be higher in order: TRA − Bi − GRU − ATT > Bi − GRU − ATT > Bi − GRU > Bi − LSTM > GRU > SVR > LSTM. According to the mean values of the RMSE and the MAE, the mean prediction accuracy can be higher in order: Although model TRA-Bi-GRU-ATT obtains the highest degrees of single and mean prediction accuracy, its stability is less than that of the SVR. During multiple experiments, when a suitable kernel function is chosen, the prediction accuracy of the SVR remains unchanged. However, due to too many parameters that must be tuned, the prediction stability of the TRA-Bi-GRU-ATT, Bi-GRU-ATT, Bi-GRU, Bi-LSTM, GRU, and LSTM is slightly poor. Figure 11 shows the box plots comparing the prediction performance of the different models: the distribution of the absolute errors. The absolute error distribution of model TRA-Bi-GRU-ATT is more concentrated than that of the other models, and its prediction performance is superior to the others. Moreover, it can be concluded that transfer learning enhances the prediction performance of the model by extracting the critical features from the historical displacement sequence to assist with learning new data. It can entirely exploit the historical information within a long period and provide a good weight initialization for the training of the slope displacement model.

Conclusions and Future Perspectives
This work developed a predictive model combining transfer learning and the deep neural network, where the deep neural network adopted the Bi-GRU to extract the forward and backward information from the analyzed displacement series. The attention mechanism was introduced to highlight the critical information and reduce the loss of historical Due to complex external conditions, monitoring systems can be destroyed, and monitoring interruption results in data missing for model training. Aiming at this problem, we introduce transfer learning employing historical data to assist with learning new data for displacement prediction. Taking the slope of the JinYu Cement Plant as an example, the experimental results demonstrated that the model proposed herein offered a higher degree of accuracy in predicting the slope displacement and had the potential to estimate the displacement of landslides, particularly in case of limited sample data.
The shortcoming of the developed model was that the displacement prediction curve showed evolution characteristics from a single monitoring point. However, there were  10 Journal of Sensors nine monitoring points, and the spatial correlations between them were overlooked. Thus, it was difficult to reveal the displacement changes in the entire monitoring system. In the future, we plan to discuss the correlation between the displacement series of different monitoring points and devise a spatiotemporal prediction method based on time-series data from an entire monitoring system containing multiple points. Furthermore, since machine learning methods have a specific degree of uncertainty, model training and predictions may differ. Ref. [10] compared the performance of five popular machine learning methods, and the experimental results showed that no method achieved the best results in all three aspects: the highest single accuracy, the mean accuracy, and the stability. Therefore, improving the stability of the prediction model while ensuring its accuracy can be one of the future research focuses.

Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
The authors declare that there are no conflicts of interest.