Analysis of Diaphragm Wall Deflection Induced by Excavation Based on Machine Learning

For the concrete diaphragm wall (CDW) supported excavation, excessive wall deﬂection may pose a potential risk to adjacent structures and utilities in urban areas. Therefore, it is of signiﬁcance to predict the CDW deformation with high accuracy and eﬃciency. This paper investigates three machine learning algorithms, namely, back-propagation neural network (BPNN), long short-term memory (LSTM), and gated recurrent unit (GRU), to predict the excavation-induced CDW deﬂection. A database of ﬁeld measurement collected from an excavation project in Suzhou, China, is used to verify the proposed models. The results show that GRU exhibits lower prediction errors and better robustness in 10-fold cross validation than BPNN and executes less computational time than LSTM. Therefore, GRU is the most suitable algorithm for CDW deﬂection prediction considering both eﬀectiveness and eﬃciency, and the predicted results can provide reasonable assistance for safety monitoring and early warning strategies conducted on the construction site.


Introduction
In recent years, there has been a rapid development of metro construction in China. e metro stations are generally constructed using the cut and cover method, and concrete diaphragm wall (CDW) is one of the most widely used support techniques especially for deep excavation in saturated soils. e lateral deformation of CDW caused by excavation is a major concern to both engineers and researchers as it poses potential risks to the surrounding facilities and structures especially in urban areas [1][2][3][4]. erefore, it is essential to implement accurate and prompt prediction on CDW deflection in engineering practice.
Traditionally, methods to predict the excavation-induced CDW deformation can be categorized into two groups: empirical formula and numerical simulation. e empirical formula based on historic projects is relatively simple in model and easy to conduct [5][6][7][8], but the predicted results usually tend to be broad and the formula is unable to represent the dynamic evolution of wall deflection. e elaborate numerical simulation is theoretically more precise by considering the soil-structure interaction, but it is still difficult to take all the instinct and extinct factors into account, and a disparity of estimated results and field measurement frequently occurs [9,10].
Alternatively, soft computing technique such as machine learning (ML) is fast becoming a widely accepted method for predictive models in geotechnical application, which is capable of capturing nonlinear characteristics with high dimensions and has demonstrated superior predictive ability [11]. Artificial neural network (ANN) is one of the most prevailing ML algorithms used in geotechnical engineering. Goh et al. [12] presented a back-propagation neural network (BPNN) model to provide initial estimates of maximum wall deflections for braced excavations in soft clay. Kung et al. [13] demonstrated that the wall deflection can be accurately predicted by ANN, using hypothetical cases from finite element simulation for model training and 12 excavation case histories in Taipei for model validation. Zhang et al. [14] employed ensemble learning methods (ELMs), including the eXtreme Gradient Boosting (XGBoost), and Random Forest Regression (RFR) to predict the maximum lateral wall deformation.
Deep learning (DL) algorithms, considered as a subset or an evolution of ML, have deeper structure and can learn much more complex nonlinear features than conventional neural networks. For example, long short-term memory (LSTM) has achieved good practical application results in the dynamic and deep processing of massive, long-term, dependent data series [15,16]. Qu et al. [17] established the concrete dam deformation prediction model based on LSTM. Li et al. [18] developed an LSTM model to predict the TBM performances including the total thrust and the cutter-head torque in a real-time manner. In recent years, gated recurrent unit (GRU) has been successfully applied to spatial-temporal data and has been quite popular in many fields. Khan et al. [19] adopted a GRU-based deep learning approach to predict hourly traffic volume. Li et al. [20] proposed a prediction model utilizing GRU algorithm for the electricity generation. As for the evaluation of retaining structure behaviors, few studies have considered the data-driven models using DL algorithms.
is paper proposes a dynamic prediction model for CDW deflection based on data-mining algorithms. e applicability and generality of three algorithms BPNN, LSTM, and GRU were studied and compared, among which BPNN represents the classic ML algorithm applicable for predictive problem, and LSTM and GRU represent more advanced DL algorithms specialized in sequential data. Accordingly, the algorithm with best performance is recommended as a useful solution for predicting excavation-induced CDW deflections, and the predicted results can be an early alert for field engineers. e remaining part of the paper proceeds as follows: in Section 2, BPNN, LSTM, and GRU prediction models are presented as well as the performance evaluation indicators. In Section 3, a real-life excavation project is adopted to testify the applicability of the proposed prediction models. In Section 4, the prediction results of three prediction models are compared and discussed. Finally, Section 5 summarized the conclusions obtained from the study with the highlights of major findings.

Methodology
2.1. Machine Learning Algorithm 2.1.1. BPNN. BPNN is a classic feedforward neural network consisting of an input layer, one or several hidden layers, and an output layer [21], as shown in Figure 1. Neurons in the input layer have the function of receiving and transmitting data. Hidden layer and output layer are composed of M-P neurons with activation functions. e mathematical relationships of M-P neurons in hidden layer and the output layer are expressed by the two following equations, respectively: where b h is the result of h th hidden neuron, x i is the i th input value out of d inputs, y j is the j th output value, q is the total number of hidden neurons, f is the activation function of sigmoid, ] hi and w hi are the weight terms, and c h and θ j are the bias terms.
2.1.2. LSTM. LSTM is a deep neural network designed for data with sequence characteristics [22]. As shown in Figure 2, there are two states in an LSTM neuron: cell state C (t) containing long-term memory and hidden state h (t) containing short-term memory, where new information is selectively recorded using three "gate" modules. e "gate" modules effectively solve the gradient vanishing problem in long-term series. e gate signal ranges from 0 to 1, where the value 0 means abandoning all the input data and the value 1 indicates that all the new information is highly relevant and should be memorized. e "forget gate" f (t) is used to calculate the forget ratio of the cell state at time t, expressed as e "input gate" i (t) determines the proportion of new information to be added to the cell state, expressed as e "output gate" o (t) calculates the output and updates the hidden state, expressed as where C (t) is the candidate cell state of LSTM neuron at time t; W and b are weight and bias terms; σ is the activation function of sigmoid; tanh is the activation function of hyperbolic tangent.
2.1.3. GRU. GRU was proposed as a modification for LSTM by Cho et al. [23] in 2014, which was initially used for language models. Figure 3 is the typical structure of GRU neuron, where the "update" gate is designed to combine the functions of "forget" and "input" gates in the LSTM neuron.
e GRU network structure is simplified and less parameters need to be trained; therefore, GRU can achieve high prediction accuracy but lower computational cost [24]. e first module is termed as "reset gate" r (t) , which determines the proportion of last hidden state to be added to the new hidden state, governed by equation (3): where the superscript t denotes the time sequence, x (t) is the input at time t, h (t−1) is the hidden state of GRU neuron at time t − 1, W is the weight term, and σ is the activation function of sigmoid. e second module is called "update gate" z (t) , which is used to calculate the memorize ratio of the new input, expressed as equation (4): en, the output of the GRU neuron is obtained via equations (5)- (7).
where h (t) is the candidate hidden state.

Performance Analysis. Prediction error inevitably
exists between output value y i and real valueŷ i for each training sample. e prediction error can be evaluated through loss functions, so as to update the parameters of neural network and assess the accuracy of prediction models. ree commonly used loss functions are summarized in Table 1.

Development of Prediction Models
2.2.1. Inputs and Outputs. Lateral displacements of CDW were measured by inclinometers embedded in the wall. e observation value collected from each inclinometer is denoted as x i t , indicating the deformation value of measuring point i on day t. In the process of construction, the inclinometers might be covered or interrupted, leading to the discontinuity of the recorded time series. In the data preprocess stage, the missing value can be filled by linear interpolation or other data augmentation methods. e input layer and output layer of prediction model are listed in Table 2. e input information length N and output prediction step M will directly affect the training speed and prediction accuracy. Generally, the richer the input information, the higher the prediction accuracy and the longer the training time. However, the continuity of measured data might be restricted by the actual conditions of construction site. Long-term prediction can provide plenty of time for precaution and implementation of deformation control measures, but whether the prediction accuracy is acceptable needs to be discussed. erefore, four prediction tasks are designed to verify the dynamic prediction ability of BPNN, LSTM, and GRU models for CDW deformation prediction. ese tasks differ in prediction time spans, that is, short-term and long-term predictions, and in input information, that is, prediction

Input layer Hidden layer
Output layer x 1 x i Figure 1: Schematic view of BPNN. Mathematical Problems in Engineering 3 using abundant information and prediction using limited information. e four tasks are listed in detail as follows: Task 1: 1-day deformation prediction using last 3-day monitored data Task 2: 7-day deformation prediction using last 3-day monitored data Task 3: 1-day deformation prediction using last 15-day monitored data Task 4: 7-day deformation prediction using last 15-day monitored data 2.2.2. Optimization. Training epoch, learning rate, and hidden neurons size were the three hyperparameters manually set before training. Training with small epochs may lead to insufficient learning and incomplete data characteristics extraction, termed as "under fitting," but large epochs will lead to "overfitting," where prediction performance on validation data is poor even when the prediction accuracy on training data has been improved to a great extent; small learning rate is time-consuming, but with large learning rate gradient may drop too fast and miss the best convergence point; less hidden neurons may limit the learning ability of the prediction model, whereas a large number of hidden neurons may lead to computational inefficiency. e parameters of weight and bias in prediction models are obtained through autonomous learning of training samples. Since MAE loss and MAPE loss are not smooth at the error close to 0, MSE loss is the most commonly used loss function in training process. For BPNN, stochastic gradient descent (SGD) is applied to minimize the MSE loss of training samples and update the parameters. As for LSTM and GRU, deep network architecture will lead to difficulties in parameter optimization; therefore, adaptive moment estimation (ADAM) is employed. Compared with SGD algorithm, ADAM algorithm sets adaptive learning rate for different parameters and obtains more opportunities to reach the global optimization point by considering an additional gradient moment.

Generalization.
e prediction model usually shows an excellent regression level on training sets, but it is more practically significant to achieve sound performance on datasets outside the training samples. is paper adopts K-fold cross validation (K-CV) method to evaluate the generalization ability of the prediction model. K-CV is a data partition technology, where the original database is randomly divided into K subdatasets, and K-1 subdatasets are used as training set with the remaining dataset used for testing [25]. e process is repeated K times so as to ensure that each sample is both trained and tested. erefore, the randomness of database division can be eliminated, and the distribution characteristics of the original database are preserved to the maximum degree.
e prediction performance of each model is determined by the average MSE of K validation sets, and the model with the lowest MSE is chosen to be the most appropriate one for CDW deformation prediction.

Model Development. Python (version 3.7.6) and
PyTorch machine learning library are used to program CDW deformation prediction models. Figure 4 illustrates the prediction process: Firstly, preprocess the database and divide the database into training sets and testing sets based on 10-fold CV; secondly, apply BPNN, LSTM, and GRU to learn the training samples, and obtain the optimal network parameters; finally, compare the performances of BPNN, LSTM, and GRU using evaluation index, and find the most suitable prediction model accordingly.

Data Sources
3.1. Project Overview. Figure 5 shows the excavation layout of subway station in Suzhou metro line 5, China. e excavation area is divided into Area I and Area II by installation of 1 m thick CDWs. e layout of Area I is irregular and its length is 87 m. e width is 27 m at west and increases to 36 m at east. e layout of Area II is virtually rectangular with the width of 24 m and length of 104 m. As shown in Figure 5, the inclinometers were installed along the periphery of the excavation zone. ere were 9 inclinometers installed in the CDWs with a spacing of roughly 20 m in Area I, while 14 inclinometers were installed in the CDWs in Area II, and the monitoring spacing was condensed to 11 m in the north of Area II where buildings are adjacent to the CDWs with a minimum distance of 1.7 m leading to an extremely sensitive construction environment. Figure 6 shows the typical cross-sectional profiles of excavation Area I and Area II. In Figure 6   No. Input layer Output layer 1 t+M embedment ratio is roughly 1.0. In Figure 6(b), the similar geological conditions for excavation in Area II are found. e excavation Area II has a depth of 24 m and is supported by 45 m long CDWs with an embedment ratio of 0.875. e parameters of the subsurface strata are listed in Table 3 as provided by the geotechnical data report (GDR) of the project. During the excavation, six levels of struts containing three concrete struts and three steel struts were installed. e procedure of excavation is detailed in Table 4.

In Situ Monitoring.
e monitoring report was recorded daily from August 8 th , 2018, to July 9 th , 2019 (a number of 336 pieces of monitored data for each inclinometer). e size of the CDW deformation database after missing data imputation is 7728. Figure 7 plots the time curves of maximum lateral displacement of CDW during excavation. is plot shows that the development of wall deflection at each inclinometer exhibits a similar pattern: in the initial stage of construction, due to small excavation depth and support of the first reinforced concrete strut, the deflection is relatively small; as the excavation proceeds, the deflections continuously increase, and substantial deflection occurs in the exposure time without supports, accounting for more than 60% of the total deformation; after pouring the bottom slab, the wall deflection is stabilized as a result of the synergistic effect of the whole supporting system.

10-Fold CV.
e influence of data input size N on prediction accuracy can be validated by comparing the results of task 1 and task 3 and the results of task 2 and task 4.

Mathematical Problems in Engineering 5
As shown in    e performance between short-term predictions (M � 1) and long-term predictions (M � 7) can be demonstrated by comparing the results of task 1 and task 2 and the results of task 3 and task 4. As shown in Table 5, when the input information length is fixed to 15 days, long-term predictions using BPNN achieve an average MSE of 12.79 mm 2 , which is the double of short-term prediction error. Average MSE of LSTM increases from 3.70 to 6.79 mm 2 , and that of GRU increases from 0.81 to 4.23 mm 2 when prediction step extends from 1 day to 7 days. erefore, it can be inferred that the increase of prediction step has an adverse effect on prediction accuracy, and GRU  Mathematical Problems in Engineering 7 achieves highest precision even in long-term prediction tasks because it can extract the correlation of input data in time sequence. Table 6 shows the training times of each algorithm on different tasks. BPNN trains the model more efficiently than the other two DL algorithms because it has a simple network structure and is relatively easy to train, but prediction accuracy and robustness resulting from BPNN are inferior. GRU trains the model slightly faster than LSTM, and the superiority will be amplified when applied to a larger-scale database. erefore, to achieve balance between efficiency and effectiveness, GRU is considered to be the most suitable algorithm for CDW deflection prediction.

New Monitored Points.
In order to further verify the generalization ability of the deformation prediction model, the optimal model obtained from 10-fold CV is selected to  make prediction in three new measuring points excluded in the original database, which are CX04, CX16, and CX23, respectively. As shown in Figure 5, CX04 is noteworthy as it is at the section with large excavation width in Area I. CX16 located in the center of the southern wall in Area II and CX23 located in the center of the western wall in Area I were chosen for verification because the deflections in the middle of excavation zone are generally much larger than those near the centers. e prediction model has an ideal performance on CX04 and CX16. It can be seen from Figures 8 and 9 that the predicted deflections fit the measurement perfectly in shortterm predictions (task 1 and task 3). Long-term predictions (task 2 and task 4) display higher dispersion degree, but MAEs are under 3 mm, which remain to be acceptable. e prediction results on CX23 deviate from the measured deflections, that is, MAPE reaching 45.6% in task 4. CX23 is in the transversal wall of excavation Area I, where the geometry and support system are quite different from all the other inclination points in the longitudinal walls; besides the measured data are limited due to the low monitoring frequency at the specific point. All the above reasons lead to the poor prediction performance on CX23; thereby, it should be noted that the reliability of the trained model cannot be guaranteed without similarity and quantity of data used in training.

Conclusions
is study established a dynamic prediction method for excavation-induced CDW deformation based on classic algorithm BPNN and DL algorithms LSTM and GRU, which can automatically extract the temporal correlation of monitored data. Four prediction tasks are designed to verify the influence of data input size and prediction time spans. A database of in situ measurement collected from a real-life excavation project is considered to evaluate the applicability of the proposed method. e results show the following: (1) Considering more historical deformation data not only improves the stability of the model but also reduces the prediction error. e increase of prediction steps reduces precision, but DL algorithms maintain satisfactory performance even in long-term prediction tasks.
(2) DL algorithms outperform BPNN in all prediction tasks with a substantial improvement in accuracy and exhibit less variation and strong robustness in 10-fold CV. (3) BPNN trains the model much faster but yields unusable prediction results. GRU achieves good balance between effectiveness and efficiency. erefore, GRU is considered to be the most suitable algorithm for dynamic prediction of CDW deformation.
Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper. Mathematical Problems in Engineering 9