A Data-Driven Method for Predicting the Cutterhead Torque of EPB Shield Machine

-e prediction of cutterhead torque of earth pressure balance (EPB) shield machine is mainly studied. First, the idea of shield tunneling stage division is proposed. -e process of shield tunneling from start to stop (or pause) is divided into start-up and stationary driving stages. Using the change point detectionmethod based on linear regression, the separation points between startup stage and stationary driving stage are identified from the original construction data, and the datasets of the two stages are extracted, respectively.-en, for the start-up stage, the linear regression method is suggested for the cutterhead torque prediction, since there is a strong linear correlation between the key parameters such as the cutterhead torque and the thrust force. Meanwhile, for the stationary driving stage, considering the fact that the key parameters vary smoothly and show obvious inertia, the long short-term memory (LSTM) network method can be used to establish the relationship model between cutterhead torque and other key parameters, such as the thrust force.-rough the test experiments of construction data in Zhengzhou, Luoyang, and Dalian shield projects, the results show that the proposed segmented modeling method possesses good adaptability and the cutterhead torque prediction model has high prediction accuracy.


Introduction
Torque of the cutterhead is one of the key parameters for the tunnel boring machine (TBM), which maintains the cutterhead to rotate and cut the soil continuously, and is also the basic parameter for equipment energy efficiency control and safety state monitoring. It is closely related to the size and configuration of the equipment, geomechanical parameters, shield operation parameters, etc., so its influencing factors are very complex [1,2]. Estimation of cutterhead torque is an important basis for the equipment parameter design and construction control and is also the basic guarantee for fast, safe, and efficient tunneling.
e existing literature mainly focuses on the geometric characteristics of the components of the cutterhead, loading state, and soil constitution and establishes the model predicting the cutterhead torque based on mechanical analysis. Usually, many assumptions and approximations are needed to satisfy the ideal modeling conditions. However, the physical and mechanical properties of soil are strongly correlated to soil composition, water content, geological structure, and other factors, so current soil strength theories can only be applied to the physical and mechanical properties of some certain type of soil under specific loading conditions, and there is no accurate constitutive relationship to describe its mechanical properties in practice [3][4][5]. For example, the friction coefficient between the cutterhead and the surrounding soil must be assumed artificially when calculating the torque of the cutterhead. All these assumptions would lead to inaccurate estimation of cutterhead torque [6][7][8][9][10].
In recent years, with the rapid development of computer and artificial intelligence technology, various data-driven methods have provided much support for the analysis of huge engineering data [11][12][13]. In [14], Wang et al. studied the prediction of cutterhead torque based on the real data of Pine diversion and water supply project in Jilin Province of China, where nonlinear support vector regression (NSVR) is employed, and cutterhead torque is taken as the output and some operation parameters as inputs. However, all of these algorithm models, such as artificial neural network (ANN), support vector machine (SVM), and random forest models (RFM), are difficult to explain because of their complex nonlinear structure and are misleading to understand the relationship between the response and the predictors [15][16][17][18].
In this paper, based on the periodic characteristics of the shield tunneling process, the data of the starting and steady tunneling stages of the TBM are separated and two different regression models about the cutterhead torque and other operation parameters are proposed, according to the different distributions of the data in both the stages. e analysis of actual TBM operation data shows that the proposed data-driven piecewise modeling method for predicting the cutterhead torque has better accuracy and applicability. Specifically, the main contributions of this paper are listed as follows: (1) A clear data collection method is proposed, that is, data are collected according to similar geological conditions and the same type of shield machine (2) e change point between the start-up stage and the stationary driving stage in an excavation cycle can be identified by the change point detection method based on linear regression, and the data are separated into two parts as start-up dataset and the stationary driving dataset (3) After denoising and normalization, linear regression models are used on the starting dataset and the long short-term memory (LSTM) recurrent neural network [19] model is employed on the stationary driving dataset (4) Compared with the existing methods, the segmented regression models are established, and the proposed method takes into account not only the influence of geological conditions and type of TBM on the modeling but also the influence of different states of TBM e rest of this paper is organized as follows. In Section 2, the necessity of data segmentation for shield construction is clarified and a method of data segmentation is provided. In Section 3, the modeling methods of start-up stage and steady driving stage are presented, respectively. In Section 4, a real data analysis is given to illustrate the effectiveness of the proposed modeling methods. Finally, Section 5 concludes the paper pointing towards future research directions.

Data Extraction and Preprocessing
In the process of shield construction, the TBM successively experiences the stages of starting, driving, suspension, driving, suspension, · · ·, driving, shutdown, etc. It is easy to see that the operation parameters have different distributions at the stages of starting and driving. erefore, in order to accurately predict the cutterhead torque, it is necessary to separate the data into two sets, corresponding to the start-up stage and the driving stage, respectively.

Necessity for the Separation of Data according to Stages.
e operation parameters present different trends when TBM works in different stages such as start-up and stationary driving. Figure 1 shows the local curves of the cutterhead rotation speed (CRS), the advance velocity (AV), the total thrust force (TTF), and the cutterhead torque (CT) (observed once per second), where the dataset is sampled from Ring 550 of Hui-Shang section of the subway tunnel project Line 4 in Zhengzhou of China. One can see that the operation parameters such as CRS, CT, and TTF grow up quickly at the start-up stage, and then the operating parameters oscillate slightly. e TBM operation data lasting no less than 120 s from the start-up to the suspension (or shutdown) stage are regarded as normal operation data, while those lasting less than 120 s from the starting to the suspension (or shutdown) stage are regarded as abnormal operation data, and the data in the suspension or shutdown stage are defined as shutdown data. e normal operation data, abnormal operation data, and shutdown data are continuous segmentally and appear alternately. As shown in Figure 1, when the shield machine is working, the total thrust force and the advance velocity will not be zero at the same time; otherwise, the shield machine will be suspended or shutdown. According to this feature, the starting and ending points of each normal operation data segment can be detected, and the normal operation data segments that meet the requirements can be extracted. Figure 2 shows the curves of typical segments of the normal operation data in Ring 550. When the shield tunnel machine starts, the key parameters (including CT, TTF, AV, and CRS) increase rapidly; then, the parameters become stationary after a period (about 100 s), which is called the stationary driving stage, and now the shield tunnel machine is in a relatively steady working state. Because of the significant difference of the operation parameters between the start-up stage and the stationary driving stage, it is necessary to establish the regression model segmentally. All the data are divided into three parts according to the start-up stage, the stationary driving stage, and other stages, where the datasets with respect to other stages contain some data that do not share the characteristics of the start-up stage and the stationary driving stage. In this way, the segmented models based on the datasets of different stages can better characterize the operation parameters in different stages and will have good performance in terms of prediction accuracy.

Change Point Detection between the Start-Up Stage and
Stationary Driving Stage. Because of the different characteristics of the operation parameters shown in different stages, it is possible to identify the change point between the start-up stage and stationary driving stage. All the data are divided into three parts according to the start-up stage, the stationary driving stage, and other stages, and the segmented regression model based on the start-up dataset and stationary driving dataset would be established, respectively.
From Figure 2, one can see that the thrust force shows obviously different trends between the start-up stage and stationary stage, and advance velocity and the cutterhead torque display the similar phenomenon to the thrust force. erefore, the change point detection method based on the linear regression model will be applied to the three parameters mentioned above, so as to comprehensively determine the true change point.
Change point detection based on the linear regression model can be described as follows. Let (t i , y i )(1 ≤ i ≤ N) be the observations, where y i is observed at instant t i , and there exist m(1 < m < N) and a i and b i (i � 1, 2) which attain the minimum of S m : If a 1 � a 2 and b 1 � b 2 do not hold simultaneously, a mutation occurs at m in the regression model, and m is called the change point.
In essence, two linear regression models with different slopes are employed to represent the relationship between the operation parameter and time t, and the start-up stage and stationary driving stage are separated by the change

Denoising and Normalization.
To reduce the interference of the outliers, principal "3σ" is employed to remove the data out of the interval [x − 3σ, x + 3σ], where x and σ represent the sample mean and standard deviation of operation parameter x, respectively. All the operation parameters are also scaled valued in the interval [0, 1], i.e., where x min and x max are the minimum and maximum of the sample of x. en, the Pearson correlation coefficients are calculated and sorted to select the important parameters, which are strongly related to cutterhead torque.

Regression Model for Cutterhead Torque on Other Operation Parameters
In this section, we will establish different regression models to capture the relationship between cutterhead torque and other operation parameters based on the start-up dataset and the stationary driving dataset.

Regression Model for Start-Up
Stage. e data collected by the State Key Laboratory of Shield Tunneling Technology of China on the intelligent big data platform of shield tunneling construction contain more than 500 attributes.
e key attributes (or parameters) include CT, propelling pressure of group A of cylinders (PPA), propelling pressure of group B of cylinders (PPB), propelling pressure of group C of cylinders (PPC), propelling pressure of group D of cylinders (PPD), AV, CRS, and rotation speed of screw conveyor (RSSC). To reduce the influence of the geological conditions and type of TBM, the construction data with the same type of TBM and similar geological conditions are sampled. For example, we sampled the dataset from the subway tunnel project Line 4 in Zhengzhou of China, Hui-Shang section, Rings 550-556, where the main geology is fine sand. e standard penetration value is 23.1, and the diameter of the TBM is 6.8 m. After normalizing of the start-up dataset, we compute the correlation coefficient matrix of the selected operation parameters, and the results are listed as follows.
e correlation coefficient between the variables (parameters) x i and x j is defined as Larger |Cor(x i , x j )| means stronger linear relationship between parameters x i and x j . It could be observed from Table 1 that the cutterhead torque has strong linear relationship with the advance velocity, the rotation speed of screw conveyor, and the propelling pressure of four groups of cylinders. us, the following linear regression model is employed to predict cutterhead torque: where the response variable y is the value of cutterhead torque, the operation variables x i (i � 1, 2, . . . can easily be estimated via the least square method, and the estimators are denoted by β i . Consequently, given the operation parameters x i (i � 1, 2, . . . , 7), the cutterhead torque can be predicted as y � β 0 + 7 i�1 β i x i . Although some machine learning algorithms such as random forest model (RFM), support vector regression (SVR), artificial neural network (RNN), and long short-term memory neural network (LSTM) could be used to predict cutterhead torque, here we prefer to apply the linear regression model (LM) for its good interpretability, less computational complexity, and especially for the strong linear correlation between cutterhead torque and other operation parameters shown in Table 1.
e analysis of actual data in the next section demonstrates that the LM performs well compared with SVR and RFM.

Regression Model for Stationary Driving Stage.
Different from the start-up stage, the cutterhead torque and other parameters in the stationary driving stage are in a relatively steady state. Observations of each parameter can be regarded as a stationary time series, which can be verified by the augmented Dickey-Fuller (ADF) stationarity test. Let X t |t � 0, 1, 2, . . . be a time series with expectation μ t � E(X t ), variance σ 2 t � Var(X t ), and covariance c(s, t) � Cov(X s , X t ), and if μ t � μ, σ 2 t � σ 2 , and c(s, t) � c(t − s), X t |t � 0, 1, 2, . . . are called a stationary time series. If the p value of the ADF test for the time series is close to zero, we can conclude the time series are stationary.
For stationary time series, the lagged order p can be determined by autocorrelation function (ACF) and partial autocorrelation function (PACF). As shown in Figure 2, the cutterhead torque in the stationary driving stage could be demonstrated as a stationary time series. In addition, there is a strong correlation between cutterhead torque and other operating parameters during the stationary driving stage. Specifically, we take the dataset which is sampled from Rings 550-556 of Hui-Shang section of the subway tunnel project Line 4 in Zhengzhou of China as an example. After extraction and normalization for the stationary driving dataset, we obtain the correlation matrix among cutterhead torque and other key operation parameters shown in following table.
In Table 2, RA (rolling angle) and PA (pitch angle) are other two parameters. It can be observed that the correlation coefficients are all above 0.6 between cutterhead torque and other seven parameters, CRS, AV, PPA, PPB, PPC, PPD, and RSSC.  Because of the autocorrelation of cutterhead torque and strong linear correlation with other parameters, the LSTM neural network is employed to predict cutterhead torque for stationary driving stage. It is a special recursive neural network (RNN) with the hidden nodes (also called cells) incorporating three kinds of gate operations, including the input gate, the forget gate, and the output gate. By tuning the status of these gates, the information flowing among LSTM's hidden layers could be controlled and the drawback of training the traditional RNNs could be overcome, where vanishing gradients or exploding gradients usually occur when the network is used to process a long input sequence [20,21].
In the prediction of cutterhead torque in the shield tunneling stage, the use of LSTM neural network model can not only take the influence of the previous moment of cutterhead torque into consideration but also cover the correlation between cutterhead torque and other key equipment parameters. In this way, while ensuring the accuracy of prediction, it can also enhance the adaptive ability of the model and expand the applicable scope of the model. e LSTM network for prediction of cutterhead is set as follows: Input: CRS, RSSC, PPA, PPB, PPC, PPD, and CT, with the training window width p Output: CT at next time instant

Evaluation Statistic for Regression
Model. R 2 is usually employed to assess the performance of the regression model, which is defined as follows: where y j is the jth observation of the response, y j is its predicted value, y is the sample mean of response, N is sample size. Closer to 1 as R 2 is, the performance of the regression model would be better.

Analysis of Actual Data
In this section, some actual data sets including subway tunnel projects in Zhengzhou, Luoyang and Dalian of China are analyzed to illustrate the effectiveness of the proposed methods and models. Firstly, the data are denoised and normalized. Secondly, start-up data set and stationary driving data set are extracted by the change point detection method based on linear regression models. irdly, the segmented regression models for the start-up stage and stationary driving stage are established, some other machine learning models as SVR and RFM are also compared.

Separation between Start-Up and Stationary Driving
Stages. All the phenomenon of the separations between start-up and stationary driving stages are similar, so we only present Figure 5 as an example, where PDA denotes the propelling displacement of group A of cylinders. Figure 5 shows the change points by dotted vertical lines, where the data are from Ring 552 of Hui-Shang section of the subway tunnel project Line 4 in Zhengzhou of China.

Linear Regression Model for Cutterhead in the Start-Up
Stage. Take the data from Rings 550-556 of Hui-Shang section of the tunnel project Line 4 in Zhengzhou as an example, and the data from the first six rings are used as training data and the last ring as testing data. e estimated linear regression model for cutterhead torque is obtained as follows: y � −0.13 + 0.09x 1 + 0.06x 2 − 0.74x 3 + 1.79x 4 − 0.48x 5 − 0.16x 6 + 0.54x 7 , where all the variables are normalized, and x i (i � 1, 2, . . . , 7) represent CRS, AV, PPA, PPB, PPC, PPD, and RSSC, respectively. Figures 6-8 show the predictive curves and true curves on testing data by the proposed linear regression model, SVR, and RFM, respectively. e predictive curves on the nonsegmented data by the linear regression model are Discrete Dynamics in Nature and Society depicted in Figure 9. It is easy to see that the linear regression model performs well compared with SVR and RFM for startup datasets, while the linear regression model performs poorly for nonsegmented data. is fact also reveals that the segmented modeling is necessary for better prediction of cutterhead torque. In addition, the comparison results for these models on different datasets are presented in Table 3. e data are processed by using Python 3.7 in a computer with Core i5-  Discrete Dynamics in Nature and Society 7 y � −0.07 + 0.40x 1 + 0.31x 2 + 0.06x 3 + 0.12x 4 − 0.04x 5 + 0.05x 6 + 0.08x 7 .
Also, the estimated linear regression model on the dataset sampled from the subway tunnel project Line 5 in Dalian of China, Rings 8047-8048, is as follows: y � −0.10 − 0.14x 1 + 0.31x 2 + 0.74 3 + 0.48x 4 − 0.56x 5 + 0.04x 6 + 0.13x 7 .   Discrete Dynamics in Nature and Society As shown in Table 3, the linear regression model performs better on testing data than SVR and RFM, and the latter two models overfit the training data. In addition, the time elapsed for LM is much less than that for SVR and RFM. Because of the well interpretability, facility for computation of the linear regression model, it is used to predict the cutterhead torque in the start-up stage.

LSTM Prediction Model for Cutterhead in the Stationary
Driving Stage. Take the dataset from the subway tunnel project Line 4 in Zhengzhou of China, Hui-Shang section, Rings 520-641 as an example; after removing the outliers, the sample size of the stationary driving dataset is 248400. Among these data, sample observations of 100 rings are selected as training data and the remaining 49680 observations are taken as testing ones. e ADF test for stationary of cutterhead torque is conducted, and the p value is 4.798 × 10 − 24 , which is near to zero, so we can conclude that the cutterhead torque series are stationary. e ACF and PACF are shown in Figure 10; it can be seen that the PACF is truncated after order of p � 1. e predictive curves by LSTM and true curves of cutterhead on testing data are depicted in Figure 11, and R 2 � 0.97. e results show that the LSTM model well simulates the relationship between cutterhead torque and other operating parameters. Considering both the dependence of the series and influences of other operation parameters, the LSTM performs well in predicting cutterhead torque in the stationary driving stage.

Conclusions
According to the characteristics of the key operation parameters, the idea of segmenting data is initiated and the change point detection based on the linear regression model is employed to separate the start-up and stationary driving stages. en, segmented regression models are established for predicting cutterhead torque. e main conclusions are briefly presented as follows: (1) e proposed change point detection method based on the linear regression model separates the data into start-up dataset and stationary driving dataset quite well. (2) In the start-up stage, CT, CRS, and TTF increase quickly and cutterhead torque has strong linear correlation with other key operation parameters. Compared with machine learning models such as SVR and RFM, the established linear model performs better. Meanwhile, the linear model merits the good interpretability and easy feasibility, which is a perfect choice to predict the cutterhead torque in the start-up stage. (3) In the stationary driving stage, considering the dependence of the parameter series and the influence of other operation parameters, the LSTM model is employed to predict cutterhead torque. e analysis of actual data demonstrates that the LSTM performs quite well. (4) e proposed segmented cutterhead torque predictive model can also be applied for early warning. In brief, first, the cutterhead torque prediction model based on the dataset from previous two rings is established; then, the cutterhead torque is predicted to judge whether the prediction error is bigger than a preset threshold value, and a warning will be issued if it is. is application is currently being tested on the intelligent big data platform of shield and TBM construction in the State Key Laboratory of Shield Machine and Boring Technology. Next, we will further optimize the modeling process according to the test results, find effective geological condition classification methods, and explore the establishment of cutterhead torque automatic alarm system, so as to better improve the safety of shield construction.
Data Availability e data sampled from Rings 520-641 of Hui-Shang section of the tunnel project Line 4 in Zhengzhou, Rings 104-106 of Line 1 in Luoyang, and Rings 47-48 of Line 5 in Dalian of China are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.