A New Embedded Estimation Model for Soil Temperature Prediction

With the continuous development of Earth science, soil temperature has received more and more attention in Earth system research as an important parameter. -e change of soil temperature (Ts) in different regions and related time series is affected by many factors, which bring certain difficulties to the accuracy of soil temperature prediction and the robustness of the algorithm. In this paper, an embedded network prediction model based on the gated recurrent unit (GRU) model is proposed to learn the local and global features of historical temperature for improving the prediction performance of soil temperature. We input different steps into the GRU model, and the output is weighted to obtain the final prediction result. In order to obtain the global characteristics of soil temperature, we connect the previous steps to the output layer directly, and the local characteristics of soil temperature are obtained through the following steps. -is paper uses the soil temperature data from two meteorological stations (Laegern and Fluehli) in Switzerland as the input data to predict the soil temperature for different soil depths (5 cm, 10 cm, and 15 cm) at different time points (6 hrs, 12 hrs, and 24 hrs), using RMSE, MAE, MSE, and R2 performance indicators as evaluation criteria to verify the accuracy of prediction. As the experimental results show, our method has the best performance compared to the others (artificial neural networks (ANN), extreme learning machine model (ELM), long short-termmemory network (LSTM), gated recurrent unit network (GRU)). In particular, we estimated the soil temperature at the soil depth of 10 cm of the Fluehli station in the coming 6 hrs; our method achieved the best performance; and, meanwhile, our model achieved the maximum value of R2 (0.9914) and the minimum values of RMSE (0.4668), MAE (0.2585), andMSE (0.2214) compared with the other fourmodels. -erefore, our model can not only predict the soil temperature at different depths but also improve the accuracy.


Introduction
Geoscience has played an important role in social development and economic construction; soil temperature (T s ) and its daily fluctuations are among the most vital meteorological parameters in Earth sciences, such as agriculture, forestry, and geology [1,2]; and it is an important variable of land-atmosphere interactions [3]. Meanwhile, there are many elements that affect the change of soil temperature; for example, the change of soil depth has a prominent effect on soil temperature. Research has shown that, in the processing of plant growth, shallow soil has a significant impact on seed germination, while deep soil affects root absorption activity [4]. erefore, the accurate prediction of soil temperature at different depths can be used to guide practical applications in some fields, which is instead of using traditional sensors manually for on-site measurement [5].
Currently, most of the soil temperature prediction methods use environmental factors to estimate [6]. However, the data collected in some regions is unavailable that cannot be used to predict, which will reduce the accuracy of model predictions [7]. erefore, this paper recommends using time series as the input data for the soil temperature prediction model.
In recent years, researchers usually use methods based on physical models to predict soil temperature through the heat transfer mechanism of the soil itself mainly [6,8]. However, there are many limitations in practical applications due to the physical model parameterization and scale issues [9]. With the continuous development of the machine learning method, it has been widely used in Earth sciences [10][11][12]. Ghorbani et al. proposed a method based on the support vector machine to estimate the soil field capacity and permanent wilting point [13]. And it also plays an important role in the soil temperature field [14]. e extreme learning machine is used to predict the soil temperature for improving the accuracy by Feng et al. [15]. Furthermore, LSTM has also received attention from researchers [16].
For the machine learning method, the artificial neural network (ANN) [2,17,18] and ELM [15,19] can learn the features from the input data without using a physical model, so there is no need to understand their internal physical processes. e artificial neural network has strong self-adaptation and self-learning capabilities and can continuously update the parameters in the model to make the output value closer to the real value, so it is used as a soil temperature prediction model usually [20][21][22]. Bilgili proposed a method based on the artificial neural network to predict monthly average soil temperature [23]. Mehdizadeh et al. used the model based on feedforward backpropagation neural networks (FFBPNN) and gene expression programming (GEP) to estimate the daily soil temperature at different depths [24]. When the traditional artificial neural network is used to predict the soil temperature, the accuracy of the output results is reduced because the correlation of the time series is not considered. For solving this problem, many researchers merge genetic algorithms into artificial neural networks to optimize neural networks [20,25,26]. However, the genetic algorithm is not efficient and prone to premature convergence. erefore, this method still needs further research.
Deep learning methods are widely used to deal with time series data. Compared with recurrent neural networks (RNN) and LSTM, GRU has a simpler structure and can solve the problem of long-term dependence [27]. erefore, this paper chooses the model based on the GRU network to predict soil temperature. e model uses hidden states to convey the information and process the relevance of time series data. It is widely used in many fields due to its special network structure. Liu et al. proposed a method, GRU-based nonlinear predictive denoising autoencoders for fault diagnosis of rolling bearing [28]. Miau and Hung designed the Conv-GRU model to estimate the water level, and the results show the effectiveness of the method [29]. Rui et al. combined GRU and LSTM models to predict traffic flow [30]. According to our research, the GRU network had not been used for soil temperature prediction. e following questions are the focus of this article. e first one is how to choose the input data to T s estimation model. e estimation of T s is affected by the past T s . Although the relevant meteorological data have some impact on T s estimation, the accuracy of the model for T s estimation will be affected by the errors between the provided data and the real data. Consequently, this paper concentrates on the time series of data. e other one is about the network model construction of the method in our paper. In the GRU network, the information is transferred by updating the cell state and the parameters in the hidden state. As the steps of the time series increase, the correlation between the initial data and the output data will be decreased, which will lead to a decrease in prediction accuracy. e motivation of this paper is to solve the problem of long-term serial dependence of soil temperature data, which leads to a decrease in prediction accuracy. With the goal, this paper proposed a new embedded estimation model based on the GRU network for T s estimation. In order to obtain the global characteristics of soil temperature, we connect the previous steps to the output layer directly, and the local characteristics of soil temperature are obtained through the following steps. We set different steps to the channels; with the outputs as cells, the estimation result is calculated by fully connecting different cells, using the past T s data from the Laegern and Fluehli stations in Switzerland from 2006 to 2014 to estimate T s in the next 6 hrs, 12 hrs, and 24 hrs at different soil depths (5, 10, and 15 cm).
e main contributions of this paper for T s estimation are listed as follows: (1) According to our research, the GRU network had not been used for soil temperature prediction yet, and the method based on GRU was achieved in this paper for the purpose of estimating soil temperature. (2) In order to obtain the global characteristics of soil temperature, we connect the previous steps to the output layer directly, and the local characteristics of soil temperature are obtained through the following steps. (3) As the results showed, our method has a better performance than the other advanced technology available.  Figure 1 shows the overall structure of our study.

e Structure of GRU.
e GRU network has the characteristics of simple structure and fast training speed and can transmit relevant information to the time series for prediction. It is widely applied in many fields due to its advantages precisely. e GUR can solve the time series problem and the gradient problem in backpropagation. e GRU unit structure is shown in Figure 2. It has two gates, which are the reset gate and the update gate. e update gate decides which new information should be discarded and added. e reset gate is used to decide how much past information to forget. 2 Scientific Programming e calculation formulas of the GRU are as follows: where x(t) represents the input value at the current moment and h t−1 is the hidden state of the previous node and uses them to get the gate status. r t is the reset gate, z t is the reset gate, h t represents the new memory, h t represents the hidden state, y t is the output of the output layer, σ(·) is the sigmoid activation function, and tanh(·) is the output tangent function.

e Structure of Our Model.
rough the previous analysis, when there are a large number of cells in the GRU network, the correlation between the features will decrease with the time series extending. erefore, we proposed the model based on GRU is to solve the problem and improve the accuracy of the estimation model, as the topological structure is shown in Figure 3.
Our model network is composed of the traditional GRU network and the auxiliary networks. e information is updated to the next cell through the parameter backpropagation of the hidden state in the GRU model that would decrease the correlation with the earlier. e traditional GRU network is used as the basic network to obtain local features, and the auxiliary network is composed of the output of different steps to obtain global features, meanwhile, merging the features as the output of the entire network model. Input the past T s data into our model to learn the pattern of periodic changes in T s , which can enhance the correlation between past T s data and improve the accuracy for the prediction.
We proposed the final output of our model is y our method (t) at the time step t, which combined all the channels to the fully connected layer, as follows:  Figure 1: e overall structure of our study. rates for different parameters. e method is suitable for processing large-scale data and optimizing parameters, as well as solving sparse gradient problems. e Adam is widely used in the field of deep learning, and it is used to optimize the model in this paper. is article uses the mean square error to optimize our model, and the calculation formula is as follows: where the observed data from the stations is y(t) and the output of our model is y our method (t).
Learning rate is an important parameter in deep learning. e value of the learning rate will affect the convergence of the function. When the learning rate is set too small, the convergence will be very slow. Meanwhile, when the value is set too large, the gradient will be affected. We use the method of exponential decay learning rate to improve the convergence of Adam and the method of exponential decay learning rate. By constantly adjusting the learning rate, the step size is set to 100 and the attenuation rate is 0.96; the algorithm is close to the optimal solution.

Model Training and Test.
In this paper, the past T s data (the Laegern and Fluehli stations in Switzerland from 2006 to 2014) is served as the input to our model for estimating T s and using TensorFlow backend. e model is tested on Intel Core (TM) i7-5820K, 3.30 GHz CPU, and 64 GB memory running Pycharm 2018. We use three-quarters of all data as training samples (data during 2006.1.1-2013.3.14), and the others were used as testing samples (data during 2013.3.15-2014.12.31). We assume that the value at the t point in the time series is x t , which is predicted by the first t − 1 elements; use half of the daily soil temperature to predict the values of T s in the following 6 hrs, 12 hrs, and 24 hrs; and set the value of t to 24. is paper compared our model with the other models (including ANN, LSTM, ELM, and GRU), meanwhile, calculating several evaluation criteria (RMSE, MAE, MSE, and R 2 ) to estimate the model performance, as follows: where the total number of data is denoted as N, y i is the observed value of data at the moment, y i is the predicted value obtained through different methods, y is the observed average value of the data. According to our knowledge, we can understand the fitting degree of the model and the accuracy of data prediction through evaluation criteria. With the smaller value of RSME, MAE, and MSE and the larger value of R 2 , the model will show the best performance.

Study Area and Field Experiment.
is paper studied the data from two stations (Laegern 47.48 N, 8.37 E, Fluehli 46.88 N, 8.01 E) located in Switzerland and downloaded the past T s data within half an hour on FLUXNET (https:// fluxnet.fluxdata.org/) to verify our model. Since these two The output at (t -1) time step The output at (t -2) time step The output at (t -3) time step The output at 1 time step stations are located in their domestic ecological nature reserve, T s has a certain impact on the surrounding ecological environment, such as plant growth, soil fertility, and microbial activities, as shown in Figure 4. is article takes the past T s data from the stations as the input. With the data provided by the stations, it can be seen that the depth at the 15 cm soil temperature of the Laegern station is the most stable, achieving the minimum of temperature differences and standard deviation; x min is the minimum value; x max is the maximum value; x mean is the average value; z sd represents the standard deviation; z s represents skewness; and z v represents variation coefficient which is shown in Table 1.

Results and Discussions
In this paper, comparing our model with the other four models (BPNN, LSTM, ELM, GRU) for estimating T s at the two stations' data, use the Adam to optimize the model and experiment with scikit-learn. For instance, input layer, hidden layer, and output layer consist of the ANN. We set the batch size to 10000, the number of iterations to 100, the learn rate to 0.03, and the number of nodes to 32, and the model get the best performance. We apply the elm function to the ELM model, the activation function in the hidden layer is sigmoid, and the number of nodes is set same to ANN. We Set the same hyperparameters to the GRU and our model so that it can be useful to show the performance of our model.

Evaluation for the Hyperparameters in Our Model.
According to our research, the value of the hyperparameters has a certain influence on the performance of the model. As the hyperparameters, what we mentioned are the number of channels, iterations, learning rate, and the number of nodes (num our model ). For example, we estimate the soil temperature at the depth of 5 cm for 6 hrs of the Laegern station. As the results showed, num our model has a certain impact on the fitting of the model and the acquisition of relevant important information. With the learning rate set to 0.03, the number of nodes is 32, the number of channels is set to 4, the number of iterations is set to 100, and our model has the best performance.
When the model is overfitting during the training process, it is not conducive to the model adapting to the changes of the data. In contrast, if underfitting occurs during training, it is not conducive to data mining of related data. Moreover, due to the large learning rate value for optimal weights, the predictive model can easily be trapped into local optimum during the learning process.
en, when the learning rate value is small, it will make the parameters hardly converge to the optimal value for training the predictive model. e results are shown in Table 2, as follows:

Evaluation for Different
Models. Comparing our model with the other four models (BPNN, ELM, LSTM, and GRU) in this part. e inputs to all the predictive models were the past T s data from the stations. e output of the model was estimated T s values in the following 6 hrs, 12 hrs, and 24 hrs. e predicting results of the five models at the depths of 5, 10, and 15 cm in the following 6 hrs, 12 hrs, and 24 hrs are experimented for the Laegern station as shown in Table 3. For the depth of 5 cm results in the coming 6 hrs, our model has better performance than the other models. it is obvious that our model has a very excellent performance on T s estimation. However, for the depth of 5 cm results in the following 12 hrs and 24 hrs, we can draw the same conclusion that our model performance is still stronger than the other four models. e accuracy has been continuously improved from 5 cm to 15 cm of the soil depths, but the accuracy in the following 6 hrs to 24 hrs has been decreased. It may be caused by systematic errors when the model makes long-term predictions [32]. However, the ELM model gets a superior level of accuracy compared to the others models for the depth of 5 cm in the following 12 hrs, the depth of 10 cm in the following 6 hrs, and the depth of 15 cm. e reason for this might be ELM is a feedforward neural network architecture in which parameters are randomly chosen [19]. Sometimes a nonoptimal solution may be generated, which will affect the performance of the model.
We estimate the soil temperature at the depth of 5 cm, 10 cm, and 15 cm of the Laegern station in the following 6 hrs, 12 hrs, and 24 hrs, respectively. e estimation results are at the depth of 5 cm, 10 cm, and 15 cm of the Laegern station in the following 6 hrs, 12 hrs, and 24 hrs. e linear relationship between the estimated value and the observed value is shown in Figure 5. According to the distribution of the scatter plot, it can be seen that the linear relationship of our model is closer to the ideal line (y � x), and its R 2 value is higher than the others. It is shown in Figures 5(a) and 5(b), for instance, that the higher value of R 2 in our model is 0.9638 for the depth of 5 cm in the following 6 hrs, and the linear relationship is y � 1.0063x + 0.0438. However, all the tested models get a good performance when we estimate at the depth of 15 cm in the following 6 hrs and 12 hrs, which is shown in Figure 5(c). at is because the estimated value and the observed value have better consistency in this case. As the time series extends (24 hrs), our model still maintains good accuracy, and the accuracy of other models starts to decrease. Above all, the experimental results showed that our model has a certain degree of robustness for long-term estimation.
Scientific Programming e frequency plot of the absolute estimation error is shown in Figure 6, and each bar indicates the error percentage. It can be obviously seen that our model has the highest frequency for the smallest error magnitude encountered in estimating T s of the Laegern station. For example, for the depth of 5 cm in the following 6 hrs, our model has a higher value of frequency (73.8%) compared to the other four models (46.5% (GRU), 45.9% (LSTM), 46.6% (ELM), 36.8% (BPNN)).   Using different predictive models to test the data separately from the Fluehli station at the depths of 5 cm, 10 cm, and 15 cm in the following 6 hrs, 12 hrs, and 24 hrs, the results are shown in Table 4.
e results noted that our model mainly gets better performance than the others; for example, the proposed model achieved excellent results (RMSE � 0.6534, MAE � 0.3928, MSE � 0.4735, and R 2 � 0.9860) compared to the other four models at the depth of 5 cm in the following 6 hrs at Fluehli station. Generally speaking, our model mainly has superior performance for estimating T s with the experiments in different regions, different times, and different soil depths.

Conclusions
Soil temperature (T s ) as the important variable is one of the land surface features impact on Earth science, usually used in many research fields; for example, it affects the growth and development of plants and the formation of soil. In this study, we research the performance of backpropagation neural networks (BPNN), gated recurrent unit (GRU), extreme learning machine (ELM), long short-term memory (LSTM) network, and our model for estimating T s at the depth of 5 cm, 10 cm, and 15 cm in the following 6 hrs, 12 hrs, and 24 hrs over the Switzerland Laegern and Fluehli stations. e statistical results indicated that our model mainly performs better than the other four models on the T s estimation.
In order to reduce the influence of long-term series on the accuracy of soil temperature estimation and obtain the global characteristics of soil temperature, we connect the previous steps to the output layer directly, and the local characteristics of soil temperature are obtained through the following steps. e soil temperature is affected by many factors, such as atmospheric temperature and precipitation. EEMD can decompose time series into signals of different frequencies Data Availability e data included in this paper are available without any restriction.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.