A Novel Long Short-Term Memory Predicted Algorithm for BDS Short-Term Satellite Clock Offsets

,

communication network and high-performance data processing equipment, and it is also unstable and difficult for common users to acquire [11][12][13][14]. Therefore, developing a highprecision satellite clock offset prediction method for common users to low-cost obtained high-precision clock offset becomes one of the significant research issues in recent years.
A lot of traditional time series modeling methods have been applied to satellite clock offset forecasting such as polynomial model, GM (1,1) prediction model, Kalman filter model, ARIMA time series model, and spectral analysis model and achieved good performance. But each of them has their own deficiencies [7,[15][16][17]. The polynomial model performs worse while the forecast time is growing. The performance of the GM (1,1) prediction model depends on its exponential coefficients, and its requirement on the train data set is strict. Only with sufficient prior knowledge about the atomic clock's own operating characteristics and its environmental characteristics can the Kalman filter model achieve great prediction results. As for the ARIMA time series model and spectral analysis model, the former's optimal fixed order is inconsistent for the clock's type and in different orbits [18]. The latter requires a large amount of prior data to analyze the environmental noise and periodic factors of the satellite clock [19]. Due to the limitation of these traditional models, researchers try to introduce machine learning methods into the modeling of satellite clock errors in recent years. M. Kim and J. Kim proposed a GA-ARIMA model to revise the RTS forecast products of IGS, which is 32% higher than the traditional ARIMA prediction method, but their research object is limited to GPS [20]. Wang et al. built a short-term satellite clock offset model by wavelet neural network which is optimized through particle swarm optimization algorithm, and it can make 1 h forecasting results better than 0.3 ns. Although the particle swarm optimization algorithm can provide an optimization method for the wavelet neural network, the particle swarm optimization algorithm also easily falls into the local optimum, and this research is based on the GPS constellation with stable spatial structure; we are not uncertain whether it is suitable for the BDS constellation [21]. He et al. established the BeiDou 2/3 short-term forecast model based on the least squares support vector machine (LS-SVM), and its 24-hour forecast accuracy is improved by 37.7% compared with the traditional method. However, for the LS-SVM model, there is no definite standard about selecting the length of the training data. It is not that the longer the input data length is, the better the forecasting effect gets [22]. Huang et al. applied the BP neural network to correct the BeiDou-2 ultrarapid clock prediction product. The forecasting accuracy for 3 hours, 6 hours, 12 hours, and 24 hours was improved by 23.1%, 21.3%, 20.2%, and 19.8% compared with IGS's original product. But in their study, they did not give a clear rule for the selection of parameters about training BP neural network such as the time interval of the input samples [23].
Most of the machine learning methods are behavior modeling algorithm using the information generated by the past behavior of organisms, and they can also accurately estimate various complex nonlinear time series [24]. Long Short-Term Memory (LSTM) stands out from most of time series prediction algorithms, which uses its memory block in hidden lay to extract and save time series information and use the forget gate in memory block to filter the stored sequence information so that it can make full use of time series information and give accurate prediction [25,26]. Nowadays, LSTM has gradually been applied to deal with the problem of GNSS data processing. Kaselimi et al. constructed an ionospheric electron content prediction model based on LSTM to replace Global Ionospheric Map (GIM) data product which performs better than GIM product [27]. Tao et al. established a multipath CNN-LSTM network model to optimize the multi-GNSS positioning performance, which reduces its positioning root mean square error of multi-GNSS positioning in the east, north, and vertical directions by 62.3%, 70.8%, and 66.0% [28]. International Journal of Aerospace Engineering In this paper, a general deep learning method used for time-series modeling, LSTM, is taken to construct the clock bias prediction model. In order to avoid the complicated structure of LSTM, we only built single-layer LSTM and use the genetic algorithm (GA) to optimize the number of hidden layer units. Then, we divide satellite clock feature into system feature and environment feature, and the QP-based Long Short-Term Memory (QPLSTM) model is construct and is applied to short-term clock bias prediction for the first time. We used the IGU-O product to conduct six-hour forecasting compared to IGU-P product to evaluate the performance of QPLSTM. Furthermore, we also use precise satellite clock product to train our model and compare to four conventional methods during 30 min and 1 h forecasting. Finally, results show that we provide a more excellent method to realize highprecision short-term satellite clock offset prediction than the mentioned method above, and it can also replace the IGU-P product.
The remaining sections of the paper are organized as follows: the original RNN and its variant of LSTM used for time series modeling is introduced in detail in Section 2. Section 3 emphasizes on the construction for the clock offset prediction model-QPLSTM-based on the variant of LSTM which is introduced in Section 2. The simulation results and dissection are conducted in Section 4. Section 5 summarizes the performance of the QPLSTM and also points out the areas that need to be optimized and improved.

LSTM RNN
LSTM is one of the variants of RNN. This part will introduce the RNN and its LSTM variant which is used to establish the satellite clock offset prediction model in the next part [29]. It started from the basic RNN model, and then, it is derived to LSTM.
RNN can build a deep structure in the time dimension [30], and it has been widely used in time series modeling [31,32]. Figure 1 shows an RNN model being expanded to a full network, and the following introduces the mathematical symbols in Figure 1: (1) Set x t as the input vector at time t。 (2) Set H t as the hidden unit value at time t. The state value depends on the input vector at the current moment and the value of the hidden layer unit at the previous moment. H t can be calculated by the following formula: In the above formula, f is the activate function, such as sigmoid function and ReLU. Normally, H 0 is initialized to 0 (3) Set o t as the output vector at time t, and it can be calculated by the following formula: (4) In the above formula, W XH and W HO represent the weight matrix of the input layer and output layer connected to the hidden layer. W HH represents the transfer weight matrix between hidden layer units Even though RNN does a good job in time series modeling, its deep gradient is calculated in the form of partial derivative and multiplication through each activate function, and when the input vector becomes a largescale time series, the gradient calculation value easily disappears or explodes, making it difficult to calculate and update the gradient. And LSTM is one of the effective variants to solve this problem [26].
LSTM is designed to use a memory block to replace the hidden layer unit in the original RNN architecture. The memory block includes an input gate, an output gate, a forget gate, and self-circulating neurons [33], and the structure of which is shown in Figure 2. The input gate is used to determine how much current information is involved in the calculation of the self-circulating neuron state value of the memory block, the forget gate is used to determine how much historical information at the last moment is used to calculate the self-circulating neurons state value of the memory at this time, and the output gate is used to control the information that can be output in the memory block. By training the combination of these gates, not only the problem of gradient disappearance or explosion caused by the large-scale time sequence used for training is solved, but also the time series variables can be predicted more finely. And the number of parameters that LSTM needs to control is four times of RNN. Figure 3 shows the LSTM model when multiple memory blocks are combined. Figure 3 also reflects the iterative steps of each door, and the mathematical expression of each step is shown as follows: (1) Set x t as the input vector to the memory block at time t 3 International Journal of Aerospace Engineering (6) Set f t and c t as the output value of the forget gate and the value of the self-circulating neuron state, respectively, at time t. Both of them are calculated by the following formula:    International Journal of Aerospace Engineering respectively, at time t. Both of them are calculated by the following formula: (8) In the above formula, σð·Þ, gð·Þ, and hð·Þ, respectively, represent the standard logistic sigmoid function with range ½0, 1, ½-2, 2, and ½-1, 1, Both of them are defined in equations (9)- (11): A modified version of Real-Time Recurrent Learning (RTRL) that is optimized by the gradient descent method and truncated Back Propagation Through Time (BPTT) are selected as the basic theory to train LSTM RNN [34]. The objective function of training is to minimize the mean square error. The training error does not directly affect the output of the memory block but enters the linear CEC of the memory block and allows the error to pass back. In this way, the LSTM neural network has the ability to handle the large-scale time series. Before training, it is necessary to determine the number of hidden layers and the number of hidden layer units and which part of the historical data is used for training and testing. Since there is no clear theoretical basis for the selection of the number of hidden layer and the number of hidden layer units [25,26], the single-layer LSTM is used in subsequent experiments and GA is selected to find the best number of hidden layer units. Initial learning rate, batch size, and number of interactions are 0.005, 125 and 1000, respectively. Each result of experiments converges to the optimal solution.

Construction of QPLSTM Used for Satellite Clock Offset Prediction
The current BDS constellation is composed of GEO, IGSO, and MEO, and the orbit type of each satellite and the atomic clock mounted on it are shown in Table 2.
The various orbit types and star clock characteristics make the traditional polynomial combined periodic term modeling method not able to achieve better forecast results for all satellites [35,36]. The method proposed in this paper models all the satellites that can obtain observations except the satellite under testing, and the forecasting performance has been significantly improved compared with conventional methods.
In order to better understand the method proposed in this article, this chapter will first introduce the on-board characteristics of the satellite clock and then introduce the new model QPLSTM that combines the traditional method with LSTM in this article.
Assume that the satellite clock offsets is CxðtÞ, and it can be decomposed into system variation components, periodic variation components, and random variation components [37,38], written in the following form: The first three items of the satellite clock offsets are caused by the characteristics of the satellite clock itself. a 0 is the initial phase, b 0 is the frequency deviation, and c 0 is the frequency drift. The fourth item is the periodic variation component caused by the movement of the satellite on-board, where n p is the number of periodic variation components contained in the clock offset sequence, cs i is the cosine coefficient of the i th period term, ss i is the sine coefficient of the ith period term, and μ is the satellite orbit angle calculated from midnight. The fifth term is the noise caused by environmental factors, which is a random variation component.
In many cases, the traditional polynomial model is proved to be a reliable method for BDS clock offset modeling [23]. Differing from using a longer time series to analyze the periodic term and the noise term in random variation component, the method proposed in paper takes the periodic term and the noise term as the feature to construct the LSTM model.
When using LSTM to predict in practice, it is very important to determine the number of input and output layers and the number of hidden layers in the network, but there is no theory that can be followed. It is usually necessary to ensure that the vector scale of the input layer and the vector scale of the output layer are the same. And then, the biggest difficulty lies in the determination of the number of hidden layer units in LSTM modeling. Before determining the number of hidden layer units, set other parameters of LSTM for satellite clock offset prediction, mainly including the selection of the loss function, training method, and the establishment of initial parameters. The training method can be expressed as follows: (1) Selection of the loss function for LSTM training. The selection of LSTM loss function in different scenarios is different [39]. Taking into account the complexity of the clock error model, we use the mean square error loss function. This function is suitable in this scenario and is often used in nonlinear time series forecasting, which is defined as L = ∑ i ðkô i − o i k/2Þ, whereô i represents the predicted value of the output layer (2) Selection of the training algorithm. Truncated Back Propagation Through Time (BPTT) based on 5 International Journal of Aerospace Engineering gradient descent has great advantages such as rigorous derivation, solid theoretical foundation, and strong versatility and is usually used as the main learning algorithm [34,40]. Therefore, we also use this training method to train the LSTM model for satellite clock offset prediction. Since we do not have to worry about gradient disappearance or explosion caused by gradient descent, we only need to set the number of learning iterations, the size of learning round, and dropping rate. Then, the LSTM is trained as follows: (a) Initializing parameters of network. The method of adaptive moment estimation is used to adjust the learning rate. Set the initial learning rate η, learning round size N, dropping iteration n, and the dropping rate lrate well; after each n rounds of adaptive learning, the next round of learning rate is η ði+1Þ = η ðiÞ · lrate, where i ∈ ½1, d N/ne (b) Setting LSTM as a single memory block network. Setting the transfer weights of each layer to W i , Self-circulating neuron: Forget gate: Output gate: And Before the end of the Nth iteration of training, the learning rate is constantly changing, and the gradient and weight are continuously updated according to the above formula.
(3) Optimization of the model parameters. GA is a global optimization algorithm, which follows a certain fitness function and uses selection, crossover and mutation genetic operators based on multipoint random parallel search to effectively prevent parameters from converging to local optimal solutions [41,42]. Therefore, it is selected to optimize the number of hidden layer units in LSTM. The fitness function of the GA is set asf fitness = 1/RMS, where RMS represent the minimum mean square error between the LSTM's forecast result and the actual value, which is defined as follows: In above, K represents the number of total targets that need to be predicted. Cx i represent the actual observation. c Cx i represent the prediction by LSTM.
By training the above LSTM model, the satellite clock offset prediction model we built can be written in the following form: The flow of the proposed model for forecasting is shown in Figure 4. Firstly, we use data preprocessing method to remove gross errors in the original data, and the specific methods and effects will be introduced in the next section. Obtain the system variation components of the satellite clock through the QP model, then standardize the residuals after the QP model is fitted, and divide them into training data and test data. Under a fixed number of hidden layer units, train the LSTM.
The prediction and the test data are used to calculate the RMS together to help the GA find the optimal number of hidden layer units and obtain the optimal residual fitting model. Finally, by combining the residual fitting model and the QP model, QPLSTM is constructed.
The machine learning method is to use prior data to learn the complex nonlinear relationship between the past and the future. In practical applications, if we need to predict Tl period of data in the future, there are Nl periods of prior data. In order to make better use of the selectivity of LSTM for time series features, we use all Nl − Tl data for training and use GA to find the best modified model and save it. Then, run the trained model online to provide real-time QP model correction items, and during online operation, there is no need to self-train each data set. When the model is inaccurate, change back to the traditional prediction method, and at the same time, collect new data and train a new LSTM model to construct QPLSTM.
In this section, QPLSTM is constructed and the entire model algorithm flow can be as shown in Figure 4. The data preprocessing method and the performance of it would be shown at the beginning of Section 4, and then, the experiments and evaluation for QPLSTM are taken through the processed data.

Data Preprocessing.
During the operation of the atomic clock on-board, changes in the external environment and abnormal hardware signal output of the atomic clock can cause abnormal conditions such as gross errors and data discontinuities in the satellite time-frequency sequence.
Therefore, before establishing the satellite clock offset prediction model, it is necessary to ensure that the observed time-frequency characteristic sequence of the on-board atomic clock is clean and stable [42,43]. The combined gross error detection method is taken to simultaneously deal with the gross errors contained in the phase and frequency [44].
The method is as follows. Firstly, process the phase sequence of the satellite clock offset observations through 3-sigma rules.
Define the step bias of residual as where Cx represents the satellite clock offset phase sequence 7 International Journal of Aerospace Engineering obtained by observation; υ represent the quadratic polynomial fitting residual vector of the clock offset phase sequence; and n represent the number of epochs of phase data. If the j th phase residual υ j satisfies jυ j − meanðυÞj > 3δ, we regard it as a gross error, where meanðυÞ means the mean value of υ.
After the phase data is processed, a well-known gross detection method, Median Absolute Deviation (MAD) method, is applied to process the frequency data of the satellite clock offset observations. The MAD method can be expressed as follows: where M is the threshold of the gross error detection model; u is the number of epochs of the frequency data; and Δt is the sampling interval of the phase data. m is the median of the frequency data. When the kth epoch value in the frequency data satisfies jb k u j > ðm + r * MÞ, it is the gross error (r = 3 in this model).
In order to verify the effectiveness of the above method, we select the precise clock offset product of the C36 satellite on October 15, 2020, as a representative to show the effect after data preprocessing. Both the raw data and the processed data are shown in Figure 5.
In Figure 5, the gross error is obviously eliminated through the combined gross error detection method.
The following data used in experiments are all preprocessed, and the selected data is continuous with nonjumping in the period of the day. Unless otherwise specified, we regard the 30-second sampling interval precision clock offset product provided by IGS as the most accurate value of the clock offset.
LP, QP, GM (1,1), and ARIMA models are also used as comparisons to QPLSTM in following experiments. The root mean squared error (RMS) and mean absolute error (MAE) are used to evaluate the performance of prediction model calculated through the prediction result of each model and the real data which is selected as training data. The mathematical expressions are where Ypre i represents the ith output data of the prediction method. Yreal i represents the target testing data.

Experiment One.
We selected the ultrarapid clock offset product with a sampling interval of 15 min provided by the Wuhan University GNSS monitoring station updated at 03:00 on October 15, 2020, for the experiment and used 12 hours of ultrarapid observation data as a priori data to train the LSTM model, then predict the subsequent 6 hours of clock offset data, and compare it with IGU-P product.
As the current ultrarapid clock offset products only have BDS-2 satellites, we give the 6 h prediction residual curve of   International Journal of Aerospace Engineering QPLSTM and IGU-P as the following for each type of condition, which is calculated by prediction result and observation value. Experiments were conducted on all satellites that can obtain in the ultrarapid clock offset products during the day, and they are divided into three categories, namely, BDS2-MEO-Rb, BDS2-GEO-Rb, and BDS2-IGSO-Rb. The simulation result is shown in Figures 6-8. Different colored curves represent satellites of different PRN numbers. The curve marked with stars represents the IGU-P product prediction errors. The curve marked with a plus sign represents the QPLSTM prediction errors. It can be seen that the clock offset value predicted by QPLSTM is closer to the actual clock offset observation value than the IGU-P product. When the external complex environment has a small impact on the clock offset, the QPLSTM method will only slightly improve compared to the IGU-P method, such as PRN16 and PRN14.

Experiment Two.
In order to further highlight the superiority of the QPLSTM method, the precise clock offset products provided by IGS on October 15, 2020, is used to conduct experiments and uses the data from the first 5 hours (600 epochs, 0:00~5:00) of the day to predict the clock offset in the next 30 minutes and 60 minutes. And compare QPLSTM with the other four traditional methods. The real precise data is selected as the testing data to verify the accuracy of the output in each prediction model. We select (C12), (C01, C03), (C08, C09, and C16), (C19, C20, C37, and C45), (C27, C28, C34, and C42), and (C40) satellites as representations of BDS clock in different scenarios.  International Journal of Aerospace Engineering Figure 9 represents the prediction accuracy of the clock offset for the selected 15 BDS satellites of the five models for two durations. It can be seen that each of LP, QP, GM (1,1), and ARIMA model cannot make the RMS lower than 0.3 ns in 1 h forecasting for every type of satellites.
As Figure 9 shows, the prediction accuracy for the QPLSTM model is better than that of the other four models in the two prediction durations for the 15 satellites. In par-ticular, for C01, QPLSTM exhibits high prediction accuracy in both two durations. Tables 3 and 4 calculate the average prediction RMS and MAE of each category of satellite on October 15 using the first 5-hour (600 epochs, 0:00~5:00) data to forecast the next 30 min and 1 h. In the table, "All" represents the average prediction accuracy of all satellites observed.

10
International Journal of Aerospace Engineering In Tables 5 and 6, the averaged prediction RMS and MAE are calculated during every hour on October 15, and five-hour previous data are used for training and testing the next 30 min and 60 min prediction result (e.g., data in 19:00~24:00 on October 14 is used for training. Data in 0:00~1:00 on October 15 is used for testing). Every single hour is selected to test and take the average in 24 h for different category clocks. "All" represents the average prediction accuracy for 24 single hour of all satellites observed.
As can be seen from Tables 3 and 4 In Tables 5 and 6, we can see that QPLSTM is stable for any hour of the day. Compared with the other four models, the average prediction accuracy of 30 min and 60 min forecasts is improved by 79.6, 69.2, 80.4, and 77.1% and 68.3, 52.7, 66.5, and 69.8% in RMS and 83.1, 73.0, 83.8, and 75.6% and 73.8, 58.8, 74.5, and 74.6% in MAE. It shows that the forecasting ability of the QPLSTM method is not accidental. The performance of QPLSTM can actually achieve a high precise short-term prediction than the other four models in each category of BDS satellite clock.

Experiment Three.
In order to further explore the versatility and flexibility of the proposed method, the precise clock offset data provided by IGS on November 9, 2020, is used for this part of experiments, taking the data of the first 5 hours and the first 10 hours of the day to predict the clock offset in the same next 1 hour (120 epochs, 10:00~11:00). And compare QPLSTM method with the other four traditional methods. Figure 10 represents the prediction accuracy of the satellite clock offset for the selected 15 BDS satellites of the five models. It can be seen that the data of different days has a significant impact on the accuracy of the clock offset prediction for the LP, QP, GM (1,1), and ARIMA models and the accuracy of most satellites is less than 0.3 ns. But for the QPLSTM model, it performs stalely and better than the other four models.
It can be seen from Figure 10 that the modeling effect of the QPLSTM model is better than the other four models even the scale of prior data used for training differs. For the C34 satellite, it can be clearly seen that with the increase of prior data, the prediction accuracy of the

11
International Journal of Aerospace Engineering four models is significantly reduced, but the prediction accuracy of the QPLSTM model is improved. For the C27 and C19 satellite, the prediction accuracy of LP and GM (1,1) models is improved with the increase of prior data. For the C16 satellite, the prediction performance of the five models has decreased even with the increase of prior data.
In order to show prediction accuracy of the five models and the influence of the increased prior data more accurately, Tables 7 and 8 list the average prediction RMS and MAE for each category satellites using 5 h or 10 h prior data to forecast the same one hour. Every one hour of the day on November 9, 2020, is included to test and then calculate the averaged RMS (e.g., data in 19:00~24:00 and data in  As can be seen from Tables 7 and 8, with the increase of prior data, the average prediction accuracy of QPLSTM on different clock categories of satellites improves, while the prediction accuracy of the other four models does not show the same trend. For the category of BDS2-MEO-Rb, the prediction accuracy of the LP, GM (1,1) and ARIMA models performs stably and is not changed much. The prediction accuracy of the QP model decreases significantly, while the QPLSTM is improved significantly. For the categories of BDS2-GEO-Rb and BDS2-IGSO-Rb, the prediction accuracy of LP and GM (1,1) decreases significantly, while the performance of the other three models is stable. For the category of BD3-MEO-Rb, the prediction accuracy of LP and GM (1,1) decreases significantly, while that of QP, ARIMA, and QPLSTM improves. For the category of BD3-MEO-H, the prediction accuracy of QP and ARIMA models decreases, while the performance of LP and GM (1,1) is relatively stable, and the prediction accuracy of the QPLSTM model improves. For the category of BD3-IGSO-H, the prediction accuracy of LP, GM (1,1) and ARIMA models decreases, while the accuracy of QP and QPLSTM improves.
Finally, the summarization about short-term clock offset prediction performance of the five different models is made as follows: (1) For LP and QP models, noise is the main factor that causes the prediction accuracy of the model to vary with the prior data. In addition, the performance of the LP model and the GM (1,1) model is basically similar, and in some cases, they are better than the QP model. At the same time, with the increase of   prior data, the performance of both decreases in different degrees. And the prediction performance of these two models is also greatly affected by the satellite's environment, which limits the wide application of these two clock offset prediction models in BDS. The prediction accuracy of the ARIMA model is greatly affected by its AR and MA orders. If it is put into practice, it is necessary to constantly search for its optimal order while satellite changes or using different input data and frequently changing the model. Thus, in a range of certain order, it preforms unevenly  (2) Compared with the other four models, the QPLSTM model is suitable for every category clock, and the change of clock on-board environment will not have a significant impact on its ability of prediction. QPLSTM can further optimize its modeling accuracy with the improvement of prior data, and its modeling accuracy can reach below 0.2 ns. Compared with the clock offset prediction products provided by IGU-P, QPLSTM is more effective. And LSTM is introduced into the model as a correction item, making its application scenarios more flexible

Conclusions
Aimed at improving the performance of the clock offset prediction method for BDS, this paper proposes a highprecision short-term clock offset prediction method, which is based on LSTM considering the characteristics of equipped atomic clock, named QPLSTM. In order to eliminate the outliers in the clock offset data and improve the LSTM modeling performance, the combined gross error detection method is introduced to preprocess the original data and the GA optimization method is used to optimize the number of LSTM hidden layer units. Experiments have shown that the QPLSTM method is better than the IGU-P solution, whose prediction accuracy of 30 min and 60 min is improved by approximately 79.6, 69.2, 80.4, and 77.1% and 68.3, 52.7, 66.5, and 69.8% in RMS, compared with the four conventional models. And QPLSTM is more versatile and stable. In addition, this modeling method that combines physical properties and neural networks can also provide references for other fields.
Due to the limitation of the LSTM theory, although the LSTM network itself has many variants and the ways of applying, the amount of calculation will increase significantly when increasing the prior data scale to pursue high prediction performance. The gradient problem of RNN has been solved to a certain extent by LSTM, which does not mean that the gradient will not disappear or explode when a much larger-scale time series is used as the input for network training. What needs to be done next for using LSTM construct a satellite clock offset prediction model is to find the way that not only maintains a high-precision prediction but also avoids the increased computational burden and the gradient problem caused by deep learning structure.

Data Availability
The experimental data in the manuscript are all public data and can be downloaded from the analysis center of Wuhan University website (http://www.igs.gnsswhu.cn/).

Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this paper.