Improved Generalized Predictive Control for High-Speed Train Network Systems Based on EMD-AQPSO-LS-SVM Time Delay Prediction Model

Various control signals of high-speed trains (HSTs) are transmitted through the train communication network. However, the time delay generated during the transmission will cause a significant threat to the stability and safe operation of the train. To overcome the effect of time delay on the train control system, based on empirical mode decomposition (EMD) and adaptive quantum particle swarm optimization (AQPSO) algorithms, a least squares support vector machine (LS-SVM) time delay prediction model is proposed in this paper. ,e EMD algorithm is used to decompose the time delay sequence into several subsequences, which emphasizes the different local characteristics of the time delay sequence. By improving the calculationmethod about the successful value of particle iteration, an AQPSO algorithm with adaptive contraction-expansion coefficient is designed to optimize the parameters of different LS-SVM models for predicting each time delay component, which improves the prediction accuracy of network delay. Further, based on actor-critic reinforcement learning algorithm, an improved generalized predictive control method is proposed for the train network system. ,e actor-critic network is used to predict the future output of the system, and the recursive least squares identification algorithm with the variable forgetting factor is adopted to identify the future system model parameters. Combined with the time delay predicted accurately, the control quantity is sent in advance according to the properly arranged time series, which compensates efficiently the influence of the time delay on the control system. Simulation results show that compared with other control methods, the proposed method has better robustness and stability, which ensures the safe operation of high-speed trains under various working conditions.


Introduction
At present, HSTs and urban track vehicles all use the train communication network (TCN) to realize train control and fault diagnosis [1]. All kinds of control signals are transmitted to the corresponding actuator through the wired train bus and the multifunction vehicle bus (MVB). However, the time delay caused by various reasons in the process of information transmission will seriously affect the safety and stability of the train control system [2]. In addition to the end-to-end time delay in the TCN, there is also the time delay generated by signal processing and control logic judgment, etc. If the time delay is too long, it will greatly affect the stability of the control system [3]. e network environment of HSTs is complex, and key systems such as traction and braking have obvious nonlinear characteristics [4]. In order to suppress the adverse effect of time delay on the control performance, it is necessary to test and study the real network characteristics of TCN, which realizes real-time and stable control according to the actual nonlinear characteristics of the train key control system. In recent years, some scholars have studied the scheduling algorithm of train network and the time delay problem of composite Ethernet [5,6], but there are still few reports on the time delay control of TCN.
Controller design and time delay prediction are two main problems to be solved in network-controlled systems. Based on controller design, researchers have closely combined network control with sliding mode control [7], neural network control [8], H ∞ control [9], and other related theories to conduct extensive research, which provides a variety of solutions for networked control of nonlinear systems. Li et al. [7] studied the tracking control problem of networked control systems of intelligent vehicle with external disturbance and network-induced delay, and a highorder sliding mode controller was designed to reduce the effect of external disturbance, and a state observer was used to compensate the time delay disturbance in the network. Xu et al. [8] proposed a robust adaptive neural network control method to compensate the uncertainty and network delay disturbance of the system, which solves the remote-control problem of ship course with uncertain time delay. Chen et al. [9] designed an H ∞ sampling controller by constructing the Lyapunov-Krasovskii functional with delay, which realizes the networked control of asynchronous traction motor.
For the time delay prediction, some scholars use the regression model prediction method to accurately model the time delay sequence samples [10,11]. However, the process of solving the model parameters is too complex, so it is not suitable for the case with a large range of network delay fluctuation. With the introduction of compensation devices in the feedback loop, Smith predictor can eliminate the adverse effect of network delay on the control system [12,13]. But in practical application, when the parameters of the controlled system are unstable or disturbed, the Smith prediction model will get out of control, which leads to the control effect getting worse and even oscillating. Due to its strong nonlinear identification ability and fast operation speed, neural network can use the past data to predict the future state of the system, which realizes the time delay prediction and compensation [14,15]. Nevertheless, the neural network prediction method is easy to fall into local extremum and relies too much on the autocorrelation coefficient of the input time delay sequence. Support vector machine (SVM) has unique advantages in dealing with nonlinear, small sample, and high-dimensional spatial recognition problems and has stronger generalization ability than neural network, which is suitable for network delay prediction with strong nonlinear characteristics [16]. Suykens and Vandewalle [17] proposed the LS-SVM algorithm to make up for the disadvantages of SVM, such as long computing time and large computing amount. Under the condition of equality constraint, the problem of convex quadratic programming in SVM was transformed into solving systems of linear equations, which greatly reduces the training time of the model. Considering the uncertainty and nonlinearity of the time delay, Tian et al. [18] introduced EMD into the LS-SVM time delay prediction model, and the time delay sequence was decomposed into several eigenmode functions for classification prediction, which reduces the modelling complexity. Since the kernel function parameters of LS-SVM have a great impact on the learning and generalization ability of the model and it is difficult to determine them uniformly, Tian et al. [19] used genetic algorithm for offline optimization of LS-SVM kernel function parameters, which effectively improves the prediction accuracy of time delay.
However, the network application environment and nonlinear controlled object of the above method are completely different from the train network control system, so it is difficult to be applied to the high-speed train network system with high real-time performance requirements. As a model-based advanced control method, generalized predictive control (GPC) has the advantages of predictive model, control optimization, cyclic rolling, keeping output variable stable, etc., which is widely used for tracking control of complex nonlinear systems such as HSTs [20], spacecraft [21], and underwater robots [22]. Li and Yan [20] designed a GPC fast algorithm based on extreme learning machine for HSTs to achieve the speed tracking control, and the extreme learning machine was used to study the parameter mapping relationship between system model and controller, which greatly reduces the computational burden of the algorithm. Chen et al. [21] proposed a GPC method with extended state observer to solve the tracking control problem of the spacecraft attitude, and according to the hyperbolic tangent function, the extended state observer was designed to estimate and compensate the uncertainties and unknown disturbances of the system, which achieves high-precision tracking of spacecraft attitude. Zhu et al. [22] designed an improved generalized predictive control (IGPC) method for the motion control of underwater robots, and the incremental proportion integration differentiation (PID) algorithm was used to optimize the generalized predictive controller in the initial stage, which improves the stability of the system.
In this paper, an IGPC method based on actor-critic reinforcement learning algorithm is proposed for the train key nonlinear network control system, and the EMD-AQPSO-LS-SVM time delay prediction model is introduced into the IGPC method for reducing the impact of time delay on control effect. e EMD algorithm is used to decompose the original time delay sequences into several intrinsic mode functions (IMFs), and by improving the particle iterative success value calculation method, an AQPSO algorithm with dynamic contraction-expansion coefficient is designed to optimize the parameters of different LS-SVM models, which improves the predictive accuracy of time delay component. Using the actor-critic reinforcement learning algorithm and the recursive least squares (RLS) method with variable forgetting factor to predict and identify the future parameters of the IGPC, respectively, the predictive controllers are designed for each real-time linear system. Combined with the accurate forward time delay prediction results, the output sequence of control signal is adjusted reasonably to compensate the influence of network delay on control performance. e effectiveness of the proposed method is verified by simulation experiments on TCN platform. e rest of this paper is organized as follows. In Section 2, the high-speed train network control system model is introduced. In Section 3, the LS-SVM time delay prediction model is designed based on EMD and AQPSO algorithms, and the AQPSO algorithm with dynamic contraction-expansion coefficient is proposed to optimize the LS-SVM model parameters. In Section 4, the IGPC strategy for HSTs is designed based on reinforcement learning algorithm. Section 5 analyses and compares the performance of different time delay prediction models and verifies the realtime performance and effectiveness of the proposed method. Section 6 discusses the advantages and limitations of the proposed method and the future research direction. Section 7 concludes this paper.

High-Speed Train Network Control System
e general train network control system model can be described as follows: (1) where y(k) is the speed of train, χ is the acceleration coefficient, u(k − τ ca ) is the unit control force of train, τ ca � (τ ca /T), τ ca is the forward channel time delay, T is the sampling period, d(k) is the unit additional resistance caused by the complicated operating environment such as wind, tunnel, and curve, and f 0 (y(k)) is the unit general resistance of train, which can be described as [3] where α 0 (k) is the rolling mechanical resistance coefficient, α 1 (k) is the other mechanical resistance coefficient, and α 2 (k) is the external air resistance coefficient.
Combining (1) and (2), the system model can be rewritten as where α 0 (k), α 1 (k), and α 2 (k) with high uncertainty change constantly with the change of operating condition, which makes the train traction control system have obvious multiple working conditions and nonlinear characteristics [3]. In practical application, the high-speed train network system is composed of an automatic train operation (ATO) system, traction control system, and sensors. e network simulation system constructed in this paper is composed of two central control units (CCUs) and a human machine interface (HMI). CCU1 simulates the controller of the ATO system, CCU2 simulates the actuator of the traction control system, and HMI simulates the sensor. Each device communicates on the MVB network with process data [23]. e network system structure is shown in Figure 1. Figure 1 shows that CCU1 simulates ATO function and sends control signal to CCU2 for realizing nonlinear control of the train traction system. CCU2 simulates the execution process of traction and braking and sends the completed information to HMI. HMI can measure the output signal of the system and send feedback to CCU1 to realize the whole network control process. In the communication process, the transmission time delay of MVB network includes two parts: forward channel time delay τ ca and feedback channel time delay τ sc . Due to the large number of nodes and ports, irregular flow changes, uneven network load distribution, and other reasons, the information in the TCN is always in the dynamic and uncertain time-varying environment, which makes the analysis and design of the train network control system more complex and difficult [24]. With the improvement of the performance requirements of HSTs, it is necessary to obtain higher frequency system information, which will further increase the transmission delay of control signals. erefore, there is considerable significance to adopt appropriate control methods to compensate and control the time delay generated in the TCN. e premise of controlling time delay is to master the characteristics of time delay transmission, which establishes the description or prediction model.

LS-SVM Time Delay Prediction Model Based on EMD and AQPSO Algorithms
Considering the randomness and nonlinearity of train network delay, the EMD algorithm is used to decompose the time delay sequence into several time delay components, which highlights the local characteristic signals with different time scales of the original data so that the modelling complexity is reduced. Meanwhile, the LS-SVM algorithm is used to build prediction models of different time delay components, and the predicted values are combined and superposed to obtain the final prediction results. For the drawbacks of LS-SVM algorithm in which parameters are difficult to be determined, an AQPSO algorithm with adaptive contraction-expansion coefficient is proposed to optimize model parameters offline, which effectively improves the prediction accuracy of network delay.  suitable for the analysis and processing of nonlinear and nonstationary time series [25]. In addition, the IMF has to meet the following two conditions: (1) for a column of data, the number of extreme points and zero crossings must be equal or at most slightly different; (2) at any point, the average value of the envelope formed by local maxima and minima is zero [26]. For the forward channel delay sequence τ ca (t), the EMD algorithm is presented in Algorithm 1.

Time Delay Sequence
Modelling Based on LS-SVM Algorithm. As a novel SVM algorithm, LS-SVM can transform the inequality constraint problem into the solution problem of linear matrix and has the advantages of simple operation, fast training speed, and strong generalization ability, which is widely used in the prediction modelling of complex data sequences [27]. erefore, we adopt the LS-SVM algorithm to conduct training modelling for time-delay subsequences processed by EMD. For a given training set U � (x i , y i )|x i ∈ R n , y i ∈ R, i � 1, 2, . . . , l , the LS-SVM model can be built as where x is the input vector, y is the output vector, w is the weight vector, b is the offset vector, φ(x) is the nonlinear mapping function, and φ(x) can be used to map the input space to the high-dimensional feature space, which makes the nonlinear fitting problem in the input space become the linear fitting problem in the high-dimensional feature space. According to the criteria of structural risk minimization, the objective function can be described as [17] min, J( w, e ) � 1 2 where e i is the error of estimation and c is the regularization parameter indicating how much to penalize the error function. To solve the above optimization problem, Lagrange multiplier is introduced into (5) to obtain where λ i is the Lagrange multiplier; taking partial derivatives of w, e, λ, and b, respectively, the following equations can be obtained: Eliminating w and e, the optimization problem is converted to solving the following linear equations: where According to the given time delay data set U, using (8) to solve λ and b, the LS-SVM time delay prediction model can be obtained: where K(x, x i ) selects radial basis function (RBF) as the kernel function: where x i is the centre vector and σ is the Gaussian kernel width. It can be seen from the above modelling process that λ and σ determine the learning accuracy and generalization ability of the LS-SVM model. If λ is too small and σ is too large, the penalty degree of the error function will decrease and the kernel function will tend to 1, resulting in underfitting of the model. On the contrary, λ is too large and σ is too small. As a result, the generalization ability of the model becomes worse and the kernel function tends to 0, which makes the model overfit the samples. us, we propose an AQPSO algorithm to optimize λ and σ of the LS-SVM model, which not only improves the prediction accuracy but also enhances the generalization ability of the model for samples with different time delays.

Parameter Optimization Based on AQPSO Algorithms.
As an extension of particle swarm optimization (PSO) in quantum space, quantum particle swarm optimization (QPSO) algorithm has the advantages of few control parameters, fast convergence speed, strong search ability, etc., which is suitable for the processing of complex optimization problems [28]. Based on the δ quantum well, QPSO algorithm can obtain the probability density function of particles at a certain point by solving the Schrodinger equation, and it uses the Monte Carlo method to obtain the evolution equation of particles [29]: is the centre random position with δ quantum well of the i th particle in the t th iteration, pbest (t) i is the best position of the i th particle in the t th iteration, gbest (t) is the best position of the population in the t th iteration, x (t+1) i is the position of the i th particle in the t + 1 th iteration, β and μ are uniformly distributed random numbers in the interval [0, 1], and L (t) i is the δ quantum well characteristic length of the i th particle in the t th iteration, which is defined as where N is the size of the particle population and α is the contraction-expansion coefficient, which determines the convergence performance of the Algorithm. A larger α value is conducive to improve the global searching ability of particles and prevent local convergence, and a smaller α value is beneficial to enhance the local searching ability of particles and improve the convergence accuracy. It is noted that α < 1.781 is the convergence condition of QPSO algorithm; otherwise, it will lead to divergence of algorithm [29].
On the basis of ensuring algorithm convergence, how to control the α value is the key to improve algorithm performance and efficiency. At present, discussions and studies on this coefficient in QPSO algorithm mainly focus on fixed value [29], linear decline [30], and nonlinear decline [31]. Compared with the fixed value, the dynamic decline method can improve the global search performance of the algorithm by taking a larger α value in the early stage of iteration, while taking a smaller α value in the late stage of iteration to enhance the local search performance of the algorithm. However, these dynamic decline methods only take into account the value of contraction-expansion coefficient changing with the number of iterations, which obviously cannot fully reflect the state changes in the actual evolution process of particles and cannot handle complex and nonlinear optimization problems well to some extent. In consideration of the limitations of the above parameter Initialization Determine all local extreme points of the original signal τ ca (t). Procedure (1) repeat (2) Obtain the upper envelopment e 1 and lower envelopment f 1 by means of cubic spline fitting of all maximum and minimum points (3) Calculate the average envelopment as (12) Obtain e i and f i by means of cubic spline fitting of all maximum and minimum points Mathematical Problems in Engineering evaluation methods, we introduce the concept of success rate of particle swarm iteration [32,33]. By improving the calculation method of success value of particle iteration in literature [33], we propose an adaptive contraction-expansion coefficient evaluation strategy that comprehensively reflects the changes of particle position state.
where α max and α min are the maximum value and minimum value of the contraction-expansion coefficient, respectively, and P s (t) is the success rate of population in the t th iteration, which can be described as follows: where S i (t) is the success value of the i th particle in the t th iteration, which is expressed as in literature [33]: is the fitness of the optimal position of the i th particle in the t th iteration and f(gbest (t) ) is the fitness of the global optimal position of the population in the t th iteration. From (15), it can be seen that S i (t) value greater than 0 represents the success of optimization; otherwise, it represents the failure of optimization [33]. However, the above method is only a simple piecewise constant function, which ignores the comparison of the state changes between the optimal position of the particle and the optimal position of the population during the evolution process. As a result, the calculation of P s (t) and α(t) is not accurate enough, which makes the algorithm unable to effectively balance the global and local search capabilities and easy to fall into the local optimal solution. In view of the inaccuracy of the above calculation method about S i (t), we make a more detailed evaluation of the particle position and state, which improves the calculation method again as follows: From (15), it can be seen that when the particle iteration is successful, the closer its optimal position is to the global optimal solution, the closer f( pbest (t) i ) is to f( gbest (t) ), which makes S i (t) larger. On the contrary, the optimal position of particles is far from the global optimal solution, and there is a big gap between f( pbest (t) i ) and f( gbest (t) ), which makes S i (t) smaller. erefore, the calculation accuracy of P s (t) and α(t) is obviously improved after considering the change of particle position state so that an AQPSO algorithm that can dynamically balance global and local search capabilities can be obtained.
During the algorithm iteration, larger P s (t) indicates that the distance between the position of population and the global optimal solution is far, so α(t) should be improved to enhance the global searching ability of the algorithm, which improves the diversity of population. Smaller P s (t) indicates that the position of population is close to the global optimal solution, and α(t) should be reduced to enhance the local search ability of the algorithm, which ensures the convergence accuracy of the optimization algorithm. To optimize the regularization parameters and kernel function width of the LS-SVM model, the AQPSO algorithm is proposed in Algorithm 2. e fitness function to evaluate the candidate solution of the algorithm is constructed as follows: where l is the total number of samples, y * j is the actual value of the j th sample, and y j (c, σ) is the predicted value of the j th sample.

LS-SVM Time Delay Prediction Model Based on EMD and AQPSO Algorithms.
Assume that the TCN time delay sequence τ ca (t) can be expressed as the following time delay sequence after EMD processing: where l is the sequence length. Sorting and transforming the time delay sequence τ, the input and output training sets of the LS-SVM model can be obtained: where X is the input training set, Y is the output training set, and m is the embedded dimension. According to (19) and (20), the LS-SVM model can use the time delay data of the previous m moment to predict the time delay data of the next moment, and the LS-SVM time delay prediction model based on EMD and AQPSO algorithms is shown in Figure 2. e establishment process of hybrid delay prediction model in Figure 2 is as follows. First, the TCN time delay sequence is decomposed into several IMFs and a remain (R) by EMD algorithm. Second, all IMFs and R data are normalized, and the input and output training sets of each LS-SVM model are generated by using (19) and (20). In the next step, AQPSO algorithm is used to optimize kernel function parameters of different LS-SVM models offline and establish prediction models of each time delay component. Finally, the prediction results of each time delay component are unnormalized, and the final prediction results are obtained by summing the equal weights.

IGPC Strategy for HSTs Based on Reinforcement Learning Algorithm
Considering that HSTs are disturbed by many factors such as operating conditions, line ramps, and curve resistance, the model parameters are highly uncertain, which has a great impact on the control effect. us, the actor-critic neural network in reinforcement learning is used to approximate the actual operating conditions of the train and predict the actual output of the train traction system in the future. Meanwhile, the LS-SVM hybrid time delay prediction model is adopted to predict the forward channel time delay to overcome its influence on the control performance. In combination with the prediction results, the RLS algorithm with variable forgetting factor is used to identify the timevarying model parameters, which improves the IGPC law for achieving the smooth control parameter mutation situation and time delay compensation effectively in real time. e nonlinear network control structure is shown in Figure 3. As shown in Figure 3, the ATO system sends the calculated control quantity u(k − τ ca ) and forward timestamp T ca into a process data packet to the train traction control system through the MVB network, and the actuators in the traction control system execute the latest control law and record the actual forward channel time delay τ ca , and the sensor node periodically acquires the output y(k) according Initialization Set the size of population N, the maximum number of iterations T 0 , the tolerable error lr, and the random positions of the particles Calculate the fitness value of i th particle by (17) ( (10) Calculate the iteration success value of each particle S i (t) by (16) (11) end for (12) Calculate the success rate of population P s (t) by (14) (13) Calculate the contraction-expansion coefficient α(t) by (13) (14) for i � 1: N do (15) Calculate the δ quantum well characteristic length of each particle L (t) i by (12)  (16)

Mathematical Problems in Engineering
to the set sampling time and sends y(k), feedback timestamp T sc , and τ ca into a process data packet to the ATO system through the MVB network. e ATO system can analyse the actual output y(k) combined with the speed and control quantity at the previous moment, and the actor-critic neural network is used to predict the future output y(k) of the system. Furthermore, combined with the RLS identification and IGPC algorithm, the appropriate control quantity u(k) is obtained in advance. Meanwhile, with the prediction results of LS-SVM hybrid time delay prediction model, the time series are arranged reasonably to send out the appropriate u(k − τ ca ) in advance, which compensates the influence of τ ca on the control performance. In addition, due to the inconsistent clock of each subdevice, it is also necessary to perform clock correction while receiving the timestamp.

Actor-Critic Neural Network Multistep Prediction.
Neural network has become a common method to deal with nonlinear systems, due to its good fitting characteristics. e control method based on neural network can effectively solve the problems caused by the uncertainty and nonlinearity of the system [34]. Based on the neural network technology, reinforcement learning is a control method with stronger learning ability and higher robustness. It emphasizes that agents modify their action strategies through the return of the environment after each action in the interaction with the external environment so as to achieve optimization decision, which has been widely applied in artificial intelligence and intelligent control [35,36]. In reinforcement learning algorithm based on the actor-critic structure, the actor network is responsible for learning

Data inverse normalization
Data inverse normalization Data inverse normalization ...

Summation
Prediction results   optimal decisions so that the agent can choose the best one for different environmental states, and the critic network is responsible for fitting the value function, which enhances the intelligence ability of the agent to the environment, and the combination of the two networks ensures the effectiveness and robustness of the reinforcement learning algorithm [37,38]. In [37], an actor-critic-based reinforcement learning algorithm is used to approximate the value function and uncertainty of system, respectively, which effectively solves the deterministic nonlinear discrete-time tracking control problem in the presence of input constraints. In [38], an actor-critic reinforcement learning algorithm based on fractional gradient descent RBF neural network is proposed to control the inverted pendulum system, which improves the convergence speed and stability. Referring to the above literature design ideas, we adopt the RBF neural network to construct the critic and actor neural network for approximating the value function and action function, respectively, which learns the nonlinear characteristics of the traction system and predicts the output of the system in future.

Design of Critic Neural Network.
e following state value function is defined to represent the discount sum of the expected revenue of a certain strategy at the beginning of the k time [37]: where c 0 ∈ [0, 1] is the discount factor, which determines the present value of the benefits. If c 0 � 0, it represents the agent that is short-sighted and only concerned with maximizing immediate benefits. Instead, as c 0 gets closer to 1, the discounted returns will take more account of future returns, which means agents will become more far-sighted. R(k) is the utility function, and it is designed as [37] R(k) � 0, e 2 (k) ≤ ξ, where e(k) � y r (k) − y(k) represents the error between the desired speed of train and the actual speed of train, ξ is the threshold, and R(k) represents the current system performance index, that is, R(k) � 0 stands for ideal tracking performance and R(k) � 1 indicates poor tracking performance. Time difference (TD) error e c (k) reflects the actor neural network selected action decisions of the degree of excellence. Rewriting (21) into a recursive form, it is defined as [36] where e c (k) > 0 indicates that the actual effect is better than expected, and the next decision will be more inclined to choose this action; e c (k) < 0 means that the actual effect is worse than expected, and the next decision will reduce the propensity to choose this action. V(k) is approximated by the RBF neural network as where w c (k) is the estimate of the ideal network weight w c (k); the weight error is defined as w c (k) � w c (k) − w c (k), φ c (x(k)) is the Gaussian basis function. Substituting (24) into (23), we obtain e cost function of critic neural network is defined as Partial derivatives of (26) can be obtained as follows: Take the partial of (25) as Substituting (25) and (28) into (27), we get According to the gradient descent method, the weight update law for w c (k) is given by where η c is the learning rate of the critic neural network weight.

Substituting (29) into (30), (30) is rewritten as
Note that there are no convergence guarantees with the weight update since it is an approximation to gradient descent but it proved successful in the simulations in this paper. One can refer to [38,39] for an exact gradient descent algorithm with improved convergence guarantees.

Design of Actor Neural
Network. Actor neural network can use historical data to predict the dynamic output of the system in the future; its ideal model is as follows: where w a (k) is the ideal network weight, φ a (x(k)) is the Gaussian basis function (the network input is selected as x(k) � [y(k − 1), . . . , y(k − n y ), u(k − 1), . . . , u(k− n u )]), n y and n u are order of output and control sequence, y(k) is the train speed at the next moment, and ε(k) is the error of estimation. e actual output of actor neural network is as follows: where w a (k) is the estimate of w a (k) and y(k) is the predicted speed; the weight error is defined as w a (k) � w a (k) − w a (k). e error of actor neural network is defined as e cost function of actor neural network is defined as From (35), the gradient is derived as Take the partial of (34) as Substituting (34) and (37) into (36), we obtain According to the gradient descent method, the updating law of actor neural network weight is as follows: where η a is the learning rate of the actor neural network weight. Substituting (38) into (39), (39) is rewritten as Note that the convergence proof of the actor-critic network during learning is provided in [37,40]. In the ATO system, the control output sequence needs to be reconstructed and sent in advance. Considering that the forward channel time delay is generally several times the sampling period, the number of recursive prediction steps is taken as d � (τ ca− max /T), i.e., the maximum forward channel time delay under the current configuration is divided by the sampling period. On the basis of obtaining relevant input and output information at k moment, actor-critic neural network can be used to perform real-time online recursive prediction of the output sequence within d period. e prediction process is shown in Figure 4. Figure 4 shows that the actor and critic networks are represented by the RBF neural network, and the external environment is composed of relevant input and output sequences of the system at k moment. Actor networks can use the output and control of previous moments to predict future train speed y(k) in one step and update the next decision based on the TD error e c (k) obtained by the critic network. Meanwhile, the critic network can use the same e c (k) to adjust the state value function, and the critic network computes the e c (k) using the R(k) and V(k) to prepare for the next prediction after the actor network outputs the predictions to the environment. It is noted that the actor network does not get R(k) directly, and the critic network does not get y(k) directly in the actor-critic neural network prediction process.

Online Model Parameter Identification with RLS
Algorithm. IGPC needs to obtain the linear model parameters of the controlled object. us, based on the output of future time predicted by actor-critic network and output and control quantity of past time, the RLS algorithm with variable forgetting factor is used to identify the model parameters of IGPC strategy at different times. e model parameters are θ(k) � [a 1 , . . . , a n a , b 1 , . . . , b n b ] Τ , and the corresponding identification formula is as follows [41]: is the set of system input and output samples, n a and n b represent the input and output orders respectively, and K(k) is the gain matrix, which can be represented as where P(k) is the covariance matrix of the error and λ 0 ( k ) ∈ (0, 1] is the forgetting factor, which is a parameter to correct the performance index in order to prevent "data saturation." Its value is determined by error ε(k) and parameter σ. λ 0 (k) and P(k) can be expressed as follows: and we set the initial value as P(0) � 10 6 I, where I is the unit matrix.

Design of Improved Generalized Predictive
Controller. e train network control system identified at different moments is the following controlled autoregressive moving average (CARIMA) model [42]: where Δ � 1 − z − 1 is the difference operator, z − 1 is the delay operator, ζ(k) is the white noise sequence with a mean of 0, and A(z − 1 ), B(z − 1 ), and C(z − 1 ) are the following polynomials of z − 1 ; the order is n a , n b , and n c , respectively.
where a i is the polynomial coefficient, in which i � 1, 2, . . . , n a , b i is the polynomial coefficient, in which i � 1, 2, . . . , n b , and c i is the polynomial coefficient, in which i � 1, 2, . . . , n c . Define the following variables: where e i is the polynomial coefficient, in which i � 1, 2, . . . , j, f i is the polynomial coefficient, in which i � 1, 2, . . . , n + 1, g i is the polynomial coefficient, in which i � 1, 2, . . . , j, and h i is the polynomial coefficient, in which i � 1, 2, . . . , n − 1. Solve the Diophantine equation: x n x 1 x 2 x n x 1 x 2 x n x 1 x 2 x n x 1 x 2 x n y ⌃ (k -1) u (k -1) Figure 4: Actor-critic neural network multistep prediction process.
According to (47), we can obtain F j (z − 1 ), G j (z − 1 ), and H j (z − 1 ). e performance index of CARIMA is defined as follows: where P is the prediction horizon, M is the control horizon, and λ is the control weighting factor; the performance index J is minimized to derive the control law: where y r (k + 1), . . . , y r (k + P)] Τ , and K 1 is the first row of the matrix (G T G + λI) − 1 G T . e above process is based on the latest acquired data packet information, which can realize the rolling optimization of each real-time sublinear model.

Simulation and Analysis
To verify the effectiveness of the proposed method, CRH3 train [2] is taken as the controlled object. e main parameters of the CRH3 train are as follows: the maximum running speed is 350 km/h, the sustained running speed is 300 km/h, the total train weight is 400 tons, and the rotary turning coefficient is 0.06. Control parameter settings are shown in Table 1. e TCN simulation platform is built by CCU1, CCU2, and HMI. e model of the train traction control system is set up in CCU2. e latest speed information is analysed in CCU1, and the LS-SVM hybrid delay prediction algorithm, actor-critic neural network algorithm, recursive RLS algorithm with variable forgetting factor, and IGPC algorithm are inserted into it. e task execution period of each device was configured to be 50 ms, and all the transceiver data modules, algorithm, and model modules were executed in the same task to achieve synchronous calculation [23]. In practical application, the characteristic period of each port is usually selected as a multiple of 64 ms. erefore, the sampling period T is selected as 64 ms.

Construction of TCN Simulation Platform.
Setting up an operating environment consistent with the actual train operating environment and the functions of each node, the semiphysical simulation platform of TCN is shown as in Figure 5. During the test, the task period is set to 50 ms, the load rate is set to 45%, and the characteristic period of the source port is set to 64 ms and 128 ms, respectively. Figure 5 shows that all vehicle control signals are transmitted through the MVB network. As the controller node, CCU1 can simulate the ATO system, while as the actuator node, CCU2 can simulate the train traction control system. In the control process, CCU1 sends the control instruction to CCU2, which simulates the train traction and braking execution process and sends the completed execution information to the sensor node. As the sensor node, HMI can measure the output signal of the system and feed it back to CCU1. When the data are transmitted through the MVB network, the computer connects the network analyser through RS-232 serial line and Ethernet data line, tests the forward and feedback channel delay of the MVB network, and analyses the delay data sample according to the captured process data.

Performance Analysis of Different Time Delay Prediction
Models. In order to verify the prediction performance of the proposed time delay prediction model, we compare the proposed model with the LS-SVM model based on EMD (EMD-LS-SVM) [25], LS-SVM model [27], Elman neural network model (ELMANNN) [43], and the least mean square algorithm based on AR model (LMS-AR) [44]. For all prediction models, the training set is 500 sets of data, the test set is 50 sets of data, the input end of each model is , and the output end is Y � τ(k). e parameters of the EMD-LS-SVM model are set as follows: c � 1854.945 and σ 2 � 0.842. e parameters of the LS-SVM model are set as follows: c � 943.571 and σ 2 � 0.598. e parameters of the ELMANNN model are set as follows: the maximum number of training is 5000, the training target is 0.01, the learning rate is 0.1, and the number of neurons is 50. e parameters of the LMS-AR model are set as follows: the order number is 20 and the convergence factor is 7.8 × 10 − 7 . When the characteristic period is 64 ms, the time delay prediction results of different models are shown in Figure 6(a). Figure 6(b) shows the time delay prediction results of different models when the characteristic period is 128 ms. In order to better measure the prediction effect of each model on the time delay series, the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and prediction time are used to evaluate the overall performance of each prediction model. e performance indexes of each prediction model under different characteristic periods are recorded in Table 2. e RMSE, MAE, and MAPE are defined as follows: where L is the length of the time delay sequence, τ * (k) is the real time delay value at the k moment, and τ(k) is the predicted delay value at the k moment. Figure 6 shows that with the increase of the characteristic period of the source port, the decoding and access time of MVB process data becomes longer, resulting in the severe time delay jitter of the system. Compared with other prediction models, the proposed prediction model can maintain high prediction accuracy in the face of different degrees of delay jitter, which effectively overcomes the impact of network delay on the train control system and ensures the real-time performance of control signal transmission. Table 2 shows that the prediction accuracy of the proposed prediction model is significantly higher than that of other prediction models. e main reason is that after the original delay sequence is processed by EMD, different local characteristics of time delay components are highlighted, and the components of frequency components and waveform changes become simpler and more regular, which effectively reduces the difficulty to predict. At the same time, the proposed model optimized by AQPSO algorithm has more accurate kernel function parameters, which effectively improves the prediction accuracy compared with the EMD-LS-SVM model. On the other hand, since the prediction results of the proposed model are composed of the predicted values of different components generated by EMD, the algorithm takes a little longer to execute but the prediction accuracy is greatly improved, and the one-step delay prediction time of the proposed model is far less than the task execution period, which can meet the real-time requirements of high-speed train network control system.

Comparison of Optimization Performance of Different PSO Algorithms.
To validate the AQPSO algorithm optimization performance, we select the QPSO algorithm based on linear contraction-expansion coefficient (QPSO-LDCE) [30], QPSO algorithm based on nonlinear decreasing contraction-expansion coefficient (QPSO-NDCE) [31], PSO algorithm based on swarm success rate descending inertia weight (SSRDIWPSO) [32], and PSO algorithm based on adaptive inertia weight (AIWPSO) [33] to compare with the proposed algorithm. e above algorithms are simulated in CCU. e hardware configurations of CCU are Intel (R) Core (TM) I5-7300HQ cpu@2.50 GHz processor and 8.00 GB memory. For all optimization algorithms, the maximum iteration time is selected as 30 times, the population number is selected as 50, and the tolerance error is set as 0.01. After testing, the parameters of AQPSO algorithm are set as follows: α max � 1 and α min � 0.5. e parameters of QPSO-LDCE algorithm are set as follows: m � 1 and n � 0.5. e parameters of QPSO-NDCE algorithm are set as follows: α initial � 1 and n � 2. e parameters of SSRDIWPSO algorithm are set as follows: w start � 0.9 and   Figure 7 shows that compared with other PSO algorithms, the AQPSO algorithm has better convergence performance in optimizing parameters of models with different time delay components. e main reason is that PSO algorithm is extended to the quantum space and represents the motion state of particles in the form of wave function, which simplifies the evolution mode and enables the particle to appear at any position in the entire search solution space with a certain probability. us, the AQPSO algorithm has a stronger global search performance compared with SSRDIWPSO and AIWPSO algorithms. On the other hand, the AQPSO algorithm comprehensively considers the position and state changes of particles in the iterative process and designs the adaptive contraction-expansion coefficient so that the proposed algorithm can dynamically balance the global and local searching ability of particles. erefore, the AQPSO algorithm has higher average convergence precision and stability compared with QPSO-LDCE and QPSO-NDCE algorithms. In terms of running time, the proposed algorithm does not increase significantly compared with other PSO algorithms, and the running time of each algorithm under a given tolerance error is basically the same.

Time Delay Compensation Effect of Different Characteristic Periods.
In order to analyse the influence of time delay compensation on the control method under different characteristic periods, the characteristic period of the source port is set as 64 ms and 128 ms, respectively, the sine wave trajectory is selected as the reference input, the sampling time is set as 64 ms, and the initial control quantity is set as 0. Figure 8(a) represents the time delay compensation effect when the characteristic period is 64 ms, and Figure 8(b) represents the time delay compensation effect when the characteristic period is 128 ms. Figure 8 shows that when the characteristic period is 64 ms, the proposed method can track sine wave quickly and accurately with almost no overshooting after adding time delay compensation, and the traction and braking effect is very ideal. If the time delay is not controlled, the output will occasionally oscillate and be unstable at different time points.  When the characteristic period is 128 ms, the forward and feedback channel delay will continue to increase, and the sum will exceed 300 ms. If the delay control strategy is not applied, the output oscillation is very severe. e reason for its oscillation is that the time delay is random change and the time when the control quantity arrives at the actuator is not fixed, which leads to the failure of control method at many moments. It is worth noting that the characteristic period is larger, the time delay is greater, and the oscillation is more serious. In the future, the data transmission of the train will be more and more frequent, and the time delay of the forward channel will be several times or even more than the sampling period. us, the key to realize fast and stable control of traction and braking is that we should adopt effective methods to compensate the influence of time delay on the train control system.

Comparison of Tracking Effects between Different Control
Methods. In order to verify the real-time performance and tracking performance of the proposed method, the nonsingular fast terminal sliding mode control method (NFTSM) [2], RBF neural network adaptive control method (RBFNN) [34], and proportional-integral-differential control method (PID) [45]   e parameters of the PID control method are set as follows: k p � 50, k i � 20, and k d � 5. e characteristic period of the source port is 64 ms, the reference input selects the sine wave and the actual working condition, respectively, the sampling time is set to 64 ms, and the initial control quantity is set to 0.  Table 3 records the statistical results of tracking errors of different control methods under actual working conditions. Figure 9 shows that compared with the proposed method and NFTSM method, the tracking effect of RBFNN adaptive control method and PID control method is poorer. Meanwhile, the speed tracking error fluctuation is bigger with the rapid change of sine wave, which illustrates that these control methods are difficult to be applied to highspeed train network control system with strong nonlinear change and uncertainty. Compared with other control methods, the proposed method has the smaller speed tracking error and has higher tracking accuracy and faster response speed at different moments, which can accurately track the rapid change of sine wave trajectory with better real-time performance. e advantage of the proposed method lies in the fact that the actor-critic neural network used in the control process to continuously interact with the environment gives full play to the learning ability of reinforcement learning algorithm to complex system. us, the actor-critic neural network can accurately predict the output of the traction control system in future, which effectively improves the control accuracy of the proposed method. Figure 10 shows that during the alternations of tracking signals from acceleration to braking, compared with other control methods, the control performance of the proposed method is good in the whole speed and braking phase. e proposed method realizes smooth transition in different static working points and has faster response speed and dynamic performance, which can guarantee the high-precision tracking of the train at a given speed. erefore, the proposed method fully meets demands to the safe operation of HSTs under complex conditions. It can be seen from Table 3 that the proposed method has more ideal tracking performance and real-time performance compared with other control methods, which can provide effective means for the safe and reliable operation of HSTs. In addition, the proposed method is simple in structure, small in computation, and easy to be applied to train communication engineering.

Discussion
To predict the train network delay with the uncertainty and nonlinearity, the LS-SVM time delay prediction model based on EMD and AQPSO algorithms is designed. After EMD processing, the original time delay sequence is transformed from long correlation sequence to short correlation sequence, which highlights the different local characteristics of the original signal and effectively reduces the modelling complexity. By improving the calculation method about the successful value of particle iteration, an AQPSO algorithm with adaptive contraction-expansion coefficient is proposed to optimize the parameters of LS-SVM model, which enhances the prediction performance of the time delay prediction model and overcomes the effect of time delay on the train control system.
Considering the nonlinear characteristics of the train in the process of traction and braking, the actor-critic network is used in the control process to continuously interact with the environment and accurately predict the future output of the system, and the RLS identification algorithm with the variable forgetting factor is adopted to identify the future system model parameters, which realizes the predictive control of the nonlinear train network system. Further, combined with effective time delay prediction and compensation methods, the IGPC scheme for the train key nonlinear network system is implemented.
In this paper, we compensate the time delay generated in TCN without further considering the actual impact of packet loss on the train control system. However, in the process of time delay testing, it is found that when the port characteristic period increases, a small number of packets are lost in the data transmission through MVB network, so an effective method should be proposed to suppress the influence of both time delay and packet loss. us, we will design a more efficient and robust train network control method for such situations in the future.

Conclusions
In this paper, a LS-SVM time delay prediction model based on EMD and AQPSO algorithm is proposed to accurately predict the forward channel time delay for compensating the effect of network delay on train control performance. Based on the actor-critic reinforcement learning algorithm, an IGPC method is designed for HSTs, and the actor-critic reinforcement learning network is used to predict the output of the system by multiple steps in future moment, and according to the output prediction, the RLS algorithm can identify the system model parameters in the future. Combined with the forward delay prediction results, the suitable control quantity is sent in advance, which realizes the time delay compensation control of the train nonlinear network control system. Simulation results show that the proposed method can track the change of reference signal quickly and has good real-time performance, robustness, and stability. e research in this paper provides reference for the optimal control of train communication network and plays an important role in further enhancing the economy, safety, and reliability of high-speed train operation.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments is research was funded by the Natural Science Foundation of Liaoning Province (grant no. 20180551003).