Referring to the optimal tracking guidance of aircraft, the conventional time based kinematics model is transformed into a downrange based model by independent variable replacement. The deviations of in-flight altitude and flight path angle are penalized and corrected to achieve high precision tracking of reference trajectory. The tracking problem is solved as a linear quadratic regulator applying small perturbation theory, and the approximate dynamic programming method is used to cope with the solving of finite-horizon optimization. An actor-critic structure is established to approximate the optimal tracking controller and minimum cost function. The least squares method and Adam optimization algorithm are adopted to learn the parameters of critic network and actor network, respectively. A boosting trajectory with maximum final velocity is generated by Gauss pseudospectral method for the validation of guidance strategy. The results show that the trained feedback control parameters can effectively resist random wind disturbance, correct the initial altitude and flight path angle deviations, and achieve the goal of following a given trajectory.
National Natural Science Foundation of China61503301NSAFU16301271. Introduction
Trajectory optimization plays an important role in aerospace engineering, which helps to realize the management and control of time, energy, and flight states. Limited by computational complexity, practical usage of online optimization seems infeasible. Therefore, tracking optimized trajectory in real time becomes the main choice of flight missions, such as the patrol of unmanned aerial vehicles, the attack of fixed ground targets, and the orbit adjustment of spacecraft. However, there are various kinds of in-flight disturbances and initial deviations, which make the aircraft depart from the reference trajectory. This requires us to design the guidance loop, correct the deviations in the flight, and ensure that the aircraft flies in accordance with the ideal trajectory.
Different theories are tried to work out the optimal tracking problem. For example, based on discounted cost function, Najafi Birgani et al. [1] solved quadratic tracking problem for time-invariant systems in the presence of disturbance, which may not be asymptotically stable. Chai et al. [2] helped the aeroassisted spacecraft in tracking the reconnaissance trajectory by means of model predictive control; two-nested gradient method is used to further decrease the computational demand. Combining proportional-integral-derivative controller and continuous sliding-mode controller together, Ríos et al. [3] proposed a robust tracking strategy for quadrotors with a desired time-varying trajectory. Chu et al. [4] designed an adaptive trajectory tracking controller for remotely operated vehicle, which is provided with an adaptive terminal sliding-mode observer for state estimation and a local recurrent neural network for dynamic model identification. In [5], on-policy and off-policy reinforcement learning algorithm is used to solve tracking problem with constrained input. In this paper, approximate dynamic programming method is used to design guidance controller for the optimal trajectory tracking issue of aircraft.
Approximate dynamic programming (ADP), also known as adaptive dynamic programming, was first proposed by Werbos [6]. It mainly creates an approximate structure (e.g., neural networks) to estimate the cost or strategic utility function, so as to optimize the process of dynamic programming. According to the outcome of the critic network, approximate dynamic programming can be divided into three families: heuristic dynamic programming (HDP) and dual heuristic programming (DHP) proposed by Werbos, and global dual heuristic programming (GDHP) proposed by Prokhorov and Wunsch [7]. Besides, three action-dependent forms were presented without mediation of model network in the actor-critic connections. Approximate dynamic programming is an intelligent optimization method which integrates dynamic programming, reinforcement learning, and neural network disciplines. It helps to overcome the curse of dimensionality brought by dynamic programming, avoid the difficulty of solving nonlinear Hamilton-Jacobi-Bellman equation or algebraic Riccati equation, and even get rid of the dependence on system dynamics by means of reinforcement learning.
Scholars and experts in the field of control and intelligent computing have made continuous research on the optimality and convergence of the approximate dynamic programming theory, and a series of applications can be found. Kiumarsi et al. [8, 9], Li et al. [10], and Wang and Mu [11] applied approximate dynamic programming to infinite-horizon linear quadratic tracker for systems with dynamical uncertainties. To solve zero-sum differential games, Mehraeen et al. [12], Sun et al. [13, 14], and Zhu et al. [15] used iterative approach to approximate the Hamilton–Jacobi–Isaacs equation with neural network. On the other hand, Abouheaf and Lewis et al. [16, 17] applied policy iteration algorithm to learn Nash solution for multiplayer cooperative games. As for constrained problems, inspired by the form of optimal cost-to-go in [18], a new value function involving Lagrange multipliers was introduced by Heydari and Balakrishnan [19] to handle terminal constraints; Kim [20] also successfully applied this idea to the spacecraft’s finite-horizon control, while Adhyaru et al. [21] and Xu et al. [22] used nonquadratic term in the performance function to deal with magnitude constraints on the control input. Most optimal problems come with finite-horizon limitations in real life and here are some solutions. Heydari and Balakrishnan [23–27] developed a sequential of neural networks to account for optimal weights at each time step. Zhao et al. [28, 29] solved finite-horizon optimal problem by applying time-vary basis functions with additional time-to-go terms. Ding et al. [30] and Wang et al. [31] studied ϵ-optimal control, where the error of cost function to its optimal value should be bounded within finite iterations.
In the future air battle, long-range air-to-air missile is vital for the beyond visual range combat. Normally, it is released from the fighter with an initial cruising speed and then climbs to a higher altitude with fixed final flight path angle. Optimal trajectory tracking method will be preferred in the boosting phase, until then the target information can be acquired. Online controller is designed for the following of given trajectory to realize maximum terminal speed. To solve this problem, a novel tracking guidance for aircraft is proposed on the basis of approximate dynamic programming. By punishing load factor, altitude, and flight path angle deviations simultaneously, the tracking of in-flight position and flight path angle can be realized. The actor-critic structure is established, and least squares method and Adam algorithm are used for the training of actor network and critic network, respectively. In addition, a new approach of solving the optimization problem with fixed terminal state is presented. This is done by replacing the independent variable of the system, normally the time variable, with a higher priority state quantity. Afterwards, applying fixed-final-time optimal theory to realize the control of this specific state is done.
The rest of this paper is organized as follows. In Section 2, kinematics model of the aircraft in the longitudinal plane is established, and a simplified perturbation system is derived ignoring the velocity equation. Focusing on the linear quadratic regulator problem, Section 3 develops the actor-critic network for the training of optimal tracking controller; then the actor is embedded into the guidance loop. In Section 4, parameters are trained within a boosting trajectory generated by GPOPS [32]; the tracking guidance is verified in conditions of random wind disturbance and initial deviations. Finally, appropriate conclusions are made in Section 5.
2. Problem Formulation
The general equations of motion in the longitudinal plane for an aircraft [33], assumed as point mass, are given by(1)mV˙=Pcosα-CxqS-mgsinθmVθ˙=Psinα+CyqS-mgcosθx˙=Vcosθy˙=Vsinθwhere V is the velocity; θ is the flight path angle; x and y are the downrange and altitude of the flight, respectively; α is the angle of attack; m and S donate the mass and reference area of the aircraft, respectively; P is the thrust of engine; q=ρV2/2 is the dynamic pressure and g is the gravitational acceleration; Cx and Cy are drag coefficient and lift coefficient, respectively, which are given as follows:(2)Cy=CyαMaαCx=Cx0Ma+KMaCy2where Ma stands for the Mach number of the aircraft; Cyα is the lift curve slope, Cx0 is the zero lift drag coefficient, and K is the induced drag factor, which are all functions of Ma.
Ignoring the curvature and rotation of earth, atmospheric density and gravitational acceleration are given as the function of altitude(3)ρy=ρ0exp-yypgy=g0Re2Re+y2where ρ0 represents the sea-level reference density; yp is atmospheric density scale height; g0 is gravitational acceleration on the ground and Re stands for the radius of earth. Here ρ0=1.225kg/m3, yp=7254.3m, g0=9.806m/s2, and Re=6371km.
The longitudinal load factor is expressed as(4)ny=Psinα+CyqSmg
On account of the in-flight stability of body structure and attitude control system, the angle of attack command should be limited; generally |α|≤20∘, which yields sinα≈α. Therefore, according to (4), one has(5)α≈nymgP+CyαqS
In the general design of an aircraft, the speed characteristics [34] are first investigated to determine the engine parameters. However, the speed becomes difficult to control during the flight owning to the unpredictability of air resistance. Generally, only the longitudinal and lateral overloads are designed in the guidance process. Ignoring the flight speed control, the velocity equation is removed from the kinematics. Selecting downrange x as the independent variable and letting go of the time variable, the reduced-order equations of motion are given by(6)dydx=tanθdθdx=-gV2+gnyV2cosθ
By choosing downrange as independent variable, we can still keep tracking of x without speed control. In addition, the guidance process takes a short time but a long distance in most cases, and the downrange based kinematics can effectively improve control accuracy. Notice that Vcosθ≠0 in (6); namely, this model is not applicable to vertical climbing or landing issues.
Small perturbation theory [35] is widely used in the flight control design. Under such circumstance, deviations caused by flight disturbances are relatively small, and the aircraft will always follow reference trajectory if only the guidance and control system works properly. The perturbations of state and control vectors for system (6) are given by(7)Δy=y-ybΔθ=θ-θbΔny=ny-nybwhere subscript “b” means value of reference trajectory.
Applying the first-order partial derivative method to (6) yields(8)dΔydx=a11Δy+a12Δθ+b1ΔnydΔθdx=a21Δy+a22Δθ+b2Δnywherein(9)a11=0a12=1cos2θa21=-ρCyS2mhpcosθ+2Re2g0Re+y3a22=nygsinθV2cos2θb1=0b2=gV2cosθ
Define z=[Δy,Δθ]T∈R2 as the state vector and u=Δny∈R as the control vector; the dynamics of perturbation system can be formulated as(10)dzdx=Az+Buwhere the matrices are given by(11)A=a11a12a21a22,B=b1b2
As long as the perturbation system stabilizes at the origin, the deviations between real states and reference states would disappear. In this way, the linear quadratic tracking problem is solved as a regulation problem and the aircraft flies as programmed.
To minimize the tracking errors and extra control, choosing a finite-horizon cost function as(12)J=12zTxfQfzxf+12∫x0xfzTxQzx+uTxRuxdxwhere x0 and xf are the initial value and terminal value of downrange from the reference trajectory; Qf∈R2×2, Q∈R2×2, and R∈R are the penalizing matrices for the terminal states, states, and controls, respectively. Qf and Q should be positive semidefinite and R is positive definite.
The quadratic optimization problem of trajectory tracking can be specified: given system dynamics (10), find the optimal control u∗ so that cost function (12) is minimized. Normally, the optimal feedback control is given by(13)u∗=-R-1BTPzwhere P is the solution of continuous time Riccati differential equation(14)dPdx=-PA-ATP+PBR-1BTP-Qwith the boundary condition(15)Pxf=Qf
3. Derivation of Tracking Guidance
Similar to the fixed-final-time optimization problem, the matrix function P obtained from (14) is associated with the aircraft’s downrange x and is solved backwards from xf to x0. There is no analytical solution for finite-horizon Riccati equation and solving process is mathematically impracticable for the most nonlinear problems due to the curse of dimensionality. Therefore, the approximate dynamic programming method is used.
Taking a small sampling step N=xf-x0/Δx, system (10) is discretized by Euler integration scheme, which yields(16)zk+1=A¯zk+B¯uk,k=0,1,2,…,N-1where the matrices A¯=I+ΔxA and B¯=ΔxB.
The corresponding quadratic cost function is expressed as(17)J=12zNTQ¯fzN+12∑k=0N-1zkTQ¯zk+ukTR¯ukLikewise, one has Q¯=ΔxQ and R¯=ΔxR.
For continuous cost function (12) and feedback control (13), according to the approximation properties of neural network and the idea of actor-critic algorithm, smooth function approximators can be established. In this way, the critic network Vk and actor network uk at k step are given as(18)Vkzk=WkTσczk+ϵck=0,1,2,…,Nukzk=ΓkTσazk+ϵak=0,1,2,…,N-1where σc:R2→Rp and σa:R2→Rq are the basis functions of critic network and actor network with p and q neurons, respectively; Wk and Γk are the corresponding network weights; ϵc and ϵa represent the approximation error of the neural networks.
Though exact reconstruction of function is not possible, approximation error is negligible on condition that the basis function is rich enough and the training data is sufficient, according to the universal approximation theorem [36–38]. Therefore, one has(19)V^kzk=W^kTσczku^kzk=Γ^kTσazkNote that the approximation is performed over the domain of interest Ω, a compact set including the origin, and the entire state trajectory should remain within this domain.
3.1. Critic Network Training
In this section, the learning process of critic network is present. Referring to the dynamic programming process, we start from the terminal step k=N. According to the boundary condition(20)JN∗zN=12zNTQ¯fzN
Given the domain of interest Ω⊂R2 based on the specific perturbation system, randomly select zNi∈Ω, i=1,2,…,m, and one has(21)W^NTσczNi≅JN∗zNi
Define vectors σcN∈Rp×m and JN∗∈R1×m(22)σcN=σczN1σczN2⋯σczNmJN∗=JN∗zN1JN∗zN2⋯JN∗zNm
Applying least squares method yields(23)W^N=σcNσcNT-1σcNJN∗TNotice that the inverse of σcNσcNT must exist, requiring that the base functions σc(zNi) are mutually independent, and m is greater than or equal to the neurons of critic network. The optimal solution obtained by approximate dynamic programming is valid in the training domain Ω; that is, the control strategy is applicable to any disturbance and initial states while being in the neighborhood of reference trajectory. In order to make the critic network as close as possible to the optimal value function, the training set needs to be large enough [39], and a completely random method is adopted to ensure sufficient exploration of the operation domain.
According to the principle of optimality, the Bellman equation is given for k=N-1,N-2,…,0, and(24)Jk∗zk=12zkTQ¯zk+uk∗TR¯uk∗+Jk+1∗zk+1
Randomly select zki∈Ω, i=1,2,…,m, and substituting critic network approximator to (24) yields(25)W^kTσczki≅Uki+W^k+1Tσczk+1iwhere(26)Uki=12zkiTQ¯zki+uk∗iTR¯uk∗iuk∗i=Γ^kTσazkizk+1i=A¯zki+B¯uk∗i
Define vectors σck∈Rp×m and Jk∗∈R1×m(27)σck=σczk1σczk2⋯σczkmJk∗=Uk1+W^k+1Tσczk+11Uk2+W^k+1Tσczk+12⋯Ukm+W^k+1Tσczk+1m
Applying the least squares method yields(28)W^k=σckσckT-1σckJk∗T,k=N-1,N-2,…,0Likewise, the inverse of σckσckT must exist.
3.2. Actor Network Training
Recalling continuous system (10) and finite-horizon cost function (12), the Hamilton-Jacobi-Bellman equation is given as(29)0=minu12zTQz+uTRu+∂J∗∂zAz+Bu+∂J∗∂x
Applying the stationarity condition, the partial derivative of (29) to control u should satisfy(30)∂∂u=uTR+∂J∗∂zB=0
Since R is positive definite, rearranging (30) yields(31)u∗=-R-1BT∂J∗∂z
Bringing in the critic network, the optimal control for discretized system is formulated as(32)uk∗=-R¯-1B¯T∇Vk+1zk+1∗where ∇Vk+1(zk+1∗) stands for the gradient of value function ∂Vk+1∗(zk+1)/∂zk+1 and the next state zk+1∗=A¯zk+B¯uk∗.
Define the approximation error of actor network as(33)eazk=Γ^kTσazk-uk∗
Actor network weights are updated through minimizing the following performance function:(34)Ea=12eaTzkeazk
There are plenty of optimization algorithms available at present, typically the gradient descent method. However the adaptive moment estimation (Adam) algorithm [40], proposed by Kingma D. and Ba J., is preferred here. Differing from the fixed learning rate of statistic gradient descent, Adam algorithm makes use of the first and second moments of gradients to adjust learning rates adaptively. This method shows advantages in many ways, for instance, being straightforward to implement, computationally efficient, having little memory requirements, and being appropriate for nonstationary objectives. Mark that t represents the iteration index in the network training process. Arbitrarily select zkt∈Ω in each iteration, calculate the gradient, and update Γ^k as follows:(35)gt=∇Γ^kEamt=β1mt-1+1-β1gtvt=β2vt-1+1-β2gt2m^t=mt1-β1tv^t=vt1-β2tΓ^kt=Γ^kt-1-αm^tv^t+ϵwhere g is the gradient of performance function Ea with weights Γ^k, and m and v are the biased estimates of the first moment and the second moment of the gradient g, respectively; m^ and v^ are bias-corrected moment estimates. α=0.001, β1=0.9, β2=0.999, and ϵ=10-8 are hyper-parameters in learning process.
Given the error tolerance Γ^E, the process ends till Γ^kt-Γ^kt-12<Γ^E. Note that Γ^k iteration requires an initial value Γ^k0. Normally Γ^k+1 is selected in the backward learning process considering the continuity of control.
3.3. Guidance Process
There are two major approaches of convergence proof for this approximate dynamic programming in the literature; interested readers are referred to [41, 42]. In summary, the actor-critic learning process for tracking guidance is given in Algorithm 1.
Algorithm 1: Actor-critic learning procedure of tracking guidance.
Input:
Perturbation equations at every downrange step.
Cost function along the trajectory and at final state.
Output:
Optimal control weights Γ^ for tracking reference trajectory.
(1) Randomly select m sets of zNi∈Ω,i=1,2,…,m, calculate
σcN and JN∗T, obtain W^N by (23).
(2) for k=N-1 to 0 do
(3) Initialize Γ^k0=Γ^k+1, m0=0, v0=0 and actor training
step t=0.
(4) repeat
(5) Randomly select zkt∈Ω, apply previous control uktg=Γ^ktσa(zkt) to calculate zk+1t.
(6) Substitute W^k+1 to (32) gives to ukt∗.
(7) Get the error of actor network from uktg and ukt∗.
(8) Calculate the gradient of weights and update mt,vt and Γ^kt by (35).
(9) Push training step t=t+1.
(10) until Γ^kt-Γ^kt-12<Γ^E
(11) Randomly select m sets of zNi∈Ω,i=1,2,…,m, apply
actor network to get uki∗.
(12) Calculate zk+1i, σck and Jk∗ according to Eqs. (26) and (27).
(13) Apply least square estimate to get W^k.
(14) end for
Once the actor network is obtained, it can be embedded into the on-line guidance of aircraft. In the presence of initial trajectory deviations or random disturbances, the motion states of the aircraft are calculated by on-board computer, acquiring the information from sensors like inertial measurement unit (IMU). The deviations of altitude Δy and flight path angle Δθ are obtained in comparison with the reference trajectory. In accordance with the downrange, the corresponding optimal weights Γ^ are taken from our standard trajectory database; thus extra control Δny is calculated. A total control of ny is implemented to force the aircraft to fly along the reference trajectory. The flowchart of tracking guidance is shown in Figure 1.
Flowchart of ADP trajectory tracking guidance.
4. Numerical Simulation
To validate the effectiveness of our tracking guidance, boosting trajectories with random wind disturbance, initial altitude deviation, and initial flight path angle deviation are simulated.
4.1. Problem Setup
Going through random wind field, the aircraft is inevitably disturbed and its flight path would change unwillingly. In general, the wind field is modeled as vertical wind and horizontal wind according to its direction, which is closely related to flight altitude. While, in simulation, the random wind disturbance is introduced in the form of additional angle of attack, steps are as follows.
Step 1.
Generate random ground wind(36)U0=rand∗σ+μW0=rand∗2σ+μwhere U0 and W0 are the vertical and horizontal winds on the ground, respectively; rand is a unit random number; μ and σ are the mean and mean square deviation of the ground wind (μ=0,σ=4m/s for the following simulations).
Step 2.
Transform ground wind into high-altitude one(37)U=U0ρ0ρW=W0ρ0ρwhere U and W are the vertical and horizontal winds at the specific altitude, respectively; ρ0 and ρ are air densities on the ground and at high altitude, respectively.
Step 3.
Calculate the disturbed angle of attack components(38)Δα1=arctanUcosθV-UsinθΔα2=arctanWsinθV+Wcosθwhere Δα1 and Δα2 represent the angle of attack introduced by vertical wind and horizontal wind, respectively; θ is the flight path angle of the aircraft.
Step 4.
Compound the additional angle of attack(39)Δα=Δα1+Δα2
Considering the dynamics of the aircraft, a boosting trajectory with maximum terminal speed is generated under the help of Gaussian pseudospectral principle and GPOPS toolbox. The initial and final states for the aircraft are given in Table 1. In addition, the engine works for 20s and produces 8kN thrust with a mass flow rate of 3.5kg/s related to the fuel consumption.
Simulation condition.
Parameters
Initial value
Final value
x (m)
0
80000
y (m)
12000
35000
θ(∘)
0
0
m (kg)
150
80
V (m/s)
280
1100~1500
“~” means value range from minimum to maximum.
4.2. Parameter Generation
The choice of neural networks usually comes from engineering experience. In order to match the general state feedback control, the basis function for actor network is selected as σa(z)=[Δy,Δθ], while for the critic network it is σc(z)=[Δy2,Δθ2,ΔyΔθ]. Although the sufficiency remains to be proved, the results have indeed met the tracking requirements. Then we obtain ∂σc(z)/∂z=[2Δy,0,Δθ;0,2Δθ,Δy]. The domain of interest for perturbation system is limited by Δy=[-500,500]m and Δθ=[-5,5]∘. In addition, penalizing matrices in cost function (17) are Q¯=diag(1,10000), R¯=10000, and Q¯f=100Q¯, respectively.
Applying Algorithm 1, the training process is implemented in a laptop with Intel Core i7-6700HQ CPU and 8G RAM, running Windows 10, and MATLAB 2018 (single threading). The actor network for this trajectory is obtained, no more than 12.3s with a fixed downrange step of 50m. Weights of critic network and actor network are normalized by dividing their own maximum values along reference trajectory for the purpose of presentation in Figure 2. Each maximum value is illustrated in the legend for different weight lines.
Actor-critic training weight.
Critic weight
Actor weight
4.3. Performance Analysis4.3.1. Wind Disturbance
To evaluate the tracking ability against random wind disturbance, all initial deviations of the trajectory are set to zero. Additional angle of attack is injected to the guidance loop. After simulation, the tracking deviations are listed in Table 2, and the flight parameters are shown in Figure 3.
Guidance comparison with wind disturbance.
Guidance
Δy(m)
Δθ(∘)
ΔV(m/s)
Open loop
(-0.048, 1497.7)
(-0.111, 2.273)
(-8.443, 12.282)
ADP tracking
(-0.329, 0.661)
(-0.145, 0.127)
(-5.698, 14.887)
Δ1 = reference data, real data.
2bracket means (min, max) value along the trajectory.
Trajectory under wind disturbance.
Position
Velocity
Angle of attack
Flight path angle
We can see the following:
According to Figure 3(c), the deviation between wind influenced angle of attack and reference angle of attack increases with the downrange. It is reasonable because altitude increases with downrange in our boosting phase and the factor ρ0/ρ in (37) increases with altitude.
Under the open-loop control, the altitude and flight path angle of aircraft will deviate from reference trajectory because of wind disturbance and will gradually increase with the flight process. However, the impact on the speed is relatively small.
The guidance controller obtained by approximate dynamic programming can accomplish the tracking task of position and flight path angle, yet the speed deviation cannot be eliminated.
4.3.2. Initial Altitude Deviation
To evaluate the tracking ability against initial altitude deviation, all initial deviations are set to zero except altitude, specifically y0=11600m. Importing the same set of wind disturbance as in Section 4.3.1, simulation results are shown in Table 3 and Figure 4.
Guidance comparison with initial altitude deviation.
Guidance
Δy(m)
Δθ(∘)
ΔV(m/s)
Open loop
(-3996.194, 400.014)
(-5.309, 0)
(-8.490, 30.585)
ADP tracking
(-0.429, 400.014)
(-17.050, 0.058)
(-6.298, 61.872)
Trajectory with initial altitude deviation.
Position
Velocity
Angle of attack
Flight path angle
We can see the following:
Under the open-loop control, the altitude and flight path angle are greatly affected; the altitude deviation reaches 4km at the end. The velocity is also changed with the trajectory.
When guided with the proposed controller, a large angle of attack is generated to correct the altitude deviation at the beginning. The flight path angle also fluctuates greatly, reaching 17∘. However, after less than 4km of downrange, the position and flight path angle will return to reference values.
It can be seen from Figure 4(b) that, because the guidance strategy does not control velocity, the aircraft will lose speed after correcting initial altitude deviation. Once the flight is consistent with the reference trajectory, the velocity deviation does not accumulate over time, with a maximum deviation of around 60m/s.
4.3.3. Initial Flight Path Angle Deviation
To evaluate the tracking ability against initial flight path angle deviation, all initial deviations are set to zero except flight path angle, specifically θ0=5∘. Importing the same set of wind disturbance as in Section 4.3.1, simulation results are shown in Table 4 and Figure 5.
Guidance comparison with initial flight path angle deviation.
Guidance
Δy(m)
Δθ(∘)
ΔV(m/s)
Open loop
(-8760.229, 0)
(-9.992, -3.537)
(-23.105, 52.749)
ADP tracking
(-9.764, 0.798)
(-5, 0.676)
(-4.352, 16.291)
Trajectory with initial flight path angle deviation.
Position
Velocity
Angle of attack
Flight path angle
We can see the following:
Under the open-loop control, the trajectory is greatly influenced by initial flight path angle.
Applying the proposed tracking guidance, the position, velocity, and flight path angle can be quickly consistent with the reference trajectory, and the initial flight path angle deviation has little effect on the speed curve.
As can be seen from Figure 5(d), the correction of flight path angle deviation can be very fast, mainly because the flight path angle is directly controlled by aircraft’s overload.
5. Conclusion
Owning to the initial deviations and random disturbances, the closed-loop tracking guidance must be designed to ensure that the aircraft is flying along the reference trajectory. Otherwise, trajectory deflection will occur and accumulate quickly over the flight. In this paper, the approximate dynamic programming method is introduced to acquire the trajectory tracking controller through actor-critic network learning. This guidance strategy not only ensures the following of reference trajectory, but also minimizes the quadratic cost function of deviations and extra controls. Simulation results show that the proposed method can effectively get rid of random wind disturbance and the initial altitude and flight path angle deviations, thus realizing the high-precision tracking requirements of position and flight path angle in the flight. This guidance law shows great capability of eliminating deviations, especially the initial flight path angle deviation. In-flight speed control is difficult for many reasons, such as fixed thrust of the engine or continuous changes of air resistance. Nevertheless, the flight speed is much less sensitive to random wind disturbance and flight path angle deviation when compared to altitude deviation. Therefore, as long as the aircraft maintains the flight along the reference trajectory, the speed will not likely to have large deviation.
Data Availability
Data is available when required.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the NSFC [grant number 61503301] and the NSAF [grant number U1630127]. The authors are grateful for the editors and reviewers for their valuable comments and constructive suggestions with regard to the revision of the paper.
Najafi BirganiS.MoaveniB.Khaki-SedighA.Infinite horizon linear quadratic tracking problem: a discounted cost function approach20183941549157210.1002/oca.2425MR3830981Zbl1398.49026ChaiR.SavvarisA.TsourdosA.ChaiS.XiaY.Optimal tracking guidance for aeroassisted spacecraft reconnaissance mission based on receding horizon control20185441575158810.1109/TAES.2018.27982192-s2.0-85040963758RiosH.FalconR.GonzalezO. A.DzulA.Continuous sliding-mode control strategies for quadrotor robust tracking: real-time application20196621264127210.1109/TIE.2018.2831191ChuZ.ZhuD.YangS. X.Observer-based adaptive neural network trajectory tracking control for remotely operated vehicle20172871633164510.1109/TNNLS.2016.2544786MR3666186KiumarsiB.ModaresH.LewisF. L.Optimal tracking control of uncertain systems: on-policy and off-policy reinforcement learning approaches2016Butterworth-Heinemann1651862-s2.0-85017557133WerbosP. J.Advanced forecasting methods for global crisis warning and models of intelligence1977222538ProkhorovD. V.WunschD. C.IIAdaptive critic designs19978599710072-s2.0-003123600210.1109/72.623201KiumarsiB.LewisF. L.ModaresH.KarimpourA.Naghibi-SistaniM.Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics20145041167117510.1016/j.automatica.2014.02.015MR31913872-s2.0-84898853127Zbl06357803KiumarsiB.LewisF. L.Naghibi-SistaniM.-B.KarimpourA.Optimal tracking control of unknown discrete-time linear systems using input-output measured data201545122770277910.1109/tcyb.2014.23840162-s2.0-84920545953LiX.XueL.SunC.Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm201831486932-s2.0-8504975831410.1016/j.neucom.2018.05.111WangD.MuC.Adaptive-critic-based robust trajectory tracking of uncertain dynamics and its application to a spring-mass-damper system201865165466310.1109/TIE.2017.27224242-s2.0-85023753630MehraeenS.DierksT.JagannathanS.CrowM. L.Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks2013436164116552-s2.0-8489005860110.1109/TSMCB.2012.222725324273142SunJ.LiuC.Zero-sum differential games for nonlinear systems using adaptive dynamic programming with input constraintProceedings of the 36th Chinese Control Conference, CCC 2017July 2017ChinaIEEE250125062-s2.0-85032188561SunJ.LiuC.YeQ.Robust differential game guidance laws design for uncertain interceptor-target engagement via adaptive dynamic programming2017905990100410.1080/00207179.2016.1192687MR3630484Zbl1366.931422-s2.0-84980384743ZhuY.ZhaoD.LiX.Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data201728371472510.1109/TNNLS.2016.2561300MR36132362-s2.0-84971473586AbouheafM. I.LewisF. L.MahmoudM. S.MikulskiD. G.Discrete-time dynamic graphical games: model-free reinforcement learning solution2015131556910.1007/s11768-015-3203-xMR3326736AbouheafM. I.LewisF. L.VamvoudakisK. G.HaesaertS.BabuskaR.Multi-agent discrete-time graphical games and reinforcement learning solutions201450123038305310.1016/j.automatica.2014.10.047MR3284139Zbl1367.91032VadaliS. R.SharmaR.Optimal finite-time feedback controllers for nonlinear systems with terminal constraints20062949219282-s2.0-3374706382310.2514/1.16790HeydariA.BalakrishnanS. N.Fixed-final-time optimal control of nonlinear systems with terminal constraints20134861712-s2.0-8488279677210.1016/j.neunet.2013.07.00223954546Zbl1297.93109KimY.KimY.ParkC.Adaptive critics design with support vector machine for spacecraft finite-horizon optimal control20193210401811110.1061/(ASCE)AS.1943-5525.0000941AdhyaruD. M.KarI. N.GopalM.Fixed final time optimal control approach for bounded robust controller design using Hamilton-Jacobi-Bellman solution2009391183119510.1049/iet-cta.2008.0288MR2571780XuH.ZhaoQ.JagannathanS.Finite-horizon near-optimal output feedback neural network control of quantized nonlinear discrete-time systems with input constraint20152681776178810.1109/TNNLS.2015.2409301MR34549722-s2.0-84937411591HeydariA.BalakrishnanS. N.Fixed-final-time optimal tracking control of input-affine nonlinear systems20141295285392-s2.0-8489374361210.1016/j.neucom.2013.09.006HeydariA.BalakrishnanS. N.Global optimality of approximate dynamic programming and its use in non-convex function minimization2014242913032-s2.0-8490540713410.1016/j.asoc.2014.07.003HeydariA.BalakrishnanS. N.Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics20132411451572-s2.0-8488006528710.1109/TNNLS.2012.2227339HeydariA.BalakrishnanS. N.Optimal switching between controlled subsystems with free mode sequence2015149162016302-s2.0-8491209868210.1016/j.neucom.2014.08.030HeydariA.BalakrishnanS. N.Optimal switching between autonomous subsystems201435152675269010.1016/j.jfranklin.2013.12.008MR3191914ZhaoQ.2013Missouri University of Science and TechnologyZhaoQ.XuH.JagannathanS.Fixed final time optimal adaptive control of linear discrete-time systems in input-output form20133317518710.2478/jaiscr-2014-0012WangD.LiuD.WeiQ.Adaptive dynamic programming for finite-horizon optimal tracking control of a class of nonlinear systemsProceedings of the 30th Chinese Control Conference, CCC 2011July 2011China245024552-s2.0-80053069436WangF.-Y.JinN.LiuD.WeiQ.Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound2011221243610.1109/tnn.2010.20763702-s2.0-78651311269PattersonM. A.RaoA. V.GPOPS-II: a MATLAB software for solving multiple-phase optimal control problems using hp-adaptive gaussian quadrature collocation methods and sparse nonlinear programming201441113710.1145/25589042-s2.0-84908509298LiN.LeiH.ShaoL.LiuT.WangB.Trajectory optimization based on multi-interval mesh refinement method201720178852136810.1155/2017/8521368MR3692959ZhangD.-Y.LeiH.-M.WuL.ShaoL.WangJ.A trajectory tracking guidance law based on LQR20143767637682-s2.0-84922231171DrenickR.The perturbation calculus in missile ballistics195125142343610.1016/0016-0032(51)90002-6MR00422342-s2.0-49749210288HornikK.StinchcombeM.WhiteH.Multilayer feedforward networks are universal approximators19892535936610.1016/0893-6080(89)90020-82-s2.0-0024880831CybenkoG.Approximation by superpositions of a sigmoidal function19892430331410.1007/BF02551274MR1015670Zbl0679.940192-s2.0-0024861871LuZ.PuH.WangF.HuZ.WangL.The expressive power of neural networks: a view from the widthProceedings of the in 31st Annual Conference on Neural Information Processing Systems, NIPS201762326240KamalapurkarR.WaltersP.DixonW. E.Model-based reinforcement learning for approximate optimal regulation2016649410410.1016/j.automatica.2015.10.039MR3433085KingmaD.BaJ.Adam: a method for stochastic optimizationProceedings of the International Conference on Learning Representations2015115Al-TamimiA.LewisF. L.Abu-KhalafM.Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof200838494394910.1109/tsmcb.2008.9266142-s2.0-49049089962HeydariA.Revisiting approximate dynamic programming and its convergence20144412273327432-s2.0-8491202693710.1109/TCYB.2014.2314612