A method to solve nonlinear optimal control problems is proposed in
this work. The method implements an approximating sequence of time-varying linear quadratic regulators that converge to the solution of the
original, nonlinear problem. Each subproblem is solved by manipulating
the state transition matrix of the state-costate dynamics. Hard, soft,
and mixed boundary conditions are handled. The presented method is
a modified version of an algorithm known as “approximating sequence
of Riccati equations.” Sample problems in astrodynamics are treated to
show the effectiveness of the method, whose limitations are also discussed.
1. Introduction
Optimal control problems are solved with indirect or direct methods. Indirect methods stem from the calculus of variations [1, 2]; direct methods use a nonlinear programming optimization [3, 4]. Both methods require the solution of a complex set of equations (Euler-Lagrange differential equations or Karush-Kuhn-Tucker algebraic equations) for which iterative numerical methods are used. These iterative procedures implement some form of Newton’s method to find the zeros of a nonlinear function. They are initiated by providing an initial guess solution. Guessing an appropriate initial solution is not trivial and requires a deep knowledge of the problem at hand. In indirect methods, the initial value of the Lagrange multiplier has to be provided, whose lack of physical meaning makes it difficult to formulate a good guess. In direct methods, the initial trajectory and control have to be guessed at discrete points over the whole time interval.
This paper presents an approximate method to solve nonlinear optimal control problems. This is a modification of the method known as “approximating sequence of Riccati equations” (ASRE) [5, 6]. It transforms the nonlinear dynamics and objective function into a pseudolinear and quadratic-like structure, respectively, by using state- and control-dependent functions. At each iteration, these functions are evaluated by using the solutions at the previous iteration, and therefore, a series of time-varying linear quadratic regulators is treated. This sequence is solved with a state transition matrix approach, where three different final conditions are handled: final state fully specified, final state not specified, and final state not completely specified. These define hard, soft, and mixed constrained problems, respectively.
The main feature of the presented method is that it does not require guessing any initial solution or Lagrange multiplier. In fact, iterations start by evaluating the state- and control-dependent functions using the initial condition and zero control, respectively. The way the dynamics and objective function are factorized recalls the state-dependent Riccati equations (SDRE) method [7–9]. These two methods possess some similarities, although the way they solve the optimal control problem is different. As the method is approximated, suboptimal solutions are derived. These could be used as first guess solutions for either indirect or direct methods.
2. The Nonlinear Optimal Control Problem
The optimal control problem requires that, given a set of n first-order differential equations
(1)x˙=f(x,u,t),
the m control functions u(t) must be determined within initial, final time ti, tf, such that the performance index
(2)J=φ(x(tf),tf)+∫titfL(x,u,t)dt
is minimized while satisfying n+q two-point conditions
(3)x(ti)=xi,ψ(x(tf),tf)=0.
The problem consists in finding a solution that represents a stationary point of the augmented performance index
(4)J¯=φ(x(tf),tf)+νTψ(x(tf),u(tf),tf)+∫titf[L(x,u,t)+λT(f(x,u,t)-x˙)]dt,
where λ is the vector of costate and ν is the multiplier of the boundary condition. The necessary conditions for optimality, also referred to as Euler-Lagrange equations, are
(5)x˙=∂H∂λ,λ˙=-∂H∂x,∂H∂u=0,
where H, the Hamiltonian, is
(6)H(x,λ,u,t)=L(x,u,t)+λTf(x,u,t).
The differential-algebraic system (5) must be solved together with the final boundary conditions (3) and the transversality conditions
(7)λ(tf)=[∂φ∂x+(∂ψ∂x)Tν]t=tf,
which define a differential-algebraic parametric two-point boundary value problem whose solution supplies ν and the functions x(t), λ(t), u(t), t∈[ti,tf].
3. The Approximating Sequence of Riccati Equations
Let the controlled dynamics (1) be rewritten in the form
(8)x˙=A(x,t)x+B(x,u,t)u,
and let the objective function (2) be rearranged as
(9)J=12xT(tf)S(x(tf),tf)x(tf)+12∫titf[xTQ(x,t)x+uTR(x,u,t)u]dt,
where the operators A, B, S, Q, and R have appropriate dimensions. The nonlinear dynamics (8) and the performance index (9) define an optimal control problem. The initial state, xi, is assumed to be given, while the final condition (ψ in (3)) can assume three different forms (see Section 4). The problem is formulated as an approximating sequence of Riccati equations. This method reduces problem (8)-(9) to a series of time-varying linear quadratic regulators that are defined by evaluating the state- and control-dependent matrices using the solution of the previous iteration (the first iteration considers the initial condition and zero control).
The initial step consists in solving problem 0, which is defined as follows:
(10)x˙(0)=A(xi,t)x(0)+B(xi,0,t)u(0),J=12x(0)T(tf)S(xi,tf)x(0)(tf)+12∫titf[x(0)TQ(xi,t)x(0)+u(0)TR(xi,0,t)u(0)]dt.
Problem 0 is a standard time-varying linear quadratic regulator (TVLQR), as the arguments of A, B, S, Q, and R are all given except for the time. This problem is solved to yield x(0)(t) and u(0)(t), t∈[ti,tf], where the superscript denotes the problem that the solution refers to.
At a generic, subsequent iteration, problem k has to be solved. This is defined as follows:
(11)x˙(k)=A(x(k-1)(t),t)x(k)+B(x(k-1)(t),u(k-1)(t),t)u(k),J=12x(k)T(tf)S(x(k-1)(tf),tf)x(k)(tf)+12∫titf[x(k)TQ(x(k-1)(t),t)x(k)ccccccccccc+u(k)TR(x(k-1)(t),u(k-1)(t),t)u(k)]dt.
Problem k is again a TVLQR; note that x(k-1) and u(k-1) are the solutions of problem k-1 achieved at previous iteration. Solving problem k yields x(k)(t) and u(k)(t), t∈[ti,tf].
Iterations continue until a certain convergence criterion is satisfied. In the present implementation of the algorithm, the convergence is reached when
(12)∥x(k)-x(k-1)∥∞=maxt∈[ti,tf]{|xj(k)(t)-xj(k-1)(t)|,j=1,…,n}≤ɛ,
where ɛ is a prescribed tolerance. That is, iterations terminate when the difference between each component of the state, evaluated for all times, changes by less than ɛ between two consecutive iterations.
4. Solution of the Time-Varying Linear Quadratic Regulator by the State Transition Matrix
With the approach sketched in Section 3, a fully nonlinear optimal control problem is reduced to a sequence of time-varying linear quadratic regulators. These can be solved a number of times to achieve an approximate solution of the original, nonlinear problem. This is done by exploiting the structure of the problem as well as its state transition matrix. This scheme differs from that implemented in [5, 6], and, in part, is described in [1].
Suppose that the following dynamics are given:
(13)x˙=A(t)x+B(t)u,
together with the quadratic objective function
(14)J=12xT(tf)S(tf)x(tf)+12∫titf[xTQ(t)x+uTR(t)u]dt,
where Q, R, and S are positive semidefinite and positive definite time-varying matrices with appropriate dimensions, respectively. The Hamiltonian of this problem is
(15)H=12[xTQ(t)x+uTR(t)u]+λT[A(t)x+B(t)u],
and the optimality conditions (5) read
(16)x˙=A(t)x+B(t)u,(17)λ˙=-Q(t)x-AT(t)λ,(18)0=R(t)u+BT(t)λ.
From (18), it is possible to get
(19)u=-R-1(t)BT(t)λ,
which can be substituted into (16)-(17) to yield
(20)x˙=A(t)x-B(t)R-1(t)BT(t)λ,λ˙=-Q(t)x-AT(t)λ.
In a compact form, (20) can be arranged as
(21)[x˙λ˙]=[A(t)-B(t)R-1(t)BT(t)-Q(t)-AT(t)][xλ].
Since (21) is a system of linear differential equations, the exact solution can be written as
(22)x(t)=ϕxx(ti,t)xi+ϕxλ(ti,t)λi,(23)λ(t)=ϕλx(ti,t)xi+ϕλλ(ti,t)λi,
where the functions ϕxx, ϕxλ, ϕλx, and ϕλλ are the components of the state transition matrix, which can be found by integrating the following dynamics:
(24)[ϕ˙xxϕ˙xλϕ˙λxϕ˙λλ]=[A(t)-B(t)R-1(t)BT(t)-Q(t)-AT(t)][ϕxxϕxλϕλxϕλλ],
with the initial conditions
(25)ϕxx(ti,ti)=ϕλλ(ti,ti)=In×n,ϕxλ(ti,ti)=ϕλx(ti,ti)=0n×n.
If both xi and λi were given, it would be possible to compute x(t) and λ(t) through (22)-(23), and therefore the optimal control function u(t) with (19). The initial condition is assumed to be given, whereas the computation of λi depends on the final condition, which, in the present algorithm, can be defined in three different ways.
4.1. Hard Constrained Problem
In a hard constrained problem (HCP), the value of the final state is fully specified, x(tf)=xf, and therefore, (14) does not account for S. The value of λi can be found by writing (22) at final time
(26)xf=ϕxx(ti,tf)xi+ϕxλ(ti,tf)λi
and by solving for λi; that is,
(27)λi(xi,xf,ti,tf)=ϕxλ-1(ti,tf)[xf-ϕxx(ti,tf)xi].
4.2. Soft Constrained Problem
In a soft constrained problem (SCP), the final state is not specified, and thus S in (14) is an n×n positive definite matrix. The transversality condition (7) sets a relation between the state and costate at final time
(28)λ(tf)=S(tf)x(tf),
which can be used to find λi. This is done by writing (22)-(23) at final time and using (28)
(29)x(tf)=ϕxx(ti,tf)xi+ϕxλ(ti,tf)λi,S(tf)x(tf)=ϕλx(ti,tf)xi+ϕλλ(ti,tf)λi.
Equations (29) represent a linear algebraic system of 2n equations in the 2n unknowns {x(tf),λi}. The system can be solved by substitution to yield
(30)λi(xi,ti,tf)=[ϕλλ(ti,tf)-S(tf)ϕxλ(ti,tf)]-1×[S(tf)ϕxx(ti,tf)-ϕλx(ti,tf)]xi.
4.3. Mixed Constrained Problem
In a mixed constrained problem (MCP), some components of the final state are specified and some are not. Without any loss of generality, let the state be decomposed as x=(y,z), where y are the p known components at final time, y(tf)=yf, and z are remaining n-p elements. The costate is decomposed accordingly as λ=(ξ,η). With this formalism, S in (14) is (n-p)×(n-p), and it is pre- and postmultiplied by z(tf). The transversality condition (7) is η(tf)=S(tf)z(tf).
The MCP is solved by partitioning the state transition matrix in a suitable form such that, at final time, (22)-(23) read
(31)[y(tf)z(tf)]=[ϕyyϕyzϕzyϕzz][yizi]+[ϕyξϕyηϕzξϕzη][ξiηi],(32)[ξ(tf)η(tf)]=[ϕξyϕξzϕηyϕηz][yizi]+[ϕξξϕξηϕηξϕηη][ξiηi],
where the dependence of the state transition matrix components on ti, tf is omitted for brevity. From the first row of (31), it is possible to get
(33)ξi=ϕyξ-1[yf-ϕyyyi-ϕyzzi]-ϕyξ-1ϕyηηi,
which can be substituted in the second row of (31) to yield
(34)z(tf)=[ϕzy-ϕzξϕyξ-1ϕyy]yi+[ϕzz-ϕzξϕyξ-1ϕyz]zi+ϕzξϕyξ-1yf+[ϕzη-ϕzξϕyξ-1ϕyη]ηi.
Equations (33)-(34), together with the transversality condition η(tf)=S(tf)z(tf), can be substituted in the second row of (32) to compute the component of the initial costate
(35)ηi(xi,yf,ti,tf)=[ϕ~ηη]-1w(xi,yf,ti,tf),
where
(36)ϕ~ηη=ϕηη-ϕηξϕyξ-1ϕyη-S(ϕzη-ϕzξϕyξ-1ϕyη),w(xi,yf,ti,tf)=[S(ϕzy-ϕzξϕyξ-1ϕyy)-ϕηy+ϕηξϕyξ-1ϕyy]yi+[S(ϕzz-ϕzξϕyξ-1ϕyz)+ϕηz+ϕηξϕyξ-1ϕyz]zi+[S(ϕzξϕyξ-1)-ϕηξϕyξ-1]yf.
Once ηi is know, the remaining part of the initial costate, ξi, is computed through (33), and therefore, the full initial costate is obtained as a function of the initial condition, given final condition, initial and final time; that is, λi(xi,yf,ti,tf)=(ξi(xi,yf,ti,tf),ηi(xi,yf,ti,tf)).
5. Numerical Examples
Two simple problems with nonlinear dynamics are considered to apply the developed algorithm. These correspond to the controlled relative spacecraft motion and to the controlled two-body dynamics for low-thrust transfers.
5.1. Low-Thrust Rendezvous
This problem is taken from the literature where a solution is available, for comparison’s sake [10, 11]. Consider the planar, relative motion of two particles in a central gravity field expressed in a rotating frame with normalized units: the length unit is equal to the orbital radius, the time unit is such that the orbital period is 2π, and the gravitational parameter is equal to 1. In these dynamics, the state, x=(x1,x2,x3,x4), represents the radial, tangential displacements (x1,x2) and the radial, tangential velocity deviations (x3,x4), respectively. The control, u=(u1,u2), is made up by the radial and tangential accelerations, respectively.
The equations of motion are
(37)x˙1=x3,x˙2=x4,x˙3=2x4-(1+x1)(1r3-1)+u1,x˙4=-2x3-x2(1r3-1)+u2,
with r=(x1+1)2+x22. The initial condition is xi=(0.2,0.2,0.1,0.1). Two different problems are solved to test the algorithm in both hard and soft constrained conditions.
Hard Constrained Rendezvous. The HCP consists in minimizing
(38)J=12∫titfuTudt
with the final, given condition xf=(0,0,0,0) and ti=0, tf=1.
Soft Constrained Rendezvous. The SCP considers the following objective function:
(39)J=12xT(tf)Sx(tf)+12∫titfuTudt,
with S=diag(25,15,10,10), ti=0 and tf=1 (xf is free).
The differential equations (37) are factorized into the form of (8) as
(40)[x˙1x˙2x˙3x˙4]=[00100001f(x1,x2)(1+1x1)0020f(x1,x2)-20]︸A(x)×[x1x2x3x4]+[00001001]︸B[u1u2],
with f(x1,x2)=-1/[(x1+1)2+x22]3/2+1. Thus, the problem is put into the pseudo-LQR form (8)-(9) by defining A(x) and B as in (40) and by setting Q=04×4 and R=I2×2.
The two problems have been solved with the developed method. Table 1 reports the details of the HCP and SCP, whose solutions are shown in Figures 1 and 2, respectively. In Table 1, J is the objective function at the final iteration, “Iter” is the number of iterations, and the “CPU time” is the computational time (this refers to an Intel Core 2 Duo 2 GHz with 4 GB RAM running Mac OS X 10.6). The termination tolerance ɛ in (12) is 10-9. The optimal solutions found replicate those already known in the literature [10, 11], indicating the effectiveness of the developed method.
Rendezvous solutions details.
Problem
J
Iter
CPU time (s)
HCP
0.9586
5
0.375
SCP
0.5660
6
0.426
Hard constrained rendezvous.
x1 versus x2
x3 versus x4
u1 versus u2
Soft constrained rendezvous.
x1 versus x2
x3 versus x4
u1 versus u2
5.2. Low-Thrust Orbital Transfer
In this problem, the controlled, planar Keplerian motion of a spacecraft in polar coordinates is studied. The dynamics are written in scaled coordinates, where the length unit corresponds to the radius of the initial orbit, the time unit is such that its period is 2π, and the gravitational parameter is 1. The state, x=(x1,x2,x3,x4), is made up by the radial distance from the attractor (x1), the phase angle (x2), the radial velocity (x3), and the transversal velocity (x4), whereas the control, u=(u1,u2), corresponds to the radial and transversal accelerations, respectively [12, 13]. The equations of motion are
(41)x˙1=x3,x˙2=x4,x˙3=x1x42-1x12+u1,x˙4=-2x3x4x1+u2x1,
and the objective function is
(42)J=12∫titfuTudt,
with ti=0 and tf=π. The initial state corresponds to the conditions at the initial orbit; that is, xi=(1,0,0,1). Two different HCPs are solved, which correspond to the final states xf=(1.52,π,0,1.52-3/2) and xf=(1.52,1.5π,0,1.52-3/2), respectively. This setup mimics an Earth-Mars low-thrust transfer. The dynamics (41) and the objective function (42) are put in the form (8)-(9) by defining Q=04×4, R=I2×2, and
(43)A(x)=[00100001x-1x1300x1x400-2x4x10],B(x)=[00001001x1].
The two HCPs have been solved with the developed method. The solutions’ details are reported in Table 2, whose columns have the same meaning as in Table 1. It can be seen that more iterations and an increased computational burden are required to solve this problem. The solution with x2,f=1.5π is reported in Figure 3.
Earth-Mars transfer details.
Problem
J
Iter
CPU time (s)
x2,f=π
0.5298
22
5.425
x2,f=1.5π
4.8665
123
41.831
Orbital transfer with x2,f=1.5π.
Transfer trajectory
Control profile
6. Conclusion
In this paper, an approximated method to solve nonlinear optimal control problems has been presented, with applications to sample cases in astrodynamics. With this method, the nonlinear dynamics and objective function are factorized in a pseudolinear and quadratic-like forms, which are similar to those used in the state-dependent Riccati equation approach. Once in this form, a number of time-varying linear quadratic regulator problems are solved. A state transition matrix approach is used to deal with the time-varying linear quadratic regulators. The results show the effectiveness of the method, which can be used to either have suboptimal solutions or to provide initial solutions to more accurate optimizers.
BrysonA. E.HoY. C.1975New York, NY, USAJohn Wiley & SonsPontryaginL. S.BoltyanskiiV. G.GamkrelidzeR. V.MishchenkoE. F.1962New York, NY, USAJohn Wiley & SonsBettsJ. T.2010Philadelphia, Pa, USASIAMConwayB.Spacecraft trajecory optimization using direct transcription and nonlinear programming2010Cambridge, UKCambridge University Press3778ÇimenT.BanksS. P.Global optimal feedback control for general nonlinear systems with nonquadratic performance criteria20045353273462-s2.0-744427152710.1016/j.sysconle.2004.05.008ÇimenT.BanksS. P.Nonlinear optimal tracking control with application to super-tankers for autopilot design20044011184518632-s2.0-444432076510.1016/j.automatica.2004.05.015MracekC. P.CloutierJ. R.Control designs for the nonlinear benchmark problem via the state-dependent Riccati equation method199884-54014332-s2.0-0032478466PearsonJ. D.Approximation methods in optimal control196213453469WernliA.CookG.Suboptimal control for the nonlinear quadratic regulator problem197511175842-s2.0-0016435369ParkC.GuiboutV.ScheeresD. J.Solving optimal continuous thrust rendezvous problems with generating functions20062923213312-s2.0-3364556612510.2514/1.14580ParkC.ScheeresD. J.Determination of optimal feedback terminal controllers for general boundary conditions using generating functions20064258698752-s2.0-3364515841410.1016/j.automatica.2006.01.015OwisA.TopputoF.Bernelli-ZazzeraF.Radially accelerated optimal feedback orbits in central gravity field with linear drag200910311162-s2.0-5784909446410.1007/s10569-008-9161-6TopputoF.OwisA. H.Bernelli-ZazzeraF.Analytical solution of optimal feedback control for radially accelerated orbits2008315135213592-s2.0-5334912923510.2514/1.33720