The numerical solution of optimal control problems by direct collocation is a widely used approach. Quasi-Newton approximations of the Hessian of the Lagrangian of the resulting nonlinear program are also common practice. We illustrate that the transcribed problem is separable with respect to the primal variables and propose the application of dense quasi-Newton updates to the small diagonal blocks of the Hessian. This approach resolves memory limitations, preserves the correct sparsity pattern, and generates more accurate curvature information. The effectiveness of this improvement when applied to engineering problems is demonstrated. As an example, the fuel-optimal and emission-constrained control of a turbocharged diesel engine is considered. First results indicate a significantly faster convergence of the nonlinear program solver when the method proposed is used instead of the standard quasi-Newton approximation.
1. Introduction
Quasi-Newton (QN) methods have become very popular in the context of nonlinear optimisation. Above all, in nonlinear programs (NLPs) arising from direct transcription of optimal control problems (OCPs), the Hessian of the Lagrangian often cannot be derived analytically in a convenient way. Algorithmic differentiation may fail due to unsupported operations or black-box parts in the model functions. Furthermore, both approaches are computationally expensive if the model functions are complex and yield long expressions for the second derivatives. On the other hand, numerical approximation by finite differences is inaccurate and hardly improves the computational performance.
A common approach in these cases is to approximate the Hessian by QN updates using gradient information collected during the NLP iterations. However, if applied to real-world OCPs, several limitations arise. These problems often exhibit large dimensions; thus limited-memory versions of the approximations are applicable only. Since many updates may be necessary until a good approximation of the full Hessian is obtained, the approximation remains poor when using the most recent steps only. Furthermore, the favourable sparsity structure of the underlying discretisation scheme is generally not preserved. This fill-in drastically reduces the performance of solvers for the linear system of equations defining the step direction during each NLP iteration.
Partial separability, a concept introduced by [1], describes a structural property of a nonlinear optimisation problem. When present, this property allows for partitioned QN updates of the Hessian of the Lagrangian (or of the objective, in the unconstrained case). For unconstrained optimisation, this approach was proposed and its convergence properties were analysed in [2]. Although only superlinear local convergence can be proven, a performance close to that obtained with exact Newton methods, which exhibit quadratic local convergence, was observed in practical experiments [3, 4].
This brief paper presents how partitioned QN updates can be applied to the NLPs resulting from direct collocation of OCPs. The concept of direct collocation to solve OCPs has been widely used and analysed [5–9]. We will show that the Lagrangian of the resulting NLPs is separable in the primal variables at each discretisation point. Due to this separability, its Hessian can be approximated by full-memory QN updates of the small diagonal blocks. This procedure increases the accuracy of the approximation, preserves the sparsity structure, and resolves memory limitations. The results are first derived for Radau collocation schemes, which include the right interval boundary as a collocation point. The adaptations to Gauss schemes, which have internal collocation points only, and to Lobatto schemes, which include both boundaries, are provided thereafter in condensed form. A consistent description of all three families of collocation is provided in [10].
The partitioned update is applied to a real-world engineering problem. The fuel consumption of a turbocharged diesel engine is minimised while the limits on the cumulative pollutant emissions need to be satisfied. This problem is cast in the form of an OCP and is transcribed by Radau collocation. The resulting NLP is solved by an exact and a quasi-Newton method. For the latter, the partitioned update achieves an increased convergence rate and a higher robustness with respect to a poor initialisation of the approximation as compared to the full QN update. Therefore, the findings for the unconstrained case seem to transfer to the NLPs resulting from direct collocation of OCPs.
2. Material and Methods
Consider the system of nonautonomous ordinary differential equations (ODEs) x˙=f(x,u) on the interval [t0,tf], with x,f∈ℝ1×nx, u∈ℝ1×nu. Radau collocation represents each element of the state vector x(t) as a polynomial, say of degree N. The time derivative of this polynomial is then equated to the values of f at N collocation points t0<τ1<τ2<⋯<τN=tf,
(1)(x˙1x˙2⋮x˙N)≈[d0,D~]︸=:D·(x0x1x2⋮xN)=!(f(x1,u1)f(x2,u2)⋮f(xN,uN)).
The notation xj:=x(τj) is adopted. The left boundary τ0=t0 is a noncollocated point in Radau collocation schemes. By introducing the appropriate matrices, this matrix equation in ℝN×nx can be written in short as
(2)D[x0X]=F(X,U).
The rows of X and F correspond to one collocation point each. In turn, the columns of X and F represent one state variable and its corresponding ODE right-hand side at all collocation points. In the following, consider the notation in (2) as shorthand for stacking the transpose of the rows in one large column vector.
The step length h=tf-t0 of the interval is assumed to be accounted for in the differentiation matrix D. Lagrange interpolation by barycentric weights is used to calculate D along with the vector of the quadrature weights w [11]. The latter may be used to approximate the definite integral of a function g(t) as ∫t0tfg(t)dt≈∑j=1Nwjg(τj).
2.1. Direct Collocation of Optimal Control Problems
We consider an OCP of the form(3a)minx(·),u(·)∫0TL(x(t),u(t))dt(3b)s.t..x˙(t)-f(x(t),u(t))=0,∀t∈[0,T],(3c)s.t..∫0Tg(x(t),u(t))dt≤0,(3d)s.t..c(x(t),u(t),t)≤0,∀t∈[0,T],where L∈ℝ, g∈ℝng, and c∈ℝnc. Simple bounds on x and u, equality constraints, and a fixed initial or end state can be included in the path constraints (3d). An objective term in Lagrange form is used, which is preferable over an equivalent Mayer term [12, Section 4.9].
Direct transcription discretises all functions and integrals by consistently applying an integration scheme. Here, k=1,…,m integration intervals [tk-1,tk] are used with 0=t0<t1<⋯<tm=T. The number of collocation points Nk can be different for each interval. Summing up the collocation points throughout all integration intervals results in a total of M=l(m,Nm) discretisation points. The “linear index” l thereby corresponds to collocation node i in interval k,
(4)l:=l(k,i)=i+∑α=1k-1Nα.
The transcribed OCP reads(5a)minx•,u•∑l=1MWl·L(xl,ul)(5b)s.t.D(k)[XNk-1,•(k-1)X(k)]-F(X(k),U(k))=0,k=1,…,m,(5c)s.t.∑l=1MWl·g(xl,ul)≤0,(5d)s.t.cl(xl,ul)≤0,l=1,…,M.The notation x• denotes all instances of variable xl at any applicable index l. The vector of the “global” quadrature weights W results from stacking the vectors of the quadrature weights w(k) of each interval k after removing the first element, which is zero. For the first interval, XNk-1,•(k-1) is the initial state x(0).
The Lagrangian of the NLP (5a)–(5d) is the sum of the objective (5a) and all constraints (5b), (5c), and (5d), which are weighted by the Lagrange multipliers λ. To clarify the notation, the Lagrange multipliers are grouped according to the problem structure. The nx·Nk multipliers for the discretised dynamic constraints on each integration interval k are denoted by λd(k), the ng multipliers for the integral inequalities are stacked in λg, and the nc multipliers for the path constraints at each discretisation point l are gathered in the vector λc,l.
2.2. Separability in the Primal Variables
The objective (5a), the integral inequalities (5c), and the path constraints (5d) are inherently separated with respect to time; that is, the individual terms are pairwise disjoint in xl and ul. We thus focus on the separability of the dynamic constraints (5b). For the derivation, we assume that f, x, and u are scalar. The extension to the vector-valued case is straightforward and will be provided subsequently.
Consider the term of the Lagrangian representing the discretised dynamic constraints (5b) for interval k,
(6)ℒ~d(k):=∑i=1Nkλd,i(k)(d0,i(k)xNk-1(k-1)+∑j=1NkD~ij(k)xj(k)-f(xi(k),ui(k))).
This formulation constitutes a separation in the dual variables (the Lagrange multipliers). By collecting terms at each collocation point and accounting for the d0 terms in the previous interval, we obtain a separation in the primal variables,
(7)ℒd(k)=∑i=1Nk([∑j=1NkD~ji(k)λd,j(k)+δi(k)∑j=1Nk+1d0,j(k+1)λd,j(k+1)]xi(k)wwwi-λd,i(k)f(xi(k),ui(k))∑=1Nk),
with
(8)δi(k)={1,ifi=Nk,k≠m,0,otherwise.
Each term inside the round brackets in (7) is a collocation-point separated part of the Lagrangian which stems from the dynamic constraints. We denote these terms by ℒd,i(k) and introduce the notation
(9)ω:=(x,u),ωi(k):=(xi(k),ui(k)).
The gradient with respect to the primal variables is
(10)∇ωℒd,i(k)=(λd(k)TD~•i(k)+δi(k)λd(k+1),Td0(k+1),0)-λd,i(k)∂f(ωi(k))∂ω.
The Hessian is simply
(11)∇ω2ℒd,i(k)=-λd,i(k)∂2f(ωi(k))∂ω2.
2.2.1. Vector-Valued Case and Complete Element Lagrangian
For multiple control inputs and state variables, the primal variables ωl at each collocation point become a vector in ℝ1×(nu+nx). Consistently, we define the gradient of a scalar function with respect to ω as a row vector. Thus, the model Jacobian ∂f/∂ω is a matrix in ℝnx×(nu+nx), and the Hessian of each model-function element s, ∂2fs/∂ω2, is a square matrix of size (nu+nx). The multiplier λd,i(k) itself also becomes a vector in ℝnx. All terms involving f, its Jacobian, or its Hessian therefore turn into sums.
The full element Lagrangian ℒi(k) consists of the terms of the dynamic constraints ℒd,i(k) as derived above, plus the contributions of the objective, the integral inequalities, and the path constraints,
(12)ℒi(k)=Wl·L(ωl)+xi(k)·(∑j=1NkD~ji(k)λd,j(k)+δi(k)∑j=1Nk+1d0,j(k+1)λd,j(k+1))-f(ωi(k))λd,i(k)+Wl·λgTg(ωl)+λc,lTcl(ωl).
The Lagrangian of the full NLP is obtained by summing these element Lagrangians, which are separated in the primal variables. Its Hessian thus is a perfect block-diagonal matrix with uniformly sized square blocks of size (nu+nx).
2.2.2. Extension to Gauss and Lobatto Collocation
Gauss collocation does not include the right interval boundary. Thus, the terms involving d0 can be included locally in each interval, which simplifies the separation in the primal variables. However, the continuity constraint
(13)x0(k+1)=x0(k)+w(k)TF(X(k),U(k))
has to be introduced for each interval. Similarly to the procedure above, this constraint can be separated. The quadrature weights w(k) are stacked in W without any modification.
Lobatto collocation includes both boundaries as collocation points. Thus, the matrix D in (1) and (2) has an additional “zeroth” row, and the argument of F becomes [x0T,XT]T in (2). The additional term
(14)-δi(k)λd,0(k+1)f(ωi(k))
arises in ℒd,i(k). Each element of W corresponding to the interval boundary between any two neighbouring intervals k and k+1 is a sum of the two weights wNk(k) and w0(k+1).
2.3. Block-Diagonal Approximation of the Hessian
The separability of the problem with respect to the primal variables allows a perfect exploitation of the problem sparsity. In fact, the Jacobian of the objective and the constraints, as well as the Hessian of the Lagrangian, can be constructed from the first and second partial derivatives of the nonlinear model functions L, f, g, and c at each discretisation point [13].
We propose to also exploit the separability when calculating QN approximations of the Hessian. These iterative updates collect information about the curvature of the Lagrangian by observing the change of its gradient along the NLP iterations. Although they perform well in practice, they exhibit several drawbacks for large problems.
Loss of sparsity. QN approximations generally do not preserve the sparsity pattern of the exact Hessian, which leads to low computational performance [12, Section 4.13]. Enforcing the correct sparsity pattern results in QN schemes with poor performance [14, Section 7.3].
Storage versus accuracy. Due to the loss of sparsity, the approximated Hessian needs to be stored in dense format. To resolve possible memory limitations, “limited-memory” updates can be applied, which rely on a few recent gradient samples only. However, these methods provide less accuracy than their full-memory equivalents [14, Section 7.2].
Dimensionality versus sampling. When sampling the gradient of a function that lives in a high-dimensional space, many samples are required to construct an accurate approximation. In fact, to obtain an approximation that is valid along any direction, an entire spanning set needs to be sampled. Although QN methods require accurate second-order information only along the direction of the steps [15], the step direction may change fast in highly nonlinear problems such as the one considered here. In these cases, an exhaustive set of gradient samples would ensure a fast convergence, which conflicts with (II).
Using approximations of the small diagonal blocks, that is, exploiting the separability illustrated in Section 2.2, resolves these problems.
The exact sparsity pattern of the Hessian is preserved.
Only M(nx+nu)2 numbers have to be stored, compared to M2(nx+nu)2 for the full Hessian.
Since the dimension of each diagonal block is small, a good approximation is already obtained after few iterations of the Hessian update [3, 4].
The partitioned QN update can be combined with the exploitation of the problem sparsity to reduce the number of the model evaluations required. In fact, when these two concepts are combined, the gradients of the model functions at each collocation point are sufficient to construct an accurate and sparsity-preserving approximation of the Hessian of the Lagrangian.
2.4. Implementation
Any QN approximation operates with the differences between two consecutive iterates and the corresponding gradients of the Lagrangian. For constrained problems,
(15)sT:=ω^-ω,(16)yT:=∇ωℒ(ω^,λ^)-∇ωℒ(ω,λ^).
The hat indicates the values at the current iteration, that is, the new data. In the following formulas, B denotes the QN approximation. Here, the damped BFGS update is used [14, Section 18.3], which reads
(17a)θ={1,ifsTy≥0.2sTBs0.8sTBssTBs-sTy,otherwise,(17b)r=θy+(1-θ)Bs,(17c)B^=B-BssTBsTBs+rrTsTr.This update scheme preserves positive definiteness, which is mandatory if a line-search globalisation is used. In a trust-region framework, indefinite approaches such as the safe-guarded SR1 update [14, Section 6.2] could be advantageous since they can approximate the generally indefinite Hessian of the full or element Lagrangian more accurately.
The Hessian block Bl corresponding to the element Lagrangian (12) at collocation point l is approximated as follows. In the difference of the gradients, all linear terms cancel. Thus, (16) becomes
(18)ylT=∇ωℒl(ω^,λ^)-∇ωℒl(ω,λ^)=Wl·(∂L(ω^l)∂ω-∂L(ωl)∂ω)w+λd,lT·(∂f(ωl)∂ω-∂f(ω^l)∂ω)w+Wl·λgT·(∂g(ω^l)∂ω-∂g(ωl)∂ω)w+λc,lT·(∂cl(ω^l)∂ω-∂cl(ωl)∂ω).
Recall that the linear index l is defined such that λd,l=λd,i(k). The QN update (17a)–(17c) is applied to each diagonal block Bl individually, with slT=ω^l-ωl and yl given by (18). As initialisation, one of the approaches described in [14, Chapter 6] can be used.
2.5. Engineering Test Problem
As a real-world engineering problem, we consider the minimisation of the fuel consumption of a turbocharged diesel engine. The statutory limits for the cumulative NOx and soot emissions are imposed as ng=2 integral inequality constraints. The nu=4 control inputs are the position of the variable-geometry turbine, the start of the main injection, the common-rail pressure, and the fuel mass injected per cylinder and combustion cycle. The model is described and its experimental validation is provided in [16]. It features a dynamic mean-value model for the air path with nx=5 state variables, and physics-based models for the combustion and for the pollutant emissions. The resulting OCP is stated in [17, 18]. The desired load torque, the bounds on the actuator ranges, and mechanical and thermal limits are imposed as nonlinear and linear path constraints (3d).
The model evaluations are expensive. Therefore, QN updates are preferable over exact Newton methods to achieve a fast solution process. The main drawback is the slow local convergence rate of QN methods when applied to the large NLPs resulting from the consideration of long time horizons in the OCP [18].
3. Results and Discussion
The results presented here are generated using the NLP solver IPOPT [19, 20] with the linear solver MUMPS [21] and the fill-reducing preordering implemented in METIS [22]. Either the exact Hessian, calculated using central finite differences on the model functions, or a full or the partitioned QN update just described is supplied to the solver as user-defined Hessian. In all cases, the first derivatives are calculated by forward finite differences.
Radau collocation at flipped Legendre nodes is applied. These collocation points are the roots of the orthogonal Legendre polynomials and have to be computed numerically [23, Section 2.3]. The resulting scheme, sometimes termed Radau IIA, exhibits a combination of advantageous properties [24], [25, Section 3.5].
Two test cases for the engineering problem outlined in Section 2.5 are considered. Case (a) is a mild driving pattern of 6 s duration which is discretised by first-order collocation with a uniform step length of 0.5 s. This discretisation results in M=13 total collocation points and 117 NLP variables. Test case (b) considers a more demanding driving pattern of 58 s duration and uses third-order collocation with a step size of 0.8 s, resulting in M=217 collocation points and 1,953 NLP variables.
The performance is assessed in terms of the number of iterations required to achieve the requested tolerance. Figure 1 shows the convergence behaviour of the exact-Newton method and of the two QN methods. Starting with iteration 5, the full Newton step is always accepted. Thus, the difference between the local quadratic convergence of the exact Newton method and the local superlinear convergence of the full BFGS update becomes obvious. The partitioned update performs substantially better than the full update. Moreover, the advantage becomes more pronounced when the size (longer time horizon) and the complexity (more transient driving profile, higher-order collocation) of the problem are increased from the simple test case (a) to the more meaningful case (b).
Convergence behaviour of IPOPT on the two test cases.
The Hessian approximation is initialised by a multiple of the identity matrix; B0=βI. A factor of β=0.05 is found to be a good choice for the problem at hand. Table 1 shows the number of iterations required as β is changed. The partitioned update is robust against a poor initialisation, whereas the full update requires a significant number of iterations to recover. This finding confirms that an accurate approximation is obtained in fewer iterations when the partitioned QN update is applied.
Effect of the initialisation of the Hessian approximation on the number of NLP iterations until convergence, test case (a).
Method
β=0.01
β=0.05
β=0.1
Full BFGS
36
28
38
Partitioned BFGS
22
20
21
4. Conclusion
We illustrated the separability of the nonlinear program resulting from the application of direct collocation to an optimal control problem. Subsequently, we presented how this structure can be exploited to apply a partitioned quasi-Newton update to the Hessian of the Lagrangian. This sparsity-preserving update yields a more accurate approximation of the Hessian in fewer iterations and thus increases the convergence rate of the NLP solver.
A more accurate approximation of the second derivatives from first order information is especially beneficial for highly nonlinear problems for which the exact second derivatives are expensive to evaluate. In fact, for the real-world engineering problem used as a test case here, symbolic or algorithmic differentiation is not an expedient option due to the complexity and the structure of the model. In this situation, using a quasi-Newton approximation based on first derivatives calculated by finite differences is a valuable alternative. The numerical tests presented in this paper indicate that a convergence rate close to that of an exact Newton method can be reclaimed by the application of a partitioned BFGS update.
A self-contained implementation of the partitioned update in the framework of an NLP solver itself could fully exploit the advantages of the method proposed. Furthermore, it should be assessed whether a trust-region globalisation is able to take advantage of an indefinite but possibly more accurate quasi-Newton approximation of the diagonal blocks of the Hessian of the Lagrangian.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This work was partially funded by the Swiss Innovation Promotion Agency CTI under Grant no. 10808.1 PFIW-IW.
GriewankA.TointA.PowellM.On the unconstrained optimization of partially separable functionsGriewankA.TointP. L.Local convergence analysis for partitioned quasi-Newton updatesLiuD. C.NocedalJ.On the limited memory BFGS method for large scale optimizationNocedalJ.Large scale unconstrained optimizationBieglerL. T.Solution of dynamic optimization problems by successive quadratic programming and orthogonal collocationTieuD.CluettW. R.PenlidisA.A comparison of collocation methods for solving dynamic optimization problemsHermanA. L.ConwayB. A.Direct optimization using collocation based on high-order Gauss-Lobatto quadrature rulesBensonD. A.HuntingtonG. T.ThorvaldsenT. P.RaoA. V.Direct trajectory optimization and costate estimation via an orthogonal collocation methodKameswaranS.BieglerL. T.Convergence rates for direct transcription of optimal control problems using collocation at Radau pointsGargD.PattersonM.HagerW. W.RaoA. V.BensonD. A.HuntingtonG. T.A unified framework for the numerical solution of optimal control problems using pseudospectral methodsBerrutJ.-P.TrefethenL. N.Barycentric Lagrange interpolationBettsJ. T.PattersonM. A.RaoA. V.Exploiting sparsity in direct collocation pseudospectral methods for solving optimal control
problemsNocedalJ.WrightS. J.GillP. E.WongE.LeeJ.LeyfferS.Sequential quadratic programming methodsAsprionJ.ChinellatoO.GuzzellaL.Optimisation-oriented modelling of the NOx emissions of a diesel engineAsprionJ.OnderC. H.GuzzellaL.Including drag phases in numerical optimal control of diesel enginesProceedings of the 7th IFAC Symposium on Advances in Automotive Control2013Tokyo, Japan48949410.3182/20130904-4-JP-2042.00130AsprionJ.ChinellatoO.GuzzellaL.Optimal control of diesel engines: numerical methods, applications, and experimental validationWächterA.BieglerL. T.On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programmingIPOPT homepage2013, http://www.coin-or.org/IpoptMUMPS
homepage2013, http://mumps.enseeiht.frMETIS—serial graph partitioning and fill-reducing matrix ordering2013, http://glaros.dtc.umn.edu/gkhome/metis/metis/overviewCanutoC.HussainiM. Y.QuarteroniA.ZangT. A.HairerE.WannerG.Stiff differential equations solved by Radau methodsButcherJ. C.