Partitioned Quasi-Newton Approximation for Direct Collocation Methods and Its Application to the Fuel-Optimal Control of a Diesel Engine

The numerical solution of optimal control problems by direct collocation is a widely used approach. Quasi-Newton approximations of the Hessian of the Lagrangian of the resulting nonlinear program are also common practice. We illustrate that the transcribed problem is separable with respect to the primal variables and propose the application of dense quasi-Newton updates to the small diagonal blocks of the Hessian. This approach resolves memory limitations, preserves the correct sparsity pattern, and generates more accurate curvature information.The effectiveness of this improvementwhen applied to engineering problems is demonstrated. As an example, the fuel-optimal and emission-constrained control of a turbocharged diesel engine is considered. First results indicate a significantly faster convergence of the nonlinear program solverwhen themethodproposed is used instead of the standard quasi-Newton approximation.


Introduction
Quasi-Newton (QN) methods have become very popular in the context of nonlinear optimisation.Above all, in nonlinear programs (NLPs) arising from direct transcription of optimal control problems (OCPs), the Hessian of the Lagrangian often cannot be derived analytically in a convenient way.Algorithmic differentiation may fail due to unsupported operations or black-box parts in the model functions.Furthermore, both approaches are computationally expensive if the model functions are complex and yield long expressions for the second derivatives.On the other hand, numerical approximation by finite differences is inaccurate and hardly improves the computational performance.
A common approach in these cases is to approximate the Hessian by QN updates using gradient information collected during the NLP iterations.However, if applied to realworld OCPs, several limitations arise.These problems often exhibit large dimensions; thus limited-memory versions of the approximations are applicable only.Since many updates may be necessary until a good approximation of the full Hessian is obtained, the approximation remains poor when using the most recent steps only.Furthermore, the favourable sparsity structure of the underlying discretisation scheme is generally not preserved.This fill-in drastically reduces the performance of solvers for the linear system of equations defining the step direction during each NLP iteration.
Partial separability, a concept introduced by [1], describes a structural property of a nonlinear optimisation problem.When present, this property allows for partitioned QN updates of the Hessian of the Lagrangian (or of the objective, in the unconstrained case).For unconstrained optimisation, this approach was proposed and its convergence properties were analysed in [2].Although only superlinear local convergence can be proven, a performance close to that obtained with exact Newton methods, which exhibit quadratic local convergence, was observed in practical experiments [3,4].
This brief paper presents how partitioned QN updates can be applied to the NLPs resulting from direct collocation of OCPs.The concept of direct collocation to solve OCPs has been widely used and analysed [5][6][7][8][9].We will show that the Lagrangian of the resulting NLPs is separable in the primal variables at each discretisation point.Due to this separability, its Hessian can be approximated by full-memory QN updates of the small diagonal blocks.This procedure increases the accuracy of the approximation, preserves the sparsity structure, and resolves memory limitations.The results are first derived for Radau collocation schemes, which include the right interval boundary as a collocation point.The adaptations to Gauss schemes, which have internal collocation points only, and to Lobatto schemes, which include both boundaries, are provided thereafter in condensed form.A consistent description of all three families of collocation is provided in [10].
The partitioned update is applied to a real-world engineering problem.The fuel consumption of a turbocharged diesel engine is minimised while the limits on the cumulative pollutant emissions need to be satisfied.This problem is cast in the form of an OCP and is transcribed by Radau collocation.The resulting NLP is solved by an exact and a quasi-Newton method.For the latter, the partitioned update achieves an increased convergence rate and a higher robustness with respect to a poor initialisation of the approximation as compared to the full QN update.Therefore, the findings for the unconstrained case seem to transfer to the NLPs resulting from direct collocation of OCPs.

Material and Methods
Consider the system of nonautonomous ordinary differential equations (ODEs) ẋ = f(x, u) on the interval [ 0 ,   ], with x, f ∈ R 1×  , u ∈ R 1×  .Radau collocation represents each element of the state vector x() as a polynomial, say of degree .The time derivative of this polynomial is then equated to the values of f at  collocation points The notation x  := x(  ) is adopted.The left boundary  0 =  0 is a noncollocated point in Radau collocation schemes.By introducing the appropriate matrices, this matrix equation in R ×  can be written in short as The rows of X and F correspond to one collocation point each.In turn, the columns of X and F represent one state variable and its corresponding ODE right-hand side at all collocation points.In the following, consider the notation in (2) as shorthand for stacking the transpose of the rows in one large column vector.The step length ℎ =   −  0 of the interval is assumed to be accounted for in the differentiation matrix D. Lagrange interpolation by barycentric weights is used to calculate D along with the vector of the quadrature weights w [11] where  ∈ R, g ∈ R   , and c ∈ R   .Simple bounds on x and u, equality constraints, and a fixed initial or end state can be included in the path constraints (3d).An objective term in Lagrange form is used, which is preferable over an equivalent Mayer term [12, Section 4.9].Direct transcription discretises all functions and integrals by consistently applying an integration scheme.Here,  = 1, . . .,  integration intervals [ −1 ,   ] are used with 0 =  0 <  1 < ⋅ ⋅ ⋅ <   = .The number of collocation points   can be different for each interval.Summing up the collocation points throughout all integration intervals results in a total of  = (,   ) discretisation points.The "linear index"  thereby corresponds to collocation node  in interval , The transcribed OCP reads The notation  • denotes all instances of variable   at any applicable index .The vector of the "global" quadrature weights W results from stacking the vectors of the quadrature weights w () of each interval  after removing the first element, which is zero.For the first interval, X (−1)  −1 ,• is the initial state x(0).
The Lagrangian of the NLP (5a)-(5d) is the sum of the objective (5a) and all constraints (5b), (5c), and (5d), which are weighted by the Lagrange multipliers .To clarify the notation, the Lagrange multipliers are grouped according to the problem structure.The   ⋅   multipliers for the discretised dynamic constraints on each integration interval  are denoted by  ()   , the   multipliers for the integral inequalities are stacked in   , and the   multipliers for the path constraints at each discretisation point  are gathered in the vector  , .

Separability in the Primal
Variables.The objective (5a), the integral inequalities (5c), and the path constraints (5d) are inherently separated with respect to time; that is, the individual terms are pairwise disjoint in x  and u  .We thus focus on the separability of the dynamic constraints (5b).For the derivation, we assume that , , and  are scalar.The extension to the vector-valued case is straightforward and will be provided subsequently.
Consider the term of the Lagrangian representing the discretised dynamic constraints (5b) for interval ,

L(𝑘)
:= This formulation constitutes a separation in the dual variables (the Lagrange multipliers).By collecting terms at each collocation point and accounting for the  0 terms in the previous interval, we obtain a separation in the primal variables, with Each term inside the round brackets in ( 7) is a collocation-point separated part of the Lagrangian which stems from the dynamic constraints.We denote these terms by L () , and introduce the notation The gradient with respect to the primal variables is The Hessian is simply (11)

Vector-Valued Case and Complete Element Lagrangian.
For multiple control inputs and state variables, the primal variables   at each collocation point become a vector in R 1×(  +  ) .Consistently, we define the gradient of a scalar function with respect to  as a row vector.Thus, the model Jacobian f/ is a matrix in R   ×(  +  ) , and the Hessian of each model-function element ,  2   / 2 , is a square matrix of size (  +   ).The multiplier  () , itself also becomes a vector in R   .All terms involving f, its Jacobian, or its Hessian therefore turn into sums.
The full element Lagrangian L ()  consists of the terms of the dynamic constraints L () , as derived above, plus the contributions of the objective, the integral inequalities, and the path constraints, The Lagrangian of the full NLP is obtained by summing these element Lagrangians, which are separated in the primal variables.Its Hessian thus is a perfect block-diagonal matrix with uniformly sized square blocks of size (  +   ).

Extension to Gauss and Lobatto
Collocation.Gauss collocation does not include the right interval boundary.Thus, the terms involving d 0 can be included locally in each interval, which simplifies the separation in the primal variables.However, the continuity constraint has to be introduced for each interval.Similarly to the procedure above, this constraint can be separated.The quadrature weights w () are stacked in W without any modification.Lobatto collocation includes both boundaries as collocation points.Thus, the matrix D in (1) and ( 2) has an additional "zeroth" row, and the argument of F becomes [x  0 , X  ]  in (2).The additional term − ()    (+1) ,0  ( ()  ) arises in L () , .Each element of W corresponding to the interval boundary between any two neighbouring intervals  and  + 1 is a sum of the two weights  ()   and  (+1) 0 .

Block-Diagonal Approximation of the Hessian.
The separability of the problem with respect to the primal variables allows a perfect exploitation of the problem sparsity.In fact, the Jacobian of the objective and the constraints, as well as the Hessian of the Lagrangian, can be constructed from the first and second partial derivatives of the nonlinear model functions , f, g, and c at each discretisation point [13].
We propose to also exploit the separability when calculating QN approximations of the Hessian.These iterative updates collect information about the curvature of the Lagrangian by observing the change of its gradient along the NLP iterations.Although they perform well in practice, they exhibit several drawbacks for large problems.
(I) Loss of sparsity.QN approximations generally do not preserve the sparsity pattern of the exact Hessian, which leads to low computational performance [12,Section 4.13].Enforcing the correct sparsity pattern results in QN schemes with poor performance [14, Section 7.3].(II) Storage versus accuracy.Due to the loss of sparsity, the approximated Hessian needs to be stored in dense format.To resolve possible memory limitations, "limited-memory" updates can be applied, which rely on a few recent gradient samples only.However, these methods provide less accuracy than their full-memory equivalents [14, Section 7.2].(III) Dimensionality versus sampling.When sampling the gradient of a function that lives in a high-dimensional space, many samples are required to construct an accurate approximation.In fact, to obtain an approximation that is valid along any direction, an entire spanning set needs to be sampled.Although QN methods require accurate second-order information only along the direction of the steps [15], the step direction may change fast in highly nonlinear problems such as the one considered here.In these cases, an exhaustive set of gradient samples would ensure a fast convergence, which conflicts with (II).
Using approximations of the small diagonal blocks, that is, exploiting the separability illustrated in Section 2.2, resolves these problems.
(I) The exact sparsity pattern of the Hessian is preserved.
(II) Only (  +   ) 2 numbers have to be stored, compared to  2 (  +   ) 2 for the full Hessian.(III) Since the dimension of each diagonal block is small, a good approximation is already obtained after few iterations of the Hessian update [3,4].
The partitioned QN update can be combined with the exploitation of the problem sparsity to reduce the number of the model evaluations required.In fact, when these two concepts are combined, the gradients of the model functions at each collocation point are sufficient to construct an accurate and sparsity-preserving approximation of the Hessian of the Lagrangian.

Implementation.
Any QN approximation operates with the differences between two consecutive iterates and the corresponding gradients of the Lagrangian.For constrained problems, The hat indicates the values at the current iteration, that is, the new data.In the following formulas, B denotes the QN approximation.Here, the damped BFGS update is used [14, Section 18.3], which reads This update scheme preserves positive definiteness, which is mandatory if a line-search globalisation is used.In a trustregion framework, indefinite approaches such as the safeguarded SR1 update [14, Section 6.2] could be advantageous since they can approximate the generally indefinite Hessian of the full or element Lagrangian more accurately.The Hessian block B  corresponding to the element Lagrangian (12) at collocation point  is approximated as follows.In the difference of the gradients, all linear terms cancel.Thus, (16) becomes Recall that the linear index  is defined such that  , =  () , .The QN update (17a)-(17c) is applied to each diagonal block B  individually, with s   = ω −   and y  given by (18).As initialisation, one of the approaches described in [14, Chapter 6] can be used.

Engineering Test Problem.
As a real-world engineering problem, we consider the minimisation of the fuel consumption of a turbocharged diesel engine.The statutory limits for the cumulative NO  and soot emissions are imposed as   = 2 integral inequality constraints.The   = 4 control inputs are the position of the variable-geometry turbine, the start of the main injection, the common-rail pressure, and the fuel mass injected per cylinder and combustion cycle.The model is described and its experimental validation is provided in [16].It features a dynamic mean-value model for the air path with   = 5 state variables, and physics-based models for the combustion and for the pollutant emissions.The resulting OCP is stated in [17,18].The desired load torque, the bounds on the actuator ranges, and mechanical and thermal limits are imposed as nonlinear and linear path constraints (3d).
The model evaluations are expensive.Therefore, QN updates are preferable over exact Newton methods to achieve a fast solution process.The main drawback is the slow local convergence rate of QN methods when applied to the large NLPs resulting from the consideration of long time horizons in the OCP [18].

Results and Discussion
The results presented here are generated using the NLP solver IPOPT [19,20] with the linear solver MUMPS [21] and the fill-reducing preordering implemented in METIS [22].Either the exact Hessian, calculated using central finite differences on the model functions, or a full or the partitioned QN update just described is supplied to the solver as user-defined Hessian.In all cases, the first derivatives are calculated by forward finite differences.
Radau collocation at flipped Legendre nodes is applied.These collocation points are the roots of the orthogonal Legendre polynomials and have to be computed numerically [23,Section 2.3].The resulting scheme, sometimes termed Radau IIA, exhibits a combination of advantageous properties [24], [25,Section 3.5].
Two test cases for the engineering problem outlined in Section 2.5 are considered.Case (a) is a mild driving pattern of 6 s duration which is discretised by first-order collocation with a uniform step length of 0.5 s.This discretisation results in  = 13 total collocation points and 117 NLP variables.Test case (b) considers a more demanding driving pattern of 58 s duration and uses third-order collocation with a step size of 0.8 s, resulting in  = 217 collocation points and 1,953 NLP variables.
The performance is assessed in terms of the number of iterations required to achieve the requested tolerance.Figure 1 shows the convergence behaviour of the exact-Newton method and of the two QN methods.Starting with iteration 5, the full Newton step is always accepted.Thus, the difference between the local quadratic convergence of the exact Newton method and the local superlinear convergence of the full BFGS update becomes obvious.The partitioned update performs substantially better than the full update.Moreover, the advantage becomes more pronounced when the size (longer time horizon) and the complexity (more transient driving profile, higher-order collocation) of the problem are increased from the simple test case (a) to the more meaningful case (b).
The Hessian approximation is initialised by a multiple of the identity matrix; B 0 = I.A factor of  = 0.05 is found to be a good choice for the problem at hand.Table 1 shows the number of iterations required as  is changed.The partitioned update is robust against a poor initialisation, whereas the full update requires a significant number of iterations to recover.This finding confirms that an accurate approximation is obtained in fewer iterations when the partitioned QN update is applied.

Conclusion
We illustrated the separability of the nonlinear program resulting from the application of direct collocation to an optimal control problem.Subsequently, we presented how this structure can be exploited to apply a partitioned quasi-Newton update to the Hessian of the Lagrangian.This sparsity-preserving update yields a more accurate approximation of the Hessian in fewer iterations and thus increases the convergence rate of the NLP solver.A more accurate approximation of the second derivatives from first order information is especially beneficial for highly nonlinear problems for which the exact second derivatives are expensive to evaluate.In fact, for the real-world engineering problem used as a test case here, symbolic or algorithmic differentiation is not an expedient option due to the complexity and the structure of the model.In this situation, using a quasi-Newton approximation based on first derivatives calculated by finite differences is a valuable alternative.The numerical tests presented in this paper indicate that a convergence rate close to that of an exact Newton method can be reclaimed by the application of a partitioned BFGS update.
A self-contained implementation of the partitioned update in the framework of an NLP solver itself could fully exploit the advantages of the method proposed.Furthermore, it should be assessed whether a trust-region globalisation is able to take advantage of an indefinite but possibly more accurate quasi-Newton approximation of the diagonal blocks of the Hessian of the Lagrangian.

Figure 1 :
Figure 1: Convergence behaviour of IPOPT on the two test cases.
. The latter may be used to approximate the definite integral of a function () as ∫

Table 1 :
Effect of the initialisation of the Hessian approximation on the number of NLP iterations until convergence, test case (a).