SYNTHESES OF DIFFERENTIAL GAMES AND PSEUDO-RICCATI EQUATIONS

For differential games of fixed duration of linear dynamical systems with nonquadratic payoff functionals, it is proved that the value and the optimal strategies as saddle point exist whenever the associated pseudo-Riccati equation has a regular solution P(t,x). Then the closed-loop optimal strategies are given by u(t) = −R−1B∗P(t,x(t)), v(t) = −S−1C∗P(t,x(t)). For differential game problems of Mayer type, the existence of a regular solution to the pseudo-Riccati equation is proved under certain assumptions and a constructive expression of that solution can be found by solving an algebraic equation with time parameter.


Introduction
The theory of differential games has been developed for several decades.The early results of differential games of a fixed duration can be found in [2,3,5], and the references therein.For linear-quadratic differential and integral games of distributed systems, the closed-loop syntheses have been established in various ways and cases in [6,8,10], and most generally in terms of causal synthesis [12,14].
In another relevant arena, the synthesis results for nonquadratic optimal control problems of linear dynamical systems have been obtained in [11,13], and some of the references therein.The key issue is how to find and implement nonlinear closed-loop optimal controls with nonquadratic criteria, which have been solved with the aid of a quasi-Riccati equation.
In this paper, we investigate nonquadratic differential games of a finite-dimensional linear system, with a remark that the generalization of the obtained results to infinite-dimensional distributed systems has no essential difficulty.
Here the primary objective is to explore whether the linear-nonquadratic differential game problem has a value and whether a saddle point of optimal strategies exists and can be found in terms of an explicit state feedback.
Since the players' sets of choices are not compact for such a differential game of fixed duration and (unlike the quadratic optimal control problems) its payoff functional has no convexity or concavity in general, the existences of a value, a saddle point, and most importantly a feedback implementation of optimal strategies in a constructive manner for this type of games are still open issues.We will tackle these issues with a new idea of pseudo-Riccati equation.
Let T > 0 be finite and fixed.Consider a linear system of differential equations: where the state function x(t) and initial data x 0 take values in R n , u(t) as the control of the player (I) takes value in R m and governed by a strategy (which is denoted simply by u), and v(t) as the control of the player (II) takes value in R k and governed by a strategy (which is denoted simply by v).The inner products in R n , R m , and R k will be denoted by •, • , which will be clear in the context.Define function spaces X = L 2 (0,T; R n ), X c = C([0,T]; R n ), U = L 2 (0,T; R m ), and V = L 2 (0,T; R k ).Assume that A, B, and C are, respectively, n × n, n × m, and n × k constant matrices.Any pair of strategies {u, v} ∈ U × V is called admissible strategies.
Set a nonquadratic payoff functional where M and Q are functions in C 2 (R n ), R is an m × m positive definite matrix, and S is a k × k negative definite matrix.The game problem is to find a pair of optimal strategies { û, v} ∈ U × V in the following sense of saddle point: for any admissible strategies u ∈ U and then the number given by (1.4) is denoted by J * (x 0 ) and called the value of this game.It is seen that whenever a pair of optimal strategies exists, the game has a value and indeed J * (x 0 ) = J(x 0 , û, v).

Yuncheng You 63
We denote by L(E 1 ,E 2 ) the space of bounded linear operators from Banach space E 1 to Banach space E 2 with the operator norm.If E 1 = E 2 , then this operator space is denoted by L(E 1 ).Any matrix with superscript * means its transposed matrix and any bounded linear operator with superscript * means its adjoint operator.Here, we mention the following relation between a Fréchet differentiable mapping f defined in a convex, open set of a Banach space and its Fréchet derivative D f , (1.5) All the concepts and results in nonlinear analysis used in this paper, such as gradient operator and proper mapping, can be found in [1,9].

Pseudo-Riccati equations
To study the solvability of the nonquadratic differential game problem described by (1.1), (1.2), and (1.3), we consider the following pseudo-Riccati equation associated with the game problem: with the terminal condition 2) The unknown of the pseudo-Riccati equation is a nonlinear mapping P(t, x) : [0,T] × R n → R n .We use P t and P x to denote the partial derivatives of P with respect to t and x, respectively.This pseudo-Riccati equation (2.1) with determining condition (2.2) will be denoted by (PRE).
has a unique global solution x ∈ X c for any given x 0 ∈ R n .
Suppose P is a regular solution of the (PRE).According to the definition of gradient operators (cf.[1]), for any t ∈ [0,T] there exist anti-derivatives Φ(t, x) of P(t, x), which are nonlinear functionals Φ(t, x) : [0,T] × R n → R such that (2.4) Since anti-derivatives may be different only up to a constant, we can set the following condition to fix the constant: Proof.From the expression of any state trajectory, it is seen that Let Ω be a closed and convex set defined by where x is a trajectory as above.According to Definition 2.1, P(t, x), P t (t, x), and P x (t, x) are all uniformly bounded in their norms over the convex, compact set [0,T]×Ω.By the mean value theorem, it follows that P(t, x) satisfies the uniform Lipschitz condition with respect to (t, x) ∈ [0,T] × Ω.These facts imply that Φ(•,x(•)) ∈ AC([0,T]; R) by the following straightforward estimation based on (2.7): (2.9) Yuncheng You 65 for any t, τ ∈ [0,T], where ξ is between t and τ, η is between x(t) and x(τ), and K is a constant only depending on {x 0 ,u,v,T}.Since we have shown earlier that x ∈ C lip ([0,T]; R), this implies that Φ(•,x(•)) ∈ AC([0,T]; R).The proof is completed.Now we prove a key lemma which addresses the connection of the (PRE) and the concerned differential game problem.
Lemma 2.3.Under the same assumptions as in Lemma 2.2, it holds that where in the penultimate equality we used the pseudo-Riccati equation (2.1) and the fact that P x (t, x) is a selfadjoint operator (symmetric matrix), P x (t, x) = P x (t, x) * , because P(t, x) is a gradient operator with respect to x (cf.[1, Theorem 2.5.2]).Then using the integration by parts to treat the term at the end of (2.11), we have and by (1.5) the inner integral in the last term of (2.12) can be rewritten as follows: (2.13) Substituting (2.12) with (2.13) into (2.11),we obtain

Closed-loop optimal strategies
Under the assumption that there is a regular solution of the pseudo-Riccati equation (2.1), (2.2), we can show the existence, uniqueness, and closed-loop expressions of a pair of optimal strategies as well as the existence of the value of this differential game.It is one of the main results of this work.
Theorem 3.1.Assume that there exists a regular solution P(t, x) of the (PRE).
Then, for any given x 0 ∈ R n , the differential game described by (1.1), (1.2), and (1.3) has a value and a unique pair of optimal strategies in the saddle-point sense.Moreover, the optimal strategies are given by the following closed-loop expressions, where x stands for the corresponding state trajectory of (1.1).
Proof.Let Φ(t, x) be the anti-derivative of P(t, x) such that (2.5) is satisfied.For any given x 0 and any admissible strategies {u, v}, from Lemma 2.3 we have Now integrating the expressions at the two ends of equality (3.5) in t over [0,T], since Φ(•,x(•)) is an absolutely continuous function, we end up with Yuncheng You 69 Note that (2.2) and (2.5) imply Φ(T, x) − Φ(T, 0) = M(x) − M(0) and Then, with (3.7) substituted, (3.6) can be written as where Note that (3.8) holds for any admissible strategies {u, v}.
According to Definition 2.1, the initial value problem (2.3) has a global solution x(•) ∈ X c over [0,T].Hence, the strategies given by the state feedback expressions in (3.1) are admissible strategies.And (3.8) shows that which depends on x 0 and T only.For any other admissible strategies {u, v}, (3.8) implies since R is positive definite and S is negative definite.This proves that there exists a unique pair of optimal strategies { û, v}, given by (3.1), and that the value of this game exists.In fact, the value is J * (x 0 ) = J(x 0 , û, v) = W(x 0 ,T).
Remark 3.2.In the above argument which goes from (3.8) to (3.11), it is important to clearly distinguish the following two concepts: one is the strategy u and v used by each player, and the other is the control function u(t) and v(t) of the time variable t ∈ [0,T].The strategy is a pattern like the feedback shown in (3.1) or any other admissible feedback.When a strategy is implemented, then u and v become concrete functions of time variable, which are usually called control functions for the players.When a pair of strategies {u, v} is different from the pair { û, v}, certainly the state trajectories x(t, x 0 , û, v), x(t, x 0 , û, v), and x(t, x 0 ,u, v) are different functions in general, but as long as the optimal strategy patterns are shown by (3.1), then we have in the derivation of (3.11).

Mayer problem: solution to the pseudo-Riccati equation
In this section, we assume that Q(x) ≡ 0. Then the payoff functional reduces to This type of differential games described by (1.1), (4.1), and (1.3) can be referred to as the Mayer problem, according to its counterpart in optimal control theory and in calculus of variations.Since a general problem (1.1), (1.2) can be reduced to a Mayer problem by augmenting the state variable with additional one dimension, it is without loss of generality to consider Mayer problems only.Associated with this Mayer problem, we will consider a nonlinear algebraic equation with one parameter τ, 0 ≤ τ ≤ T, as follows: where Here, (4.2) has an unknown y ∈ R n and a parameter τ ∈ [0,T].Equation (4.2) can also be written as with t = T − τ, 0 ≤ t ≤ T. Note that G(t) is a symmetric matrix for each t ∈ [0,T].However, unlike the optimal control problems, here G(t) is in general neither nonnegative, nor nonpositive due to the assumptions on R and S.

Yuncheng You 71
First consider a family of differential games defined over a time interval [τ, T], where 0 ≤ τ ≤ T is arbitrarily fixed.We use (DGP) τ to denote the differential game problem for the linear system with respect to the payoff functional in the sense of saddle point, that is, where A, B, C, M, R, and S satisfy the same assumptions made in Section 1.
We first investigate the solution of (4.2) and then find out its connection to a regular solution of the pseudo-Riccati equation (PRE).The entire process will go through several lemmas as follows.
Proof.Suppose that { û, v} is a pair of saddle-point strategies with respect to (DGP) τ .Then one has In other words, û is the minimizer of J τ (x 0 ,u, v) subject to constraint (4.5) with v = v, and v is the maximizer of J τ (x 0 , û, v) subject to constraint (4.5) with u = û.Thus one can apply the Pontryagin maximum principle (cf.[7]).Since the Hamiltonians in these two cases are, respectively, the co-state function ϕ associated with the optimal control û in (4.8) satisfies the following terminal value problem: and the co-state function ψ associated with the optimal control v in (4.9) satisfies the same terminal value problem (4.11), with the same value x(T) that corresponds to the control functions { û, v}.Therefore, one has By the maximum principle, the saddle-point strategies can be expressed as the following functions of the time variable t: Hence the state trajectory x corresponding to the saddle-point strategies { û, v} satisfies the following equation, for t ∈ [τ, T], Equation (4.15) shows that, since x 0 ∈ R n is arbitrary, for any given x = x 0 ∈ R n on the right-hand side of (4.2), there exists a solution y to (4.2), which is given by y = x T; x 0 ,τ = (simply denoted by) x(T), (4.16)where x(T; x 0 ,τ) represents the terminal value of the saddle-point state trajectory with the initial status x(τ) = x 0 .
It is, however, quite difficult to address the issue of the uniqueness of solutions to (4.2).Now we will exploit a homotopy-type result in nonlinear analysis for this purpose, based on a reasonable assumption below.For each τ ∈ [0,T], define a mapping K τ : R n → R n by where G(•) is given by (4.3).Actually, K τ (y) is the left-hand side of (4.2).Also let K(y, τ) = K τ (y).We make another assumption here.
Assumption 4.3.Assume that M is an analytic function on R n , and K is uniformly coercive in the sense that K(y, τ) → ∞ uniformly in τ, whenever y → ∞.
constant-valued function which contradicts Lemma 4.2.As a consequence, each point in K −1 τ (p) must be isolated.Therefore, there exists a sufficiently small open neighborhood N 0 (p) of p such that the component of K −1 τ (N 0 (p)) containing y 0 is an open neighborhood O(y 0 ) that has no intersection with any other components containing any other preimages (if any) of p.Moreover, as a consequence of this and the continuity, Third, for T ∈ [0,T], we have K T = I, the identity mapping on R n , which is certainly a homeomorphism.Thus, condition (c) of Lemma 4.4 is satisfied.
Therefore, we apply Lemma 4.4 to conclude that for every τ ∈ [0,T], the mapping K τ is a homeomorphism on R n .Finally, since M is analytic, it is clear that the mapping K τ is a C 1 mapping.It remains to show that K −1 τ is also a C 1 mapping.Indeed, due to (4.16) and the uniqueness of the solution to (4.2) just shown by the homeomorphism, we can assert that where x(t), τ ≤ t ≤ T, satisfies (4.14) or equivalently {x, ϕ} satisfies the following differential equations and the initial-terminal conditions: By the differentiability of a solution of ODEs with respect to the initial data, or directly by the successive approximation approach, we can show that for any τ ≤ t ≤ T, ∂x t; e −A(T−τ) p, τ ∂p exists and is continuous in p. (4.20) Corollary 4.7.Under Assumptions 4.1 and 4.3, for every τ ∈ [0,T] and every y ∈ R n , the derivative is a nonsingular matrix.
The inverse matrix of (4.21) will be denoted by [I + G(T − τ)M (y)] −1 .Corollary 4.7 is a direct consequence of Lemma 4.6 and the chain rule (cf.[1] or [9]).Thanks to Lemma 4.6 and the linear homeomorphism e −A(T−τ) , there exists a unique solution y of (4.2) for any given τ ∈ [0,T] and any given x ∈ R n .

Yuncheng You 75
This solution y can be written as a mapping This mapping H will be referred to as the solution mapping of (4.2).We are going to show the properties of the nonlinear mapping H, which will be used later in proving the main theorem of this section.
where H t and H x stand for the partial derivatives of H with respect to t and x, respectively.
Proof.Define a mapping where (t, y, x) Obviously, E is C 1 mapping and E y = DK τ (y) is invertibly due to Corollary 4.7, for any τ and y.Note that (4.2) is exactly E(t, y, x) = 0. Then by the implicit function theorem and its corollary (cf.[1]), the solution mapping H(T − t, x) of (4.2) (renaming τ = t) is a C 1 mapping with respect to (t, x).Its partial derivatives are given by Directly calculating E t and E x from (4.25) and (4.3) and then substituting them into (4.26) and (4.27), we obtain (4.23) and (4.24).
Before presenting the main result of this section, we need a lemma which provides some properties of the inverses of some specific types of operators.These properties will be used to prove the self-adjointness of concerned operators in the main result.Lemma 4.9.Let X and Y be Banach spaces and A 0 ∈ L(X), B 0 ∈ L(Y, X), C 0 ∈ L(X, Y ), and D 0 ∈ L(Y ).Then the following statements hold: (a) if A 0 , D 0 , and D 0 − C 0 A −1 0 B 0 are boundedly invertible, then A 0 − B 0 D −1 0 C 0 is boundedly invertible and its inverse operator is given by also boundedly invertible and the following equality holds: Proof.The proof is similar to the matrix case, so it is omitted.Now we can present and prove the main result of this section.
where the last equality follows from Lemma 4.9(b) and (4.29).Hence, P x (t, x) is selfadjoint and, consequently, P(t, •) is a gradient operator for every t ∈ [0,T].
Step 3. Finally we show the existence of a global solution x(•) ∈ X c to the initial value problem (2.3) over [0,T], for any given x 0 ∈ R n .Indeed, by Lemma 4.2, there exists a trajectory x(•) ∈ X c corresponding to a saddle-point pair of strategies of (DGP) τ=0 with any given initial state x 0 .Then by (4.14), the terminal value of this trajectory satisfies in the saddle-point sense.Here T > 0 is a fixed number.Suppose that a, b, c, and m are nonzero constants that R and S are positive constants.We are going to show that Corollary 4.11 can be applied to this problem, with some restrictions on these parameters in (5.2).Then we will show that the regular solution of the corresponding pseudo-Riccati equation and the closed-loop optimal strategies can be found constructively.
The only condition we impose on the parameters is Since the right-side function f (t, x) of (5.7) is C 1 mapping in t and in x, as shown by Lemma 4.8, the local Lipschitz condition is satisfied by this function f .Hence, for any initial status x(0) = x 0 ∈ R, (5.7) has a unique local solution x(t), t ∈ [0,t 1 ], for some 0 < t 1 ≤ T. It suffices to prove that this solution will not blow up, so that t 1 = T.
Let the maximal interval of the existence of this solution be [0,t ω ).Multiplying both sides of (5.(5.14)