NEAR-NASH EQUILIBRIUM STRATEGIES FOR LQ DIFFERENTIAL GAMES WITH INACCURATE STATE INFORMATION

ε-Nash equilibrium or “near equilibrium” for a linear quadratic cost game is considered. Due to inaccurate state information, the standard solution for feedback Nash equilibrium cannot be applied. Instead, an estimation of the players’ states is substituted into the optimal control strategies equation obtained for perfect state information. The magnitude of the ε in the ε-Nash equilibrium will depend on the quality of the estimation process. To illustrate this approach, a Luenberger-type observer is used in the numerical example to generate the players’ state estimates in a two-player non-zero-sum LQ differential game.


Introduction
Game theory deals with the development of suitable concepts to describe and understand conflict situations [1].Dynamic differential game theory can be viewed as a child of the parents game theory and optimal control theory [2].More advanced studies of this theory concern the noncooperative linear quadratic (LQ) games, where the dynamics of the players are covered by a linear ordinary differential equation (ODE) and the cost functionals are quadratic in state vectors as well as control actions [15].LQ games have many applications in the fields of engineering and economics (see [5,6,13]).All the publications mentioned above deal with perfect state information for all players during the game.
Most of the practical applications deal with inaccurate systems where only a part or a combination of the game's state space coordinates are known, [4, 7-12, 14, 16, 17].None of these publications considered the analysis of the ε-Nash equilibrium related to uncertainty and the quality of an applied observation process.For example, in [16], several problems of inaccurate state information in LQ games are presented, quadratic cost functionals are used, and white noise additively corrupts the output.It is still not clear how the estimation (measurements) errors affect the functionals for each player.
In [17], the case for a two-player stochastic differential game is considered, and the solution of this problem is presented.However, "how rapidly does the resulting optimal control converge to a limit, if it exists?"The question remains open.In [11], a partially observed system is considered, but the disturbances are assumed to be quadratically integrable on an infinite horizon or in other words, tend to zero.In [10], the inaccurate information case for two players is considered when the functionals are dependent on a parameter.At the start of the game only player 1 knows the exact value of this parameter and player 2 has only a (subjective) probability (fiducial) distribution for this parameter.
In this paper, the analysis of ε-Nash equilibrium in non-zero-sum N-person LQ differential games with inaccurate state information is considered when only estimations for state information are available.The outline of this paper is as follows.In Section 2, the players' model description and basic assumptions are presented.Then, in Section 3, the definition of the ε-Nash equilibrium is introduced.Next, Section 4 presents the perfect information case giving the direct proof of the standard feedback Nash equilibrium solution.Problem formulation of the main problem tackled in this paper is given in Section 5. Section 6 deals with near-equilibrium calculation.Section 7 presents the stationary cases for average cost functionals.Lastly, in Section 8, an example illustrates the proposed approach.Finally, the concluding remarks are given.

Players, model description, and basic assumptions
First, consider a game where the players' dynamics are covered by linear ordinary differential equations with a quadratic cost functional as an individual aim performance.The players' state dynamics are assumed to be subjected to an external exogenous input.So, an LQ dynamic game (LQDG) given by the following state dynamics equations is considered: Here j denotes the player ( a known exciting signal and x(t) ∈ R n is the state vector of the game, u j (t) ∈ R mj is the control action of each j-player, y(t) ∈ R p is the output of the game which can be measured at each time, C(t) ∈ R p×n , and ξ(t) ∈ R p is an output noise which is supposed to be bounded but unmeasured.The performance index of each i-player is characterized by the following penalty cost functional: where u i is the control action for i player and u ı are the control actions for the rest of the players ( ı is the counter-coalition collection of indices counteracting to the player with the index i).
) are the given matrices.The vector ξ(t) represents an external unknown output bounded (not obligatorily uniform) perturbation.
The definitions and assumptions below will be applied throughout this paper.(A1) The matrices A(t), B j (t) of the game as well as the exciting input d(t) are assumed to be known for every participant and d(t) is bounded, that is, for all t ≥ 0, (2.3) (A2) Suppose the class U admis of admissible control actions u i (x,t) (i = 1,...,N) contains all nonstationary feedback controls satisfying the uniform (on t) Lipschitz condition on x, that is, for any t ≥ 0 and any x,x ∈ R n , and which are quadratically integrable by t within the time interval [0, T] for any x, that is, for any (2.5)
The following simple fact seems to be useful for our next considerations.
where P i is a solution (if it exists) of the differential coupled Riccati equation (for the simplicity of the presentation, the time dependence of all functions considered below has been omitted) with and the "shifting" vector p i satisfies the following ODE: with M. Jimenez-Lizarraga and A. Poznyak 5 The cost functionals J i * T (u i * ,u ı * ) (i = 1,...,N) in the equilibrium are as follows: Notice that in this case the equilibrium state trajectory is governed by In this case it may be shown that ε = 0.The first result announced in this section can be found in [2] obtained by the dynamic programming method [3].Here the direct proof of the same result is presented based on an energetic function implementation.Below the intermediate relations obtained within this proof will be needed for the analysis of the same game for average cost functionals for both perfect and inaccurate information cases.

Direct proof of the main result.
(a) Energetic function.Consider the functions where P i (t) = P iT (t) ≥ 0 is an (n × n) matrix with continuously differentiable elements, p i (t) ∈ R n is a continuously differentiable vector, and m i (t) ∈ R is a continuously differentiable function.In view of (2.1), it follows that Integrating this equation implies 6 Near-Nash equilibrium for LQ differential games Adding and subtracting (4.12) Rearranging the expression above and making the square completion yields ), one may obtain M. Jimenez-Lizarraga and A. Poznyak 7 Now for J i * T (u i * ,u ı * ), this gives (4.15) If P i and p are the solutions of (4.3) and (4.5), respectively, and m satisfies 8 Near-Nash equilibrium for LQ differential games with β defined in (4.6), then (4.7) is obtained since by (4.16), adding and subtracting the terms and using (3.3), we derive Minimizing the right-hand side of the last identity by u i , the conclusion that the minimizing control action u * = (u 1 * ,...,u N * ) is It is easy to check directly that V * T (u * ,u * ) = 0, which implies So, by (3.3), the strategies (4.22) ensure a Nash equilibrium.The functional J i * T (u i * ,u ı * ) becomes as (4.7).This is the well-known standard result.The main result is proven.

Proposition and additional assumptions.
In the case of inaccurate information, we dealt with an immensurable noise ξ(t) corrupting the output y(t) of the game model (2.1).In addition, the output signals may differ from the state vector x(t), that is, C = I.Proposition 5.1.It seems to be natural to use a current state estimate x(t) (if it is available) instead of x(t) in the equilibrium control laws (4.2), that is, where u i * denotes the control action based on estimations x.
(A3) Consider the class of state estimates satisfying the following estimation quality: where Λ i x (i = 1,...,N) are given normalizing weights permitting to work with the state components of a different physical nature and z 2 Λ := z T Λz is a standard weighted Euclidian norm.

Problem formulation.
The main problem considered in this paper is as follows: for the given class of state estimates, fulfilling (5.2), demonstrate that the control action u i * (•), given by (5.1) , provides a ε-Nash equilibrium with ε T related to ε obs , that is, for any admissible u i ∈ U admis the following inequality holds: where J i * T denotes the functional based on the control actions u i .

Near-equilibrium calculation
Since the closed-loop system with optimal control always has a quadratically bounded state vector, there exists γ T such that Consider the following auxiliary functions and matrices: ) where B j (t), R j j−1 (t), and B j T (t) are the known matrices of the system, P j (t) is the solution of (4.3), and (i) x * (t) is the players' dynamics when perfect state information is available for each player and the players use Nash equilibrium solution (4.2), (ii) x(t) is the players' dynamics when each player substitutes state estimations x into the control (5.1), (iii) x(t | i) is the players' dynamics when perfect state information is available for each player and the players use Nash equilibrium solution except player i (the index i reflects this dependence), (iv) x(t | i) is the players' dynamics when every player substitutes state estimations x into the control except player i.
M. Jimenez-Lizarraga and A. Poznyak 11 Define several constants as ) Qji P j , Qji P j := P j B j R j j −1 R i j R j j −1 B j T P j , Qi : Here the matrices Λ i are weighted n × n matrices.Now a formulation of the main result can be made.Theorem 6.1 (the inaccurate information case).Suppose that for the LQ game given by (2.1) with the strategies given by (5.1), the following additional assumptions are fulfilled: (1) (A1), (A2), and (A3) hold, (2) there exist positive definite matrices Λ 1 and Λ i 2 (i = 1,...,N) such that the set of the differential Riccati equations have positive definite matrix solutions S and S i , with A and A i defined in (4.4) and (6.4), respectively.Then these control actions provide the ε-Nash equilibrium of the game, that is, for any admissible u i satisfying (2.4), the inequality 12 Near-Nash equilibrium for LQ differential games may be guaranteed, where (6.9) Proof.The Nash equilibrium definition (3.1) gives Adding and subtracting J i * T ( u i * , u ı * ) and J i T (u i , u ı * ) in both sides gives where u i * is defined by (5.1).By (3.3), it follows that In what follows, the terms ΔJ i T1 and ΔJ i T2 will be estimated.(A) ΔJ i T1 -term estimation.Under the perfect state information, the Nash equilibrium solution u * := ((u i * ) T ,(u ī * ) T ), given by (4.2), generates the following players' dynamics: The corresponding cost functional is as follows: x * T Qij P j x * + p j T L i j p j + 2M i j x * dt, The trajectory x, controlled by (5.1) using the state estimations x, is given by B j R j j −1 B j T P j x + p j + d (6.15) M. Jimenez-Lizarraga and A. Poznyak 13 and the corresponding cost function J i T ( u i * , u ı * ) is x T Qij P j x + p j T L i j p j + 2M i j x dt. (6.16) So, ΔJ i T1 can be computed as (using (6.6)) Adding and subtracting the term x * in the last identity and defining Using z, defined in (6.2), it follows that The last equation may be rewritten as For the joint energetic function V , defined in (6.2), the time derivative is where L 1 is a constant matrix.Select S as a solution to the differential Riccati equation (6.7) with A given by (4.4).Integrating V implies and hence M. Jimenez-Lizarraga and A. Poznyak 15 Using (6.6) yields Applying the Jensen inequality together with the last inequalities, it follows that (6.28) Using (6.25), which is also valid within the interval [T,T − Δ) for a Δ small enough, the inequality is derived.Dividing by Δ and tending it to zero implies 16 Near-Nash equilibrium for LQ differential games and hence that provides the bound for ΔJ i T1 .(B) ΔJ i T2 -term estimation.Under the perfect state information, when the players use equilibrium solution (except player i) the strategy u := ((u i ) T ,(u ı * ) T ) generates the following players' dynamics For simplicity, the dependence of x(t | i) on t and i has been omitted.The corresponding cost function is evaluated as follows: x T Qij P j x + p j T L i j p j + 2 Mij x dt. (6.33) When the players use the estimations x(t | i), i = 1,...,n, the trajectories are as follows: The corresponding cost functions are xT Qij P j x + p j T L i j p j + 2 Mij x dt.So, the term of the joint function related to ΔJ i T2 can be computed as follows: (6.37) Using Qi , defined in (6.6), adding and subtracting x in both sides of the last equality implies dt, (6.42) and hence Given the inequalities above and applying Jensen's inequality, it follows that x 2 dt.

.47)
Dividing by Δ, it follows that The theorem is proven.

Stationary systems with average cost functionals
In this section, linear systems with stationary parameters will be considered, namely, the matrices A, B i , C, Q i , and R i j are supposed to be constant as well as the exogenous input d.The results below deal with the so-called average cost functionals.
20 Near-Nash equilibrium for LQ differential games Theorem 7.1 (average cost functionals under perfect information).In the stationary case (when A, B i , C, Q i , R i j , and d are constant), the optimal strategies for the LQ game (2.1) are given by where P i and p i are the constant solution of the algebraic version (with the derivative equal to zero) of ( 4.3) and (4.5), respectively.The corresponding average cost functional J i * ∞ for each player is as follows: Proof.It directly follows from (4.7) by dividing by T and taking the upper limits sequentially on the right-and then on the left-hand sides.
Theorem 7.2 (the "near-Nash equilibrium" under inaccurate information).Assume that A, B i , C, Q i , R i j , and d, participating in the stationary version of LQ game (2.1), are constant and the conditions ( 1) and ( 2) of Theorem 6.1 hold with constant matrices S, S i (i = 1,...,N).Consider the "realizable version" of the equilibrium control, namely

.3)
Then these strategies provide a ε-Nash equilibrium to the game, that is, for any admissible strategy u i (i = 1,...,N), the following inequalities hold: where (all constants participating above are defined in (6.6) with the omitted operation sup because of the stationary nature of the considered expressions).
Proof.It follows directly from (6.25) and (6.43) by dividing by T and taking the upper limits sequentially on the right-and then on the left-hand sides.

Numerical example
Consider the LQ game governed by (2.1) and by the average cost functional (2.2) with The solution for (4.3) is the pair of matrices: For the given LQDG, the Luenberger observer structure is used: B j u j (t) + K 1 y(t) − y(t) , y(t) = C x(t) + ξ(t), K 1 ∈ R 2×2 . (8.3) Here K = I is taken.With these parameters and for T = 5000 s, the functionals of the players have the values shown in Table 8.1.The observation quality (5.2) is ε obs = 33.2.This gives ε = sum{ε 1 ,ε 2 } = 40.449and the players' estimated states (observer player 1) are depicted in Figure 8.1.The corresponding cost fuctionals are represented in Figure 8.2.The graph (see Figure 8.1) of the corresponding simulations shows that the players' actions tend to the near-equilibrium point under bounded (nonvanished) perturbations.

Conclusion
ε-Nash equilibrium or the "near equilibrium" for an LQ average cost game is considered in detail for perfect and inaccurate information cases.When perfect information on the states of the players is unavailable, the standard approach to feedback Nash equilibrium design cannot be applied.Instead, state estimates are used, if they are available, in the control action designed under complete state information.The dependence of ε-level in Nash equilibrium on the quality ε obs of the state estimation process is derived.It turns out to be proportional to the integral quality (and its root) of the state estimation process.A Luenberger-type observer is applied in the illustrating example to generate the current 22 Near-Nash equilibrium for LQ differential games  players' states in a two-player LQ differential non-zero-sum game.The numerical results show the workability of the proposed approach.