A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games

Altman and Shwartz [1] established a sufficient condition for the existence of a stationary Markovian constrained Nash equilibrium (CNE) policy pair in a general model of finite two-person zero-sum constrained stochastic games and Alvarez-Mena and Hernández-Lerma [2] extended the result for infinite state and action spaces. Even though a few computational studies exist for average-payoff models with additional simplifying assumptions (see, e.g., [3–5]), there seems to be no work providing a meaningful necessary condition for CNE or any general approximation scheme for CNE within the general discounting-cost model. This brief paper establishes a necessary condition that a CNE policy pair satisfies by a novel characterization of the set of all feasible policies of one player when the other player’s policy is fixed. This is done by identifying feasible mixed actions of one player at a current state when the expected total discounted constraint cost from each reachable next state is given by a value function defined over the state space. The necessary condition provides a general method of testing whether a given policy pair is a CNE policy pair and can induce a general approximation scheme for CNE.


Introduction
Altman and Shwartz [1] established a sufficient condition for the existence of a stationary Markovian constrained Nash equilibrium (CNE) policy pair in a general model of finite two-person zero-sum constrained stochastic games and Alvarez-Mena and Hernández-Lerma [2] extended the result for infinite state and action spaces.Even though a few computational studies exist for average-payoff models with additional simplifying assumptions (see, e.g., [3][4][5]), there seems to be no work providing a meaningful necessary condition for CNE or any general approximation scheme for CNE within the general discounting-cost model.
This brief paper establishes a necessary condition that a CNE policy pair satisfies by a novel characterization of the set of all feasible policies of one player when the other player's policy is fixed.This is done by identifying feasible mixed actions of one player at a current state when the expected total discounted constraint cost from each reachable next state is given by a value function defined over the state space.The necessary condition provides a general method of testing whether a given policy pair is a CNE policy pair and can induce a general approximation scheme for CNE.

Preliminaries
Consider a two-person zero-sum Markov game (MG) [6]  = (, , , , ), where  is a finite state set, and () and () are nonempty finite pure-action sets for the minimizer and the maximizer, respectively, at  in  with  = ⋃ ∈ () and  = ⋃ ∈ ().We denote the mixed action sets at  in  over () and () to be [()] and [()], respectively, with [] = ⋃ ∈ [()] and Once  in [()] and  in [()] are simultaneously taken at  in  by the minimizer and the maximizer, respectively (with the complete knowledge of the state but without knowing each other's current action being taken),  makes a transition to a next state  by the probability    given as Here () denotes the probability of selecting , similar to (), and ( | , , ) denotes the probability of moving from  to  by  and .Then the minimizer obtains an expected cost of (, , ) given by where (, , , ) in R is a payoff to the minimizer (the negative of this will be incurred to the maximizer).We define a stationary Markovian policy  of the minimizer as a function  :  → [] with () ∈ [()] for all  ∈  and denote Π to be the set of all possible such policies.A policy  is similarly defined for the maximizer with [], and we denote Φ to be the set of all possible such policies.Define the objective value of  in Π and  in Φ with an initial state  in  as where   is a random variable denoting the state at time  by following  and , and  ∈ (0, 1) is a fixed discounting factor.We let   (, ) = ∑ ∈ ()(, )() for a given initial state distribution  over .
The MG  is associated with constraint functions   ,  = 1, 2, defined over  and constraint-cost functions   ,  = 1, 2, where   (, , , ) ∈ R, ,  ∈ ,  ∈ (), and  ∈ (), is a constraint-cost paid by the minimizer if  = 1 and by the maximizer if  = 2. (For simplicity, we consider the model in [1,2] with only one side constraint for each player.) A policy  in Π is called -feasible with respect to  in Φ if the pair of  and  satisfies the constraint inequality of ∑ ∈ () 1 (, )() ≤ ∑ ∈ () 1 (), where  1 is defined with  1 ∈ (0, 1) such that The expected constraint cost  1 (, , ) is given by 1 (, , , )  ( | , , )  ()  () . ( , where  2 is defined with  2 and  2 ∈ (0, 1).We say that  is feasible with respect to  if, for all  in ,  1 (, )() ≤  1 (), and  is feasible with respect to  if, for all  in ,  2 (, )() ≤  2 ().Note that if  is feasible with respect to , then  is -feasible with respect to  for any .Let That is, Π  is the set of all -feasible policies in Π with respect to  (when the maximizer's policy is fixed by ); similarly Φ  is obtained when the minimizer's policy is fixed by .Then  has a constrained Nash equilibrium (CNE) if there exists a pair of  * in Π and  * in Φ such that  * is -feasible with respect to  * and  * is -feasible with respect to  * , and
The second statement can be proven with the similar symmetrical reasoning.
Some notable examples of choosing V in () for [()] ,V and [()] ,V are  1 ,  2 ,  1 (, ), and  2 (, ), for some pair of  and , and the zero function such that V() = 0 for all  in .In particular, if We now let If [()] ,V = 0 for some  in , then Π ,V = 0, and similarly Φ ,V = 0 if [()] ,V = 0 for some  in .The following theorem characterizes the set of one player's all feasible policies with respect to a fixed policy of the other player.
Theorem 2. Given  in Φ,  in Π is feasible with respect to  if and only if  is in ⋃ V∈() Π ,V .Furthermore, given  ∈ Π,  in Φ is feasible with respect to  if and only if  is in ⋃ V∈() Φ ,V .
Proof.We prove only the first part of the statement for the minimizer case.Suppose that  is feasible with respect to .Then by setting V =  1 (, ), (1 −  1 )min ∈ ( 1 () −  1 (, )()) ≥ 0. It follows that and this implies that there exists V in () such that () is in [()] ,V for all  in .
For the other direction, if  is in ⋃ V∈() Π ,V , then, for some V ∈ (), () is in [()] ,V for all  in .By Lemma 1,  is feasible with respect to .
Because a policy of one player which is feasible with respect to a policy of the other player is -feasible with respect to the policy of the other player, the following necessary condition satisfied by a CNE policy pair is immediate.Corollary 3. If a pair of  * in Π and  * in Φ is a CNE pair for a given , then the pair satisfies the following saddle-point inequality: Given a pair of  in Π and  in Φ, if we find some V,  in () such that the pair does not satisfy the saddle-point inequality over nonempty Π ,V and Φ , , then the pair is not a CNE pair.

An Example of Approximation Scheme for CNE
We now provide an example of a general approximation scheme for CNE based on the necessary condition.Basically, we fix some , V in () and try to find an equilibrium policy pair that satisfies the saddle-point inequalities over subsets of the feasible policy spaces induced with the selected , V and the pair, which is an approximate CNE pair.We start with selecting arbitrary V 1 , V 2 in () and define feasible mixed joint-action sets induced with V 1 , V 2 : for all  ∈ , where   denotes the mixed action with   () = 1 and   with   () = 1.
We then obtain a pair of nonempty Δ and similarly That is,  ,V () is a -constrained feasible pure-action set with V at  for the minimizer and similarly  ,V () is for the maximizer.For any subset   () ⊆ (),  ∈  with ⋃ ∈   () =   , we denote [  ()] to be the set of all possible probability distributions over () that have zero probabilities for the actions in ()\  ().If   () = 0, then [  ()] = 0.The notation of [  ()] for   () ⊆ () is similarly denoted.(Note that [ ,V ()] ⊆ [()] ,V ⊆ [()] for all  in  in general.)We further let If [  ()] = 0 for some  in , Π[  ] = 0, and similarly By construction, the following result is immediate.For all  ∈ , which further implies that, for any (π, φ) ] for all  in , π is feasible with respect to φ and φ is feasible with respect to π.
Consider now the unconstrained game , , ), where  and  are evaluated only at ] for all  in , and denote the set of all NE policy pairs of  V 1 ,V 2 to be NE( V 1 ,V 2 ).The above two results finally imply then that any (π, φ) ∈ NE( V 1 ,V 2 ) is a local CNE for .In other words, for any , π is -feasible with respect to φ and φ is -feasible with respect to π, and That is, the local CNE pair (π, φ) satisfies all of the conditions of CNE except that the saddle-point inequality is satisfied locally for the subsets of Π φ and Φ π.In fact, related solution concepts for games that are resistant to local deviations, called "local NE, " have been already established in economics (see, e.g., [7]). Projecting () turns out to be equivalent to obtaining a complete bipartite subgraph or biclique subgraph from a (bipartite) graph.The problem of finding a biclique in a given bipartite graph is well studied in the graph theory literature (see, e.g., [8]).Another issue is how we set V 1 and V 2 such that Δ V 1 ,V 2 () is nonempty for all  in .If there exists a pure-policy pair of  in Π and  in Φ such that, for all  in , ()() = 1 for some  ∈ () and ()() = 1 for some  ∈ () and one policy is feasible with respect to the other policy, then by setting V  =   (, ),  = 1, 2, we have ((), ()) ∈ = 0 for all  in .We can put the following feasibility assumption on  to assure the existence of such a pure-policy pair: for all  in , In other words, by this assumption there exists at least one pure-policy pair such that one policy is feasible with respect to the other policy.The existence comes from the fact that there exists a pure-policy pair (, ) that achieves inf   ,   1 (  ,   )() for all  in  in solving this Markov decision process problem [9] and also for the case of inf   ,   2 (  ,   )() for all  in .

Concluding Remark
For simplicity, the model of the present note deals with only one side constraint for each player.Generalization of this into multiple side constraints per player would make the definitions complex, but the ideas must be the same to the one side-constraint case.