2. Preliminaries
Consider a two-person zero-sum Markov game (MG) [6] M=(X,A,B,P,C), where X is a finite state set, and A(x) and B(x) are nonempty finite pure-action sets for the minimizer and the maximizer, respectively, at x in X with A=⋃x∈XA(x) and B=⋃x∈XB(x). We denote the mixed action sets at x in X over A(x) and B(x) to be D[A(x)] and D[B(x)], respectively, with D[A]=⋃x∈XD[A(x)] and D[B]=⋃x∈XD[B(x)].

Once f in D[A(x)] and g in D[B(x)] are simultaneously taken at x in X by the minimizer and the maximizer, respectively (with the complete knowledge of the state but without knowing each other's current action being taken), x makes a transition to a next state y by the probability Pxyfg given as
(1)Pxyfg=∑a∈A(x) ∑b∈B(x)f(a)g(b)p(y∣x,a,b).
Here f(a) denotes the probability of selecting a, similar to g(b), and p(y∣x,a,b) denotes the probability of moving from x to y by a and b. Then the minimizer obtains an expected cost of C(x,f,g) given by
(2)C(x,f,g) =∑y∈X ∑a∈A(x) ∑b∈B(x)c(x,y,a,b)p(y∣x,a,b)f(a)g(b),
where c(x,y,a,b) in ℝ is a payoff to the minimizer (the negative of this will be incurred to the maximizer).

We define a stationary Markovian policy π of the minimizer as a function π:X→D[A] with π(x)∈D[A(x)] for all x∈X and denote Π to be the set of all possible such policies. A policy ϕ is similarly defined for the maximizer with D[B], and we denote Φ to be the set of all possible such policies. Define the objective value of π in Π and ϕ in Φ with an initial state x in X as
(3)V(π,ϕ)(x)=E[∑t=0∞γtC(Xt,π(Xt),ϕ(Xt))∣X0=x],
where Xt is a random variable denoting the state at time t by following π and ϕ, and γ∈(0,1) is a fixed discounting factor. We let Vδ(π,ϕ)=∑x∈Xδ(x)V(π,ϕ)(x) for a given initial state distribution δ over X.

The MG M is associated with constraint functions κi, i=1,2, defined over X and constraint-cost functions di, i=1,2, where di(x,y,a,b)∈ℝ, x,y∈X, a∈A(x), and b∈B(x), is a constraint-cost paid by the minimizer if i=1 and by the maximizer if i=2. (For simplicity, we consider the model in [1, 2] with only one side constraint for each player.)

A policy π in Π is called δ-feasible with respect to ϕ in Φ if the pair of π and ϕ satisfies the constraint inequality of ∑x∈Xδ(x)J1(π,ϕ)(x)≤∑x∈Xδ(x)κ1(x), where J1 is defined with β1∈(0,1) such that
(4)J1(π,ϕ)(x) =E[∑t=0∞β1tD1(Xt,π(Xt),ϕ(Xt))∣X0=x], x∈X.
The expected constraint cost D1(x,f,g) is given by
(5)D1(x,f,g) =∑y∈X ∑a∈A(x) ∑b∈B(x)d1(x,y,a,b)p(y∣x,a,b)f(a)g(b).
Similarly, ϕ in Φ is δ-feasible with respect to π in Π if ∑x∈Xδ(x)J2(π,ϕ)(x)≤∑x∈Xδ(x)κ2(x), where J2 is defined with D2 and β2∈(0,1). We say that π is feasible with respect to ϕ if, for all x in X, J1(π,ϕ)(x)≤κ1(x), and ϕ is feasible with respect to π if, for all x in X, J2(π,ϕ)(x)≤κ2(x). Note that if π is feasible with respect to ϕ, then π is δ-feasible with respect to ϕ for any δ.

Let
(6)Πϕ={π∈Π:∑x∈Xδ(x)J1(π,ϕ)(x)≤∑x∈Xδ(x)κ1(x)},hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhϕ∈Φ,Φπ={ϕ∈Φ:∑x∈Xδ(x)J2(π,ϕ)(x)≤∑x∈Xδ(x)κ2(x)},hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhπ∈Π.
That is, Πϕ is the set of all δ-feasible policies in Π with respect to ϕ (when the maximizer's policy is fixed by ϕ); similarly Φπ is obtained when the minimizer's policy is fixed by π. Then M has a constrained Nash equilibrium (CNE) if there exists a pair of π* in Π and ϕ* in Φ such that π* is δ-feasible with respect to ϕ* and ϕ* is δ-feasible with respect to π*, and
(7)Vδ(π*,ϕ)≤Vδ(π*,ϕ*)≤Vδ(π,ϕ*) ∀π∈Πϕ*, ∀ϕ∈Φπ*.

3. A Necessary Condition for CNE
Let B(X) be the set of all real-valued functions on X. Given ϕ in Φ and v in B(X), define ϕ-constrained feasible mixed action set with v for the minimizer: for all x in X,
(8)D[A(x)]ϕ,v={∑y∈Xf∈D[A(x)] :D1(x,f,ϕ(x))+β1∑y∈XPxyfϕ(x)v(y) ≤v(x)+(1-β1)minx′∈X(κ1(x′)-v(x′))∑y∈X}.
We further define π-constrained feasible mixed action set with v for the maximizer for π in Π and v in B(X): for all x in X,
(9)D[B(x)]π,v={∑y∈Xg∈D[B(x)] :D2(x,π(x),g)+β2∑y∈XPxyπ(x)gv(y) ≤v(x)+(1-β2)minx′∈X(κ2(x′)-v(x′))∑y∈X}.

Lemma 1.
Given ϕ in Φ and v in B(X), suppose that D[A(x)]ϕ,v≠∅ for all x in X. Then any π in Π such that π(x)∈D[A(x)]ϕ,v for all x in X is feasible with respect to ϕ. Similarly, given π in Π and v in B(X), if D[B(x)]π,v≠∅ for all x in X, then any ϕ in Π such that ϕ(x)∈D[B(x)]π,v for all x in X is feasible with respect to π.

Proof.
Consider an operator Tπ,ϕ:B(X)→B(X) given as
(10)Tπ,ϕ(v)(x) =D1(x,π(x),ϕ(x))+β1∑y∈XPxyπ(x)ϕ(x)v(y),hhhhhhhhhhhhhhhhhhhhhh∀x∈X,∀v∈B(X).
Then we have that for τ in ℝ and u in B(X) if Tπ,ϕ(u)(x)≤u(x)+τ for all x in X, then, for all x∈X, J1(π,ϕ)(x)≤u(x)+τ(1-β1)-1 and limn→∞(Tπ,ϕ(u))n(x)=J1(π,ϕ)(x) [6].

Because π(x) is in D[A(x)]ϕ,v for all x in X, Tπ,ϕ(v)(x)≤v(x)+(1-β1)minx′∈X(κ1(x′)-v(x′)) for all x in X. With τ=(1-β1)minx′∈X(κ1(x′)-v(x′)), we have that, for all x in X,
(11)J1(π,ϕ)(x)≤v(x)+minx′∈X(κ1(x′)-v(x′))≤κ1(x).
The second statement can be proven with the similar symmetrical reasoning.

Some notable examples of choosing v in B(X) for D[A(x)]ϕ,v and D[B(x)]π,v are κ1, κ2, J1(π,ϕ), and J2(π,ϕ), for some pair of π and ϕ, and the zero function such that v(x)=0 for all x in X. In particular, if v(x)=0 for all x∈X, then D[A(x)]ϕ,v={f∈D[A(x)]:D1(x,f,ϕ(x))≤(1-β1)minx′∈Xκ1(x′)}.

We now let
(12)Πϕ,v={π∈Π:π(x)∈D[A(x)]ϕ,v,∀x∈X},hhhhhhhhhhhhhhhϕ∈Φ, v∈B(X),Φπ,v={ϕ∈Φ:ϕ(x)∈D[B(x)]π,v,∀x∈X},hhhhhhhhhhhhhhhhhπ∈Π,v∈B(X).
If D[A(x)]ϕ,v=∅ for some x in X, then Πϕ,v=∅, and similarly Φπ,v=∅ if D[B(x)]π,v=∅ for some x in X. The following theorem characterizes the set of one player's all feasible policies with respect to a fixed policy of the other player.

Theorem 2.
Given ϕ in Φ, π in Π is feasible with respect to ϕ if and only if π is in ⋃v∈B(X)Πϕ,v. Furthermore, given π∈Π, ϕ in Φ is feasible with respect to π if and only if ϕ is in ⋃v∈B(X)Φπ,v.

Proof.
We prove only the first part of the statement for the minimizer case. Suppose that π is feasible with respect to ϕ. Then by setting v=J1(π,ϕ), (1-β1)minx∈X(κ1(x)-J1(π,ϕ)(x))≥0. It follows that
(13)D1(x,π(x),ϕ(x))+β1∑y∈XPxyπ(x)ϕ(x)J1(π,ϕ)(y) =J1(π,ϕ)(x) ≤J1(π,ϕ)(x)+(1-β1)minx′∈X(κ1(x′)-J1(π,ϕ)(x′))
and this implies that there exists v in B(X) such that π(x) is in D[A(x)]ϕ,v for all x in X.

For the other direction, if π is in ⋃v∈B(X)Πϕ,v, then, for some v∈B(X), π(x) is in D[A(x)]ϕ,v for all x in X. By Lemma 1, π is feasible with respect to ϕ.

Because a policy of one player which is feasible with respect to a policy of the other player is δ-feasible with respect to the policy of the other player, the following necessary condition satisfied by a CNE policy pair is immediate.

Corollary 3.
If a pair of π* in Π and ϕ* in Φ is a CNE pair for a given δ, then the pair satisfies the following saddle-point inequality:
(14)Vδ(π*,ϕ)≤Vδ(π*,ϕ*)≤Vδ(π,ϕ*),∀π∈⋃v∈B(X)Πϕ*,v, ∀ϕ∈⋃v∈B(X)Φπ*,v.

Given a pair of π in Π and ϕ in Φ, if we find some v, u in B(X) such that the pair does not satisfy the saddle-point inequality over nonempty Πϕ,v and Φπ,u, then the pair is not a CNE pair.

4. An Example of Approximation Scheme for CNE
We now provide an example of a general approximation scheme for CNE based on the necessary condition. Basically, we fix some u, v in B(X) and try to find an equilibrium policy pair that satisfies the saddle-point inequalities over subsets of the feasible policy spaces induced with the selected u, v and the pair, which is an approximate CNE pair.

We start with selecting arbitrary v1, v2 in B(X) and define feasible mixed joint-action sets induced with v1, v2: for all x∈X,
(15)Dv1,v2(x) =⋂i=1,2{∑y∈X(f,g)∈D[A(x)]×D[B(x)] :Di(x,f,g)+βi∑y∈XPxyfgvi(y)≤vi(x) +(1-βi)minx′∈X(κi(x′)-vi(x′))∑y∈X}.

Let
(16)Δv1,v2(x) ={(a,b)∈A(x)×B(x):(fa,gb)∈Dv1,v2(x)},
where fa denotes the mixed action with fa(a)=1 and gb with gb(b)=1.

Assume that Δv1,v2(x)≠∅ for all x in X. We then obtain a pair of nonempty ΔAv1,v2(x)⊆A(x) and ΔBv1,v2(x)⊆B(x) such that ΔAv1,v2(x)×ΔBv1,v2(x)⊆Δv1,v2(x).

Let
(17)Aϕ,v(x)={a∈A(x):fa∈D[A(x)]ϕ,v},hhhhhx∈X, v∈B(X), ϕ∈Φ
and similarly
(18)Bπ,v(x)={b∈B(x):gb∈D[B(x)]π,v},hhhhhx∈X, v∈B(X), π∈Π.
That is, Aϕ,v(x) is a ϕ-constrained feasible pure-action set with v at x for the minimizer and similarly Bπ,v(x) is for the maximizer. For any subset A′(x)⊆A(x), x∈X with ⋃x∈XA′(x)=A′, we denote D[A′(x)] to be the set of all possible probability distributions over A(x) that have zero probabilities for the actions in A(x)∖A′(x). If A′(x)=∅, then D[A′(x)]=∅. The notation of D[B′(x)] for B′(x)⊆B(x) is similarly denoted. (Note that D[Aϕ,v(x)]⊆D[A(x)]ϕ,v⊆D[A(x)] for all x in X in general.) We further let
(19)Π[A′]={π∈Π:π(x)∈D[A′(x)],∀x∈X},Φ[B′]={ϕ∈Φ:ϕ(x)∈D[B′(x)],∀x∈X}.
If D[A′(x)]=∅ for some x in X, Π[A′]=∅, and similarly Φ[B′]=∅ if D[B′(x)]=∅ for some x in X.

By construction, the following result is immediate. For all x∈X,
(20)ΔAv1,v2(x)⊆⋂ϕ∈Φ[ΔBv1,v2]Aϕ,v1(x),ΔBv1,v2(x)⊆⋂π∈Π[ΔAv1,v2]Bπ,v2(x),
which further implies that, for any (π~,ϕ~) such that π~(x)∈D[ΔAv1,v2(x)] for all x in X and ϕ~(x)∈D[ΔBv1,v2(x)] for all x in X, π~ is feasible with respect to ϕ~ and ϕ~ is feasible with respect to π~.

Consider now the unconstrained game Mv1,v2=(X,ΔAv1,v2,ΔBv1,v2,P,C), where P and C are evaluated only at f in D[ΔAv1,v2(x)] and g in D[ΔBv1,v2(x)] for all x in X, and denote the set of all NE policy pairs of Mv1,v2 to be NE(Mv1,v2). The above two results finally imply then that any (π~,ϕ~)∈NE(Mv1,v2) is a local CNE for M. In other words, for any δ, π~ is δ-feasible with respect to ϕ~ and ϕ~ is δ-feasible with respect to π~, and Π[ΔAv1,v2]⊆Πϕ~ and Φ[ΔBv1,v2]⊆Φπ~, and
(21)Vδ(π~,ϕ)≤Vδ(π~,ϕ~)≤Vδ(π,ϕ~),∀π∈Π[ΔAv1,v2], ∀ϕ∈Φ[ΔBv1,v2].
That is, the local CNE pair (π~,ϕ~) satisfies all of the conditions of CNE except that the saddle-point inequality is satisfied locally for the subsets of Πϕ~ and Φπ~. In fact, related solution concepts for games that are resistant to local deviations, called “local NE," have been already established in economics (see, e.g., [7]).

Projecting Δv1,v2(x) into the two sets of ΔAv1,v2(x) and ΔBv1,v2(x) turns out to be equivalent to obtaining a complete bipartite subgraph or biclique subgraph from a (bipartite) graph. The problem of finding a biclique in a given bipartite graph is well studied in the graph theory literature (see, e.g., [8]). Another issue is how we set v1 and v2 such that Δv1,v2(x) is nonempty for all x in X. If there exists a pure-policy pair of π in Π and ϕ in Φ such that, for all x in X, π(x)(a)=1 for some a∈A(x) and ϕ(x)(b)=1 for some b∈B(x) and one policy is feasible with respect to the other policy, then by setting vi=Ji(π,ϕ), i=1,2, we have (π(x),ϕ(x))∈Dv1,v2(x) for all x in X, making Δv1,v2(x)≠∅ for all x in X. We can put the following feasibility assumption on M to assure the existence of such a pure-policy pair: for all x in X,
(22)min{κ1(x),κ2(x)} ≥max{infπ∈Π,ϕ∈ΦJ1(π,ϕ)(x),infπ∈Π,ϕ∈ΦJ2(π,ϕ)(x)}.

In other words, by this assumption there exists at least one pure-policy pair such that one policy is feasible with respect to the other policy. The existence comes from the fact that there exists a pure-policy pair (π,ϕ) that achieves infπ′,ϕ′J1(π′,ϕ′)(x) for all x in X in solving this Markov decision process problem [9] and also for the case of infπ′,ϕ′J2(π′,ϕ′)(x) for all x in X.

To illustrate how the above method works, we consider the following simple MG M=(X,A,B,P,C), where X={1,2}, A(x)=B(x)={a,b}, ∀x∈X, and p, c, d1, and d2 are given in terms of a matrix form, respectively: paa=[0.10.90.90.1], pab=[0.90.10.10.9], pba=[0.90.10.90.1], and pbb=[0.10.90.10.9]; caa=cab=[0550], cba=cbb=[510105]; and d1aa=d1ab=[0220], d1ba=d1bb=[2442], d2aa=d2ab=[0220], and d2ba=d2bb=[4224]. In this matrix form, for example, the (i,j)th entries of pab and cab refer to p(j|i,a,b) and c(i,j,a,b), respectively. The other parameters are given such that γ=β1=0.9, β2=0.95, κ1(1)=κ2(1)=40, and κ1(2)=κ2(2)=50.

We first observe that, for the pure-policy pair (π,ϕ)=((a,a),(a,a)), one policy is feasible with respect to the other policy. The notation of π=(π(1),π(2)) here refers to π(1)∈D[A(1)] and π(2)∈D[A(2)]. For simplicity, we write π(1) as a for the distribution concentrated on the action a. (Note that we can obtain another such pure-policy pair ((a,a),(b,b)) which achieves infπ∈Π,ϕ∈ΦJ1(π,ϕ)(x)=2, ∀x∈X.)

Now, by setting vi=Ji(π,ϕ), i=1,2, we first obtain Δv1,v2(x)={(a,a),(a,b)}, for all x∈X, thereby having the sets of ΔAv1,v2(x)={a} and ΔBv1,v2(x)={a,b}, for all x∈X. We solve the unconstrained two-person zero-sum MG Mv1,v2, obtaining a pure NE policy pair of (π~,ϕ~)=((a,a),(a,a)) for Mv1,v2, which is then a local CNE for M.