^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

From the classical point of view, it is important to determine if in a Markov decision process (MDP), besides their existence, the uniqueness of the optimal policies is guaranteed. It is well known that uniqueness does not always hold in optimization problems (for instance, in linear programming). On the other hand, in such problems it is possible for a slight perturbation of the functional cost to restore the uniqueness. In this paper, it is proved that the value functions of an MDP and its cost perturbed version stay close, under adequate conditions, which in some sense is a priority. We are interested in the stability of Markov decision processes with respect to the perturbations of the cost-as-you-go function.

From the classical point of view (for instance, in Hadamard’s concept of well-posedness [

In this paper, we study a family of perturbations of the cost of an MDP and establish that, under convexity and adequate bounds, the value functions of both the original and the cost-perturbed Markov decision processes (MDPs) are uniformly close. This result will eventually help us determine whether both the uniqueness and the nonuniqueness are stable with respect to this kind of perturbation.

The structure of this paper is simple. Firstly, the preliminaries and assumptions of the model are outlined. Secondly, the main theorem is stated and proved, followed by the main example. A brief section with the concluding remarks closes the paper.

Let

Let

(a)

(b) The transition law

(c) There exists a policy

The following consequences of Assumption

The optimal value function

There is also

and

For every

Let

It will be also supposed that the MDPs taken into account satisfy one of the following Assumptions

(a)

(b)

(c)

(d)

(a) The same as Assumption

(b)

(c)

(d)

Assumptions

For

There is a policy

Suppose that, for

For

Suppose that, for

Let

It is easy to verify, using Assumption

There exists a measurable function

With respect to the existence of the function

Suppose that Assumptions

if

under Condition

The proof of case (a) follows from the proof of case (b) given that

(b) Assume that

Firstly, for each

Secondly, assume that for some positive integer

Consequently, using Condition

On the other hand, from (

In conclusion, combining (

The following corollary is immediate.

Suppose that Assumptions

Let

Example

Assumption

By direct computations we get, for the stationary policy

On the other hand, Assumptions

Now, take

Now, for each

Hence, taking

The specific form of the perturbation used in this paper is taken from [

Both state and action spaces are considered to be subsets of

Theorem

Finally, we should mention that this research was motivated by our interest in understanding the relationship between nonuniqueness and robustness in several statistical procedures based on optimization.