Explicit Solution of the Average-Cost Optimality Equation for a Pest-Control Problem

We introduce a Markov decision process in continuous time for the optimal control of a simple symmetrical immigration-emigration process by the introduction of total catastrophes. It is proved that a particular control-limit policy is average cost optimal within the class of all stationary policies by verifying that the relative values of this policy are the solution of the corresponding optimality equation.


Introduction
The term "Markov decision process" was introduced by Bellman 1 for the description of a stochastic process controlled by a sequence of actions.During the last fifty years, the Markov decision process has been the subject of remarkable research activity.It is called discrete-time Markov decision process or semi-Markov decision process if the times between consecutive decision epochs are equal or random, respectively.The Markov decision process in continuous time is a special semi-Markov decision process if the times between consecutive decision epochs are exponentially distributed.Collections of results with some emphasis to the theoretical aspects of Markov decision processes are given in the books of Derman 2 , Ross 3 , Whittle 4, 5 , Puterman 6 , and Sennott 7 .The computational aspects of Markov decision processes can be found in detail in the books of Puterman 6 and Tijms 8 .
The most widely used optimization criteria in a Markov decision process are the minimization of the finite-horizon expected cost, the minimization of the infinite-horizon total expected discounted cost, and the minimization of the long-run expected average cost per unit time.An intuitively appealing class of policies is the class of stationary policies.A policy is said to be stationary if, at each decision epoch, it chooses one action which depends only on the current state of the process.Although an optimal discounted-cost wider class of all stationary policies.Numerical results that compare the optimal policy of the present problem with the optimal policy of the problem in which a simple immigration process is controlled through total catastrophes are presented in Section 5.In Section 6, we consider the case in which the immigration rate is not equal to the emigration rate, and we give a condition that guarantees the optimality of the policy which always introduces catastrophes.In Section 7, we summarize the results of the paper.

The Model
Consider a population of individuals which grow in a habitat with infinite capacity according to a simple symmetrical immigration-emigration process.Let ν denote the immigration and the emigration rate.We suppose that the individuals are harmful.For example, the individuals may be insects which destroy a crop or spread a disease.We refer to such individuals as pests.Assume that the rate at which damage is caused by the pests is proportional to their population size.Defining the unit of cost to be the cost per unit time of the damage caused by each pest, it follows that the cost of the damage caused by the pests is i per unit time, where i is the current population size.
The pest population may be controlled by some action which introduces total catastrophes.When such a catastrophe occurs, the population size of the pests is instantaneously reduced to zero.Let u be the control variable, where u is the rate at which catastrophes occur.Assume that the unit of time has been chosen in such a way that the available values of u are restricted to the closed interval 0, 1 so that if the maximal level of control is being applied,then catastrophes occur at an average rate of one per unit time and the length of time until the occurrence of a catastrophe is exponentially distributed with unit mean.The controlling action gives rise to costs due to labour, materials, risk, and so forth.Let the cost of taking controlling action u be ku per unit time, where k > 0. A stationary policy is defined by a sequence {u i }, where u i ∈ 0, 1 is the level of control applied whenever the process is in state i, i ≥ 1.
If the stationary policy u i , i ≥ 0, is employed, our assumptions imply that we have a continuous time Markov chain model for the population growth of the pests with the state space S {0, 1, 2, . ..} and the following transition rates:

2.1
The expected long-run average cost per unit time of a policy u i , i ≥ 0 is defined as the limit as t → ∞ of the expected cost incurred in the time interval 0, t divided by t, given that the policy u i , i ≥ 0 is employed.We aim to find a stationary policy which minimises the expected long-run average cost per unit time among all stationary policies.An intuitively appealing class of policies is P ≡ {P x : x 1, 2, . ..},where P x is the stationary policy according to which u i 0, 1 ≤ i ≤ x − 1, and u i 1, i ≥ x.Thus, P x is a policy of "bang-bang" type, where controlling action is not taken as long as the population size is less than x, but controlling action is taken at the maximal possible level whenever the population size is greater than or equal to x.We refer to P x as a control-limit policy.In Section 3, the optimal policy P x * within P is found, and in Section 4, the optimality of P x * within the wider class of all stationary policies is established using Bather's 10 general results.
Bather's theory has also been applied to two other pest-control models see 12, 13 .It has not been applied to any other problem for the optimal control of a denumerable Markov chain in continuous time.Note that the present model is a special case of a more general model that was studied in 14 .In that paper, it was assumed that the immigration and emigration rates were not necessarily equal and the cost rate caused by i pests was equal to d i , where {d i } is a nondecreasing sequence such that d i → ∞, as i → ∞, and d i ≤ Ai m for some constant A > 0 and integer m.The characterization of the form of the optimal policy was achieved in that work by the standard method of successive approximations that we describe in Section 1.The approach that we follow in the present paper is direct, since it does not use discounted programming.It includes only the solution of the corresponding averagecost optimality equation and the verification of certain conditions on the transition rates and on the cost rates.It also enables us, as we will see in Section 3, i to obtain a measure of the advantage of starting the process in state i ≥ 0 rather than in some other state j ≥ 0 when using the optimal policy and ii to obtain some interesting inequalities for the minimum average cost and the optimal critical value x * .

The Optimal Policy within the Class P
The equilibrium probabilities π i , i 0, 1, . . .under the policy P x , x ≥ 1 satisfy the following balance equations:

3.1
The above equations together with the normalising condition ∞ i 0 π i 1 yield where, ρ 2ν 1 − √ 4ν 1 / 2ν .Using a well-known result see, e.g., Theorem 5.10 in Ross 3 , the average cost g x under the policy P x can be expressed in terms of the equilibrium probabilities π i , i 0, 1, . . .and the cost rates under P x as follows: Substituting 3.2 into 3.3 , we obtain Proposition 3.1.There exists a policy P x * that is optimal within the class P. The appropriate value of x * is equal to θ 1, where θ is the integer part of the unique positive root of the polynomial

3.5
Proof.Using the above expression for g x we find for x ≥ 1 that g x 1 − g x is equal to A x multiplied by a positive quantity.Temporarily, treat x as a continuous real variable in the interval 0, ∞ .Note that A x > 0, x ≥ 0, since 0 < ρ < 1.Hence, A x , x ≥ 0 is strictly convex.The convexity of A x , the inequality A 0 −12kρ < 0, and the fact that A x → ∞, as x → ∞ imply that the equation A x 0 has a unique positive root r and A x < 0 0 ≤ x < r , A x > 0 x > r .It follows that the sequence g x , x 1, 2, . . .attains its minimum at the integer x * θ 1, where θ is the integer part of r.

Verification of Optimality
According to the results of Bather 10 the policy P x * is optimal within the class of all stationary policies if there exists a constant g and a sequence of nonnegative numbers {h i }, i 0, 1, 2, . . .such that and each stationary policy satisfies certain conditions on the cost rates and transition rates.Equation 4.1 is referred as the optimality equation.The cost structure and transition rates for the present problem are such that all Bather's conditions are clearly satisfied, except for the following one, which needs careful attention.Given any stationary policy, let c i be the cost rate in state i and q i the sum of the transition rates out of state i, i ≥ 0. For every stationary policy, there must exist a positive, decreasing sequence {φ i } and a positive integer n such that ∞ i n φ i ∞ and Note that if the 4.1 -4.4 and the condition 4.5 are satisfied, then the constant g turns out to be the minimum average cost.Thus, to verify the optimality of P x * , we choose g g x * , so that from 3.4 find a sequence h i such that 4.1 -4.4 are satisfied, and then, for every stationary policy, find a sequence {φ i } and an integer n such that condition 4.5 is satisfied.Let The unique solution of the difference equations 4.3 and 4.4 , given the conditions 4.2 and 4.7 and the expression 4.6 for g, is given by Lemma 4.1.The sequence {h i }, i 0, 1, . . .defined by 4.8 and 4.9 is nonnegative and increasing.
Proof.From the expression 4.8 , we have

4.11
Therefore to prove that the sequence {h i }, 0 ≤ i ≤ x * , is nonnegative and increasing, it is sufficient to show that Given the expression 4.6 for g, it can be easily verified that the inequality A x * − 1 ≤ 0, which follows from the proof of Proposition 3.1, implies the above inequality.Hence, {h i }, 0 ≤ i ≤ x * , is nonnegative and increasing.From the expression 4.9 , we have
The function to be minimised on the right-hand side of 4.1 is linear function of u.Hence, the minimum is achieved either at u 0 or at u 1.Since {h i } is solution of 4.3 and 4.4 , to prove that 4.1 is satisfied, we need to check that

which simplify to
15 Using the result of Lemma 4.1, we deduce that the inequalities 4.16 hold if and only if h x * ≥ k, which using 4.8 with i x * can be written equivalently as Substituting for g from 4.6 and ν from 4.18 , the above inequality reduces to A x * − 1 ≤ 0, which follows from the proof of Proposition 3.1.Thus, it has been proved that the sequence {h i }, i ≥ 0, defined by 4.8 and 4.9 , satisfies 4.1 .To prove the extra condition 4.5 , let n x * and

4.20
The fact that the sequence {φ i }, i ≥ x * is positive and decreasing with ∞ i n φ i ∞ is immediate from its definition and Lemma 4.1.For every stationary policy f and every state  i, i ≥ x * , we have that c i ≥ i and q i ≤ 2ν 1.Hence, the sequence {φ i } will satisfy 4.5 for all stationary policies if The sequence {φ i } as defined in 4.20 satisfies the above inequalities.The proof of the following proposition, which is the main result of the paper, has been completed.Proposition 4.2.The policy P x * is optimal within the class of all stationary policies.Remark 4.3.The method that we use in the present work for proving the optimality of P x * enables us to compute the difference h i − h j for any states i, j ≥ 0. This difference is equal to the difference in total expected costs over an infinitely long period of time by starting in state i rather than in state j when using the policy P x * see Chapter 3 in Tijms's book 8 .
Remark 4.4.The inequalities 4.17 and 4.19 imply that the minimum average cost g is bounded below and above by two rational functions of the optimal critical point x * .Note also that from 4.17 and 4.19 , we deduce that x * is bounded above by the square root of 1 12νk.

Numerical Results
In Table 1, we present for different values of ν and k the critical number x * for the present model and the critical number x for the simpler model in which a simple immigration process is controlled through total catastrophes see Kyriakidis and Abakuks 12 .The value of x * is equal to θ 1, where θ is the integer part of the unique positive root of A x 0, while x is found from 2.9 in 12 .In Table 2, we present the corresponding minimum average costs g x * and g x see 4 and 3.5 in 12 .
From Table 1, we see that the critical points x * and x are nondecreasing as k increases, for fixed ν.This is intuitively reasonable, since it seems preferable to avoid introducing catastrophes if the cost of their introduction take large values.From this table we can see that the critical points x * and x are nondecreasing as ν increases for fixed k.We also observe that for k ∈ {1, 5, 10}, the critical value x * is greater or equal to the critical value x , while for k ∈ {20, 50}, the critical value x * is smaller or equal to the critical value x .This can be explained intuitively, since, in the model of the present paper, for large values of k, it seems preferable to initiate the mechanism that introduces catastrophes when the population size is relatively small, because the emigrations of pests may reduce their size below the critical point before a catastrophe occurs.Consequently, in this case, the catastrophe mechanism, which causes high costs, stops.From Table 2, we see that for fixed value of ν, the minimum average costs g x * and g x increase as k increases.From this table we can also see that, for fixed value of k, the minimum average costs g x * and g x increase as ν increases.We also observe that in all cases, g x * is considerably smaller than g x .This is intuitively reasonable, since the emigration of the individuals causes a considerable reduction of the cost that they cause.

Control of the Asymmetric Process
Consider the same model as the one introduced in Section 1 with the following modification: the emigration rate of the pests is μ, where μ is different from the immigration rate ν.As stated in Section 2, the optimality of a control-limit policy can be proved through the discountedcost problem see 14 .However, in this case, it is not possible to repeat the approach that we developed in the case in which ν is equal to μ.This is due to the fact that it is difficult to minimize analytically with respect to x the average cost g x , x ≥ 1 of the control-limit policy P x , since it is given by then the policy P 1 is optimal, since the corresponding optimality equation is satisfied with g g 1 and 6.6

Discussion
A widely used criterion for the optimal control of a stochastic process is the minimization of the expected long-run average cost per unit time.A usual method in the literature to prove that the optimal policy has a specific structure is to show that the solution of the corresponding average-cost optimality equation possess specific properties such as monotonicity or convexity.This is usually achieved through the corresponding finite-horizon and infinite-horizon discounted cost problems, since, in general, it is difficult to solve explicitly the average-cost optimality equation.
In the present paper, we present a problem for controlling a denumerable Markov chain in continuous time in which it is possible to solve explicitly the average-cost optimality equation.Consequently, a particular stationary policy is proved to be average-cost optimal, since in this problem, some extra conditions on the cost rates and on the transition rates are valid.

Table 1 :
The critical numbers x * , x .

Table 2 :
The minimum average costs g x * , g x .