ON THE CONTROL OF A TRUNCATED GENERAL IMMIGRATION PROCESS THROUGH THE INTRODUCTION OF A PREDATOR

This paper is concerned with the problem of controlling a truncated general immigration process, which represents a population of harmful individuals, by the introduction of a predator. If the parameters of the model satisfy some mild conditions, the existence of a control-limit policy that is average-cost optimal is proved. The proof is based on the uniformization technique and on the variation of a fictitious parameter over the entire real line. Furthermore, an efficient Markov decision algorithm is developed that generates a sequence of improving control-limit policies converging to the optimal policy.


Introduction
In many problems dealing with the optimal control of a stochastic process under the criterion of minimizing the expected long-run average cost per unit time it is possible to prove that the optimal policy initiates the controlling action if and only if the state of the process exceeds a critical level.Such a policy is usually called control-limit policy.A method that in some problems leads to the proof of the optimality of a control-limit policy is a parametric analysis introduced by Federgruen and So [2] in a queueing model.According to this method first it is shown that an optimal control-limit policy exists when a parameter (possibly fictitious) takes sufficiently small values.This assertion is then extended inductively from interval to interval of the parameter values.An important advantage of the Federgruen-So method is that in many cases, as a corollary, it can be proved that any local minimum within the set of the average costs of control-limit policies is a global minimum within this set.This result enables us to compute very quickly the optimal policy using the usual bisection procedure or a special-purpose policy iteration algorithm that creates a sequence of strictly improving control-limit policies.
The present paper is concerned with the problem of controlling a pest population, which grows stochastically according to a general immigration process in a habitat with finite capacity, through the introduction of a predator.It is assumed that the predator 2 Control of a truncated general immigration process captures the pests one at a time and then emigrates from the habitat.The capture rate of the predator depends on the number of pests.A finite-state continuous time Markov decision model is constructed and it is proved that there exists an average-cost optimal control-limit policy, if the parameters of the model satisfy some mild conditions.The proof is based on Federgruen-So method.
Note that the Federgruen-So method has been applied to two other Markov decision models for pest control.These models differ from the present one in the way the pest population grows or in the way the pest population is controlled.Specifically, in the first of these models (see [7]) it was assumed that the pest population grows according to a general immigration process in a habitat with finite capacity and it is controlled through total catastrophes, which annihilate instantaneously the pest population size.In the second model (see [8]) it was assumed that the pest population grows according to a Poisson process in a habitat with unlimited capacity and it is controlled through the introduction of a predator.The capture rate of the predator was assumed to be constant.
The structure of the rest of the paper is as follows.In Section 2 we give a detailed description of the Markov decision process.In Section 3, firstly, a necessary and sufficient condition is found under which the condition of never controlling is optimal.When this condition fails, the optimality of control-limit policies is shown by applying the Federgruen-So technique.In Section 4 a tailor-made policy iteration algorithm is developed that generates a sequence of improving control-limit policies and converges to the optimal policy.

The model
Consider a population of individuals that cause some kind of damage (e.g., pests) which grow stochastically according to a general immigration process in a habitat with carrying capacity N, where N is a positive integer.Assume that the immigration rate that corresponds to each state i, 0 ≤ i ≤ N − 1 is equal to ν i > 0. The immigration rate ν N that corresponds to the state N is necessarily equal to zero, since N is the carrying capacity of the habitat.It is assumed that the damage done by the pests is represented by a cost c i , 0 ≤ i ≤ N, for each unit of time during which the population size is i.We impose the natural assumptions that the sequence {c i } is non decreasing and c 0 = 0.
We suppose that there is a controller who observes the evolution of the population continuously and may take an action that introduces a predator in the habitat, whenever a new state is entered.That is, the controller takes actions on a discrete-time mode.More specifically, we assume that there exists a controlling mechanism which can be in one of two modes: on or off.Whenever the mechanism is turned off the pest population evolves without being influenced.When it is turned on, a predator is introduced in the habitat after some random time that is exponentially distributed.The presence of the predator immediately stops the immigrations of the pests, that is, the rates ν i , 0 ≤ i ≤ N − 1, take immediately the value 0. As soon as the predator is introduced in the habitat, it captures the pests one at a time until their population size is reduced to zero and then it emigrates with rate ϑ > 0. It is assumed that the predator captures the pests with rate σ i > 0 when their population size is i, 1 ≤ i ≤ N. The unit of time has been chosen in such a way that E. G. Kyriakidis 3 the rate at which the predator is introduced in the habitat is equal to one.Thus, when the controlling mechanism is on, the length of time until the introduction of the predator is exponentially distributed with unit mean.Whenever the controlling mechanism is on, it incurs a cost of K > 0 per time unit.
Let i and i be the states of the process at which the population size of the pests is i, 0 ≤ i ≤ N, and the predator is absent from their habitat or present, respectively.A stationary policy f is defined by a sequence { f i : 0 ≤ i ≤ N} where f i is the action taken when the process is at state i.It is assumed that f i = 1, when the controlling mechanism is on, and f i = 0 when the controlling mechanism is off.If the stationary policy f ≡ { f i : 0 ≤ i ≤ N} is used, our assumptions imply that we have a continuous time Markov chain model for the population growth of the pests with state space S = {0, 0 ,1,1 ,...,N,N }.
Our goal is to find a policy that minimizes the expected long-run average cost per unit time for every initial state among all stationary policies.The decision epochs include the epochs at which an immigration of a pest occurs and the epochs at which the predator emigrates.An intuitively appealing class of policies is the class of control-limit policies {P n : n = 0,1,...,N}, where P n is the stationary policy under which the controlling action is taken if and only if the population size of the pests is equal to or exceeds n.It seems reasonable that the optimal policy will be of control-limit type if K is sufficiently small.In an earlier paper (see [6]) a similar model was introduced, in which the pest population grows in a habitat with unlimited capacity according to a simple immigration process.The cost rates c i and the captures rates were taken as c i = i and σ i = σ, i ≥ 0. In that work the optimality of a particular control-limit policy within the wider class of all stationary policies was established by proving that it satisfies the optimality equation and certain conditions given by [1].
In the present model, it seems difficult to repeat the same proof since the expression for the average cost under a control-limit policy is too complicated.However, if we impose some mild conditions on the parameters of the model, we can prove the existence of an optimal control-limit policy by applying the Federgruen-So technique, which, as it was mentioned in the previous section, is based on a variation of a parameter over the entire real line.The same technique has been applied in some other queueing and maintenance models (see Federgruen & So [3,4], So [13], So & Tang [14,15]) and in two pest control models (see Kyriakidis [7,8]).
The conditions that we impose on the parameters of the model are given below.
The proposition below, which can be proved by induction on i, gives a sufficient condition for the validity of Condition 1.

The optimality of control-limit policies
If the process is never controlled the long-run average cost per unit time is c N since N is an absorbing state in this case.In the proposition below a necessary and sufficient condition is given under which the policy of never controlling is optimal.Its proof is presented in the appendix.
Proposition 3.1.The policy that never introduces the predator in the habitat is optimal if and only if Assume now that the relation (3.1) is not valid.In this case the policy that never introduces the predator is not optimal.The average cost of a stationary policy which prescribes action 0 at state N is equal to the average cost of the policy that never introduces the predator since N is an absorbing state under such a stationary policy.Consequently, we can restrict ourselves only to the stationary policies that prescribe action 1 at state N.All the results that we will present in the rest of this section are concerned with the optimal policy among these stationary policies.Let r be a real number (possibly negative) that represents a fictitious cost incurred each unit of time the process is occupying the state 0 .In Theorem 3.5 it will be shown that a control-limit policy is optimal for any fixed value of r, in particular for r = 0.
Let T (n) i0 and T (n) i 0 , 0 ≤ i ≤ N, be the expected time until the process under the policy P n , 0 ≤ n ≤ N, reaches the state 0 , given that the initial state is i or i , respectively.Let also C (n) i0 and C (n) i 0 , 0 ≤ i ≤ N, be the expected cost until the process under the policy P n , 0 ≤ n ≤ N, reaches the state 0 , given that the initial state is i or i , respectively.Conditioning on the first transition from the state i, we obtain: Note also that Given the above values of T (n) N0 and C (n) N0 , the quantities T (n) i0 and C (n) i0 , i = N − 1,...,n can be found from (3.2) and (3.3), recursively.E. G. Kyriakidis 5 Let g n denote the expected long-run average cost per unit time under the policy P n , 0 ≤ n ≤ N. The process under the policy P n is a regenerative process, where the successive entries into state 0 can be taken as regenerative epochs between successive cycles.From a well-known regenerative argument (see [11,Proposition 5.9]) it follows that g n is equal to the expected cost of a cycle divided by the expected time of the cycle.Hence, be the relative value associated with the policy P n , 0 ≤ n ≤ N, that corresponds to the state i and let w (n)  i , 0 ≤ n ≤ N, be the relative value associated with the policy P n , 0 ≤ n ≤ N, that corresponds to the state i .These quantities are defined by (see relation (3.1.7)in Tijms [16]) Clearly, since g n = C (n) 0 0 /T (n) 0 0 , by the usual regenerative argument.According to the semi-Markov version of Theorem 3.1.1 in Tijms [16] (see [16, page 220]) the numbers h (n)  i , w (n) i , 0 ≤ i ≤ N and g n satisfy the system of equations: . The results of Lemmas 3.2, 3.3 and 3.4 will be used in the proof of Theorem 3.5.The proof of Lemma 3.2 is similar to the proof of Proposition 3 in [8] and the proof of Lemma 3.3 is similar to the proof of Lemma 2 in [7].
Lemma 3.2.The policy P n is optimal if and only if (3.13) Lemma 3.3.Assume that the policy P n , n < N, is optimal for some fixed value R of the parameter r.Then, it is impossible for the policy P n to be optimal for all r ≥ R (simultaneously).
(ii) Condition 2 implies that the sequence {B (n)  i } is non-decreasing in i, n ≤ i < N, for each n = 0,1,.... Theorem 3.5.There exists a sequence Proof.The proof is by induction on n.We first establish that a number R > −∞ exists such that the policy P 0 is optimal for all r ≤ R. In view of Lemma 3.2, it suffices to show that the numbers h (0)  i and w (0) i , 0 ≤ i ≤ N, and g 0 satisfy the inequalities: , with n = 0 the above inequalities reduce to with 0 ≤ i ≤ N − 1.Note that the process under P 0 must pass through the state i before it enters the state 0 , if the initial state is i + 1.Hence, T (0) i+1,0 − T (0) i 0 > 0. From (3.5) we have that g 0 → −∞ as r → −∞.Thus, there exists a number R > −∞ such that (3.15) hold simultaneously for all r ≤ R. From Lemma 3.3 it follows that R 1 < +∞, where R 1 = sup{w : w ≥ R and the policy P 0 is optimal for all r ≤ w}.
Suppose that there exists a sequence where n < N, such that the policy P s , 0 ≤ s ≤ n, is optimal for all r ∈ [R s ,R s+1 ] with R s+1 = sup{w : w ≥ R s and the policy P s is optimal for all r ∈ [R s ,w]} < +∞.We will show that the policy P n+1 is optimal for r = R n+1 .To achieve this, we use the standard uniformization technique (see Serfozo [12]) to transform the original Markov decision process into an equivalent one in which the times between transitions have the same exponential parameter ν = max 1≤i≤N {ν 0 ,1 + ν i ,σ i ,ϑ} whatever the state and the action are.The reformulated Markov decision process has the same average cost as the original one under any stationary policy.Thus both models have the same optimal policy.Let g n and h (n)  i , w (n) i , 0 ≤ i ≤ N, denote the average cost and the relative values under the policy P n in the new model.Let also T (n)  i0 , T (n) i 0 , and C (n) i0 , C (n) i 0 , 0 ≤ i ≤ N, be the expected times and costs, respectively, until the new process under the policy P n , 1 ≤ n ≤ N, reaches the state 0 , given that the initial state is i or i .E. G. Kyriakidis 7 Consider now some ε > 0. If r = R n+1 + ε, the policy P n is not optimal for the original and, consequently, for the new model.Hence, according to the corresponding result of Lemma 3.2 for the equivalent model one of the following two cases occurs: Case 1.For some i with 0 ≤ i ≤ n − 1: The above inequality is equivalent to ψ i (R n+1 + ε) > 0, with where the last equality follows from the fact that the original and the reformulated process have the same generator (see Serfozo [12]).Since T (n) i0 − T (n) i 0 > 0 and g n as given in (3.5) is increasing in r we deduce that ψ i (r) is decreasing in r.Thus, where the last inequality follows from the optimality of P n for r = R n+1 .Clearly, this is a contradiction and the following Case 2 must arise.
Case 2. For some i with n ≤ i ≤ N: The above inequality is equivalent to ψ i (R n+1 + ε) < 0, with (3.20) From Lemma 3.4 we deduce that ψ i (r), n ≤ i ≤ N, is non-decreasing in i.Thus, Consider a sequence {ε } ↓ 0. In view of the above inequality we have that for all , ψ n (R n+1 + ε ) < 0. From the continuity of ψ n (r) in r it follows that ψ n (R n+1 ) ≤ 0. However, ψ n (R n+1 ) ≥ 0 since the policy P n is optimal for r = R n+1 .Thus, ψ n (R n+1 ) = 0.The last equality means that in the new model for r = R n+1 the actions prescribed, for each state i, by the policy P n+1 minimizes the right-side of the optimality equation (see [16, equation (3.5.4)]), which is satisfied by the numbers h (n)  i , w (n) i , 0 ≤ i ≤ N, and g n .Thus the policy P n+1 is optimal for r = R n+1 in the new and, consequently, in the original model.
Consider again the original model.If n + 1 < N, we define R n+2 = sup{w : w ≥ R n+1 and the policy P n+1 is optimal for all r ∈ [R n+1 ,w]}.From Lemma 3.3 it follows that R n+2 < ∞.If n + 1 = N, it can be shown that the policy P N is optimal for all r ≥ R N , using a similar analysis as in Case 1 without transforming the model.Lemma 3.6.{T (n)  n0 }, 0 ≤ n ≤ N, is non-decreasing in n.Proof.Conditioning on the first transition from state n we have that where p n+1, j (t) is the probability that the state of the (uncontrolled) general immigration process at time t will be j, given that the state at time 0 is n + 1.The relations (3.23) and (3.24) give the result of the lemma.
From (3.5) and Lemma 3.6 we deduce that g n can be written as where g n is independent of r and π n is decreasing in n.Using this result and Theorem 3.5, the following proposition, which will be useful in the computation of the optimal controllimit policy, can be proved in the same way as the Lemma 5.2 in Federgruen and So [2].
Proposition 3.7.For any fixed r any local minimum within the set {g n : 0 ≤ n ≤ N} is a global minimum within this set.

The computation of the optimal policy
In this section we assume that r = 0. So, we consider again the model introduced in Section 2. In view of Theorem 3.5, if condition (3.1) fails, there exists an optimal controllimit policy P n * .From Proposition 3.7 it follows that the optimal critical point n * can be on the time until the introduction of the predator we obtain T