THREE DIFFERENT OPERATIONS RESEARCH MODELS FOR THE SAME ( s , S )

Operations Research techniques are usually presented as distinct models. Difficult as it may often be, achieving linkage between these models could reveal their interdependency and make them easier for the user to understand. In this article three different models, namely Markov Chain, Dynamic Programming, and Markov Sequential Decision Processes, are used to solve an inventory problem based on the periodic review system. We show how the three models converge to the same (s, S) policy and we provide a numerical example to illustrate such a convergence.


Introduction
Operations Research is usually perceived as a set of models each of which is applicable to a specific type of problems.Operations Research textbooks often fail to establish linkage between these models and deal with them as "unrelated" topics.Such linkage is essential to ensure the integrity of Operations Research.The present article aims at linking three different Operations Research models, namely Markov Chain, Dynamic Programming, and Markov Sequential Decision Processes, by applying each of them to the same inventory control problem.The article seeks to explain how a solution is obtained by each of the three models and how the three solutions are equivalent even though they may look quite different.

The Problem and the (s, S) Policy Solution
Let us consider a hypothetical company estimating the distribution of demand D for one of the items it is producing by P [D = j] = p j , for j ∈ {m, . . ., M }; where P [D = j] is the probability of having a level of demand equal to j, and p j the value of such a probability.The demand for any period n can be satisfied by the quantity x n produced during period n and/or the quantity i n available in inventory at the beginning of n.A holding cost c h is incurred for every unit stored from one period to another, and a stockout cost c u is incurred for every unit unavailable when requested (lost sale).The production cost c g (x n ), expressed as a function of the quantity produced x n , is assumed to be zero when x n equals zero and is concave for x n > 0.
Since no specific inventory policy has been adopted, the management of the company is now interested in developing a process control system whereby reorder decisions are automatically generated according to a production policy δ n that associates to each inventory level i n , at the beginning of the period n, a fixed production quantity δ n (i n ) chosen from the set of possible production quantities {x n }.
Scarf [1] proved the existence, for each period n, of an optimal production policy δ * n that brings the inventory level to a target level S * n whenever the initial inventory position i n for the item is lower than (or equal to) a determined value s * n .One important feature of our problem is that cost functions, demand distribution, as well as possible levels of initial inventory, are the same for all periods.This implies the existence of a steady state so that for any possible value of initial inventory i corresponds one optimal policy δ * (i) independently of the period n.Therefore, our concern is to find that optimal decision policy δ * that associates to each inventory position i the production quantity δ * (i) that minimizes the total production, holding and stockout costs, for an infinite horizon.Such a policy is determined by the two optimal values s * and S * of the two variables s and S, respectively: Further, the values of i can never exceed S (the highest possible level) minus m (the lowest possible demand): Constraints ( 1) and (2) implicitly require that: S > s and S ≥ m Moreover, we assume an inventory capacity restriction of K units:

The Markov Chain Model
The inventory level I n at the beginning of each period n is a discrete-time stochastic process whose possible values are {0, 1, . . ., S − m}, as stated in (2).Since I n is always equal to I n−1 plus production minus sales, its probability distribution depends on the inventory level I n−1 and not on the states the stochastic process passed through on the way to I n−1 .For all states i and k and all periods n, the probability that the system is in state i at the beginning of period n−1 will be in state k at the beginning of period n, does not depend of n, but does so on the specified policy (s, S).Therefore, the transition probabilities can be written as Let y and z be two natural numbers verifying 0 ≤ y ≤ M and m ≤ z ≤ S, the transition matrix Q sS from the states i = 0, 1, . . .s, s + 1, . . ., M − y, . . ., S−z, . . ., S−m to the states k = 0, 1, . . ., M −y, . . ., S−z, . . ., S−m can be represented as shown below, where p j = 0 for all j < 0 (e.g., p m−z = 0 if z > m) and y i=x p i = 0 for all y < x (e.g., M i=S p i = 0 if S > M ).At optimality, we must have s < M for if we have enough stock to satisfy all the demand of the period, there will be no need to order and incur unnecessary holding cost [2].However, as we are uncertain whether M < S or M > S, we include the two parameters y and z.
As none of the states in the chain is transient or periodic, and since all of them communicate with each other, we can conclude that the chain is ergodic [3], [4].Therefore, there exists a steady-state distribution π sS = [π sS 0 , π sS 1 , . . ., π sS S−m ] for the chain that can be calculated by solving the system: where : Let us call g(s, S), h(s, S) and u(s, S) the expected per period production, holding and stockout costs, respectively, as functions of reorder point s and target level S: Let w(s, S) be the expected total cost, equal to the sum of the three functions ( 6)- (8).The optimal values s * and S * can be obtained by minimizing w(s, S) = g(s, S) + h(s, S) + u(s, S) subject to (4): min w(s, S) = g(s, S) + h(s, S) + u(s, S), S.T. s < S and S ∈ {m, m+1, . . ., K}

The Dynamic Programming Model
Let the period n be the phase and the inventory level i n at the beginning of the period n the state.The process evolves from state i n to state i n+1 as: where j belongs to the set {m, . . ., M } and δ n (i n ) is the quantity to produce during period n according to the policy δ n as a function of the initial inventory level i n .Let v i n , δ n (i n ) denote the expected total cost of production, holding, and stockout, for any period n having an initial inventory of i n units and a production of δ n (i n ) units: The objective is to minimize the expected total cost for the periods 1, 2, . . ., given that the inventory level is initially i 1 .If we denote the objective function by f 1 (i 1 ), we can generate a more general function f n (i n ) defined as the minimal expected total costs for the periods n, n+1, . .., given that i n units are initially available in inventory.The recurrence relation between f n (i n ) and f n+1 (i n+1 ) can be expressed as: However, dynamic programming models require a finite horizon [5], [6] since f n (i n ) in (12) cannot be computed before f n+1 (i n+1 ).This imposes a last period N as the starting point of the recurrence relation.N could be chosen large enough to enable the process to reach a steady state.For the first periods, one optimal policy δ * (i) corresponds to any possible value of initial inventory i, independently of the period n.However, the last periods could be different, as they may carry on the effect of the introduction of the "dummy" last period N .The solution of the dynamic program is achieved first by minimizing v i N , δ n (i n ) to obtain δ * N (i).Then, we use the recursivity in (12) to find δ * N −1 (i), δ * N −2 (i), . . .and so on until the procedure reaches a period , thereby solving the problem for the last L periods only: The first part of ( 16) is justified exactly in the same way as (2): i can never exceed the highest possible level (K) minus the lowest possible demand (m).The second part is directly obtained from (4).

The Markov Sequential Decision Processes Model
A Markov sequential decision process can be defined as an infinite horizon probabilistic dynamic program.It can also be defined as a Markov process with a finite number of states and with an economic value structure associated with the transitions from one state to another [3], [7].In our case, the state will continue to be the initial inventory of the period.Let f δ (i) be the expected cost incurred during an infinite number of periods, given that, at the beginning of period 1, the state is i and stationary policy δ is followed: where v i, δ(i) is the expected cost incurred during the current period, as defined in (10).The horizon being infinite, f δ (i) will also be infinite.
To cope with the problem, we can use the expected discounted total cost.We assume that a $1 paid the next period will have the same value as a cost of β dollars paid during the current period.Let V δ (i) be the expected discounted cost incurred during an infinite number of periods, given that, at the beginning of period 1, the state is i and stationary policy δ is followed: where M j=m p j V δ max 0, i + δ(i) − j is the expected cost, discounted back to the beginning of period 2 and incurred from the beginning of period 2 onward.The smallest value of V δ (i), that we denote by V (i), is the expected discounted cost incurred during an infinite number of periods, provided that the state at the beginning of period 1 is i and the optimal stationary policy δ * is followed: Using ( 16) and ( 18), equality ( 19) can be equivalently written as: For i = 0, . . ., K −m : (20) This can be transformed into the following K −m linear programs: max V (i); for i = 0, . . ., K −m (21) S.T. ( 22) It can be shown [8] that the solutions of the K inter-dependent linear programs (21)-( 22) are achieved simply by taking the sum of all the objectives, thus obtaining a single-objective linear program:

Linking the Models
First we show the link between the last two models, then between the first and the last ones.

Linking the Dynamic Programming Model and the Markov Decision Process Model
The solution of the dynamic program is that of (15)-( 16).However, as the horizon is initially infinite, we can choose n sufficiently large so that This allows the writing of ( 15)-( 16) as: For i = 0, . . ., K −m : (25) The same equality can be obtained when giving β the value of 1 in (20): For i = 0, . . ., K −m : (26) Therefore, both (15)-( 16) and (20) are obtained from (25).The two models diverged when dealing with the problem of the infinite value of the function (25).In (15)-( 16) a finite number of periods was fixed and in (20) the expected cost was discounted.

Linking the Markov Decision Process Model and the Markov Chain Model
Let us focus on (17), which was the starting point of the Markov sequential decision processes model.To simplify the representation, we assume that the state evolves from i 0 to i j0 , then i j1 , i j2 , i j3 . . .This means that we denote max 0, i + δ(i) − j k by i j k :

Solution of the Dynamic Programming Model
The solutions for the periods N and N−1 are provided in the following table where i, i N −1 , i N , δ(i), δ N −1 (i) and δ N (i) are as defined in (16), v i, δ(i) as defined in (10) and i N is max 0, i + δ(i) − j as defined in (10).Based on the last column of the table, the optimal policy is to produce 3 only when i=0.The same solution is obtained for f N −2 , f N −3 , . . .(calculations not shown), which means that (0,3) is the optimal steady state policy (as found previously).The same value of W could be found by dividing w(0, 3) by 1−β: This illustrates the convergence of the three models.

Conclusion
In this paper, we used three different models to solve the same problem based on the same notation, the same data, and the same assumptions.Despite some similarities, the three models approached the problem in different ways.Having different theoretical bases, the obtained formulations showed major differences, but they all converged into the same optimal solution as was illustrated by the numerical application.Such a convergence is justified by the fact that all three models lead to an exact solution, which is the optimal (s, S) policy.

Call for Papers
As a multidisciplinary field, financial engineering is becoming increasingly important in today's economic and financial world, especially in areas such as portfolio management, asset valuation and prediction, fraud detection, and credit risk management.For example, in a credit risk context, the recently approved Basel II guidelines advise financial institutions to build comprehensible credit risk models in order to optimize their capital allocation policy.Computational methods are being intensively studied and applied to improve the quality of the financial decisions that need to be made.Until now, computational methods and models are central to the analysis of economic and financial decisions.However, more and more researchers have found that the financial environment is not ruled by mathematical distributions or statistical models.In such situations, some attempts have also been made to develop financial engineering models using intelligent computing approaches.For example, an artificial neural network (ANN) is a nonparametric estimation technique which does not make any distributional assumptions regarding the underlying asset.Instead, ANN approach develops a model using sets of unknown parameters and lets the optimization routine seek the best fitting parameters to obtain the desired results.The main aim of this special issue is not to merely illustrate the superior performance of a new intelligent computational method, but also to demonstrate how it can be used effectively in a financial engineering environment to improve and facilitate financial decision making.In this sense, the submissions should especially address how the results of estimated computational models (e.g., ANN, support vector machines, evolutionary algorithm, and fuzzy models) can be used to develop intelligent, easy-to-use, and/or comprehensible computational systems (e.g., decision support systems, agent-based system, and web-based systems) This special issue will include (but not be limited to) the following topics: • Computational methods: artificial intelligence, neural networks, evolutionary algorithms, fuzzy inference, hybrid learning, ensemble learning, cooperative learning, multiagent learning the solution V (0) = 1150.382,V (1) = 1143.793,and V (2) = 1137.963.The discounted expected cost for the infinite horizon, that we denote by W , can be calculated on the basis of the steady state probabilities:

•
Application fields: asset valuation and prediction, asset allocation and portfolio selection, bankruptcy prediction, fraud detection, credit risk management • Implementation aspects: decision support systems, expert systems, information systems, intelligent agents, web service, monitoring, deployment, implementation