A Risk-Averse Inventory Model with Markovian Purchasing Costs

We study a few dynamic risk-averse inventory models using additive utility functions. We add Markovian behavior of purchasing costs in our models. Such Markovian purchasing costs can reflect a market situation in a global supply chain such as fluctuations at exchange rates or the existence of product spot markets. We provide our problem formulations with finite and infinite MDP (Markovian Decision Process) problems. For finite time models, we first prove (joint) concavity of the model for each state and obtain a (modified) base-stock optimal policy. Then, we conduct comparative static analysis for model parameters and derive monotone properties to the optimal solutions. For infinite time models, we show the existence of stationary base-stock optimal policies and the inheritance of the monotone properties proven at our finite time models.


Introduction
Various inventory problems have been studied as dynamic programming models in supply chain literature.In these models, there exists a product to be sold over a (finite or infinite) time horizon.On the one hand, when demand exceeds supply for the product, the shortage amount is backordered and fulfilled by the supply in the next period with a backordering cost.On the other hand, when supply exceeds demands for the product, the excessive inventory is carried for the potential demand in the next period with a holding cost.The firm's objective is to determine the optimal ordering quantity so as to optimize expected total profit or costs.
The literature focuses mainly on risk-neutral performance measures, when the firm maximizes the expected total profit or minimizes the expected total costs.It implies that inventory managers are risk-neutral.Particularly, risk neutrality provides the best decision only on average, so it is consistent with rational decision making.However, we cannot assume all the inventory managers are risk-neutral.In supply chain literature, Schweitzer and Cachon [1] provide an experimental evidence suggesting that inventory managers may be risk-averse for the products with high profits.
In recent years, risk-averse inventory models have received an increasing attention in literature.Most work up to date focuses on single-period (newsvendor) models.For the single-period risk-averse models, Chen et al. [2] provide an excellent review and a summary of the literature in this direction.Choi et al. [3] also review the recent literature after Chen et al. [2].On the other hand, for the multiperiod risk-averse models, Bouakiz and Sobel [4] initiated their analysis with exponential utility functions and characterized the optimal ordering policies.After then, Chen et al. [2] and Chen and Sun [5] studied dynamic inventory models in conjunction with hedging opportunities at financial markets.From this stream of research, typical key contributions to literature are the characterization of the optimal base-stock policies in dynamic inventory models.This paper follows after Chen et al. [2] and Chen and Sun [5] in this stream where the authors consider a few risk-averse models with finite time horizon at the former and with infinite time horizon at the latter.Chen et al. [2] consider several joint models of inventory and pricing with and without financial hedging opportunities.They establish a consumption model with an income flow in a multiperiod model.As risk measures, they use additive exponential utility functions to analyze the models and derive state-dependent (modified) base-stock optimal policies.The work of Chen and Sun [5] is a natural extension to that of Chen et al. [2] with infinite time horizon.They also use additive exponential utility functions, but without financial hedging opportunities.As a result, they obtain state-dependent (modified) basestock optimal policies with infinite time horizon.
In our work, we add Markovian (discrete-state) behavior of purchasing costs, which distinguishes this work from Chen et al. [2] and Chen and Sun [5] where the models parameters are fixed or at most uniquely determined deterministically by their historical trajectory.That is, they do not consider the probabilistic characteristics in their model parameters.Our Markovian purchasing costs reflect typical market situations in global supply chains such as fluctuations at exchange rates or the existence of product spot markets.By exploiting such cost changes, inventory managers can get benefits from the fluctuations.Thus, a (random) fluctuation in purchasing cost affects the optimal ordering quantity significantly, so it has been frequently studied in the literature of risk-neutral inventory models (e.g., Gavirneni [6] and Yang and Xia [7]).
Gavirneni [6] considers a risk-neutral multiperiod inventory model.By analyzing the model with both finite and infinite time horizons, he obtains a base-stock optimal policy and monotone property of the impact of fluctuations in purchasing costs on the optimal ordering quantity under some conditions.Yang and Xia [7] study a continuousreview risk-neutral inventory system with a continuous MDP (Markov Decision Process) formulation.Then, they identify some conditions where the base-stock order-up-to level is monotone by the (random) fluctuations in purchasing costs.However, both of these two works only study the corresponding risk-neutral models.After then, with Markovian purchasing costs in risk-averse models, our key contributions are to conduct a comparative static analysis with finite and infinite time horizon and obtain monotone properties to the optimal solution, which have not been studied in literature.
The remainder of this paper is organized as follows.In Section 3, we establish the models with MDP formulations using general additive utility functions.Then, in Section 3.1, we prove the propositions of the concavity of the model and state-dependent optimal base-stock policy.It implies that these propositions can be preserved with risk aversion as well as risk neutrality.In addition, for our comparative static analysis, we prove the impacts of backordering and inventory holding costs to the optimal order-up-to level.Then, with the special case of exponential utility functions at Section 3.2, we also prove the impacts of (random) price changes and cost fluctuations to the optimal solution.We then extend the analytical results to the case of infinite time horizon models at Section 4. For numerical analysis, computational results are presented to confirm the analytical results in Section 5. Finally, we provide some concluding remarks in Section 6.

Problem Formulation
We consider a risk-averse firm to make a sequential decision from time  = 1, 2, . . ., , where  is a length of time horizon.In each time , it faces a nonnegative and realbounded random demand   ∈ [0,  max ], where demands in different time periods are independent.It also has (linear) time-invariant resale price , inventory holding costs ℎ, and backordering costs per unit per period.
Let us denote   to be the initial on-hand inventory at time  before placing an order.Similarly,   is the accumulated inventory at time  after receiving an order.Lead time is given zero.So, the amount to be ordered is fulfilled instantaneously.
Let us also define fluctuations in purchasing costs.We denote  as the total number of possible values of the purchasing costs in each time , where, without loss of generality,  1 ≥  2 ≥ ⋅ ⋅ ⋅ ≥   ≥ ⋅ ⋅ ⋅ ≥   , for  = 1, . . ., .This purchasing cost in each time  undergoes Markovian behavior with a transition matrix  = [  ], where   is the probability that the purchasing cost is   in the next period given that it is   in this period.Then, the current profit function Π  (, , ) at time  is defined when backordering is allowed given the target on-hand inventory , initial inventory , and purchasing costs   with state .Consider In addition to profit functions, we assume that inventory managers can borrow or lend money with a risk-free interest rate   from financial markets.That is, we consider both consumption and profit income in our model, and the current profit Π  and a (nonnegative) consumption level   (≥0) at time  change the current wealth level   as follows: To be consistent with risk aversion, each   is nondecreasing and concave.As a special case of (general) additive utility function, exponential utility function has the form   (  ) = − −  /  ,   > 0. Here,   can be translated into risk tolerance factor.Thus, lower   means more risk-averse.
The original model is max For a risk-neutral model, let   (, , ) be profits-to-go function of the risk-neutral model up to the end of time horizon, T, when backordering is allowed given that the initial on-hand inventory is  and wealth level  with the state  at time .Consider where with a boundary condition Due to additivity of expected value operator, wealth and consumption levels can be separated from the model as they do not affect the optimal ordering quantity.It implies that our risk-neutral case is an income-flow model without consumption through financial markets, which is equivalent to the model in Gavirneni [6].

Additive Utility Functions.
In this subsection, we focus on additive utility function to analyze a dynamic consumption model.First, we define the value function   (, , ) which means utility-of-profits-to-go function of additive utility up to the end of time horizon, , when backordering is allowed given that the initial on-hand inventory is  with the purchasing costs state  and wealth level  at time .Consider where with a boundary condition By an equivalent formulation   ( −   , , ) =   (, , ) and the modified income at time , Then a new problem is where with a boundary condition Proposition 1 (existence of a wealth dependent base-stock optimal policy).  (, , ) is jointly concave in  and  for each  = 1, . . ., .In addition, a wealth () dependent basestock policy is optimal.
Proof.Our proof idea is induction.For   (, , ) at  = , it is obvious to be jointly concave in  and , ∀ as   (⋅) is nondecreasing and concave.Next, we assume that  +1 (, , ) is jointly concave in  and , ∀.Finally, we prove that a wealth () dependent base-stock policy is optimal and   (, , ) is jointly concave in  and , ∀.
Then, we prove that a wealth dependent base-stock policy is optimal.Let  * (, ) be an optimal solution for the problem Because E[  (, , )] is concave in  for given  and , it is optimal to order-up-to  * (, ) if  <  * (, ) and not to order otherwise.That is, a wealth dependent base-stock policy is optimal.
Finally, after a proper modification of Theorem A.4 (convexity preservation under minimization) in Porteus [8],   (, , ) is jointly concave in  and , ∀.Now we conduct a comparative static analysis of model parameters.In fact, for single-period models, the comparative static analysis was done in Eeckhoudt et al. [9].Then, we extend the analysis to multiperiod inventory models with general utility functions at Section 3.1 and exponential utility functions at Section 3.2.The dynamic characteristics in our multiperiod models make the analysis nontrivial and even much more challenging.
Proof.Our proof idea is to use supermodularity and has two steps.First, we find the commonality between our model and the (single-period) model in Eeckhoudt et al. [9].Specifically, we show that our boundary case at time  =  is the same as in the case of Eeckhoudt et al. [9].Finally, we show that supermodularity preserves through time periods recursively as maximization preserves it.

Additive Exponential Utility Functions. Now we use exponential utility function for further analysis in Section 3.2.
To analyze it with a risk tolerance parameter , denote the "certainty equivalent" operator with respect to a random variable  to be We also consider the "effective risk tolerance" per period, defined as Then, at time  with an additive exponential utility function, where ) , (20) For any given (, ), the first-order optimal condition with respect to   is Thus, After taking logarithm and calculating the equation, the maximizer (  ) * of   is as follows: Then, the optimal consumption level  *  can be calculated as follows: where Then, by plugging ( 22) into (21), it is where Therefore, the optimal order-up-to level is independent of wealth level with exponential utility function which simplifies the model.
Proposition 3 (the impact of resale price to the optimal base-stock level).ŷRA  (, ) is a nonincreasing function of  which means   (, ) is submodular in (, ).That is, higher resale price means lower order-up-to level at each time  = 1, . . ., .
Proof.We will prove this proposition similar to Proposition 2.
Next, in order to discuss preservation of supermodularity for Proposition 4 (the impact of fluctuations to the optimal base-stock level).When   1 =   1 for each  = 1, . . .,  (the cost in a period is independent of the cost in the previous period), the base-stock solution is order-preserving with respect to the costs such as ŷRA  ( + 1, ) ≥ ŷRA  (, ), for all  = 1, . . ., .
Proof.Our proof idea is to use the concept of supermodularity.First, let me denote   (, ) as follows: For supermodularity of   (, ) with respect to (, ), it is equivalent to prove that In this proposition, it is sufficient to prove that the state space and the set of state and action spaces are lattice and that due to the concavity of   proven at Proposition 1.At time  = , we need to prove where  = ŷRA  (, ).The first inequality holds true as   ≥  +1 and exp(−Π  ( + 1, )/  ) ≥ 0. Finally, the state space S = {(1, . . ., ), [− max × ,  max × ]} and action space [,  max × ] are lattices trivially to satisfy the definition of lattice.

Extension to Infinite Time Horizon Model
In this section, we consider the infinite time horizon problem as a special and limiting case of the finite time horizon problem when  → ∞.For infinite time horizon model, we focus on a subset of additive exponential utility functions at Section 3.2, denoted as   .For this subset of   , we specifically consider two conditions.The first is uniform boundedness of   where our utility,   , has a finite value for all action and state spaces.That is,   (⋅) < ∞.So is Then, the second is   1 =   1 for all  = 1, . . .,  and   =  for all time  to discuss similar analytical results at finite time MDP models, such as where with   =  ∑  =0  − .Under this second condition, cost parameters are stationary.Now it is time to consider the infinite horizon model with stationary parameters.Then, we study the model with an expected discounted profit criteria.
Proposition 5 (existence of a stationary and wealth independent base-stock optimal policy).A stationary and wealth ()-independent base-stock policy is optimal with infinite time horizon model.
Proof.There are two methods to prove stationarity of infinite time horizon model.The first one is to prove that our optimal value operator is a contraction mapping.Then, by Banach's fixed point theorem, there exists a unique optimal solution to satisfy stationarity.In this paper, we use an alternative method shown in Puterman [11].Then, by Theorem 6.11.10 of Puterman [11], what we need to show is uniform boundedness of E[  ] which we focus on in this section as all other conditions are trivial.Then, only one difference between finite and infinite time horizon models is that maximization is replaced by supremum as we consider continuous state and action spaces.Thus, for each state, there exists Moreover, this (, ) is the unique solution of where Corollary 6 (inheritance of the monotone optimal policy for backordering costs and inventory holding costs with infinite time horizon problem).The base-stock solution is orderpreserving (or order-reversing) with respect to the backordering costs (or inventory holding costs) such that ŷRA (, , ) is a nondecreasing (or nonincreasing) function of  (or ℎ).
Proof.It is evident by Propositions 2 and 5 with Lemma 8-4 (a) of Heyman and Sobel [12] as our state space is trivially a partially ordered set.
Corollary 7 (inheritance of the monotone policy for purchasing costs and resale price with infinite time horizon problem).
The base-stock solution is order-preserving (or order-reversing) with respect to the purchasing costs (or resale price) such as ŷRA ( + 1, ) ≥ ŷRA (, ) and ŷRA (, ) is a nonincreasing function of .
Proof.The proof is the same as in Corollary 6.

Computational Study
In this section, we provide our numerical results to confirm the analytical results in Section 3. We consider additive exponential utility with   = , for all  = 1, . . Figure 1 shows how risk tolerance factor affects the optimal solution.We select the factor to be  = 1000, 3000 and compare the optimal order-up-to level with the riskneutral solutions.When  increases, the optimal solution becomes higher and eventually converges to the risk-neutral solution in the limit.
Figures 2-4 present the numerical results for comparative static analysis with backordering costs, inventory holding costs, and resale price, respectively.In Figure 2, we select our backordering costs to be  = 100, 300, 600 and all other parameters are the same as in our base case.Then, as backordering costs increase, the optimal solutions also increase for each time  = 1, . . .,  and  = 1, . . ., .For Figures 3 and 4, we study the impacts of inventory holding costs and resale price to the optimal solutions.Similarly, we take the same values as in our base case except ℎ = 10, 150, 300 for Figure 3 and  = 200, 400, 800 for Figure 4, respectively.Then, in all our cases, our analytical results are confirmed to show monotone impacts of these model parameters such that the optimal solutions decrease when inventory holding costs or resale price increases.

Conclusion
This paper reconsiders risk-averse inventory models in supply chain literature.Different from the previous works in literature, we use the two key conditions simultaneously, which are multiperiod models and fluctuations in purchasing costs.Although most of the results are seemingly consistent with those in literature, they are analytically challenging and   need to be proved rigorously with independent investigation.In fact, most of the multiperiod inventory models tend to focus on characterizing the base-stock optimal ordering policies in general regardless of risk preferences.This paper could fulfill the knowledge gap in literature to conduct a comparative static analysis as a further research in this research stream.
For possible limitations, the impact of risk tolerance factor has not been discussed analytically but only numerically in this paper.Actually Figure 1 in Section 5 may imply the existence of the monotone impact on risk tolerance factor, if possible, in multiperiod inventory models.In the literature of risk-averse inventory models, such monotone impact on risk tolerance factor has been studied in various single-period models (e.g., Eeckhoudt et al. [9]).Thus, it is an interesting conjecture and would be left as a further possible line in this research stream, which has not been proved yet with any multiperiod risk-averse inventory models in literature, up to our best knowledge.

Figure 1 :
Figure1: The impact of risk tolerance factor to optimal order-up-to level.

Figure 2 :Figure 3 :
Figure 2:The impact of backordering costs to optimal order-up-to level.

Figure 4 :
Figure 4:  The impact of resale price to optimal order-up-to level.