A Consumption and Investment Problem via a Markov Decision Processes Approach with Random Horizon

is work is devoted to a consumption and investment problem, in which there is an investor with certain initial wealth with the possibility of deciding how much of such wealth will be consumed and how much will be invested in each of a series of successive times. e key issue is to nd a wealth assignation rule in order to maximize the performance criteria; such dilemma will be achieved by the dynamic programming technique for the Markov decision processes with random horizon.


Introduction
Markov decision processes (MDPs) provide a very useful system for creating and implementing a decision-making process whose results are partially random. MDPs are useful stochastic processes for boarding a wide range of optimization problems of continuous or discrete nature (In this paper, it will be only considered the discrete framework). In all the sequel, at each step, the process is in some state and the decision maker may choose any action that is available for such a state. e process responds at the next stage by randomly moving to a new state and giving a reward to the decision maker. e central problem of MDPs is to nd an "optimal policy"; i.e., a function that speci es some mechanism for selecting actions optimally at each stage.
MDPs can be solved by dynamic programming. For example, in [1], a comprehensive and theoretical treatment of the mathematical foundations of optimal stochastic control of discrete-time systems is given; meanwhile, in [2], interest is mostly limited to MDPs with a Borel space of states and possibly unlimited costs. In [3], it is explained that the theory of the stochastic dynamic programming method is easily applicable to many practical problems, even for nonstationary models.
However, there exist another method which may be considered for solving stochastic optimization problems, for example: In [4], an emended minimax method is developed based on the semi-autonomized multiobjective optimization algorithm by amending the classical minimax method, which leads to desirable optimal values in the certitude state and to nd another Pareto optimal solution under fuzziness in the incertitude state. In [5], the author focuses on one general fractile criterion iterative-interactive optimization process in order to obtain the preferable Pareto optimal solution, subject to a speci ed main objective function to multiobjective stochastic linear programming problems in a fuzzy environment. In addition, in [6], one real-life-based cost-e ective and customer-centric closed-loop supply chain management model is considered together with the Tset that represents the inherent impreciseness to objective functions which conducts to nd that optimal values are superior than stipulated goals to both the objective functions in the T environment. In [7], the e ects of setup cost reduction and quality improvement in a two-echelon supply chain model with deterioration are developed. e objective is to minimize the total cost of the entire supply chain model by simultaneously optimizing setup cost, process quality, number of deliveries, and lot size. In [8], a set of very interesting situations coming from mobile and wireless networks, connection management, and Internet is considered in which optimal decisions are required and it is necessary to provide a side view about control problems and the theory behind them.
In this paper, the possibility will be taken into account that external factors may force the process to be completed earlier than planned. In this way, it is necessary to consider the horizon as a random variable, which may be independent of the state-action space [9]. Such an idea has been explored; for example, in [10], where the optimal selection strategy for the Armed Bandit paradigm with random horizon and possibly random discount factors is found.
Hence, it will be considered and investor with certain initial capital which; in each of a random number of times, may reinvest into risky assets, consume or invest in a riskfree bond. e goal is to conceive a strategy of consumption and investment in order to maximize the expected sum of an utility coming from; exclusively, the spent capital at each stage. Hence, in this paper, via the theory of MDPs with a finite random horizon, an optimal policy of consumption and investment will be established, in the case in which utility function responsible for evaluating consumption is of the exponential type. Although these kind of utility functions are rather classical, they are as well useful since such functions consider a constant absolute aversion to risk and are the only risk-averse increasing utility function whose risk premium is invariant with respect to wealth [11,12]. e term risk aversion refers to the preference for stochastic realizations with limited deviation from the expected value. In risk-averse optimal control, one may prefer a policy with a higher cost in expectation but lower deviations than one with a lower cost but possibly higher deviations [13].
is work is organized as follows: in section two, fundamental ideas around MDP with random horizon jointly with an equivalence between performance criteria of an MDP with a finite random horizon and the one associated with an MDP with deterministic horizon are analyzed. Section three will be addressed to a consumption and investment scenario with random horizon together with a numerical experiment. Additionally, this section contains the main contribution of this text: the finding of the optimal policy for the consumption and investment problem with a finite random horizon and exponential utility function. Finally, two appendices are included. On the one hand, Appendix A deals with basic definitions around the MDP theory together with some useful assumptions for solving the consumption and investment problem by means of the dynamic programming technique. On the other hand, the concept of the financial market will be discussed in Appendix B.

Markov Decision Process with a Random Horizon
In literature, one may find references where discrete-time control problems with a random horizon are discussed, for example: [9,10]. erefore, let τ be a discrete random variable associated with some probability space (Ω ′ , F ′ , P). Suppose that the mass function of τ is known and given by ρ n ≔ P(τ � n), with n � 0, 1, 2, . . . , N, where N is a natural number or N � ∞. Consider now a Markov decision model (E, A, D, Q, r n , g) and define the following performance criteria: where π ∈ Δ, x ∈ E, and E denotes the expected value with respect to the joint distribution of the process (x n , a n ) and τ. In order to introduce the corresponding optimal control problem, we define the optimal value function as follows: In this way, the optimal control problem with a random horizon consists of finding a policy π * ∈ Δ such that V τ (π * , x) � V τ (x), ∀x ∈ E. e following assumption will be considered for simplifying the performance criteria under a discrete random horizon [9].
where for each n � 0, 1, . . . , N, P n ≔ P[τ ≥ n]. us, the optimal control problem with a random horizon τ is equivalent to the optimal control problem with planning horizon N + 1, a nonhomogeneous reward function and an equally zero terminal reward. Hence, eorem A.6 may be considered under conditions of Assumption A.5. An alternative approach discussed in [9] considers a different set of assumption on the reward function (which remain fixed at each stage) and the transition kernel Q.

Consumption and Investment Problem with a Random Horizon
An investor has an initial wealth of x > 0 and at the beginning of at most N periods (implicitly a random horizon τ with support on 0, 1, . . . , N { } is contemplated) he/she can determine which part of his/her wealth will be consumed and which part will be invested in the financial market given in Appendix B. Amount c n denotes the consumed amount at time n and will be evaluated by an utility function U c . Remaining money will be invested in d risky assets and in a risk-free bond. Terminal riches (X τ ) is judged via another utility function U p . e main problem is designing a strategy of sequential decisions of consumption and investment in order to maximize the sum of his/her expected gains.
In all the sequel, Assumption B.9 will be supposed and no arbitrage opportunities are available. In addition, it is supposed that the domain of both utility functions U c and 2 Advances in Operations Research U p is [0, ∞). In this context, dynamics of the wealth is as follows: where (c n , a n ) is a consumption and investment strategy, i.e., (a n ) and (c n ) are (F n ) adapted, 0 ≤ c n ≤ X n and X 0 � x. e consumption and investment problem previously described may be associated with a Markov decision model with the following components: In this framework, value function is defined as follows: where the supremum is taken over all policies π � (f 0 , . . . , f N−1 ) with f n (x) � (c n (x), a n (x)). Sufficient conditions will be given to propose the solution of the consumption and investment problem with a random horizon with a finite support. Under assumption 1, the proof of the following result can be obtained by using eorem A.6 and the wealth dynamics given in (4) [14]. Its conclusion allows to associate a MDP with a random horizon with support on 0, 1, . . . , N { } with another MDP with a nonhomogenous reward function, deterministic horizon (N + 1), and equally zero terminal cost.

Theorem 1. In the multiperiodic consumption and investment problem, we define functions
Then, there exist maximizers f * n of V n and the strategy (f * 0 , . . . , f * N ) is optimal for the consumption and investment problem.

Exponential Utility Function.
is section deals with a version of the consumption and investment problem with exponential utility functions and a finite random horizon. In this setting, the process that describes the evolution of investor's capital may end before some fixed horizon due to external causes. However, Assumption 1 prevents the decision maker to finish such process because of bad investments, which may lead to drop this process below zero.
Utility functions arise naturally in economics and finance, for example: On the mean-variance approach of Merton and Samuelson, it has already found that a quadratic utility provides a closed-form solution for the portfolio selection under very general conditions; however, on the case of power and the exponential utility function, there is no possibility to find closed-form solutions without information on the distribution of the return process [15]. In addition, in [16], by assuming that a portfolio's returns follow an approximate log-normal distribution, the closed-form expressions of the optimal portfolio weights were obtained for both power and logarithmic utility functions.
In portfolio optimization, in order to maximize the widespread logarithmic utility of some investor, assets whose prices depend on their past values in a non-Markovian way are taken into account [17]. On the same topic [18], Chapter 9 provides a very interesting contribution on the treatment of utility functions, in particular the risk aversion is deeply addressed.
On a similar matter, in [19], it is possible to review a selfcontained survey of utility functions (exponential and power utilities of the first and second kind) together with some of their applications in finance. is reference also discusses the Pareto optimal risk exchanges and presents very illustrative examples dealing with earlier mentioned utility functions.
Exponential utility functions are widely employed because they consider a constant absolute aversion to risk [20][21][22], and they are the only risk-averse increasing utility functions whose risk premium is invariant respect to wealth [11,12]. e term risk aversion refers to the preference for stochastic realizations with limited deviation from the expected value. In risk-averse optimal control, one may prefer a policy with a higher cost in expectation but lower deviations than one with lower cost but possibly higher deviations [13]. In addition, from the technical point of view, if both utility functions U c and U p are of the following form: then they are bounded superiorly, and hence, Assumption B.9 is directly satisfied.
Advances in Operations Research 3 e following assumption is needed by ensuring that optimal consumptions do not exceed available capital.
en, it is possible to deduce the following result for the consumption and investment problem with a (finite) random horizon and risk-averse increasing utility function whose risk premium is invariant with respect to wealth (exponential utility function).
is theorem provides a mechanism for acting optimally with respect to the consumption and risk investments at each stage, such optimal decisions come from the optimization of equations expressed in eorem 1 which are relatively simple; in this case, thanks to the definitions of coefficients b n and k n .

Theorem 2.
Suppose that both U c (c) and U p (x) are exponential utility functions of the form (5) with c > 0. en, it holds that: ∀x ≥ 0, where k n , b n , and v n are given by equations (9)- (11), respectively. In addition, optimal consumption at stage n is and the optimal investment at stage n is Remark 1. Preceding theorem supplies the pursued consumption and investment optimal policy (f * n ) which is such that with ζ * n and a * n as in eorem 2.
Proof 1. First of all, consider the following sets of functions in order to test Assumption A.5

Advances in Operations Research
Consider the transformation: c � ζx and a � S 0 N /S 0 n b(1 + i n+1 )α, hence: where v n is given in (11). (iii) e existence of a maximizer in the set Ψ will be proven. For this, we examine the real function ℓ(ζ) stated as It is possible to discover its maximum through standard optimization techniques. Hence, from Assumption 2, it is observed that the only critical point of ℓ in [0, 1] is as follows: which is a relative maximum via the criterion of the second derivative. By substituting the value of ζ in T n v(x), it is found that: erefore, the maximizer for v is of the form (ζx, a) and T n v(x) ∈ M (iv) Expressions for V n and their corresponding maximizers by utilizing eorem 1 will be attained.
For each x > 0, By an inductive process and following essentially the same lines as those in ii) and iiti), it may be found for n � 1, . . . , N − 1 that: where k N−n and b N−n are expressed in (9) and (10). Additionally, optimal consumption is given by c N−n � ζ N−n x with: and the optimal investment is a □ Remark 2. e earlier theorem states that under its own assumptions, it is possible to explicitly find the optimal strategy; hence, it is not necessary to perform numerical methods for solving the dynamic programming equation at each stage. However, if this was not the case, there exist several papers dealing with complexity of solution algorithms for MDPs for finite state and action spaces. Nevertheless, we refer to the reader to the contribution of Chow and Tsitsiklis [23] where tight lower bounds on the computational complexity of dynamic programming for the case where the state space is continuous, and the problem is to be solved approximately, within a specified accuracy. On the same direction, Section 12.5 of Advances in Operations Research [24] is also relevant for the framework studied in this article.

Numerical Example.
In this section, results of Section 3.1 will be illustrated. For this, consider d � 2 (two risky assets) and that distribution of relative risk random vectors (R n ) may be approximated by a bivariate normal distribution with parameters μ n and Σ n . In this case, it may be found that, and that For the sake of simplicity, we consider that random vectors (R n ) are independent and identically distributed with parameters Σ n+1 � 1/8 0 0 1/8 and μ n+1 � 0.1 0.2 .
Additionally, set c � 0.6 for having a not so flat utility function; over the horizontal axis (bigger values of c lead to utility functions closer to the x-axis), in addition a constant interest rate will be contemplated: i n � 0.05. Finally, it will be established that a random horizon will behave as a binomially distribution with parameters 10 and 0.5. Now, we split the simulation example into two stages:

Stage I: Before Implementing Dynamics of the Wealth
Process. At this stage, it is possible to find corresponding values of v n and a * n . Given the simplifications considered above, it is achievable to find constant values of v n � 0.877204, a * n � (1.26984, 2.53968).

(28)
And the values of b n and k n (that will help on construction of optimal consumptions) observed in Table 1. is is possible since these parameters do not depend on the initial capital or the wealth process.

Stage II: Performing the Wealth Process.
At this stage, the initial capital becomes relevant; hence, Tables 2-4 expose the evolution of relative and absolute optimal consumptions as well as a trajectory of (X n ) with an initial capital equal , and . A decreasing behavior is observed on the wealth process accompanied by an increasing behavior of relative consumption, which leads to an "almost" constant performance of absolute consumption. On the same fashion, Figures 1-3 illustrate same observations but allow to compare trajectories of ζ n , c n , and x n .
In Figure 4, the dynamics of relative consumption are observed by considering different values of c with remaining parameters fixed as before. It may be noticed that practically, no effect is observed.
In Figure 5, sensibility analysis of optimal relative consumption with respect to interest rate is schematized. It is     Figure 6 exhibits a similar shape on absolute optimal consumptions than Figure 3. In addition, (27) implies a decreasing behavior of optimal investments. Furthermore, Figure 7 shows a concave behavior on wealth dynamics as interest rate increases, however, one may remark that: naturally, wealth is bigger for huger values of i.
In order to perform an analysis of execution time by implementing the strategy dictated in eorem 2, an initial capital of with a xed interest rate of .05 and a horizon with discrete uniform distribution will be taken into account. Hence, in Table 5, the support of random horizon is increased as well as the value of parameter c. From such a table, together with Figure 8, it is observed that the value of c has no in uence on execution time and that it poses a slow increasing rate (below the identity function) as maximum value of random horizon grows.
On the same fashion, a discretely uniform horizon will be considered again, an initial capital of but now c 0.3 and interest rate will vary from 0.01 to 0.75. In this setting, it is possible to observe in Table 6 as well as in Figure 9 that the interest rate has no repercussion on execution time and that it follows a moderate rise as a random horizon owns a larger support.
By summarizing, it may be seen that neither the value of parameter of exponential utility function nor the interest rate have an e ect on performance time and that implementation time grows slower than the maximum value of horizon.

Conclusions
In this paper, a consumption and investment problem was studied through a Markov decision process with random horizon of finite support. In this framework, the optimal consumption and investment was obtained via a dynamic programming approach by evaluating consumptions via an exponential utility function.

A. Markov Decision Processes with Fixed Horizon
First of all, it will be defined the main topic of this paper: Markov decision processes, which will be the tool to solve the consumption and investment problem described in section 3. As an initial state, horizon will be considered as a fixed natural number. Hence, in such context, the following definition [2] is provided: } is called the set of admissible actions. For each n ≥ 1, r n : D ⟶ R is a measurable function, which gives the one-stage reward of the system at stage n. g: E ⟶ R is a measurable function, such that g(x) provides the terminal reward if final state is x. Q is a stochastic transition kernel from D to E. Quantity Q(B|x, a) gives the probability that the next state is in B if current state is x and action a is taken.
Remark A.2. Usually, the definition of a Markov decision model considers all its components as time invariant; however, in view of the purposes of this paper, the reward function will depend on time, in fact this condition arises naturally when a random horizon is considered. Additionally, it is also possible to define MDM in a more general setting by considering a time dependence on the set of state-action set, transition functions, and transitions kernels; however, such a paradigm is beyond the interest of this paper, nevertheless, corresponding ideas may be found for example in [14,25].
In Section 3, transition kernel Q is characterized by random variables (Z n ) defined on some measurable space (Z, Z) called the disturbances space. It is assumed that such random variables have a common distribution Q Z which may depend on (x, a) ∈ D and that there exists a measurable function T: D × Z ⟶ E known as the transition function. Here, T(x, a, z) provides the next state of the system when the current state is x, action a ∈ D(x) is taken and disturbance z occurs as follows. Hence, the corresponding transition kernel is defined as follows: In the context of MDM, decisions are modeled via measurable functions from E to A as can be observed in the following definition.
for any x ∈ E, is called the decision rule. Let F denote to the set of all decision rules.
with f n ∈ F is called policy or strategy. e set of this class of policies is denoted by Δ.
One can find more general approaches dealing with policies, a very important reference in that direction is [25]; however, the last definition was adjusted to the intentions of Section 3. e formalization of Markov decision models under a probability space will allow to associate them with some probability measure, and consequently, it will be possible to define the corresponding mathematical expectation.
For this, we contemplate a Markov decision model in N stages, an initial state x ∈ E, π � (f 0 , f 1 , . . . , f N−1 ) a fixed policy, and the canonical probability space guaranteed by the Ionescu-Tulcea eorem [14,25], usually denoted by (Ω, F, P π x ), where Ω � E N+1 and F is the corresponding product σ-algebra. In addition, if ω � (x 0 , x 1 , . . . , x N ) ∈ Ω, the state of the system at time n is modeled via a random variable X n ; for n � 0, 1, . . . , N by On this probability space, (X n ) is called the Markov decision process. Given that, the optimization problems that are treated in this article are related with optimization of expected values of aggregated rewards, the following assumption [14] is considered: In all the sequel, it will supposed that Assumption A.4 holds for a Markov decision process with horizon N.
Contemplated performance criteria of policy π when initial state is x ∈ E is the so-called total expected reward: en, the value function for x ∈ E is defined by the following:

Advances in Operations Research
Functions V(π, x) and V(x) are well defined since A policy π * is called optimal for a N stage MDP, if V(π * , x) � V * (x) for all x ∈ E, [2,25]. e following assumption allows to provide sufficient conditions to establish the existence of optimal policies [14].
|v is measurable}. and Ψ ⊂ F, for n � 0, 1, . . . , N − 1, such that: e dynamic programming technique is expressed in the following theorem, whose proof may be found; for example, in [2,14,25]. Suppose that Assumption A.5 holds, then there exist maximizers f n ∈ Ψ of V n , in addition the deterministic Markov policy π * � (f 0 , . . . , f N−1 ) is optimal, and the value function V * equals V 0 , i.e., (A.8) B. Financial Markets. Financial markets allow an efficient allocation of resources within the economy. rough organized and regulated exchanges, these markets will give to participants a certain guarantee that they will be treated fairly and honestly. In short, it is a platform that allows traders to easily buy and sell financial instruments and securities, for example, stocks, bonds, commercial paper, bills of exchange, debentures, and more. Financial markets lie in the fact that they act as an intermediary between savers and investors, or they help savers to become investors [26,27].
It will be considered a financial market of N-periods with d risky assets and a risk-free bond with considerations treated in [14]. It will be assumed that random variables are defined in a probability space (Ω, F, P) together with a filtration (F n ) with F 0 : � ∅, Ω { }. e financial market is given by the following: A risk-free bond with S 0 0 ≡ 1 and S 0 n+1 ≔ S 0 n 1 + i n+1 , n � 0, 1, . . . , N − 1 (A.9) where i n+1 denotes the deterministic interest rate for period [n, n + 1). ere are d risky assets and the price process of k-th asset is given by S k 0 � s k 0 known and S k n+1 � S k n R k n+1 , n � 0, . . . , N − 1, (A.10) Where R k n+1 > 0 P-a.s. for all k and n. R k n+1 is the relative price change on interval [n, n + 1) for k-th risky asset and process (S k n ) is assumed to be adapted to (F n ) for any k.
Positive random variable R n+1 defines the relative price change S n+1 /S n . Relative risk process (R n ) is defined by R n ≔ (R 1 n , . . . , R d n ) and R k n ≔ R k n /1+ i n+1 − 1, k � 1, . . . , d.
Consider now the following notation, S n ≔ (S 1 n , . . . , S d n ), R n ≔ (R 1 n , . . . , R d n ) and F S n ≔ σ(S 0 , . . . , S n ). As (S n ) is (F n ) adapted it holds that: F S n ⊂ F n for n � 0, 1, . . . , N − 1. It is assumed that (F n ) is the filtration generated by stock prices, that is F n � F S n . Subsequent definition is the main needed mathematical object for investing in the earlier described financial market.
Definition B.7. A portfolio or trading portfolio is a stochastic process (F n )-adapted, ϕ � (ϕ 0 n , ϕ n ) where ϕ 0 n ∈ R and ϕ n � (ϕ 1 n , . . . , ϕ d n ) ∈ R d for n � 0, 1, . . . , N − 1. Random variable ϕ k n denotes the amount of money invested in k-th asset during [n, n + 1). erefore, the wealth process evolves as follows: In order to solve the consumption and investment problem of Section 3, an utility function for evaluating consumptions will be needed.
Definition B.8. A function, U: dom U ⟶ R is called a utility function, if U is strictly increasing, strictly concave and continuous on its domain. e following assumption correspond to the proper version of Assumption A.4 in the Financial market context. Assumption B.9. E‖R n ‖ < ∞. Where ‖R n ‖ � |R 1 n | + . . . + |R d n |, for R n ∈ R d and n � 1, . . . , N.