A Stochastic Dynamic Programming Approach Based on Bounded Rationality and Application to Dynamic Portfolio Choice

Dynamic portfolio choice is an important problem in finance, but the optimal strategy analysis is difficult when consideringmultiple stochastic volatility variables such as the stock price, interest rate, and income. Besides, recent research in experimental economics indicates that the agent shows limited attention, considering only the variables with high fluctuations but ignoring those with small ones. By extending the sparse max method, we propose an approach to solve dynamic programming problem with small stochastic volatility and the agent’s bounded rationality.This approach considers the agent’s behavioral factors and avoids effectively the “Curse of Dimensionality” in a dynamic programming problem with more than a few state variables. We then apply it to Merton dynamic portfolio choice model with stochastic volatility and get a tractable solution. Finally, the numerical analysis shows that the bounded rational agent may pay no attention to the varying equity premium and interest rate with small variance.


Introduction
In reality, how to choose an asset's portfolio of consumption and investment is one of the most important decisions for many people.In modern portfolio choice field, Merton [1,2] provides a general framework for understanding the portfolio demand of long-term investors when investment opportunities change over time.In a classical Merton model [1,2], however, the riskless interest rate, the risky mean rate of return, and the volatility coefficient are usually assumed to be constant.These assumptions are lack of realism, particularly over long time intervals.A large volume of empirical researches in financial market which indicates the assumption that these variables are stochastic volatile and follow a certain stochastic process (e.g., Ornstein-Uhlenbeck process) is more realistic [3,4].But when introducing these stochastic variables into the Merton-style portfolio choice model, the problem becomes increasingly complicated and formidable to solve.Also, this will lead to the "Curse of Dimensionality." Quite a lot of approaches have been developed to deal with this kind of problems, such as martingale methods [5][6][7][8] and various approximate numerical algorithms [9][10][11][12].However, these methods have more restrictive assumptions and are too complex to get a tractable solution of strong explanations.Based on the control of small noise, Judd and Guu [13] proposed a method to solve dynamic programming problems with stochastic disturbance.He makes the simplifying assumption that uncertainty is small and obtains the first-and high-order solutions of complicated dynamic programming model.This method provides a quite suitable solution for dynamic portfolio choice model with stochastic volatility.
On the other hand, a growing body of empirical studies indicate that the agent considers only the variables with high fluctuations but ignores those with small ones [14][15][16].Bordalo et al. [17] showed that the agent rationally chooses to be inattentive to news.Kőszegi and Szeidl [18] analyzed the monetary policy and found out that when price is changed, the decision makers are usually unaware of it.There are also many literatures showing that the agent pays attention to salient factors.Sims [19] uses two empirical strategies to analyze how individuals optimize fully with respect to the incentives created by tax policies and shows that tax salience affects agents' behavioral response.Peng and Xiong [20] study the allocation of investors' attention among different information.They find out that investors with limited attention will focus on macroeconomic and industry information rather than that of a specific firm.Seasholes and Wu [21] demonstrate that attention-grabbing events will attract investors' attention.In their model, they regard them as the proxy variables and their results empirically indicate that these events have a significant impact on the allocation of investor's attention.Maćkowiak and Wiederholt [22] show that decision makers' attention is usually drawn to salient payoffs.
In recent years, Gabaix [23] provides a sparse max operator to model dynamic programming with bounded rationality.In the sparse max, the agent pays less or no attention to some features the fluctuations of which are smaller than some thresholds, and he tries to strike a good balance between the utility loss of inattention and the cognitive cost which can be regarded as the loss for taking time to think about the decisions rather than to enjoy oneself.The sparse max seems more realistic than traditional economic models since it has a very robust psychological foundation.Also, it can deal with problems of maximization with constraints easily and get a tractable solution in a parsimonious way.
However, Gabaix [23] only studies the dynamic programming in a stationary environment without the stochastic volatility terms.But the financial market is strewn with numerous stochastic dynamic programming problems, and these problems are hard to solve due to multitudinous state variables.To address this issue, we extend the sparse max operator and develop a stochastic version of Gabaix's method.The distinctive feature of this method is that it considers the agent's behavioral factors (limited attention) and can effectively preclude the "Curse of Dimensionality" for multiple variables.To verify the validity and practicability of our model, we consider the Merton dynamic portfolio choice problem with stochastic volatility variables (e.g., [24,25]) and get a tractable solution.
The remainder of this paper is organized as follows.Section 2 presents the sparse dynamic programming method proposed by Gabaix [23].Section 3 extends this model and gives a general principle for solving continuoustime dynamic programming with stochastic variables.In Section 4, we apply our method to Merton dynamic portfolio choice.Finally, we discuss some implications of our findings and suggest topics for future research in Section 5.

The Sparse Max Operator without Constraints
We mainly introduce the sparse max operator proposed by Gabaix [23] in this section.In the traditional version, the agent faces a maximization problem: where y = ( 1 ,  2 , . . .,   ),  is a utility function and  is a constraint.Variable  and function  have arbitrary dimensions.For any optimal decision, in principle, thousands of considerations are relevant to the agent.Since it would be too burdensome to take all of these variables into account, the agent is used to discarding most of them.At the same time, his attention is allocated purposefully to important variables.
Hence, the agent might sensibly pick a "sparse" representation of the variables; namely, choose the attention vector m = ( 1 ,  2 , . . .,   ) to replace variable   with    =      ∈ (1, 2, . . ., ), where the superscript  of    represents sparse.The optimal attention vector is obtained by weighing the utility losses for imperfect inattention against the cost savings without thinking too much.
Gabaix [23] assumes the cognitive cost is (  ) = |  |  , where  ≥ 0 and parameter  ≥ 0 is a penalty for lack of sparsity.If  = 0, the agent will be a traditional, rational agent.
Based on above analysis, Gabaix [23] defines the sparse max operator as follows.
Definition 1 (see [23] Sparse max operator without constraints).The sparse max defined by the following procedure.
Step 1. Choose the attention vector m * Define    =  *    as the sparse representation of   .
From Figure 1 we know that the agent will not consider the variable when 0 ≤  2 ≤  ( = 1).
From Figure 2, we know that the agent who seeks "sparsity" should sensibly drop relatively unimportant features.In addition, if the features are larger than that cutoff, they are still dampened: in Figure 2, (, ) is below the 45 degree line (for positive , in general, |(, )| < ||).
Based on the analysis above, we can use the truncation function to represent the sparse agent's optimal action.Remark 2 (see [23]).If rational optimal action is which is obtained by the Taylor expansion around the default action   , then the sparse agent's optimal action is where   is the standard deviation of .

A Stochastic Dynamic Programming Approach Based on Sparse Max Operator
In order to effectively deal with stochastic dynamic programming in finance in this section, we extend Gabaix [23] sparse max operator and propose a bounded rational stochastic dynamic programming model in continuous time.
The general model of stochastic dynamic programming in continuous time is max where  denotes the discount factor,  is the utility function,  is the decision variable which has an arbitrary dimension, the vector x represents important factors which are always considered by the agent, and the vector y defined in Section 2 represents factors that that may not be considered by the sparse agent.(x, y, ), ℎ(x, y) are the state transition function of x and y, respectively.And (x, ), (y) represent the stochastic volatility of x and y, respectively. x ,  y are independent standard Brownian motions; namely  x  y = 0.
Assumption 4. All state variables are stochastic and they are independent of each other; stochastic volatility of x is a function of x and  while stochastic volatility of y is uncorrelated with .
Assumption 5. x is one dimensional; that is, only one variable would be always considered by the agent and other variables may not be considered by the agent.Assumption 6.According to Judd and Guu [13], we assume the variance of each component of vector y is small and independent of one another.
To facilitate analysis, we use  to replace x, denote the stochastic differential equation of   by   = ℎ  ((),   ()) +   (  ())   (),  ∈ (1, 2, . . ., ), and use the notation   [] =   [] +     [] as the total derivative with respect to  (i.e., the full impact of a change in , including the impact it has on a change in the action ).
Based on Remark 2 in Section 2, we have the following proposition.
Proposition 7. The optimal action in bounded rationality model (7) is where   is the standard deviation of .
Proof.See the appendix.
From Proposition 7, we know that, in order to derive the optimal action   , we should get the default action   which is related to  and    .The detail process of solving them is described as the following steps, which contain the main results of our method.
By substituting y = 0 into the basic model (7), we get the default model: max This is a general dynamic programming model in continuous time and the state variable is one dimension, so we can get the optimal default action   and the value function (, 0) = ∫ ∞ 0  − (, , 0) easily [26].
The following Proposition 8 and its proof in the appendix show the result and the process of obtaining    .

Proposition 8. The impact of a change in 𝑦 𝑖 on the value function is
where By implicit function theorem, the impact of a change in   on the optimal action is    = −   /  , where Proof.See the appendix.Now we can get the optimal action based on the two steps above.By the analysis of Proposition 7, we can see that (   ,   /  ) represents the impact of variable   on the action   .When    is smaller than   /  , (   ,   /  ) = 0 which means the agent will discard this factor.

Application: Dynamic Portfolio Choice
where  is the discount factor,  is the utility function, () is the wealth at time ,  is the standard deviation of (), and () is the consumption at time .The investment control () at time  is the fraction of the wealth invested in the risky asset, so (1 − ()) is the fraction of the wealth invested on the riskless asset. is the riskless interest rate and  is the risky mean rate of return;  follows standard Brownian motion.We assume the utility function () = () 1− /(1 − ), where  (0 <  < 1) is the parameter of risk preference.The goal is to choose consumption () and investment () control processes to maximize long-term utility.
In model (13), the riskless interest rate  and the risky mean rate of return  are assumed to be constant [1].However, this assumption is unrealistic, particularly over long time intervals [27,28].Instead, now we assume that these two variables are stochastic and satisfying () =  + r(), () =  + b(), where ,  represent the long mean of the riskless interest rate and the risky rate of return, respectively, and r(), b() are their volatile part.() and () depend on some "economic factor" () [24]; namely,
We assume that () follows an Ornstein-Uhlenbeck process too, so we obtain  b = −  b() +   () where   has the same meaning with   .  is the standard deviation of (), and we assume   and   are small [13].() is a Brownian motion and independent of () and ().Then we get the following model: From Section 3, we know that, for the bounded rational agent, the optimal consumption   and the optimal fraction of wealth allocated to risky market   in model ( 18) can be expressed as where   and   are the default actions when r() = 0, b() = 0.   ,   are the standard deviation of  and , respectively. r,  b,  r, and  b are the impact of r and b on  and , respectively.Next we will give the process of solving them using the approach described in Section 3. Step where Step 2. Solve  r,  b,  r, and  b.
Next we will give the results of  r,  b,  r, and  b.Proposition 9 shows their expressions, and the proof is the solution process.Proposition 9. Based on the results of (22) and implicit function theorem, we have And the final results of model (18) are where Proof.See the appendix.
Proposition 9 makes predictions about the sparse agent's choice.When  = 0, the agent is the traditional, perfectly rational agent.And when  > 0, it is a policy of a sparse agent.Larger  indicates that the agent is less sensitive to fluctuations of both the riskless interest rate and the risky mean rate of return.
From Figure 3, we know that whatever  is, |  r | > 0 and |  b| > 0, which means when the variances of r and b are big, the agent will consider them in the process of making a decision.
Figure 3(a) shows that if  = 0, then the agent reacts like the rational agent: when r goes up by 1%,   r will fall by −2.19% (the agent saves more).For  = 1, if r goes up by 1%,   r falls by −1.85%.This result indicates that the greater the cognitive cost about the factor is, the less attention will be paid to this factor by the boundedly rational agent.From Figure 3(b), we can reach a similar conclusion.
Figure 4 also shows the agent will always consider r and b, that is, |  r| > 0 and |  b| > 0, whatever  is.It addition, we can obtain that if  = 0,   r = −0.34%and   b = 0.34%, which means the rational agent has the same sensitivity about the r and b when deciding   .With the increasing of , the absolute values of   r and   b both will decrease, which means that the agent will pay less attention to them.In other words, the impact of r and b on   will decrease for the increasing cognitive cost.Next, we assume that the standard deviation of riskless interest rate and the risky mean rate of return is smaller,   = 0.15,   = 0.25.By keeping other parameters fixed, we get results shown in Figures 5 and 6.
Figure 5(a) shows that when  ≥ 0.26,   r = 0 which means that if the fluctuation is small the agent may discard r when he decides the optimal consumption.We can get a similar conclusion from Figure 5(b): when  ≥ 0.95,   r = 0 with   = 0.25.
From Figure 5(a), we know that if  = 0,   r = −2.19%while Figure 3(a) also shows if  = 0,   r = −2.19%which means, for the rational agent, the sensitivity of   to r has nothing to do with r's variance.However, the boundedly rational agents have different reactions to r as  increases, such as when  = 0.26,   r = 0 with   = 0.15 in Figure 5(a) while   r = −2.1% with   = 1.5 in Figure 3(a).This disparity indicates that when the cognitive costs are the same and  > 0, that is, the agents have the same boundedly rational degree, more volatile factors will be considered while the factor with smaller variance may be neglected.
Additionally, we can know that when 0.26 ≤  < 0.95, the agent does not react to r, namely,   r = 0 (in Figure 5(a)), but will react to a change in b (in Figure 5(b)), which is more important: the sensitivity of   to b remains high even for a high cognitive friction .Note that this "feature by feature" selective attention could not be rationalized by just a fixed cost to consumption, which is not feature dependent.But when  ≥ 0.95,   r =   b = 0, which indicates that the agent will pay no attention to both r and b once their thinking costs are beyond some thresholds.
Considering Figure 6(a), we can see that when  ≥ 0.11,   r = 0, while Figure 4(a) shows whatever  is,   r > 0 with   = 1.5 which means the smaller the variance of a factor is, the more likely the agent will ignore it.From Figure 6(b), we can also obtain the same conclusion.

Conclusion
Dynamic portfolio choice is an important but complex problem in modern financial field, but extant methods always generate complicated numerical calculations due to numerous state variables.Hence, to address this problem, this paper extends the sparse max operator proposed by Gabaix [23] and proposes a new approach to deal with dynamic programming under stochastic terms under the assumption of the agent's limited attention.We apply this method to Merton dynamic portfolio choice problem and find that it effectively simplifies the model's solution process and avoids the "Curse of Dimensionality." Finally, numerical example shows that this method has significant economic implications and clearly interprets the agent's economic behavior when he makes a portfolio choice.
Our study can be extended in several directions.Future research should consider the condition when the stochastic factors are correlated with each other for it is more realistic.Besides, information faced by the agent is always imprecise and incomplete, and the fuzzy set theory is an important approach to deal with this kind of problem [31][32][33].Hence, using fuzzy set theory to handle imprecise values in dynamic programming may be another direction for further research.

Figure 3 :Figure 4 :
Figure 3: (a) Impact of a change in r on   with   = 1.5,   = 1.7.(b) Impact of a change in b on   with   = 1.5,   = 1.7.

Figure 5 :Figure 6 :
Figure 5: (a) Impact of a change in r on   with   = 0.15,   = 0.25.(b) Impact of a change in b on   with   = 0.15,   = 0.25.