Nash Equilibrium Strategy for a DC Pension Plan with State-Dependent Risk Aversion: A Multiperiod Mean-Variance Framework

This paper investigates a defined contribution (DC) pension plan investment problem during the accumulation phase under the multiperiod mean-variance criterion. Different from most studies in the literature, where the investor’s risk aversion attitude is state-independent, we choose a state-dependent risk aversion parameter, which is a fractional function of the current wealth level. Moreover, we incorporate the wage income factor into our model, which leads to a more complicated problem than the portfolio selection problems that appeared in relevant papers. Due to the time inconsistency of the resulting problem, we derive the explicit expressions for the equilibrium strategy and the corresponding equilibrium value function by adopting the game theoretic framework and using the extended Bellman equation. Further, two special cases are discussed. Finally, based on real data from the American market, some prominent features of the equilibrium strategy established in our theoretical derivations are provided by comparing them with the results in the existing literature.


Introduction
According to the definition of benefits, pension funds can be divided into defined benefit (DB) pension plans and defined contribution (DC) pension plans. In a DB plan, benefits are defined in advance by the sponsor, and contributions are initially set and subsequently adjusted to keep the fund in balance, where the fund's inadequency risk is borne by the sponsor. In a DC plan, contributions are explicitly defined beforehand and benefits are generated solely from the accumulation of these contributions and the fund's investment returns. For this kind of plans, contributors constitute the only risk taker (Devolder et al. [1]). Due to the expenses and long-term obligations associated with running a DB plan, more and more employers are replacing DB plans with DC plans. The risk management for DC pension plans during the accumulation phase becomes more important because of the volatile benefits affected by the economic environment.
Mean-variance criterion pioneered by Markowitz [2] has been one of the key research topics in financial economics and has stimulated numerous extensions and applications from different perspectives. Typical studies for DC pension plans include Poterba et al. [3], Yao et al. [4], Liu et al. [5], Vigna [6], Guan and Liang [7], Blake et al. [8], and Lin et al. [9]. The investor who considers the mean-variance criterion needs to balance between maximizing the expected value of the terminal wealth and minimizing the risk measured by the variance of the terminal wealth. However, the meanvariance criterion lacks the smoothing property, and the corresponding optimization problem is time-inconsistent in the sense that the Bellman optimality principle is not applicable any more. In other words, the optimal strategy that optimizes the mean-variance utility at the initial time does not remain to be optimal at a latter time. Therefore, at a latter time point, the investor has to commit himself to the initial optimal strategy and disregards the fact that it is no longer optimal. In the literature, this strategy is termed as the precommitment strategy, which has been criticized for lacking rationality. Therefore, time consistency now has become a basic requirement in the field of dynamic risk management and investment analysis (see Artzner et al. [10]). There are basically two ways to handle time-inconsistent problems in the literature. The first approach investigates the precommitted problem, where "optimal" is interpreted as "optimal from the point of the initial time." Under this circumstance, only the admissible strategy that optimizes the initial objective function will be considered regardless of whether it remains to be optimal for the objective function at a latter time. Richardson [11] might be the first scholar who studies a precommitted mean-variance model; he considers only one single stock with a constant risk-free rate in a continuous-time setting. Under the discrete-time setting, Li and Ng [12] adopt an embedding scheme to transform the originally time-inconsistent mean-variance problem into a tractable stochastic linear quadratic control problem, followed by Zhou and Li [13] for the continuous-time case. The other approach formulates the problem under a game theoretic framework, which is also the approach we adopt in the present paper. The primitive idea of this approach can be traced back to Strotz [14]. Later on, Ekeland and Lazrak [15] and Ekeland and Pirvu [16] investigate the timeconsistent strategy for a consumption and investment problem under hyperbolic discounting in continuous time and propose a precise definition of the game theoretic equilibrium strategy for the first time. This work is extended by Björk and Murgoci [17] who consider a general class of objective functions and a general controlled Markov process. Within this framework, the authors formally define the equilibrium concept and derive the extended Hamilton-Jacobi-Bellman (HJB) system in conjunction with the verification theorem. Then, Kryger and Steffensen [18] obtain explicit solutions for several cases including the mean-standard deviation, the endogenous habit formation for quadratic utility, as well as group utility. Further, Björk and Murgoci [19] and Björk et al. [20] extend their own work in Björk and Murgoci [17] and analyze the game theory in detail under the discretetime setting and continuous-time setting, respectively. More results with this approach can be found in Wang and Forsyth [21], Li et al. [22], Wei et al. [23], Wu and Chen [24], Zhao et al. [25], Wu et al. [26], Zeng et al. [27], and so on.
When implementing an optimization problem under the mean-variance framework, another crucial issue is how to correctly assess the investor's risk aversion attitude. Because of its fundamental influence, there has been a renewed interest in studying the risk aversion parameter recently. In Basak and Chabakauri [28], the authors assume a constant risk aversion parameter and derive that the optimal policy, that is, the optimal amount invested in the risky asset, is independent of the current wealth level. Björk et al. [29] point out that that result is economically unreasonable. Then they study a mean-variance problem with the risk aversion parameter taking a fractional form of the current wealth level in the continuous-time setting. This choice of the risk aversion parameter is intuitive and reasonable from the dimension analysis and economic point of view. Wu [30] adopts the same model for the risk aversion parameter and investigates the mean-variance portfolio selection problem in the multiperiod setting. Hu et al. [31] define the risk aversion as a linear function of the current wealth level in a continuous-time setting. Cui et al. [32] and Cui et al. [33] propose a pretty flexible behavioral risk aversion model, which takes a piecewise linear form of the surplus or shortage with respect to a preset investment target in the multiperiod setting and the continuous-time setting, respectively.
With the above in mind, the main goal of this paper is to study the mean-variance problem with a state-dependent risk aversion for a DC pension plan in a multiperiod setting. To the best of our knowledge, this problem has not been studied before. We choose a state-dependent risk aversion model, which is a fractional function of the current wealth level, to describe the risk attitude of the mean-variance investor. Meanwhile, we incorporate the wage income factor into our model, which leads to a more complicated problem than that in Wu [30] and Cui et al. [33]. Using the game theory and the extended Bellman equation, we derive the analytical expressions for the equilibrium strategy and the corresponding equilibrium value function. Finally, we use real data from the American market to illustrate the obtained theoretical results in our paper. Some prominent features of the equilibrium strategy which are quite different from those in the existing literature are identified.
The main contributions of this paper include the following: (1) We choose a state-dependent risk aversion parameter to describe the investor's risk attitude and investigate the multiperiod mean-variance optimization problem for a DC pension plan, which is much more complicated than the portfolio selection problems that appeared in relevant papers.
(2) We obtain the closed form of the equilibrium strategy for our problem, which takes a linear feedback form of the current wealth level and the current contribution level. (3) Our derivation approach is very technique and may enlighten the similar research. (4) Some new prominent features of the equilibrium strategy are observed from our numerical experiments.
The rest of this paper is organized as follows. In Section 2, we introduce the market structure and propose the optimization model. Section 3 proceeds with the solution of our optimization problem. Two special cases are discussed in Section 4. In Section 5, using real data from the American market, we present some numerical results. Finally, concluding remarks are made in Section 6.

Model Formulation
We consider a -period asset allocation problem for a DC pension fund. The financial market under consideration consists of risky assets with random returns and one riskless asset with a deterministic return. Suppose that a representative pension fund member (wage earner) joins a pension plan at time 0 with an initial wealth 0 (≥ 0) and initial wage income 0 (> 0) and plans to retire at time . Wealth can be reallocated among these + 1 assets at the beginning of each of the following ( − 1) consecutive time periods. The deterministic return of the riskless asset at time period (i.e., the investment interval from time to time + 1) is denoted by (> 1) and the random returns of the risky assets at time period are denoted by the vector e = [ 1 , ⋅ ⋅ ⋅ , ] , where the notation represents the transpose operation.
Discrete Dynamics in Nature and Society 3 Before the retirement, the investor needs to contribute a predefined amount of money as premiums at the beginning of each period . Upon the retirement, he can convert the pension fund into an annuity so that he can receive a scheduled pension stream after retirement. And if he dies before the retirement, his heir can withdraw all the money in his pension account.
Similar to that in Yao et al. [4], we denote by the wage income received at time , which is assumed to be uncontrollable in this paper. The dynamics of the wage income can be given as where is an exogenous random variable representing the stochastic growth rate of the wage income over period . We assume that > 0 almost surely for = 0, ⋅ ⋅ ⋅ , −1 to ensure the nonnegativity of the wage income. Suppose that the wage earner contributes a fixed percentage, , of his wage income at the beginning of each period until retirement; here is a constant called the contribution rate. In other words, is the amount of contribution at time . Denote by the wealth of the pension fund at the beginning of the th time period; then after his contribution at time , the wealth of the pension fund is + .

Remark 1.
As pointed out in Chiu and Li [34], if the wage income is controllable, the contribution could be affected by the portfolio; therefore the mean-variance optimization problem for the DC pension plan reduces to a standard portfolio selection problem.

Remark 2.
We do not require > 0 in order to make our model more general. When < 0, can be interpreted as the consumption of the member or the distribution of the pension fund over period . Therefore, our model can also be used to study the DC pension fund management problem in the decumulation phase. When = 0, our model reduces to an ordinary portfolio selection problem.
Let , = 1, ⋅ ⋅ ⋅ , , be the amount invested in the th risky asset at the beginning of the th time period. Then, incorporating the contribution at the beginning of period , the amount invested in the riskless asset at the beginning of th time period is ( + )−∑ =1 . Thus, the wealth process of the pension fund follows the dynamics where P = [ 1 − , ⋅ ⋅ ⋅ , − ] and u = [ 1 , ⋅ ⋅ ⋅ , ] are the excess return vector and the investment strategy at time , respectively. Let (Ω, {F }, P) be a filtered complete probability space, where F represents the information available up to time ; that is, F = {( , ) | 0 ≤ ≤ }. An investment strategy starting from time , u( ) = {u , = , + 1, ⋅ ⋅ ⋅ , − 1} is called time-admissible if u is adapted to F for all = , ⋅ ⋅ ⋅ , − 1. Denote by Θ the collection of all timeadmissible investment strategies.
As that done in most of existing literature, like Li and Ng [12], Wu [30], and Yao et al. [4,5], we make the following assumptions throughout this paper.  Assumption 6. Financial market is frictionless; that is, there are no transaction costs and no taxes, but short-selling is permitted for all risky assets in all periods and borrowing from the money market (at the interest rate ) is also allowed. Meanwhile, we assume that there are no extreme situations in the market. Remark 7. Assumption 3 implies that E[P P ] is positive definite for all = 0, ⋅ ⋅ ⋅ , − 1. Similar assumptions are also used in Li and Ng [12], Wu [30], and Yao et al. [4,5]. It means that assets in the market are not redundant. Assumption 4 is reasonable; it shows that at least two assets have different expected returns. Assumption 5 means that the excess returns of risky assets and wage income are statistically independent among different time periods.
The optimal asset allocation problem for a DC pension fund under the multiperiod mean-variance framework with state-dependent risk aversion is to find the optimal admissible investment strategy of the following problem at the beginning of period : s.t. +1 = + + P u , where E , , [⋅] and Var , , [⋅] are the expectation and variance conditioned on the event { = , = }, respectively.
( ) = / is the state-dependent risk aversion parameter of the investor; here > 0 is the -dependent risk aversion coefficient. 4 Discrete Dynamics in Nature and Society Remark 8. Our risk aversion parameter takes a fractional form of the current wealth level. The intuition behind this setting is clear: the larger the current wealth level, the smaller the degree of the risk aversion. This model is the same as that in Björk et al. [29], where the authors give a clear explanation for this choice from two aspects: the dimension analysis and the economic point of view. We think that this explanation is much more reasonable. Please see Björk et al. [29] for detailed discussions.
A disturbing issue of the above optimization problem is that it is time-inconsistent, since the variance operation does not satisfy the smoothing property; that is, , ∀ > . To derive the time-consistent investment strategy, we formulate the problem in a game theoretic framework, which is adopted in Björk and Murgoci [19]. The basic idea is that we take the decision-making process as a noncooperative game with one decision-maker at each time point. At time , under the current information ( , ), the decision-maker can only choose the current strategy, u , under the condition that he already knows the optimal strategy of decision-maker ,û , for all > . The objective function for decisionmaker is given by ( , ; u( )), where only depends on the strategy restricted to the time interval [ , ]. Given this point of view, we see that the strategy through this process is time-consistent. Nevertheless, since decision-maker can only choose the current strategy, u , which is optimal with respect to the subgame, we call this strategy the subgame perfect Nash equilibrium strategy. The above solution scheme works roughly as follows.

Equilibrium Strategy and Equilibrium Value Function
In this section, we proceed with the solution of Problem ( ) in (3) by formulating it in game theoretic terms. We aim to derive the equilibrium strategy and the corresponding equilibrium value function.
In what follows, we adopt the following definition of equilibrium strategy given by Björk and Murgoci [19].

The Recursion for the Equilibrium Value Function.
We now introduce two function sequences that will play a central role in the later derivation process.

Equilibrium Strategy and Equilibrium Value Function.
In order to obtain the explicit solution to Problem ( ), we construct series of , , , , and , = 0, 1 ⋅ ⋅ ⋅ , − 1, satisfying the following recursive relations and boundary conditions: And With the above definitions, we have the following conclusions regarding , , and .
In what follows, for notational convenience, for any vector M ∈ R , any matrix N ∈ R × , and = 1, ⋅ ⋅ ⋅ , , we define We are now ready to state the main result for our problem.
Remark 13. The condition that > 0 during the whole investment horizon is theoretically necessary, and it is not a severe requirement in real-world applications. We need this condition when using the extended Bellman equation (7). In the real world, noting that there are no extreme situations in the market by Assumption 6, if the initial wealth is large enough and risky assets have stable returns, then the current wealth level can be greater than 0 at each period.

Remark 14.
It is interesting to see from (17) that the equilibrium strategy takes a linear feedback form of the current wealth level and the current contribution level . Further, the equilibrium strategy is independent of the initial information (initial wealth, initial wage, and initial interest rate), which is quite different from the precommitment strategy. This result is consistent with the results in Wu [30] and Wu and Chen [24]. The reason is that the equilibrium investor makes decision based on the forthcoming information, while the precommitment investor aims to find the globally optimal strategy from the state of initial time. Moreover, we see that the return randomness of risky assets characterized by P and the randomness of the income level dictated by affect the equilibrium strategy both separately and coupled together.
Then, the equilibrium strategy (17) can be rewritten aŝ We can view̃= /‖ ‖ and̃= /‖ ‖ as two mutual funds; it is easy to see that (22) reveals a property of "twofund theorem" in the equilibrium strategy. That is to say, the equilibrium strategy is a summation of two portfolios: one is proportional tõand the other is proportional tõ. Moreover, the current wealth level and the investor's risk aversion coefficient only affect the amount invested iñ, while the current wage income level and the contribution rate affect the investment amounts in both̃and̃. Therefore, the investor just needs to strike a balance among these two "mutual funds" and the riskless asset, according to his current wealth level, his current wage income level, his risk aversion coefficient, and his contribution rate. Analogous results of the two-fund theorem with stochastic interest rate and the three-fund theorem with stochastic interest rate and stochastic liability, respectively, are obtained in Yao et al. [35].
We finish this section by giving the mean and variance of the terminal wealth in the following theorem.

Special Cases
In this section, we present two special cases of our model.

Special Case 1.
We assume that the wage income is uncorrelated with all the risky assets. Mathematically, in such case, is uncorrelated with P , = 0, ⋅ ⋅ ⋅ , − 1. So we have By (16), it follows that where +1 , +1 , and +1 are still given by (16). According to (11)- (15) and (26), and are given by (11) and (13), while , , and can be further rewritten as where the boundary conditions are still given by (11)- (15). The results in Theorem 12 can be rewritten as follows.
Theorem 16. For = 0, ⋅ ⋅ ⋅ , − 1, and is uncorrelated with P , if > 0, the equilibrium strategy can be rewritten in the following form: and the corresponding equilibrium value function is given by and , , , , and are defined by (11), (13), and (27)- (29). Remark 17. From (30), we can see that when the wage income is uncorrelated with all the risky assets, the return randomness of risky assets characterized by P and the randomness of the wage income dictated by affect the equilibrium strategy just separately. Moreover,̃reduces tõ ; the equilibrium strategy now is proportional tõ. Thus, the previous "two-fund theorem" now degenerates to the "one-fund theorem." This result is just because we assume that the wage income is uncorrelated with all the risky assets. It reveals that the correlation between the wage income and the risky assets leads to the change from "two-fund theorem" to "one-fund theorem" in the equilibrium strategy.

Special Case 2.
No pension contribution is considered. Under this situation, we only need to set = 0 in our model, which then degenerates to a multiperiod mean-variance portfolio selection model with state-dependent risk aversion. The equilibrium strategy and the corresponding value function can be simplified as follows. if > 0, the equilibrium strategy can be simplified aŝ and the corresponding equilibrium value function can be simplified as where and are defined in (11) and (13).
Remark 19. We can see from (34) that the equilibrium strategy now is just a linear function of the current wealth level and is only affected by the return randomness of risky assets characterized by P , and the corresponding value function only depends on the current wealth level. The equilibrium strategy in (34) becomes a special result of special case 1; thus the previous analysis of "two-fund theorem" to "one-fund theorem" is still valid. Further, it is obvious from (34)-(37) that our results are consistent with the results of Theorem 7 in Wu [30], as we expected.

Numerical Illustration
In this section, using real data from the American stock market, we present some numerical results to illustrate the equilibrium strategy obtained in this paper. Consider an investor who enters a DC pension plan at time 0 with an initial wealth 0 = 1, an initial wage income 0 = 1, and a contribution rate = 0.2 for an investment horizon of = 10. For simplicity, we assume that the risk aversion coefficient and the market parameters are independent of time . Based on the above data set, the monthly risk-free return rate is 0.0115; that is, = 1.0115; other market parameters for = 0, ⋅ ⋅ ⋅ , 9 can be calculated as follows:

Data Set and Parameter
8 Discrete Dynamics in Nature and Society ] .

(38)
On the basis of the above historical data, we can further calculate the related parameters , , , , and backwards by (11)- (15). Table 1 presents the values of , , , , and in each period under different risk aversion coefficients. We can observe from Table 1 that the larger the value of , the smaller the values of , , , , and in each period. The reason behind that is quite straightforward. The larger the value of , the more risk averse of the investor at time . Thus, the investor will choose a relatively safer investment strategy, that is, holding low position in the risky asset. According to (23) and (24), , , , , and are related to the conditional expected value and variance of the terminal wealth. The low positions in risky assets may result in small conditional expected value and variance of the terminal wealth and then small values of , , , , and .

Sensitivity Analysis of the Risk Aversion Coefficient.
In this subsection, we analyze the effects of the risk aversion coefficient on the equilibrium strategy, the equilibrium value function, and the global investment performance, that is, the Sharpe ratio of the terminal wealth achieved by the equilibrium strategy. Parameters 0 , 0 , , and take the same values as those in Section 5.1. It should be pointed out here that graphs in the lower right corner of Figures 1, 3, and 4 are meaningless; we plot them just to show the legend.  Figure 1. We can see from Figure 1 that the amounts invested in three stocks all increase over time. However, as increases, the investment in each stock decreases at each period. This decreasing trend becomes more evident when approaching the terminal time. The reason is that the larger the value of , the more risk averse of the investor at time , which results in a lower investment amount in the risky asset. Note that our risk aversion parameter is the same as the risk tolerance model in Cui et al. [33] essentially. Due to this, our result depicted in Figure 1 is consistent with the results in Cui et al. [33], where they show that the larger the value of − (or + ), that is, the less risk averse of the investor, the larger the absolute value of K − (or K + ), that is, the larger position in the risky asset at each time.
The equilibrium value function at the initial time increases as the risk aversion coefficient increases. We present this tendency in the left-hand side of Figure 2. This result is consistent with the result in Wu and Chen [24], where they show a similar result when the volatility of the risk aversion changes. Wu and Chen [24] point out that the volatility of the risk aversion parameter, as a component of risk, plays an important role in the equilibrium value function. As our risk aversion parameter is state-dependent, even though every risk aversion coefficient increases with the same volatility, the volatility of our risk aversion parameter is still changing.

(3) Effect of the Risk Aversion Coefficient on the Sharpe Ratio of the Terminal Wealth Achieved by the Equilibrium Strategy. Define
as the Sharpe ratio of the terminal wealth achieved by the equilibrium strategy; here 0 is the wealth obtained if we invest all the wealth in the riskless asset. In order to investigate the global investment performance of the equilibrium strategy, we choose this Sharpe ratio as a performance index and examine the relationship between the Sharpe ratio and the risk aversion coefficient. Let the risk aversion coefficient fluctuate as (39) shows; we find that increasing the risk aversion coefficient gives rise to the increasing Sharpe ratio of the terminal wealth. This observation is consistent with the result in the second analysis in this subsection. The variation of the Sharpe ratio with respect to the risk aversion coefficient is depicted in the right-hand side of Figure 2.

Effects of Other Parameters on the Equilibrium Strategy.
To obtain a more comprehensive understanding, we proceed with the analyses of the influence of the investment time horizon and the contribution rate on the equilibrium strategy. In the following experiments, parameters 0 and 0 take the same values as those in Section 5.1; the risk aversion coefficient = 0.5. Let = 0.2 and the investment time horizon increase from 5 to 11 with a stepsize of 2; the equilibrium strategies under different s are demonstrated in Figure 3. We can draw some conclusions from this figure: (1) The allocations in risky assets all increase rapidly during the whole investment horizon for each . (2) The shorter the , the larger the investment in each risky asset at each period. This is easy to understand, since the investor would have less confidence to control the investment uncertainty in the future when the investment horizon becomes longer, which results in smaller investments in risky assets, that is, a relatively safer investment strategy. (3) Wu and Chen [24] point out that the investment in each risky asset takes the same value when approaching the terminal time no matter which value takes. However, no such phenomenon has been found in our result. When it comes to the maturity date, the investment in each risky asset is the highest. The longer the investment time horizon, the larger the amount invested in each risky asset at the end of the investment horizon. Since we contribute to the pension account at each period, our accumulated wealth is basically increasing over the entire investment horizon; then the amount that we can actually invest has been gradually increasing all the time.
Finally, we examine the effect of the contribution rate on the equilibrium strategy. Let = 10 and the contribution rate increase from 0 to 0.3 with a stepsize of 0.1. It is natural to anticipate that a larger investment comes with a higher contribution rate, as it directly leads to a higher accumulated wealth for the pension fund. Obviously, the results shown in Figure 4 support our intuitive idea. Further, we obtain from Figure 4 that the investment in each risky asset increases over time under each . Pay attention to the case when = 0; our model degenerates to a trivial multiperiod mean-variance portfolio selection problem in this case. Compared with the other three cases, the amount invested in each risky asset is the lowest and the speed of increase in the investment amount is the slowest during the whole investment period when = 0. Take = 11 in Figure 3 as a comparative example (note that = 0.2); when there is no pension contribution at each period ( = 0), the investment in each risky asset is lower and increases much more slowly than that of = 11 in Figure 3 during the whole investment horizon, which just illustrates the rationality of the first conclusion in Figure 3.

12
Discrete Dynamics in Nature and Society All the above numerical results and analyses further demonstrate the impact of different parameters in our model on the equilibrium strategy. These observations give us a more intuitive and better understanding of the theoretical results obtained in this paper.

Conclusion
We study the mean-variance optimization problem for a multiperiod DC pension fund, where the risk aversion parameter takes a fractional state-dependent form. As the resulting problem is time-inconsistent, we deal with it within a game theoretic framework and obtain the explicit expressions for the equilibrium strategy and the corresponding value function. We find that the equilibrium strategy takes a linear feedback form of the current wealth level and the current contribution level. We also examine two special cases of our model. Finally, we carry out a series of empirical tests based on real data from the American market, which shed light on some important features of the equilibrium strategy obtained in our theoretical results. Some new observations different from those in the existing literature can be summarized as follows: (1) The equilibrium strategy does not depend on the initial information such as the initial wealth, initial wage income, and initial interest rate.
(2) The equilibrium strategy decreases as the risk aversion coefficient increases and this trend becomes more evident when approaching the terminal time. Moreover, the larger the value of , the larger the value of the equilibrium value function. Last, increasing results in the increasing Sharpe ratio of the terminal wealth achieved by the equilibrium strategy.
(3) The amounts invested in risky assets increase over time. The shorter is, the larger the amount invested in each risky asset at each period is. The longer is, the larger the amount invested in each risky asset at the end of the investment horizon is.
(4) The higher the contribution rate, the larger the investment in each risky asset at each period.

A. Proof of Proposition 11
We prove this proposition by mathematical induction on .
Then is obviously positive definite, since −1 is positive definite by Assumption 3.
By (11) and (13), it follows that For any nonzero -dimensional vector , noting E[P −2 P −2 ] is positive definite by Assumption 3, we have Now, supposing that, for + 1, +1 > 2 +1 and +1 is positive definite, we consider the case for : Note the following expression: Then, we have Therefore, The positive definiteness of can be established similar to that of −1 ; thus we omit it. By the principle of mathematical induction, > 2 holds for = 0, ⋅ ⋅ ⋅ , − 1, and is positive definite for = 1, ⋅ ⋅ ⋅ , . This completes the proof of the proposition.
(B.13) Plugging (B.11) into the above equation and after a series of complicated calculations, we obtain (B.14) Again, plugging (B.11) into (B.10) and taking notice of (16), we can deduce the equilibrium value function as

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.