Relationship between Maximum Principle and Dynamic Programming for Stochastic Recursive Optimal Control Problems and Applications

This paper is concerned with the relationship between maximum principle and dynamic programming for stochastic recursive optimal control problems. Under certain differentiability conditions, relations among the adjoint processes, the generalized Hamiltonian function, and the value function are given. A linear quadratic recursive utility portfolio optimization problem in the financial engineering is discussed as an explicitly illustrated example of the main result.


Introduction
The nonlinear backward stochastic differential equation (BSDE) was introduced by Pardoux and Peng [1].Independently, Duffie and Epstein [2] introduced BSDE from economic background.In [2], they presented a stochastic differential formulation of recursive utility.Recursive utility is an extension of the standard additive utility with the instantaneous utility depending not only on the instantaneous consumption rate but also on the future utility.As found by El Karoui et al. [3], the utility process can be regarded as a solution to a special BSDE.The optimal control problem that the cost functional is described by the solution to a BSDE is called a stochastic recursive optimal control problem.In this case, the control systems become forward-backward stochastic differential equations (FBSDEs).This kind of optimal control problems has found important applications in real-world problems such as mathematical economics, mathematical finance, and engineering (see Schroder and Skiadas [4], El Karoui et al. [3,5], Ji and Zhou [6], Williams [7], and Wang and Wu [8]).
It is well known that Pontryagin's maximum principle and Bellman's dynamic programming are two of the most important tools in solving stochastic optimal control problems.See the famous reference book by Yong and Zhou [9] for systematic discussion.For stochastic recursive optimal control problems, Peng [10] first obtained a maximum principle when the control domain is convex.And then Xu [11] studied the nonconvex control domain case, but he needs to assume that the diffusion coefficient does not contain the control variable.Ji and Zhou [6] established a maximum principle when the forward state is constrained in a convex set at the terminal time.Wu [12] established a general maximum principle, where the control domain is nonconvex and the diffusion coefficient depends on the control variable.Maximum principle for stochastic recursive optimal control systems with Poisson jumps, and their applications in finance were studied in Shi and Wu [13], where the control domain is convex.
For another important approach-dynamic programming-to study stochastic recursive optimal control problems, Peng [14] (also see Peng [15]) first obtained the generalized dynamic programming principle and introduced a generalized Hamilton-Jacobi-Bellman (HJB) equation which is a second-order parabolic partial differential equation (PDE).Result that the value function is the viscosity solution to the generalized HJB equation is also proved in [14].Wu and Yu [16] extended the results of [14,15] with obstacle constraint for the cost functional described by the solution to a reflected backward stochastic differential equation and proved that the value function is the unique viscosity solution to their generalized HJB equation.Li and Peng [17] generalized the results of [14,15] by considering the cost functional defined by the controlled BSDE with jumps.They proved that the value function was the viscosity solution to the associated generalized HJB equation with integral-differential operators.
Hence, a natural question arises: are there any relations between these two methods?Such a topic was intuitively discussed by Bismut [18] and Bensoussan [19] and then studied by many researchers.Under certain differentiability conditions, the relationship between the maximum principle and dynamic programming is essentially the relationship between the derivatives of the value function and the solution to the adjoint equation along the optimal state.However, the smoothness conditions do not hold in general and are difficult to verify a priori, see Zhou [20] for the deterministic case and Yong and Zhou [9] for its stochastic counterpart.Zhou [21] first obtained the relationship between general maximum principle and dynamic programming using the viscosity solution theory (see also Zhou [22] or Yong and Zhou [9]), without the assumption that the value function is smooth.For diffusion with jumps, the relationship between maximum principle and dynamic programming was first given by Framstad et al. [23,24] under certain differentiability conditions, and then Shi and Wu [25] eliminated these restrictions within the framework of viscosity solutions.For singular stochastic optimal control problem, the relationship between maximum principle and dynamic programming was given by Bahlali et al. [26], with the derivatives of the value function.For Markovian regime-switching jump diffusion model, the relationship between maximum principle and dynamic programming was given by Zhang et al. [27], also with the derivatives of the value function.
In this paper, we derive the relationship between maximum principle and dynamic programming for the stochastic recursive optimal control problem.For this problem, we connect the maximum principle of [10] with the dynamic programming of [14,15], under certain differentiability conditions.Specifically, when the value function is smooth, we give relations among the adjoint processes, the generalized Hamiltonian function, and the value function.For this target, in Section 2, we first adopt some related results of [14,15], which in this paper are stated as a stochastic verification theorem.Also we prove that under additional convexity conditions, the necessary conditions in the maximum principle of [10] are in fact sufficient.In Section 3, we show the relationship between maximum principle and dynamic programming under certain differentiability conditions for our stochastic recursive optimal control problem, by the martingale representation technique.In Section 4, we discuss a linear quadratic (LQ) recursive utility portfolio optimization problem in the financial engineering.In this problem, the state feedback optimal control is obtained by both the maximum principle and dynamic programming approaches, and the relations we obtained are illustrated explicitly.Finally, we end this paper with some concluding remarks in Section 5.
Notations.Throughout this paper, we denote by R  the space of -dimensional Euclidean space, by R × the space of  ×  matrices, and by S  the space of  ×  symmetric matrices.⟨⋅, ⋅⟩ and | ⋅ | denote the scalar product and norm in the Euclidean space, respectively.⊤ appearing in the superscripts denotes the transpose of a matrix.
Given (⋅) ∈ U[, ], we introduce the cost functional Our recursive stochastic optimal control problem is the following.
Remark 1.Because , , ,  are all deterministic functions, then from [15, Proposition 5.1 of Peng], we know that under (H1) and (H2), the above value function is a deterministic function.So our definition ( 6) is meaningful.We introduce the following generalized HJB equation: where the generalized Hamiltonian function  : We have the following result.
This kind of control system was studied by Peng [10] and a maximum principle was obtained.In order to mention his result, we need the following assumption.
The proof is complete.

Applications to Financial Portfolio Optimization
In this section, we consider an LQ recursive utility portfolio optimization problem in the financial engineering.In this problem, the optimal portfolio in the state feedback form is obtained by both maximum principle and dynamic programming approaches, and the relations which we obtained in Theorem 5 are illustrated explicitly.Suppose the investors have two kinds of securities in the market for possible investment choice.
(i) A risk-free security (e.g., a bond), where the price  0 () at time  is given by here   is a bounded deterministic function.
(ii) A risky security (e.g., a stock), where the price  1 () at time  is given by here (⋅) is a 1-dimensional Brownian motion and   ,   ̸ = 0 are bounded deterministic functions with   >   .
Let () denote the total market value of the investor's wealth invested in the risky security which we call portfolio.Given the initial wealth   (0) =  0 ≥ 0, combining (39) and (40), we can get the following wealth dynamics: We denote by U ad the set of admissible portfolios valued in U = R.
For any given initial wealth  0 > 0, Kohlmann and Zhou [28] discussed a mean-variance portfolio optimization problem.That is, the investor's object is to find an admissible portfolio  * () which minimizes the variance Var[  ()] := E[(  () − E[  ()]) 2 ] at some future time  > 0 under the condition that E[  ()] =  for some given  ∈ R. Using the Lagrange multiplier method, we know that it is equivalent to study the following problem: sup where some  ∈ R is given.Using the completion of squares technique, the study of [28] obtained an optimal portfolio in the state feedback form by some stochastic Riccati equation and BSDE.The optimal value function was also obtained.
In this paper, we generalize the above mean-variance portfolio optimization problem to a recursive utility portfolio optimization problem.The recursive utility means that the utility at time  is a function of the future utility (in this section, we do not consider the consumption).In fact, in our framework, the recursive utility can be assumed to satisfy some controlled BSDE.
We consider a small investor, endowed with initial wealth  0 > 0, who chooses at each time  his/her portfolio ().The investor wants to choose an optimal portfolio  * (⋅) ∈ U ad to maximize the following recursive utility functional with generator: where constant  ≥ 0.
Remark 6.In fact, the recursive utility functional (43) defined above stands for some standard additive utility of recursive type.It is a meaningful and nontrivial generalization of the classical standard additive utility and has many applications in mathematical economics and mathematical finance.For more details about utility functions, see Duffie and Epstein [2], Section 1.4 of El Karoui et al. [3] or Schroder and Skiadas [4].More precisely, for any (⋅) ∈ U ad , the investor's utility functional is defined by where   () In fact, in our framework, the wealth process   (⋅) and recursive utility process   (⋅) can be regarded as the solution to the following controlled FBSDE: for  ∈ [0, ], and the value function is given by (76), where   ,   , and   are determined by (61), (62), and (75), respectively.4.3.Relationship.We now can explicitly illustrate the relationships in Theorem 5.In fact, relationship (23) is obvious from (67).And (65) is exactly the relationship given in (25) and (24).

Concluding Remarks
In this paper, we have studied the relationship between maximum principle and dynamic programming for stochastic recursive optimal control problems.Under certain differentiability conditions, we give relations among the adjoint processes, the generalized Hamiltonian function, and the value function.A linear quadratic recursive utility portfolio optimization problem in the financial market is discussed as an explicitly illustrated example of our result.An interesting and challenging problem remains open.For the stochastic recursive optimal control problem, what is the relationship between maximum principle and dynamic programming without the illusory differentiability conditions on the value function?This problem may be solved in the framework of nonsmooth analysis.Viscosity solution theory is certainly a nice tool (e.g., see Yong and Zhou [9]).A new result on stochastic verification theorem for forwardbackward controlled system using viscosity solutions has been published very recently by Zhang [29].However, at this moment, we do not have publishable results for the relationship within the framework of viscosity solutions.We hope to address this problem in the future work.