Terminal-Dependent Statistical Inference for the Integral Form of FBSDE

Backward Stochastic Differential Equation (BSDE) has been well studied and widely applied.Themain difference from the Original Stochastic Differential Equation (OSDE) is that the BSDE is designed to depend on a terminal condition, which is a key factor in some financial and ecological circumstances. However, to the best of knowledge, the terminal-dependent statistical inference for such a model has not been explored in the existing literature. This paper is concerned with the statistical inference for the integral form of Forward-Backward Stochastic Differential Equation (FBSDE). The reason why I use its integral form rather than the differential form is that the newly proposed inference procedure inherits the terminal-dependent characteristic. In this paper the FBSDE is first rewritten as a regression version, and then a semiparametric estimation procedure is proposed. Because of the integral form, the newly proposed regression version is more complex than the classical one, and thus the inference methods are somewhat different from those designed for the OSDE. Even so, the statistical properties of the new method are similar to the classical ones. Simulations are conducted to demonstrate finite sample behaviors of the proposed estimators.


Introduction
The Backward Stochastic Differential Equation (BSDE) was first presented by Bismut [1] for the linear case and by Pardoux and Peng [2] for the general case.The solution of a BSDE consists of a pair of adapted processes (  ,   ) satisfying where  is the generator,   is the standard Brownian motion, and  is the terminal condition.Usually the terminal condition is designed as a random variable with given distribution.
If  meets certain conditions, the BSDE has a unique solution.
The integral form of the BSDE can be expressed as The study history of the BSDE was relatively short but progressed rapidly.In addition to the interesting mathematical nature, its extensive applications gained more and more attentions; see for example Peng [3], Pardoux and Peng [4], Pardoux and Tang [5], Peng and Wu [6], Ma and Yong [7], and Nualart and Schoutens [8].Duffie and Epstein [9] used the BSDE to describe the consumer preferences under uncertain economic environment (i.e., the stochastic differential utility).El Karoui and Quenez [10] stated that in financial markets, prices of many important derivative securities could be solved by a certain BSDE.Lin et al. [11] used an extended statistical model to describe an ecological problem.Furthermore, the BSDE is closely related to nonlinear partial differential equation, more generally, the inseparability of nonlinear semigroup or stochastic control problems.Meanwhile, this type of equation appears frequently in mathematical finance as pointed out by Quenez [12].Recently, Delong [13] introduced the most recent advances in BSDE (including FBSDE) and applied BSDE with jumps to insurance and finance fields.
In terms of the backward equation, within a complete market it serves to characterize the dynamic value of replicating portfolio   with a final wealth  and a special quantity   that depends on the hedging portfolio.Especially when the randomness of (, ) of BSDE comes from the state of the forward equation, the corresponding equation is proved to be a Forward-Backward Stochastic Differential Equation (FBSDE), which can be expressed as   = − (  ,   ,   , )  +     ;   = , with   satisfying   =  (,   )  +  (,   )   .
Compared to the Ordinary Stochastic Differential Equation (OSDE) that contains an initial condition, the solution of the FBSDE is affected by the terminal condition   = (  ).
As is well known, there exist a number of parametric and nonparametric methods to deal with estimation and test for the OSDE.However, these methods can not be directly employed to infer the BSDE and FBSDE because the two models are related to a terminal condition.
For the FBSDE defined above, the statistical inference was investigated initially by Su and Lin [14], Chen and Lin [15], and a relevant model which was proposed by Lin et al. [11].However, they did not take the terminal condition into account in the inference procedure.In the framework of the FBSDE mentioned above, the terminal condition is additional, which is not nested into the equation.Thus, there is an essential difficulty to use the terminal condition to refine the inference procedure.As a result, their methods fail to cover the full problems given in the FBSDE.
As well the FBSDE could turn to the integral form: =  (,   )  +  (,   )   ;  0 = . ( In this paper I focus only on the integral form because it contains the terminal condition as an additive term of the equation.With such a construction, a terminal-dependent inference could be built.I am concerned with the semiparametric estimation of the FBSDE in this paper.Note that   is usually unobservable and  can not be completely specified in the financial market.The problems of interest are therefore to give both proper estimations of the generator  and the process   based on observed data (  ,   ) and the terminal condition .As an initial investigation, this paper only considers the model with generator being parametric structure; that is to say,  can be written in the form of  = (, ,   ,   ), where  is an unknown parameter vector.Even so, such a simplified form is widely used in financial markets, and, furthermore, the proposed methods can be extended to the other complicated forms.
It is worth mentioning that the key point of the method is the use of the integral equation rather than the differential equation.This change leads to a completely new work among the existing researches.Unlike the forward equation, because of the integral, the cumulative error appears not neglectable; nevertheless, the resultant estimation is still asymptotically unbiased for the condition of mixing dependency of   attached.Another difference from the ordinary model is that the generator contains the unobservable process   , and then it is necessary to estimate   first.After plugging the estimator of   into the generator, I could infer generator  with the newly proposed methods.
The paper is organized as follows.In Section 2, the FBSDE is first rewritten as a special regression, and, by this representation, the estimation procedure for the FBSDE with linear generator is designed.Next I discuss the asymptotic properties in Section 3. A supplement for the inference of equation is suggested, and an extension for nonlinear model is briefly discussed in Section 4. Simulation study is proposed in Section 5 to illustrate the methods.The proofs of the theorems are presented in Section 6.

Terminal-Dependent Semiparametric
Estimation for the FBSDE where {  } ≥0 is the Brownian motion and  is a smooth function.Here the generator  is a function of ,   , and   , with   being usually unobservable.Furthermore, the adapted process   ,   and terminal condition could be indicated as a function of   .As is known to everyone, the existence and uniqueness result of the FBSDE have been studied elaborately.This section is intended to represent the FBSDE as a statistical framework and then address the proper estimators of  and   based on observed data {  ,   } and the terminal condition .
To recast the model ( 6) as a statistical model, I first examine the property of the last term of the first equation in (6).By the property of Itô integral and the relation between the two equations in (6), I have Then I regard − ∫       as error and consequently rewrite the first equation of model (6) as where  t is the error term with mean zero and bounded variance, and the adapted process   ,   and terminal condition  depend on   via the second equation of (6).
Remark 1.It seems that formula (8) proposes a regression that is determined by both expectation and variance frameworks.However, such a regression is quite unlike the classical one.
In the newly defined structure, although the expectation of the error is zero, the conditional expectation of the error is nonzero.Even so, the resultant estimation is asymptotically unbiased, and thus the consistency of the estimators defined below still holds because of the condition of mixing dependency of   given below; for details see the following theorems and the proofs of the theorems.
Given the initial calendar time point  1 , I record the observed time series data {(  ), (  ),  = 1, . . ., } at the equally spaced time points {  =  1 + ( − 1)Δ,  = 1, . . ., } ⊆ [0, ].Denote Δ  =  +1 −   (= Δ) for 1 ≤  ≤  − 1 and Δ  =  −   .Note that Δ  is the distance between the last observation time   and the terminal time ; indeed it may be quite large and then makes the following formula (9) inaccurate.Therefore I first assume Δ  small enough, that is, Δ  = (Δ), and then propose an adjustment in Section 4 for the case with larger Δ  .On the other hand, since the distribution of  is supposed known, I can get the samples In this section I assume  can be expressed as linear function (,   ,   ) =  +   +   , where , , and  are unknown parameters.Then the model ( 8) can be approximately rewritten as where ξ = (1/) ∑  =1   and ]  =  − ξ, satisfying (]  ) = 0 and Var(] This is the statistical version of (8), a new regression model.It is worth mentioning that the new model ( 9) is somewhat different from the classical regression; that is, in addition to the mean-variance structure, the new one has a complicated structure and contains a terminal information.

Semi-Parametric
Estimation for the FBSDE.I now turn to estimating unknown parameter vector  = (, , )  in model (9).While the generator contains unobservable interesting process   , it is necessary to estimate  for plugging the estimator into the generator firstly.After that, the common parametric estimation methods can be employed to estimate parameters.
Concerning inference of   , despite the connection between   and the variance of  t in (9), the second formula of ( 9) is related to the weighted sum of  2  , which causes inconvenience for estimating   by residual-based method.I now adopt a difference-based method instead.
Denote ( ,  ,  ,  ) by (  ,   ), respectively, for short.By using the Taylor's expansion of (, ) = ((, ) − (,   )) 2 , then L(,   ) =  2  (,   ) 2 (,   ), and Finally an approximation of  2  could be expressed as that is, By (19), I regard  2 ( 0 ) as point-wise nonparametric regression function.For simplicity, here the N-W kernel estimator is taken as an example of nonparametric smooth estimators: where  ℎ (⋅) = (⋅/ℎ)/ℎ, (⋅) is the kernel function satisfying the regularity condition given below and ℎ is the bandwidth or smoothing parameter.Similarly, if   also depends on  besides   , the corresponding estimator could be Since having calculated Ẑ2  , I plug it in the first formula of ( 9), obtaining From the above, it is simple to deduce the estimator of  = (, , )  with common parametric methods, the least square method for example, by minimizing For simplicity, denote Finally, I can write the estimator as

Asymptotic Results
The following two theorems are concerned with asymptotic properties of the estimators deduced in the previous section.First of all, I lead in several conditions.
(c) The continuous kernel function (⋅) is symmetric about 0, with a support of interval [−1, 1], and where the matrix Σ is nonsingular and satisfies with  min (Σ) and  max (Σ) being the smallest and largest eigenvalues of Σ, respectively.
The condition (a) is commonly used for the weakly dependent process; see for example Rosenblatt [18,19], Kolmogorov and Rozanov [20], Bradley and Bryc [21], Lin and Lu [22], and Su and Lin [14].The condition (b) is also reasonable because, as is shown by (19),   can be regarded as the deviation between the adjacent two observations.The condition (c) is standard for nonparametric kernel function, and the condition (d) is obviously common because it describes the property of average.Furthermore, as remarked in the previous section, to express the estimator related to   rather than model variables   and   , I apply conditions mainly on the latent variable   , including the stationary mixing Markov character used in the following theorems.Actually the process {  } may be unstationary.

Theorem 3. In addition to the condition of Theorem 2, if the condition (d) holds, then as
where  2 = Var(/T).
The proof is also presented in Section 6.The result is eventually standard in the sense of asymptotic normality with the convergence rate of order √.As was shown in the remark given in the previous section, even the conditional mean of error of the model is nonzero, the newly proposed estimation is consistency because of the mixing dependency; for details see the proof of Theorem 3. Furthermore, because of the terminal condition, the asymptotic variance is larger than that without use of the terminal condition.

Supplement and Extension
4.1.Supplement.As is mentioned in Section 2.1, when the last observation   is far away from the terminal , the new model (9) appears inaccurate.In this case I need an adjustment to obtain a relatively accurate model.The main steps of adjustment are defined as follows: first I ignore the terminal condition to obtain both the accurate model and parameters estimations limited in (0,   ); next I estimate the unobservable variables in the interval (  , ) by the first step estimated model; finally, I substitute the estimators for the unobservable variables in (  , ) and build a relatively accurate model defined in the whole interval [0, ] and related to the terminal condition.
For arbitrary  ≤   , This equation is accurate and thus I can get the estimators ĝ and Ẑ of  and   for  ≤   by the methods given in Section 2. When  >   , this method is however unsuitable for estimating   because it cannot be extrapolated to the interval (  , ), so I attempt to complete the data within this interval.Set   <  +1 < ⋅ ⋅ ⋅ <  + < .Discretize model (32) and write its forward linear version as Similar to formula ( 9), the expectation-variance structure is shown as To estimate the unobservable data ((  ), (  )) for  =  + 1, . . .,  + , I treat   as being parameterizable.It is known by, for example, Morris [23] that variance can be expressed as the quadratic function of mean for several common distributions, such as normal, gamma, binomial, negative binomial, and Poisson.For the mean-variance structure in (34), I might as well suppose the following parametric structure: for some parameters   .By simply transforming and neglecting (Δ 2 ) terms, I see where  2 = 1 or −1, and denote  = ( 1 ,  2 ,  3 ,  4 ).
Let (  ) =  1 +  2 √ 3 +  4 (  ) (1 ≤  ≤ ) and plug (  ) into (32).I then could get the estimators through the methods in Section 2; denote by b , c , and ω the estimators of ,  and , respectively.Finally I could refine the original orbit and estimate ( +1 ) one by one, more precisely, Iterating the above procedures, I obtain the complete data in [0, ].Consequently, the same approaches as in Section 2 could be performed again, and a refined estimator of  could be constructed.

Extension.
Consider that the semiparametric models in Section 2 are of linear structure in the sense that  is linearly related to parameters , , and .However, some generators are nonlinear in parameters; thus the resulting model ( 9) will be nonlinear.For example Constantinides [24] presented the resulting model with the specification form: See for other examples Fan [25], Fan and Zhang [26], Chan et al. [27], and Aït-Sahalia [28].
Then, for the flexibility of modeling the above case, a nonlinear semiparametric model can be defined as where   = ∫       satisfying (  ) = 0,  is a given function, and  is an unknown -dimensional parameter vector.
Before estimating nonlinear model (39),   can be estimated similarly by (19) or (20) because its estimator is free of the structure of .Furthermore, the resulting estimator has the same asymptotic properties as in Theorem 2. Thus I only focus on the estimation of parameter vector  here.
After plugging the estimator Ẑ of   into the first formula of (39), I can adopt a common method to obtain an estimator of , for example, by minimizing where ĝ () = (,   ,   , Ẑ ).Under regularity conditions, I can also get θ by solving the following equation: where Q () denotes the derivative of Q().By the similar arguments used in the previous section, the resultant estimator is normally distributed; the details are omitted here.

Simulations
In this section I investigate the finite-sample behaviors by simulation.Despite Theorems 2 and 3 based on stationarity of   , I also extend this method to nonstationary process such as Geometric Brownian Motion.I use the mean, standard deviation (STD) or mean square error (MSE) to evaluate the estimations, based on 300 repetitions.Apparently, the model with stationary condition will work better.
Example 4. Consider Cox-Ingersoll-Ross (CIR) process: This model describes the interest rate dynamic system and is stationary when 2 ≥  2 .On the other hand, the riskless asset with price per unit   is conducted as follows: with  being the constant short rate.Let  0 () and  1 () denote the quantities invested in bond   and asset   , respectively.Naturally the total wealth process   satisfies   =  0 ()  +  1 ()  .Similar with the classic self-financing FBSDE model in El Karoui et al. [29], the resulting model is Denote parameters  =  1 ,  = , and  = ( + )/ 1  2 .I put the equal length Δ = 0.4 of time period and choose sample size  = 300.So the time interval is [0, 120].The terminal time is chosen as 122, which is quite near the former one.Let  = 0.2,  = 0.06,  = 0.08,  = 0.05, and  0 =  1 = 10.In the estimation procedure, I use the Gaussian kernel defined by () = (1/ √ 2) exp(− 2 /2); meanwhile the optimal bandwidths would be ( −1/5 ) theoretically, and popular data-driven method can also be used, such as CV, GCV, or plug-in approach.In the simulation, I set ℎ = std() −1/5 for simplicity.The simulation results with other choices are similar.I present the true curves and the N-W nonparametric estimation curves for   and generator  and report the mean and MSE of estimator β of  ≜ (, , ), respectively, in Table 1.These results show that the estimators of  and  work well.However, because of the plug-in estimator Ẑ , the estimator of the coefficient  has fairly large bias and the MSE.On the other hand, Figure 1 shows that the estimation curves of drift and diffusion are closed to the true ones.
Example 5.In this part I consider the case that the terminal time is far away from the last observed time, as mentioned in where   is Geometric Brownian Motion for modeling stock price satisfying while the riskless asset is the same as formula (43),   =  0 .Firstly, let  = 0.1,  = 0.01, Δ = 0.12,  = 300,  = 36.6,and  0 =  1 = 10.Obviously   =  1   .I choose the same pattern kernel function and bandwidth ℎ = std() −1/5 .Table 3 reports the simulation results.The results show that the estimators of  work well, but  have larger bias and the STD because of the plug-in estimator Ẑ .While the curves can still be fitted well, that is, the estimated curves of drift and diffusion are closed to the true ones, Figure 3 presents the estimated curves for diffusion   and drift  by one simulation.
Finally, I choose  relatively large as 0.05 and 0.12, which display different extension of volatilities.From Tables 4 and  5 and Figures 4 and 5, I can see that their performances are not so bad, which means that the approach could be applied more widely.

= ((𝑍
Furthermore, From the conditions of Markov process and -mixing coefficient, (50) To my interest, both the conditional expectation and variance are independent of C, so the condition could be erased.From Lemma 1 of Politis and Romano [30] and the relation between the -mixing condition and the -mixing condition (e.g., Theorem 1.1.1 of [22]), I can ensure that {( +1 −  ) 2 ,  = 1, . . ., −1} is a -mixing-dependent process and the mixing coefficient, denoted by   (), satisfies where  is a positive constant.Finally, I use the Central Limit Theorems for -mixing-dependent process (e.g., Theorem 4.0.1 of [22]) to complete this proof.
Proof of Theorem 3. I present the basic results for (1/√)( Û − )  , which leads to rate of convergence and asymptotic expansions.Similar to Cui et al. [31] or Su and Lin [14], I need the following decomposition: This completes the proof.

Figure 1 :
Figure 1: The real lines are the true curves of   and function  respectively, and the dashed ones are estimated curves for them in Example 4.

Figure 2 :Figure 3 :
Figure 2: The real lines are the true curves of   and function (), respectively, and the dashed ones are estimated curves for them in Example 5.

Figure 4 :Figure 5 :
Figure 4: The real lines are the true curves of   and function (), respectively, and the dashed ones are estimated curves for them with  = 0.05.

Table 1
The distance between  = 124 and   = 120 is larger than that in Example 4. I add 10 estimated points by the method given in Section 4 and employ the same model and parameters as before.Table2reports the simulation results, which tells us that the parameter estimators do not perform as well as before but still feasible.Besides, Figure2presents the estimated curves for   and , which also perform well although they are not better than the estimations in Example 4.Example 6.I turn to the nonstationary case in this part.Obviously when forward process   does not satisfy the stationary condition, this cumulate effect induced by backward addition performs more significantly, which makes the statistical inference quite a challenge.Under this situation, I choose certain model and parameters to control the relative stationarity.