On Local Linear Approximations to Diffusion Processes

Diffusion models have been used extensively in many applications. These models, such as those used in the financial engineering, usually contain unknown parameters which we wish to determine. One way is to use the maximum likelihood method with discrete samplings to devise statistics for unknown parameters. In general, the maximum likelihood functions for diffusion models are not available, hence it is difficult to derive the exact maximum likelihood estimator MLE . There are many different approaches proposed by various authors over the past years, see, for example, the excellent books and Kutoyants 2004 , Liptser and Shiryayev 1977 , Kushner and Dupuis 2002 , and Prakasa Rao 1999 , and also the recent works by Aı̈t-Sahalia 1999 , 2004 , 2002 , and so forth. Shoji and Ozaki 1998; see also Shoji and Ozaki 1995 and Shoji and Ozaki 1997 proposed a simple local linear approximation. In this paper, among other things, we show that Shoji’s local linear Gaussian approximation indeed yields a good MLE.


Introduction
Diffusion processes are used as theoretical models in analyzing random phenomena evolved in continuous time.These models may be described in terms of It ô's type stochastic differential equations dX t A X t , θ dt σ X t , θ dW t , 1.1 where W t t≥0 is a Brownian motion, with some unknown parameters θ to be determined in rational ways.

International Journal of Mathematics and Mathematical Sciences
It is, however, difficulty to derive the maximum likelihood estimator for θ if the diffusion coefficient i.e., the volatility σ is unknown.On the other hand, in practice, the volatility is determined first by using the fact that when σ is a constant.Therefore we will limit ourselves on diffusion models with constant volatility: Since there is no much difference at technical level, we will consider one-dimensional models only.That is, we will assume throughout the paper that W is a one-dimensional Brownian motion, and X is real valued.The distribution μ X T of X t t≥0 over a finite time interval 0, T has a density with respect to the Wiener measure μ W T the law of the Brownian motion W , given by the Cameron-Martin formula: which is in turn the likelihood function with continuous observation.In practice, only discrete values X t 0 , . . ., X t n may be observed over the duration 0, T , where 0 t 0 < t 1 < • • • < t n T and t i − t i−1 δ.The corresponding likelihood function L θ is the conditional expectation under Wiener measure: E L θ | X t 0 , . . ., X t n n j 1 p θ δ, X t j−1 , X t j n j 1 G δ, X t j−1 , X t j , 1.5 where p θ t, x, y is the conditional probability density function of X t given X 0 x, and G t, x, y is the Gaussian density 1/ √ 2πt exp{−|x − y| 2 /2t} see 1 .Since the denominator of 1.5 does not depend on θ, we may simply consider the numerator L X t 0 , . . ., X t n ≡ n j 1 p θ δ, X t j−1 , X t j , 1.6 as a likelihood function.Therefore, the MLE for θ under a discrete observation may be found by solving either explicitly if possible or numerically the likelihood equation ∇L X t 0 , . . ., X t n 0. 1.7 The difficulty with this approach is that, unless for a very special drift vector field A, an explicit formula for p θ t, x, y is not known.To overcome this difficulty, many approximation methods have been proposed in the literature by various authors.The idea is to replace the diffusion model 1.3 by an approximation model for which an explicit formula for the likelihood function is available.One possible candidate is of course the Euler-Maruyama approximation where {ξ j } is an i.i.d.sequence with standard normal distribution N 0, 1 and X 0 X 0 .However, the likelihood function L 1 X 0 , . . ., X n for this model is not, in general, close enough to that of the diffusion model if measured in terms of the ratio of their corresponding likelihood functions The second approach is to discretize the likelihood function dμ X T /dμ W T for continuous observations.In order to utilize this likelihood function, we need to handle the It ô integral here the right-hand side involves only the sample X.This idea to get rid of It ô's integral and replace it by an ordinary one has far-reaching consequences, see the interesting paper 2 for some applications.
One can also use approximations to the probability density function p θ t, x, y and construct functions which are close to the maximum likelihood function.There are a great number of articles devoted to this approach, such as 3-5 , for example.The difficulty, however, is that even f t, x, y is a uniform approximation of p θ t, x, y , there is no guarantee that the approximate likelihood function j f t, x j−1 , x j would tend to j p θ t, x j−1 , x j when n → ∞.
In this paper we consider the linear diffusion approximation proposed by Shoji and Ozaki 6 to the diffusion model 1.3 , which leads to the following approximation of the likelihood function L X t 0 , . . ., X t n : where t j jT/n so that X t j is a sample with fixed duration δ t j − t j−1 over 0, T , and h j t, x, y is the probability transition density of the following linear diffusion model 12 when t j−1 ≤ t < t j and X t j−1 X t j−1 .

International Journal of Mathematics and Mathematical Sciences
The approximation 1.12 is called the local linearization of the diffusion model 1.3 , which has been studied in Shoji and Ozaki 6 .Shoji has showed numerically that the local linearizations do yield better estimates.Shoji's approximation was revisited in Prakasa Rao 7 , without a definite conclusion.
The main goal of the paper is to prove Theorem 3.1 which implies that the local linear approximations 1.12 is efficient for the propose of deriving MLE with discrete samples.
The paper is organized as follows.In Section 2, we present the MLE for linear models such as 1.12 .In Section 3, we state our main result for Shoji's local linear approximation, and give some comments about the conditions on the sampling data.Our main theorem provides a deterministic convergence rate for the likelihood functions.In Section 4, we prove that the likelihood function for the local linear approximation converges to the Cameron-Martin density but only in probability sense.Sections 5, 6, and 7 are devoted to the proof of our main result.In Section 5, we state the main tool, a representation formula for diffusions, established by Qian and Zheng 8 .In Section 6, we develop the main technical estimates in order to prove Theorem 3.1, whose proof is completed in Section 7. Section 8 contains a discussion about the Euler-Maruyama approximation which concludes the paper.

Linear Diffusions
Let us begin with the MLE of parameters a, b, and σ > 0 for the linear diffusion model Mishra and Bishwal 9 discussed a similar model : whose finite-dimensional distributions are Gaussian, determined through the probability transition function h t, x, y .Fortunately we have an explicit formula for h.Indeed the linear equation 2.1 may be solved explicitly and its solution is given by the formula e −a t−s dW s , 2.2 formula 6.8 of Karatzas and Shreve 10 , page 354 , and therefore Suppose we have a discrete sample observed over the equal time scale during the period 0, T , X iT/n , i 0, . . ., n.According to the Markov property, their joint distribution, or the maximum likelihood function L a, b, σ; x 0 , . . ., x n μ x 0 where δ T/n, and μ x is the probability density function of the initial distribution.Therefore the logarithmic of the maximum likelihood function l a, b, σ;

2.5
The maximum likelihood estimates for a, b, and σ are the stationary points of l, that is solutions to the equation ∇l 0. Set ρ e −aT/n .Then a − n/T log ρ and β b/a.

2.6
As an interesting consequence we have the following.
Corollary 2.2.The maximum likelihood estimators a, b, σ to the linear diffusion model 2.1 are not sufficient statistics while a, b, σ, X 0 , X n are sufficient.

Diffusion Models
We consider the diffusion model 1.3 .Our approach and our conclusions are applicable to multidimensional cases as long as the diffusion coefficients are constant.For simplicity, we only consider one-dimensional case.The question is to estimate θ under a discrete observation {x 0 , . . ., x n } over the time scale δ in the time interval 0, T .Then, up to a constant factor, its maximum likelihood function where p t, z, y is the transition probability density of X t we have dropped the subscript θ for simplicity .The approximation maximum likelihood function, proposed in 6 , is given by where h j t, x, y is the transition density function to the linear diffusion model which is the first-order approximation to 1. 3 .
In what follows we assume that A has bounded first and second derivatives and for some constant C 0 > 0 independent of parameters θ.
The main result of the paper is follows.
Theorem 3.1.Assume that A •, θ and A •, θ are bounded uniformly in θ.Let T > 0 be a fixed time and C > 0 be a constant.Suppose {x n j } j≤n (n 1, 2, . ..) is a family of discrete samples such that for all pair j, n such that j ≤ n, n 1, 2, . .., where δ n T/n.Then where L and L 2 are defined in 3.1 and 3.2 with δ δ n T/n.
The convergence in 3.6 happens in a deterministic sense, and therefore conditions such as so that on average we should have |x n j − x n j−1 | 2 ≤ Cδ n .Since X t has continuous sample paths, so that {X ω t : t ∈ 0, T } for a fixed sample point ω is bounded.Since x n j are sampled from the fixed duration 0, T , thus we can assume that {x n j } is bounded, though here we have a countable many samples.It is possible to relax this constraint, for example, we may impose that |x n j | ≤ Cn α with α < 1/2, but for simplicity we only consider the bounded case.This condition is placed as a kind of "integrability" condition on the samples.
From the asymptotic of the transition density function p t, x, y , it is easy to see that lim for each j, while, as our observation {x 0 , . . ., x n } happens over a fixed time interval 0, T , the ratio 3.6 as n → ∞ is really an infinite product, its behavior thus depends on the global behavior of p t, x, y .Although there are many results about bounds of p t, x, y in the literature see 2, 11 e.g. , the best we could find are those which yield 3.8 uniformly in x j , none of them yields the precise limit 3.6 .In fact, the proof of 3.6 depends on careful estimates on p t, x, y through a representation formula established in 8 .

Linear Diffusion Approximations
Without losing generality, we may assume that T 1.Let X j/n be a discrete observation of the diffusion model 1.3 at t j j/n j 0, . . ., n .For simplicity, write X j/n as X j if no confusion may arise.Consider the family of linear diffusions a j A X j−1 , θ .

4.2
Then where δ 1/n.The approximating likelihood function is

International Journal of Mathematics and Mathematical Sciences
We need to compare this function to the likelihood function with continuous observationthe Cameron-Martin density, which, however, should be discounted with respect to the Wiener measure.Thus we have to renormalize L 2 δ against the discrete version of Brownian motion, which is given by where Hence its logarithmic Proof.Let D j X j − X j−1 .Then

4.10
Since b j A X j−1 , θ − a j X j−1 and a j A X j−1 , θ , so that

4.11
However, 12 in probability.The claim thus follows immediately.

A Representation Formula
From this section, we develop necessary estimates in order to prove Theorem 3.1.In this section, we recall the main tool in our proof, a representation formula proved by Qian and Zheng 8 .Based on this formula, we prove the main estimate 6.65 , which has independent interest, in the next section.We conclude the proof of Theorem 3.1 in Section 7. Let x ∈ R. Consider the linear diffusion Our main tool is a representation formula 5.7 discovered in 8 .Let X t , P x be the solution to the linear stochastic differential equation 5.1 .Proposition 5.2.For x, y ∈ R and T > 0 one has p T, x, y h T, x, y 1 where which is a martingale under the probability P x .
To prove 3.6 , we need to estimate the double integral appearing on the right-hand side of 5.7 , which requires a precise estimate for which can be achieved since we know the precise form h T, x, y .Of course, if we knew the joint distribution of U t , X t , our task would be easy, but unfortunately it is rarely the case.
Our arguments are based on the fact that U t is a martingale under P x , together with some delicate estimates for the functional integral P x ∇h T − t, X t , y h T, x, y p , 5.10 which will be done in the next section.

Main Estimates
We use the notations established in the previous section.Let T > 0, x, y ∈ R and d y − x.Then and therefore

6.2
where S T y − x − σ a, T 2 ax b .

6.3
For t ∈ 0, T and p > 1 we set for simplicity.
Lemma 6.1.For any p > 1 one has for all t ∈ 0, T .
Proof.The two inequalities follow from the fact that e 2aT − 1 p e 2aT − 1 − p − 1 e 2a T −t − 1 6.6 assumes its maximum 1 and minimum 1/p. Since

6.15
Then In what follows, we always assume that T > 0 is chosen such that the condition 6.23 is satisfied.Next we estimate D p t , which is provided in the following.Lemma 6.5.Let p > 1.Then where the positive constant C 1 depends only on p, ζ, and C 0 . Proof.Let Then, by the H ölder inequality

6.27
Next we estimate the expectation On the other hand

6.31
International Journal of Mathematics and Mathematical Sciences Lemma 6.6.Let T > 0 satisfy condition 6.23 , x, y ∈ R, and p > 1 and q > 1 such that 1/p 1/q 1.Then p T, x, y h T, x, y ≤ 1 exp e 2a T −t − 1 e aT − 1

6.32
Proof.Since is a martingale under P x , so that

6.34
By the H ölder inequality we deduce that 6.35 Equation 6.32 , follows from the representation 6.9 .

6.36
Proof.We have p × e − pa/ e 2a T −t −1 |y−e a T −t X t − e a T −t −1 /a b| 2 .

6.37
Under the probability P x , In terms of Z t and d y − x

6.40
Making change of variable N 2a e 2aT − e 2a T −t Z t . 6.41 Then, under P x , N has the standard normal distribution N 0, 1 , so that

International Journal of Mathematics and Mathematical Sciences
Let us simplify the last integral.Indeed, set

6.43
Then we rewrite the term appearing in the exponential in the last line of 6.42 .45 the inequality 6.42 may be rewritten as follows:

6.48
Thus 6.46 yields that

6.50
International Journal of Mathematics and Mathematical Sciences Therefore

6.60
In particular D p t dt.

Proof of Theorem 3.1
We are now in a position to prove Theorem 3.1.We may assume that T 1, so that δ n 1/n.Let x n j j 0, 1, . . ., n be discrete samplings with time scale δ δ n 1/n on 0, 1 .By our assumptions, |x n j − x n j−1 | 2 ≤ Cδ n , and |x n j | ≤ C for all pair j, n such that 0 ≤ j ≤ n and n ≥ 1.For simplicity we write x j for x n j if no confusion may arise.In the proof below, we will use C i to denote nonnegative constants which may depend on C, T 1 and the bounds of A and A appearing in our diffusion model 1.3 , but independent of n.
Recall that h j t, x, y is the probability transition density function of the diffusion 3.3 , that is, dX t b j a j X t dt dW t , 7.1 where b j A x j−1 , θ − x j−1 A x j−1 , θ , a j A x j−1 , θ .

The Euler-Maruyama Approximation
Recall that the Euler-Maruyama approximation to 1.3 is a Markov chain given by X j X j−1 A θ, X j−1 δ ξ j δ, 8.1 where {ξ j } is an i.i.d.random sequence, with standard normal N 0, 1 .The conditional distribution of X j given X j−1 x j−1 is Gaussian with mean x j−1 A θ, x j−1 δ and variance δ so that the likelihood function is given as From which we may deduce the following estimate.

2 .
uniformly in θ, in probability with respect to the Wiener measure, where l is the log of the Cameron-Martin density 1.4 .

7 . 2 International 2 j e a j δ 1 2 a j x j−1 b j 2 × e a j−1 δ − 1 a j− 1 ⎛ 1 ≤ C 9 ,
Journal of Mathematics and Mathematical SciencesAccording to 6.65 we have p δ, x j−1 , x j h j δ, x j−1 , x j ≤ 1 C 8 e |a j |δ e p−1 a j / e 2a j δ −1 S j x j−1 − x j − x j−1 .7.4Since a j and x j are bounded,a j x j−1 b j A x j−1 , θ ≤ C 4 1 x j−of Theorem 3.1 is complete.

Proposition 2.1. The
maximum likelihood estimates for the linear diffusion model 2.1 with discrete observations are given by |C| 2 X s ds |C X t | 2p .
× P x {U t C X t K t }dt, t − x − σ a, t 2 ax b t 0 e a t−s dW s .