1. Introduction
The conditional least squares (CL) estimator is one of the most fundamental estimators for financial time series models. It has the two advantages which can be calculated with ease and does not need the knowledge about the innovation process (i.e., error term). Hence this convenient estimator has been widely used for many financial time series models. However, Amano and Taniguchi [1] proved it is not good in the sense of the efficiency for ARCH model, which is the most famous financial time series model.
The estimating function estimator was introduced by Godambe ([2, 3]) and Hansen [4]. Recently, Chandra and Taniguchi [5] constructed the optimal estimating function estimator (G estimator) for the parameters of the random coefficient autoregressive (RCA) model, which was introduced to describe occasional sharp spikes exhibited in many fields and ARCH model based on Godambe's asymptotically optimal estimating function. In Chandra and Taniguchi [5], it was shown that G estimator is better than CL estimator by simulation. Furthermore, Amano [6] applied CL and G estimators to some important time series models (RCA, GARCH, and nonlinear AR models) and proved that G estimator is better than CL estimator in the sense of the efficiency theoretically. Amano [6] also derived the conditions that G estimator becomes asymptotically optimal, which are not strict and natural.
However, in Amano [6], G estimator was not applied to a conditional heteroscedastic autoregressive nonlinear model (denoted by CHARN model). CHARN model was proposed by Härdle and Tsybakov [7] and Härdle et al. [8], which includes many financial time series models and is used widely in the finance. Kanai et al. [9] applied G estimator to CHARN model and proved its asymptotic normality. However, Kanai et al. [9] did not compare efficiencies of CL and G estimators and discuss the asymptotic optimality of G estimator theoretically. Since CHARN model is the important and rich model, which includes many financial time series models and can be assumed as return processes of assets, more investigation of CL and G estimators for this model are needed. Hence, in this paper, we compare efficiencies of CL and G estimators and investigate the asymptotic optimality of G estimator for this model.
This paper is organized as follows. Section 2 describes definitions of CL and G estimators. In Section 3, CL and G estimators are applied to CHARN model, and efficiencies of these estimators are compared. Furthermore, we derive the condition of asymptotic optimality of G estimator. We also compare the mean squared errors of θ^CL and θ^G by simulation in Section 4. Proofs of Theorems are relegated to Section 5. Throughout this paper, we use the following notation: A: Sum of the absolute values of all entries of A.
2. Definitions of CL and G Estimators
One of the most fundamental estimators for parameters of the financial time series models is the conditional least squares (CL) estimator θ^CL introduced by Tjϕstheim [10], and it has been widely used in the finance. θ^CL for a time series model {Xt} is obtained by minimizing the penalty function
(2.1)Qn(θ)≡∑t=m+1n(XtE[Xt∣Ft(m)])2,
where Ft(m) is the σalgebra generated by {Xs:tm≤s≤t1}, and m is an appropriate positive integer (e.g., if {Xt} follows kthorder nonlinear autoregressive model, we can take m=k). CL estimator generally has a simple expression. However, it is not asymptotically optimal in general (see Amano and Taniguchi [1]).
Hence, Chandra and Taniguchi [5] constructed G estimator θ^G based on Godambe's asymptotically optimal estimating function for RCA and ARCH models. For the definition of θ^G, we prepare the following estimating function G(θ). Let {Xt} be a stochastic process which is depending on the kdimensional parameter θ0, then G(θ) is given by
(2.2)G(θ)=∑t=1nat1ht,
where at1 is a kdimensional vector depending on X1,…,Xt1 and θ, ht=XtE[Xt∣Ft1], and Ft1 is the σfield generated by {Xs,s≤t1}. The estimating function estimator θ^E for the parameter θ0 is defined as
(2.3)G(θ^E)=0.
Chandra and Taniguchi [5] derived the asymptotic variance of n(θ^Eθ0) as
(2.4)(1nE[∂∂θ′G(θ0)])1E[G(θ0)G′(θ0)]n(1nE[∂∂θ′G(θ0)]′)1
and gave the following lemma by extending the result of Godambe [3].
Lemma 2.1.
The asymptotic variance (2.4) is minimized if G(θ)=G*(θ) where
(2.5)G*(θ)=∑t=1nat1*ht,aθ,t1*=E[∂ht∂θ∣Ft1]E[ht2∣Ft1]1.
Based on the estimating function G*(θ) in Lemma 2.1, Chandra and Taniguchi [5] constructed G estimator θ^G for the parameters of RCA and ARCH models and showed that θ^G is better than θ^CL by simulation. Furthermore, Amano [6] applied θ^G to some important financial time series models (RCA, GARCH, and nonlinear AR models) and showed that θ^G is better than θ^CL in the sense of the efficiency theoretically. Amano [6] also derived conditions that θ^G becomes asymptotically optimal. However, in Amano [6], θ^CL and θ^G were not applied to CHARN model, which includes many important financial time series models. Hence, in the next section, we apply θ^CL and θ^G to this model and prove θ^G is better than θ^CL in the sense of the efficiency for this model. Furthermore, conditions of asymptotical optimality of θ^G are also derived.
3. CL and G Estimators for CHARN Model
In this section, we discuss the asymptotics of θ^CL and θ^G for CHARN model.
CHARN model of order m is defined as
(3.1)Xt=Fθ(Xt1,…,Xtm)+Hθ(Xt1,…,Xtm)ut,
where Fθ,Hθ:Rm→R are measurable functions, and {ut} is a sequence of i.i.d. random variables with Eut=0, E[ut2]=1 and independent of {Xs;s<t}. Here, the parameter vector θ=(θ1,…,θk)′ is assumed to be lying in an open set Θ⊂Rk. Its true value is denoted by θ0.
First we estimate the true parameter θ0 of (3.1) by use of θ^CL, which is obtained by minimizing the penalty function
(3.2)Qn(θ)=∑t=m+1n(XtE[Xt∣Ft(m)])2=∑t=m+1n(XtFθ)2.
For the asymptotics of θ^CL, we impose the following assumptions.
Assumption 3.1.
(
1
)
u
t
has the probability density function f(u)>0 a.e. u∈R.
(
2
)
There exist constants ai≥0, bi≥0, 1≤i≤m, such that for x∈Rm with x→∞,
(3.3)Fθ(x)≤∑i=1maixi+o(x),Hθ(x)≤∑i=1mbixi+o(x).
(
3
)
H
θ
(
x
)
is continuous and symmetric on Rm, and there exists a positive constant λ such that
(3.4)Hθ(x)≥λ for ∀x∈Rm.
(
4
)
Consider the following
(3.5){∑i=1mai+Eu1∑i=1mbi}<1.
Assumption 3.1 makes {Xt} strict stationary and ergodic (see [11]). We further impose the following.
Assumption 3.2.
Consider the following
(3.6)EθFθ(Xt1,…,Xtm)2<∞,EθHθ(Xt1,…,Xtm)2<∞,for all θ∈Θ.
Assumption 3.3.
(
1
)
F
θ
and Hθ are almost surely twice continuously differentiable in Θ, and their derivatives ∂Fθ/∂θj and ∂Hθ/∂θj, j=1,…,k, satisfy the condition that there exist squareintegrable functions Aj and Bj such that
(3.7)∂Fθ∂θj≤Aj∂Hθ∂θj≤Bj,
for all θ∈Θ.
(
2
)
f
(
u
)
satisfies
(3.8)limu→∞uf(u)=0,∫u2f(u)du=1.
(
3
)
The continuous derivative f′(u)≡∂f(u)/∂u exists on R and satisfies
(3.9)∫(f′f)4f(u)du<∞,∫u2(f′f)2f(u)du<∞.
From Tjϕstheim [10], the following lemma holds.
Lemma 3.4.
Under Assumptions 3.1, 3.2, and 3.3, θ^CL has the following asymptotic normality:
(3.10)n(θ^CLθ0)⟶dU1WU1,
where
(3.11)W=E[Hθ02∂∂θFθ0∂∂θ′Fθ0],U=E[∂∂θFθ0∂∂θ′Fθ0].
Next, we apply θ^G to CHARN model. From Lemma 2.1, θ^G is obtained by solving the equation
(3.12)∑t=m+1n1Hθ2∂∂θFθ(XtFθ)=0.
For the asymptotic of θ^G, we impose the following Assumptions.
Assumption 3.5.
(
1
)
Consider the following
(3.13)Eθ‖aθ,t1*‖2<∞
for all θ∈Θ.
(
2
)
a
θ
,
t

1
*
is almost surely twice continuously differentiable in Θ, and for the derivatives ∂aθ,t1*/∂θj, j=1,…,k, there exist squareintegrable functions Cj such that
(3.14)∂aθ,t1*∂θj≤Cj,
for all θ∈Θ.
(
3
)
V=E[(1/Hθ02)(∂/∂θ)Fθ0(∂/∂θ′)Fθ0] is k×kpositive definite matrix and satisfies
(3.15)V<∞.
(
4
)
For θ∈B (a neighborhood of θ0 in Θ), there exist integrable functions Pθijl(X(t1)), Qθijl(X(t1)), and Rθijl(X(t1)) such that
(3.16)∂2∂θj∂θl(aθ,t1*)iht≤Pθijl(X(t1)),∂∂θj(aθ,t1*)i∂∂θlht≤Qθijl(X(t1)),(aθ,t1*)i∂2∂θj∂θlht≤Rθijl(X(t1)),
for i,j,l=1,…,k, where X(t1)=(X1,…,Xt1).
From Kanai et al. [9], the following lemma holds.
Lemma 3.6.
Under Assumptions 3.1, 3.2, 3.3, and 3.5, the following statement holds:
(3.17)n(θ^Gθ0)⟶dN(0,V1).
Finally we compare efficiencies of θ^CL and θ^G. We give the following theorem.
Theorem 3.7.
Under Assumptions 3.1, 3.2, 3.3, and 3.5, the following inequality holds:
(3.18)U1WU1≥V1,
and equality holds if and only if Hθ0 is constant or ∂Fθ0/∂θ=0 (for matrices A and B, A≥B means AB is positive definite).
This theorem is proved by use of Kholevo inequality (see Kholevo [12]). From this theorem, we can see that the magnitude of the asymptotic variance of θ^G is smaller than that of θ^CL, and the condition that these asymptotic variances coincide is strict. Therefore, θ^G is better than θ^CL in the sense of the efficiency. Hence, we evaluate the condition that θ^G is asymptotically optimal based on local asymptotic normality (LAN). LAN is the concept of local asymptotic normality for the likelihood ratio of general statistical models, which was established by Le Cam [13]. Once LAN is established, the asymptotic optimality of estimators and tests can be described in terms of the LAN property. Hence, its Fisher information matrix Γ is described in terms of LAN, and the asymptotic variance of an estimator has the lower bound Γ^{−1}. Now, we prepare the following Lemma, which is due to Kato et al. [14].
Lemma 3.8.
Under Assumptions 3.1, 3.2, and 3.3, CHARN model has LAN, and its Fisher information matrix Γ is
(3.19)E[1Hθ02(∂Hθ0∂θ,∂Fθ0∂θ)(accb)(∂Hθ0∂θ′∂Fθ0∂θ′)],
where
(3.20)at=ut(f′(ut)/f(ut))+1, bt=(f′(ut)/f(ut)), a=E[at2], b=E[bt2], c=E[atbt].
From this Lemma, the asymptotic variance of θ^GV1 has the lower bound Γ^{−1}, that is,
(3.21)V1≥Γ1.
The next theorem gives the condition that V1 equals Γ^{−1}, that is θ^G becomes asymptotically optimal.
Theorem 3.9.
Under Assumptions 3.1, 3.2, 3.3, and 3.5, if ∂Hθ0/∂θ=0 and ut is Gaussian, then θ^G is asymptotically optimal, that is,
(3.22)V1=Γ1.
Finally, we give the following example which satisfies the assumptions in Theorems 3.7 and 3.9.
Example 3.10.
CHARN model includes the following nonlinear AR model:
(3.23)Xt=Fθ(Xt1,…,Xtm)+ut,
where Fθ: Rm→R is a measurable function, and {ut} is a sequence of i.i.d. random variables with Eut=0, E[ut2]=1 and independent of {Xs;s<t}, and we assume Assumptions 3.1, 3.2, 3.3, and 3.5 (for example, we define Fθ=a0+a1Xt12+⋯+amXtm2, where a0>0, aj≥0, j=1,…,m, ∑j=1maj<1 ). In Amano [6], it was shown that the asymptotic variance of θ^CL attains that of θ^G. Amano [6] also showed under the condition that ut is Gaussian, θ^G is asymptotically optimal.
5. Proofs
This section provides the proofs of the theorems. First, we prepare the following lemma to compare the asymptotic variances of CL and G estimators (see Kholevo [12]).
Lemma 5.1.
We define ψ(ω) and ϕ(ω) as r×s and t×s random matrices, respectively, and h(ω) as a random variable that is positive everywhere. If the matrix
E[ϕϕ′/h]1
exists, then the following inequality holds:
(5.1)E[ψψ′h]≥E[ψϕ′]E[ϕϕ′h]1E[ψϕ′]′.
The equality holds if and only if there exists a constant r×t matrix C such that
(5.2)hψ+Cϕ=o a.e.
Now we proceed to prove Theorem 3.7.
Proof of Theorem <xref reftype="statement" rid="thm1">3.7</xref>.
Let ψ=(∂/∂θ)Fθ0, ϕ=(∂/∂θ)Fθ0, and h=Hθ02, then from the definitions of matrices U, W and V, it can be represented as
(5.3)U=E[ψϕ′],W=E[ψψ′h],V=E[ϕϕ′h].
Hence from Lemma 5.1, we can see that
(5.4)W≥UV1U.
From this inequality, we can see that
(5.5)U1WU1≥V1.
Proof of Theorem <xref reftype="statement" rid="thm2">3.9</xref>.
Fisher information matrix of CHARN model based on LAN Γ can be represented as
(5.6)Γ=E[1Hθ02(∂Hθ0∂θ,∂Fθ0∂θ)(accb)(∂Hθ0∂θ′∂Fθ0∂θ′)]=E[1Hθ02(a∂Hθ0∂θ∂Hθ0∂θ′c∂Fθ0∂θ∂Hθ0∂θ′c∂Hθ0∂θ∂Fθ0∂θ′+b∂Fθ0∂θ∂Fθ0∂θ′)]=E[1Hθ02(a∂Hθ0∂θ∂Hθ0∂θ′c∂Fθ0∂θ∂Hθ0∂θ′c∂Hθ0∂θ∂Fθ0∂θ′)]+E[(f′(ut)f(ut))2]E[1Hθ02∂Fθ0∂θ∂Fθ0∂θ′].
From (5.6), if ∂Hθ0/∂θ=0, Γ becomes
(5.7)Γ=E[(f′(ut)f(ut))2]E[1Hθ02∂Fθ0∂θ∂Fθ0∂θ′].
Next, we show under the Gaussianity of ut that E[(f′(ut)/f(ut))2]=1. From the Schwarz inequality, it can be obtained that
(5.8)E[(f′(ut)f(ut))2]=E[ut2]E[(f′(ut)f(ut))2]≥(E[utf′(ut)f(ut)])2=(∫∞∞xf′(x)dx)2=(∫∞∞xf(x)dx∫∞∞f(x)dx)2=1.
The equality holds if and only if there exists some constant c such that
(5.9)cx=f′(x)f(x).
Equation (5.9) becomes, for some constant k,
(5.10)cx=(logf(x))′,c2x2+k=logf(x),f(x)=e(c/2)x2+k=eke(c/2)x2.
Hence, c is 1, and f(x) becomes the density function of the normal distribution.