Joint Estimation Using Quadratic Estimating Function

A class of martingale estimating functions is convenient and plays an important role for inference for nonlinear time series models. However, when the information about the first four conditional moments of the observed process becomes available, the quadratic estimating functions are more informative. In this paper, a general framework for joint estimation of conditional mean and variance parameters in time series models using quadratic estimating functions is developed. Superiority of the approach is demonstrated by comparing the information associated with the optimal quadratic estimating function with the information associated with other estimating functions. Themethod is used to study the optimal quadratic estimating functions of the parameters of autoregressive conditional duration ACD models, random coefficient autoregressive RCA models, doubly stochastic models and regressionmodels with ARCH errors. Closed-form expressions for the information gain are also discussed in some detail.


Introduction
Godambe 1 was the first to study the inference for discrete time stochastic processes using estimating function method.Thavaneswaran and Abraham 2 had studied the nonlinear time series estimation problems using linear estimating functions.Naik-Nimbalkar and Rajashi 3 and Thavaneswaran and Heyde 4 studied the filtering and prediction problems using linear estimating functions in the Bayesian context.Chandra and Taniguchi 5 , Merkouris 6 , and Ghahramani and Thavaneswaran 7 among others have studied the estimation problems using estimating functions.In this paper, we study the linear and quadratic martingale estimating functions and show that the quadratic estimating functions are more informative when the conditional mean and variance of the observed process depend on the same parameter of interest.This paper is organized as follows.The rest of Section 1 presents the basics of estimating functions and information associated with estimating functions.Section 2 presents the general model for the multiparameter case and the form of the optimal quadratic estimating function.In Section 3, the theory is applied to four different models.
Suppose that {y t , t 1, . . ., n} is a realization of a discrete time stochastic process, and its distribution depends on a vector parameter θ belonging to an open subset Θ of the p-dimensional Euclidean space.Let Ω, F, P θ denote the underlying probability space, and let F y t be the σ-field generated by {y 1 , . . ., y t , t ≥ 1}.Let h t h t y 1 , . . ., y t , θ , 1 ≤ t ≤ n be specified q-dimensional vectors that are martingales.We consider the class M of zero mean and square integrable p-dimensional martingale estimating functions of the form where a t−1 are p × q matrices depending on y 1 , . . ., y t−1 , 1 ≤ t ≤ n.The estimating functions g n θ are further assumed to be almost surely differentiable with respect to the components of θ and such that E ∂g n θ /∂θ | F y n−1 and E g n θ g n θ | F y n−1 are nonsingular for all θ ∈ Θ and for each n ≥ 1.The expectations are always taken with respect to P θ .Estimators of θ can be obtained by solving the estimating equation g n θ 0. Furthermore, the p × p matrix E g n θ g n θ | F y n−1 is assumed to be positive definite for all θ ∈ Θ.Then, in the class of all zero mean and square integrable martingale estimating functions M, the optimal estimating function g * n θ which maximizes, in the partial order of nonnegative definite matrices, the information matrix is given by . This is a more general result in the sense that for its validity, we do not need to assume that the true underlying distribution belongs to the exponential family of distributions.The maximum correlation between the optimal estimating function and the true unknown score justifies the terminology "quasi-score" for g * n θ .Moreover, it follows from Lindsay 8, page 916 that if we solve an unbiased estimating equation g n θ 0 to get an estimator, then the asymptotic variance of the resulting estimator is the inverse of the information I g n .Hence, the estimator obtained from a more informative estimating equation is asymptotically more efficient.

General Model and Method
Consider a discrete time stochastic process {y t , t 1, 2, . ..} with conditional moments

2.1
That is, we assume that the skewness and the excess kurtosis of the standardized variable y t do not contain any additional parameters.In order to estimate the parameter θ based on the observations y 1 , .
a the optimal estimating function is given by g When the conditional skewness γ and kurtosis κ are constants, the optimal quadratic estimating function and associated information, based on the martingale differences m t y t − μ t and s t m 2 t − σ 2 t , are given by 2.10

Autoregressive Conditional Duration (ACD) Models
There is growing interest in the analysis of intraday financial data such as transaction and quote data.Such data have increasingly been made available by many stock exchanges.Unlike closing prices which are measured daily, monthly, or yearly, intra-day data or highfrequency data tend to be irregularly spaced.Furthermore, the durations between events themselves are random variables.The autoregressive conditional duration ACD process due to Engle and Russell 10 had been proposed to model such durations, in order to study the dynamic structure of the adjusted durations x i , with x i t i − t i−1 , where t i is the time of the ith transaction.The crucial assumption underlying the ACD model is that the time dependence is described by a function ψ i , where ψ i is the conditional expectation of the 6 Journal of Probability and Statistics adjusted duration between the i − 1 th and the ith trades.The basic ACD model is defined as where ε i are the iid nonnegative random variables with density function f • and unit mean, and F x t i −1 is the information available at the i − 1 th trade.We also assume that ε i is independent of It is clear that the types of ACD models vary according to different distributions of ε i and specifications of ψ i .In this paper, we will discuss a specific class of models which is known as ACD p, q model and given by where ω > 0, a j > 0, b j > 0, and max p,q j 1 a j b j < 1.We assume that ε t 's are iid nonnegative random variables with mean μ ε , variance σ 2 ε , skewness γ ε , and excess kurtosis κ ε .In order to estimate the parameter vector θ ω, a 1 , . . ., a p , b 1 , . . ., b q , we use the estimating function approach.For this model, the conditional moments are μ t μ ε ψ t , σ and the information gain in using g * Q θ over g * S θ is

Random Coefficient Autoregressive Models
In this section, we will investigate the properties of the quadratic estimating functions for the random coefficient autoregressive RCA time series which were first introduced by Nicholls and Quinn 11 .
Consider the RCA model where {b t } and {ε t } are uncorrelated zero mean processes with unknown variance σ 2 b and variance σ 2 ε σ 2 ε θ with unknown parameter θ, respectively.Further, we denote the skewness and excess kurtosis of {b t } by γ b , κ b which are known, and of {ε t } by γ ε θ , κ ε θ , respectively.In the model 3.6 , both the parameter θ and β σ 2 b need to be estimated.Let θ θ, β , we will discuss the joint estimation of θ and β.In this model, the conditional mean is μ t y t−1 θ then and the conditional variance is σ 2 t y 3.14 implies that I CLS θ ≤ I g * M θ .Hence the optimal estimating function is more informative than the conditional least squares one.The optimal quadratic estimating function based on the martingale differences m t and s t is given by 3.8 and 3.11 , respectively.It is obvious to see that the information of g * Q θ is larger than that of g * M θ .Therefore, we can conclude that for the RCA model, , and hence, the estimate obtained by solving the optimal quadratic estimating equation is more efficient than the CLS estimate and the estimate obtained by solving the optimal linear estimating equation.

Doubly Stochastic Time Series Model
Random coefficient autoregressive models we discussed in the previous section are special cases of what Tjøstheim 12 refers to as doubly stochastic time series model.In the nonlinear case, these models are given by

3.17
where u 0 0 and v 0 σ 2 a .Hence, the conditional mean and variance of y t are given by

3.19
Then, the inequality 3.20 implies that that is, the optimal linear estimating function g * M θ is more informative than the conditional least squares estimating function g CLS θ .
The optimal estimating function and the associated information based on s t are given by g * S θ −

3.22
Hence, by Theorem 2.1, the optimal quadratic estimating function is given by

3.24
It is obvious to see that the information of g * Q is larger than that of g * M and g * S , and hence, the estimate obtained by solving the optimal quadratic estimating equation is more efficient than the CLS estimate and the estimate obtained by solving the optimal linear estimating equation.Moreover, the relations

3.27
Moreover, the information matrix for θ β , α is given by

Conclusions
In this paper, we use appropriate martingale differences and derive the general form of the optimal quadratic estimating function for the multiparameter case with dependent observations.We also show that the optimal quadratic estimating function is more informative than the estimating function used in Thavaneswaran and Abraham 2 .Following Lindsay 8 , we conclude that the resulting estimates are more efficient in general.
Examples based on ACD models, RCA models, doubly stochastic models, and the regression model with ARCH errors are also discussed in some detail.For RCA models and doubly stochastic models, we have shown the superiority of the approach over the CLS method.

the optimal quadratic estimating function with independent observa- tions. For the discrete time stochastic process {y t }, the following theorem provides optimality of the quadratic estimating function for the multiparameter case. Theorem 2.1. For
. ., y n , we consider two classes of martingale differences {m t θ y t − μ t θ , t 1, . . ., n} and {s t θ the general model in 2.1 , in the class of all quadratic estimating functions of the form That is, m t and ψ t are uncorrelated with conditional variance m t and ψ t , respectively.Moreover, the optimal martingale estimating function and associated information based on the martingale differences ψ t are Then, the quadratic estimating function based on m t and ψ t becomes * Q θ − I g * S θ is given by 2.6 Proof.We choose two orthogonal martingale differences m t and ψ t s t − σ t γ t m t , where the conditional variance of ψ t is given by ψ t m t s t − m, s 2 t / m t σ 4 t κ t 2 − γ 2 t .
15where {θ b t } of 3.6 is replaced by a more general stochastic sequence {θ t } and y t−1 is replaced by a function of the past, F of a t .Under the normality assumption of {ε t } and {a t }, and the initial condition y 0 0, u t and v t satisfy the following Kalman-like recursive algorithms see 13, page 439 : t } consists of square integrable independent random variables with mean zero and variance σ 2 a .We further assume that {ε t } and {a t } are independent, then E y t | F , then {m t } and {s t } are sequences of martingale differences.We can derive that m, s t 0, m t σ 2 e θ f 2 t, F , and the conditional skewness and excess kurtosis are assumed to be constants γ and κ, respectively.It follows form Theorem 2.1 that the optimal component quadratic estimating function for the parameter vector θ β 1 , . . ., β r , α 0 , . . ., α s • • • α s ε 2 t−s .In this model, the conditional mean is μ t x t β, the conditional variance is σ 2 t h t β , α is g It is of interest to note that when {ε t } are conditionally Gaussian such that γ 0, κ 0, estimating functions for β and α based on the estimating functions m t y t − x t β and s t m 2 t − h t , are, respectively, given by s j 1 α j x t ε t−j 1, ε 2 t−1 , . . ., ε 2