Estimation and Properties of a Time-Varying GQARCH ( 1 , 1 )-M Model

Time-varying GARCH-Mmodels are commonly used in econometrics and financial economics. Yet the recursive nature of the conditional variance makes exact likelihood analysis of these models computationally infeasible. This paper outlines the issues and suggests to employ a Markov chain Monte Carlo algorithm which allows the calculation of a classical estimator via the simulated EM algorithm or a simulated Bayesian solution in onlyO T computational operations, where T is the sample size. Furthermore, the theoretical dynamic properties of a time-varying GQARCH 1,1 -M are derived. We discuss them and apply the suggested Bayesian estimation to three major stock markets.


Introduction
Time series data, emerging from diverse fields appear to possess time-varying second conditional moments.Furthermore, theoretical results seem to postulate quite often, specific relationships between the second and the first conditional moment.For instance, in the stock market context, the first conditional moment of stock market's excess returns, given some information set, is a possibly time-varying, linear function of volatility see, e.g., Merton 1 ,Glosten et al. 2 .These have led to modifications and extensions of the initial ARCH model of Engle 3 and its generalization by Bollerslev 4 , giving rise to a plethora of dynamic heteroscedasticity models.These models have been employed extensively to capture the time variation in the conditional variance of economic series, in general, and of financial time series, in particular see Bollerslev et al. 5 for a survey .
Although the vast majority of the research in conditional heteroscedasticity is being processed aiming at the stylized facts of financial stock returns and of economic time series in general, Arvanitis and Demos 6 have shown that a family of time-varying GARCH-M models can in fact be consistent with the sample characteristics of time series describing the temporal evolution of velocity changes of turbulent fluid and gas molecules.Despite the fact that the latter statistical characteristics match in a considerable degree their financial analogues e.g., leptokurtosis, volatility clustering, and quasi long-range dependence in the squares are common , there are also significant differences in the behavior of the before mentioned physical systems as opposed to financial markets examples are the anticorrelation effect and asymmetry of velocity changes in contrast to zero autocorrelation and the leverage effect of financial returns see Barndorff-Nielsen and Shephard 7 as well as Mantegna and Stanley 8, 9 .It was shown that the above-mentioned family of models can even create anticorrelation in the means as far as an AR 1 time-varying parameter is introduced.
It is clear that from an econometric viewpoint it is important to study how to efficiently estimate models with partially unobserved GARCH processes.In this context, our main contribution is to show how to employ the method proposed in Fiorentini et al. 10 to achieve MCMC likelihood-based estimation of a time-varying GARCH-M model by means of feasible O T algorithms, where T is the sample size.The crucial idea is to transform the GARCH model in a first-order Markov's model.However, in our model, the error term enters the inmean equation multiplicatively and not additively as it does in the latent factor models of Fiorentini et al. 10 .Thus, we show that their method applies to more complicated models, as well.
We prefer to employ a GQARCH specification for the conditional variance Engle 3 and Sentana 11 since it encompasses all the existing restricted quadratic variance functions e.g., augmented ARCH model , its properties are very similar to those of GARCH models e.g., stationarity conditions but avoids some of their criticisms e.g., very easy to generalize to multivariate models .Moreover, many theories in finance involve an explicit tradeoff between the risk and the expected returns.For that matter, we use an in-mean model which is ideally suited to handling such questions in a time series context where the conditional variance may be time varying.However, a number of studies question the existence of a positive mean/variance ratio directly challenging the mean/variance paradigm.In Glosten et al. 2 when they explicitly include the nominal risk free rate in the conditioning information set, they obtain a negative ARCH-M parameter.For the above, we allow the conditional variance to affect the mean with a possibly time varying coefficient which we assume for simplicity that it follows an AR 1 process.Thus, our model is a time-varying GQARCH-M-AR 1 model.As we shall see in Section 2.1, this model is able to capture the, so-called, stylized facts of excess stock returns.These are i the sample mean is positive and much smaller than the standard deviation, that is, high coefficient of variation, ii the autocorrelation of excess returns is insignificant with a possible exception of the 1st one, iii the distribution of returns is nonnormal mainly due to excess kurtosis and may be asymmetry negative , iv there is strong volatility clustering, that is, significant positive autocorrelation of squared returns even for high lags, and v the so-called leverage effect; that is, negative errors increase future volatility more than positive ones of the same size.
The structure of the paper is as follows.In Section 2, we present the model and derive the theoretical properties the GQARCH 1,1 -M-AR 1 model.Next, we review Bayesian and classical likelihood approaches to inference for the time-varying GQARCH-M model.We show that the key task in both cases is to be able to produce consistent simulators and that the estimation problem arises from the existence of two unobserved processes, causing exact likelihood-based estimations computationally infeasible.Hence, we demonstrate that the method proposed by Fiorentini et al. 10 is needed to achieve a first-order Markov's transformation of the model and thus, reducing the computations from O T 2 to O T .A comparison of the efficient O T calculations and the inefficient O T 2 ones simulator is also given.An illustrative empirical application on weekly returns from three major stock markets is presented in Section 4, and we conclude in Section 5.
Definition 2.1.The time-varying parameter GQARCH 1,1 -M-AR 1 model: where z t are independent for all t s, and where {r t } T t 1 are the observed excess returns, T is the sample size, {δ t } T t 1 is an unobserved AR 1 process independent with δ 0 δ of {ε t } T t 1 , and {h t } T t 1 is the conditional variance with h 0 equal to the unconditional variance and ε 0 0 which is supposed to follow a GQARCH 1,1 .It is obvious that δ t is the market price of risk see, e.g., Merton  Modelling the theoretical properties of this model has been a quite important issue.Specifically, it would be interesting to investigate whether this model can accommodate the main stylized facts of the financial markets.On other hand, the estimation of the model requires its transformation into a first-order Markov's model to implement the method of Fiorentini et al. 10 .Let us start with the theoretical properties.

Theoretical Properties
Let us consider first the moments of the conditional variance h t , needed for the moments of r t .The proof of the following lemma is based on raising h t to the appropriate power, in 2.
Consequently, the moments of r t are given in the following theorem taken from Arvanitis and Demos 6 .where

2.7
and In terms of stationarity, the process {r t } is 4th-order stationary if and only if These conditions are the same as in Arvanitis and Demos 6 , indicating that the presence of the asymmetry parameter, γ , does not affect the stationarity conditions see also Bollerslev 4 and Sentana 11 .Furthermore, the 4th-order stationarity is needed as we would like to measure the autocorrelation of the squared r t 's volatility clustering , as well as the correlation of r 2 t and r t−1 leverage effect .The dynamic moments of the conditional variance and those between the conditional variance h t and the error ε t are given in the following two lemmas for a proof see Appendix B .

2.10
where A and B are given in Lemma 2.4.
Furthermore, from Arvanitis and Demos 6 we know that the following results hold.
Theorem 2.6.The autocovariance of returns for the model in 2.1 -2.3 is given by and the covariance of squares' levels and the autocovariance of squares are

2.12
where all needed covariances and expectations of the right-hand sizes are given in Lemmas 2.4 and 2.5,

2.13
From the above theorems and lemmas it is obvious that our model can accommodate all stylized facts.For example, negative asymmetry is possible; volatility clustering and the leverage effect negative Cov h t , ε t−k can be accommodated, and so forth.Furthermore, the model can accommodate negative autocorrelations, γ k , something that is not possible for the GARCH-M model see Fiorentini and Sentana 12 .Finally, another interesting issue is the diffusion limit of the time-varying GQARCH-M process.As already presented by Arvanitis 13 , the weak convergence of the Time-varying GQARCH 1,1 -M coincides with the general conclusions presented elsewhere in the literature.These are that weak limits of the endogenous volatility models are exogenous stochastic volatility continuous-time processes.Moreover, Arvanitis 13 suggests that there is a distributional relation between the GQARCH model and the continuous-time Ornstein-Uhlenbeck models with respect to appropriate nonnegative Levy's processes.
Let us turn our attention to the estimation of our model.We will show that estimating our model is a hard task and the use of well-known methods such as the EM-algorithm cannot handle the problem due to the huge computational load that such methods require.

Likelihood-Inference: EM and Bayesian Approaches
The purpose of this section is the estimation of the time-varying GQARCH 1,1 -M model.Since our model involves two unobserved components one from the time-varying in-mean parameter and one from the error term , the estimation method required is an EM and more specifically a simulated EM SEM , as the expectation terms at the E step cannot be computed.The main modern way of carrying out likelihood inference in such situations is via a Markov chain Monte Carlo MCMC algorithm see Chib 14 for an extensive review .This simulation procedure can be used either to carry out Bayesian inference or to classically estimate the parameters by means of a simulated EM algorithm.
The idea behind the MCMC methods is that in order to sample a given probability distribution, which is referred to as the target distribution, a suitable Markov chain is constructed using a Metropolis-Hasting M-H algorithm or a Gibbs sampling method with the property that its limiting, invariant distribution is the target distribution.In most problems, the target distribution is absolutely continuous, and as a result the theory of MCMC methods is based on that of the Markov chains on continuous state spaces 15 .This means that by simulating the Markov chain a large number of times and recording its values a sample of correlated draws from the target distribution can be obtained.It should be noted that the Markov chain samplers are invariant by construction, and, therefore, the existence of the invariant distribution does not have to be checked in any particular application of MCMC method.
The Metropolis-Hasting algorithm M-H is a general MCMC method to produce sample variates from a given multivariate distribution.It is based on a candidate generating density that is used to supply a proposal value that is accepted with probability given as the ratio of the target density times the ratio of the proposal density.There are a number of choices of the proposal density e.g., random walk M-H chain, independence M-H chain, tailored M-H chain and the components may be revised either in one block or in several blocks.Another MCMC method, which is special case of the multiple block M-H method with acceptance rate always equal to one, is called the Gibbs sampling method and was brought into statistical prominence by Gelfand and Smith 16 .In this algorithm, the parameters are grouped into blocks, and each block is sampled according to the full conditional distribution denoted as π φ t /φ /t .By Bayes' theorem, we have π φ t /φ /t ∝ π φ t φ /t , the joint distribution of all blocks, and so full conditional distributions are usually quite simply derived.One cycle of the Gibbs sampling algorithm is completed by simulating {φ t } p t 1 , where p is the number of blocks, from the full conditional distributions, recursively updating the conditioning variables as one moves through each distribution.Under some general conditions, it is verified that the Markov chain generated by the M-H or the Gibbs sampling algorithm converges to the target density as the number of iterations becomes large.
Within the Bayesian framework, MCMC methods have proved very popular, and the posterior distribution of the parameters is the target density see 17 .Another application of the MCMC is the analysis of hidden Markov's models where the approach relies on augmenting the parameter space to include the unobserved states and simulate the target distribution via the conditional distributions this procedure is called data augmentation and was pioneered by Tanner and Wong 18 .Kim et al. 19 discuss an MCMC algorithm of the stochastic volatility SV model which is an example of a state space model in which the state variable h t log-volatility appears non-linearly in the observation equation.The idea is to approximate the model by a conditionally Gaussian state space model with the introduction of multinomial random variables that follow a seven-point discrete distribution.
The analysis of a time-varying GQARCH-M model becomes substantially complicated since the log-likelihood of the observed variables can no longer be written in closed form.In this paper, we focus on both the Bayesian and the classical estimation of the model.Unfortunately, the non-Markovian nature of the GARCH process implies that each time we simulate one error we implicitly change all future conditional variances.As pointed out by Shephard 20 , a regrettable consequence of this path dependence in volatility is that standard MCMC algorithms will evolve in O T 2 computational load see 21 .Since this cost has to be borne for each parameter value, such procedures are generally infeasible for large financial datasets that we see in practice.

Estimation Problem: Simulated EM Algorithm
As mentioned already, the estimation problem arises because of the fact that we have two unobserved processes.More specifically, we cannot write down the likelihood function in closed form since we do not observe both ε t and δ t .On the other hand, the conditional loglikelihood function of our model assuming that δ t were observed would be the following: where r r 1 , . . ., r T , δ δ 1 , . . ., δ T , and h h 1 , . . ., h T .However, the δ t 's are unobserved, and, thus, to classically estimate the model, we have to rely on an EM algorithm 22 to obtain estimates as close to the optimum as desired.At each iteration, the EM algorithm obtains φ n 1 , where φ is the parameter vector, by maximizing the expectation of the log-likelihood conditional on the data and the current parameter values, that is, E • | r, φ n , F 0 with respect to φ keeping φ n fixed.
The E step, thus, requires the expectation of the complete log-likelihood.For our model, this is given by:

3.2
It is obvious that we cannot compute such quantities.For that matter, we may rely on a simulated EM where the expectation terms are replaced by averages over simulations, and so we will have an SEM or a simulated score.The SEM log-likelihood is:

3.3
Consequently, we need to obtain the following quantities: where M is the number of simulations.Thus, to classically estimate our model by using an SEM algorithm, the basic problem is to sample from h | φ, r, F 0 where φ is the vector of the unknown parameters and also sample from δ | φ, r, F 0 .
In terms of identification, the model is not, up to second moment, identified see Corollary 1 in Sentana and Fiorentini 23 .The reason is that we can transfer unconditional variance from the error, ε t , to the price of risk, δ t , and vice versa.One possible solution is to fix ω such that E h t is 1 or to set ϕ u to a specific value.In fact in an earlier version of the paper, we fixed ϕ u to be 1 see Anyfantaki and Demos 24 .Nevertheless, from a Bayesian viewpoint, the lack of identification is not too much of a problem, as the parameters are identified through their proper priors see Poirier 25 .
Next, we will exploit the Bayesian estimation of the model, and, since we need to resort to simulations, we will show that the key task is again to simulate from δ | φ, r, F 0 .

Simulation-Based Bayesian Inference
In our problem, the key issue is that the likelihood function of the sample p r | φ, F 0 is intractable which precludes the direct analysis of the posterior density p φ | r, F 0 .This problem may be overcome by focusing instead on the posterior density of the model using Bayes' rule: where

3.6
On the other hand, is the full-information likelihood.Once we have the posterior density, we get the parameters' marginal posterior density by integrating the posterior density.MCMC is one way of numerical integration.The Hammersley-Clifford theorem see Clifford 26 says that a joint distribution can be characterized by its complete conditional distribution.Hence, given initial values {δ t } 0 , φ 0 , we draw {δ t } 1 from p {δ t } 1 | r, φ 0 and then φ 1 from p φ 1 | {δ t } 1 , r .
Iterating these steps, we finally get {δ t } i , φ i M i 1 , and under mild conditions it is shown that the distribution of the sequence converges to the joint posterior distribution p φ, δ | r .
The above simulation procedure may be carried out by first dividing the parameters into two blocks: Then the algorithm is described as follows.
3 Draw from p φ | δ, r in the following blocks: i draw from p φ 1 | δ, r using the Gibbs sampling.This is updated in one block; ii draw from p φ 2 | r by M-H.This is updated in a second block.
We review the implementation of each step.

Gibbs Sampling
The task of simulating from an AR model has been already discussed.Here, we will follow the approach of Chib 27 , but we do not have any MA terms which makes inference simpler.Suppose that the prior distribution of δ, ϕ 2 u , ϕ is given by: which means that δ, ϕ 2 u is a priory independent of ϕ.Also the following holds for the prior distributions of the parameter subvector φ 1 :

3.10
where I ϕ ensures that ϕ lies outside the unit circle, IG is the inverted gamma distribution, and the hyperparameters v 0 , d 0 , δ pr , σ 2 δ pr , ϕ 0 , σ 2 ϕ 0 have to be defined.Now, the joint posterior is proportional to

3.11
From a Bayesian viewpoint, the right-hand side of the above equation is equal to the "augmented" prior, that is, the prior augmented by the latent δ We would like to thank the associate editor for bringing this to our attention.We proceed to the generation of these parameters.

Generation of δ
First we see how to generate δ.Following again Chib 27 , we may write or, otherwise,

3.13
Under the above and using Chib's 1993 notation, we have that the proposal distribution is the following Gaussian distribution see Chib 27 for a proof . where

3.15
Hence, the generation of δ is completed, and we may turn on the generation of the other parameters.

Generation of ϕ 2 u
For the generation of ϕ 2 u and using 27 notation, we have the following. where 3.17 Finally, we turn on the generation of ϕ.

Generation of ϕ
For the generation of ϕ, we follow again Chib 27 and write

3.18
We may now state the following proposition see Chib 27 for a proof .
Proposition 3.3.The proposal distribution of ϕ is where

3.20
The Gibbs sampling scheme has been completed, and the next step of the algorithm requires the generation of the conditional variance parameters via an M-H algorithm which is now presented.

Metropolis-Hasting
Step 3 -ii is the task of simulating from the posterior of the parameters of a GQARCH-M process.This has been already addressed by Kim et  First, we need to decide on the priors.For the parameters α, β, γ, ω, we use normal densities as priors: and similarly where α ω, α , I α , I β are the indicators ensuring the constraints α > 0, α β < 1 and β > 0, α β < 1, respectively.μ ., σ 2  .are the hyperparameters.
We form the joint prior by assuming prior independence between α, β, γ, and the joint posterior is then obtained by combining the joint prior and likelihood function by Bayes' rule:

3.23
For the M-H algorithm, we use the following approximated GARCH model as in Nakatsuma 29 which is derived by the well-known property of GARCH models 4 : where w t ε 2 t − h t with w t ∼ N 0, 2h 2 t .Then the corresponding approximated likelihood is written as and the generation of α, β, γ is based on the above likelihood where we update {h t } each time after the corresponding parameters are updated.The generation of the four variance parameters is given.

Generation of α
For the generation of α, we first note that w t in 3.32 , below, can be written as a linear function of α: where

3.27
Now, let the two following vectors be

3.28
Then the likelihood function of the approximated model is rewritten as

3.29
Using this we have the following proposal distribution of α see Nakatsuma 29 or Ardia 30 for a proof .
Hence a candidate α is sampled from this proposal density and accepted with probability: where α * is the previous draw.
Similar procedure is used for the generation of β and γ .

Generation of β
Following Nakatsuma 29 , we linearize w t by the first-order Taylor expansion where ξ t is the first-order derivative of w t β evaluated at β * the previous draw of the M-H sampler.Define as where g t β * −ξ t β * which is computed by the recursion: 3.34 ξ t 0 for t ≤ 0 30 .Then,

3.36
Then, the likelihood function of the approximated model is rewritten as

3.37
We have the following proposal distribution for β for a proof see Nakatsuma 29 or Ardia 30 .
Hence, a candidate β is sampled from this proposal density and accepted with probability: Finally, we explain the generation of γ .

Generation of γ
As with β, we linearize w t by a first-order Taylor, expansion at a point γ * the previous draw in the M-H sampler.In this case, where g t γ * −ξ t γ * which is computed by the recursion:

3.44
Thus, we have the following proposal distribution for γ for proof see Nakatsuma 29 and Ardia 30 . 45

3.46
The algorithm described above is a special case of a MCMC algorithm, which converges as it iterates, to draws from the required density p φ, δ | r .Posterior moments and marginal densities can be estimated simulation consistently by averaging the relevant function of interest over the sample variates.The posterior mean of φ is simply estimated by the sample mean of the simulated φ values.These estimated values can be made arbitrarily accurate by increasing the simulation sample size.However, it should be remembered that sample variates from an MCMC algorithm are a high dimensional correlated sample from the target density, and sometimes the serial correlation can be quite high for badly behaved algorithms.
All that remains, therefore, is step 2 .Thus, from the above, it is seen that the main task is again as with the classical estimation of the model, to simulate from δ | φ, r, F 0 .

MCMC Simulation of
For a given set of parameter values and initial conditions, it is generally simpler to simulate {ε t } for t 1, . . ., T and then compute {δ t } T t 1 than to simulate {δ t } T t 1 directly.For that matter, we concentrate on simulators of ε t given r and φ.We set the mean and the variance of ε 0 equal to their unconditional values, and, given that h t is a sufficient statistic for F t−1 and the unconditional variance is a deterministic function of φ, F 0 can be eliminated from the information set without any information loss.

3.50
Nevertheless, each time we revise one ε t , we have also to revise T − t conditional variances because of the recursive nature of the GARCH model which makes h new,t s depend upon ε new t for s t 1, . . ., T. And since t 1, . . ., T, it is obvious that we need to calculate T 2 normal densities, and so this algorithm is O T 2 .And this should be done for every φ.To avoid this huge computational load, we show how to use the method proposed by Fiorentini et al. 10 and so do MCMC with only O T calculations.The method is described in the following subsection.

Estimation Method Proposed: Classical and Bayesian Estimation
The method proposed by Fiorentini et al. 10 is to transform the GARCH model into a firstorder Markov's model and so do MCMC with only O T calculations.Following their transformation, we augment the state vector with the variables h t 1 and then sample the joint Markov process {h t 1 , s t } | r, φ ∈ F t where s t sign ε t − γ , 3.51 so that s t ±1 with probability one.The mapping is one to one and has no singularities.More specifically, if we know {h t 1 } and ϕ, then we know the value of

3.52
Hence the additional knowledge of the signs of ε t − γ would reveal the entire path of {ε t } so long as h 0 which equals the unconditional value in our case is known, and, thus, we may now reveal also the unobserved random variable {δ t } | r, φ, {h t 1 }.

3.55
From the above, it is seen that we should first simulate {h t 1 } | r, φ since we do not alter the volatility process when we flip from s t −1 to s t 1 implying that the signs do not cause the volatility process , but we do alter ε t and then simulate {s t } | {h t 1 }, r, φ.The second step is a Gibbs sampling scheme whose acceptance rate is always one and also conditional on {h t 1 }, r, φ the elements of {s t } are independent which further simplifies the calculations.We prefer to review first the Gibbs sampling scheme and then the simulation of the conditional variance.

Simulations of {s t } | {h t 1 }, r, φ
First, we see how to sample from {s t } | {h t 1 }, r, φ.To obtain the required conditionally Bernoulli distribution, we establish first some notation.We have the following see Appendix A :

3.63
Now, since the following should hold: and similarly we have the support of the conditional distribution of h t 1 given that h t is bounded from below by ω βh t , and the same applies to the distribution of h t 2 given h t 1 lower limit corresponds to d t 0 and the upper limit to d t 1 0 .This means that the range of values of h t 1 compatible with h t and h t 2 in the GQARCH case is bounded from above and below; that is:

3.66
From the above, we understand that it makes sense to make the proposal to obey the support of the density, and so it is seen that we can simplify the acceptance rate by setting A numerically efficient way to simulate h t 1 from p h t 1 | r t , h t , φ is to sample an underlying Gaussian random variable doubly truncated by using an inverse transform method.More specifically, we may draw doubly truncated so that it remains within the following bounds: The inverse transform method to draw the doubly truncated Gaussian random variable first draws a uniform random number u ∼ U 0, 1 3.73 and then computes the following:

3.74
A draw is then given by However, if the bounds are close to each other the degree of truncation is small the extra computations involved make this method unnecessarily slow, and so we prefer to use the accept-reject method where we draw and accept the draw if γ − l t ≤ ε new t ≤ γ l t , and otherwise we repeat the drawing this method is inefficient if the truncation lies in the tails of the distribution .It may be worth assessing the degree of truncation first, and, depending on its tightness, choose one simulation method or the other.
The conditional density of ε new t will be given according to the definition of a truncated normal distribution: where Φ • is the cdf of the standard normal.By using the change of variable formula, we have that the density of h new t 1 will be

3.78
Using Bayes theorem we have that the acceptance probability will be min 1,

3.79
Since the degree of truncation is the same for old and new, the acceptance probability will be min 1, where p r t 1 | h t 1 is a mixture of two univariate normal densities so

3.81
Hence, and the acceptance probability becomes min where

3.84
Overall the MCMC of {h t 1 } | r, φ includes the following steps.
i Use an inverse transform method to simulate 3.85 doubly truncated.ii Calculate

3.86
Steps 2 a i and 2 a ii are equivalent to draw .

iv Set
Remark 3.7.Every time we change h t 1 , we calculate only one normal density since the transformation is Markovian, and, since t 0, . . ., T − 1, we need O T calculations.
Notice that if we retain h new t 1 , then ε new t is retained and we will not need to simulate s t at a later stage.In fact we only need to simulate s t at t T since we need to know ε T .The final step involves computing

3.90
Using all the above simulated values, we may now take average of simulations and compute the quantities needed for the SEM algortihm.As for the Bayesian inference, having completed Step 2 we may now proceed to the Gibbs sampling and M-H steps to obtain draws from the required posterior density.Thus, the first-order Markov transformation of the model made feasible an MCMC algorithm which allows the calculation of a classical estimator via the simulated EM algorithm and a simulation-based Bayesian inference in O T computational operations.

A Comparison of the Simulators
In order to compare the performance of the inefficient and the efficient MCMC sampler introduced in the previous subsection, we have generated realizations of size T 240 for the simple GQARCH 1,1 -M-AR 1 model with parameters δ 0.1, ϕ 0.85, α 0.084, β 0.688, γ 0.314 which are centered around typical values that we tend to see in the empirical literature .We first examine the increase in the variance of the sample mean of ε t across 500,000 simulations due to the autocorrelation in the drawings relative to an ideal but infeasible independent sampler.
We do so by recording the inefficient ratios for the observations t 80 and t 160 using standard spectral density techniques.In addition, we record the mean acceptance probabilities over all observations and the average CPU time needed to simulate one complete drawing.The behavior of the two simulators is summarized in Table 1 and is very much as one would expect.The computationally inefficient sampler shows high serial correlation for both t 80 and t 160 and a low acceptance rate for each individual t.Moreover, it is extremely time consuming to compute even though our sample size is fairly small.In fact, when we increase T from 240 to 2400, and 24000 the average CPU time increases by a factor of 100 and 10000, respectively, as opposed to 10 and 100 for the other one the efficient , which makes it impossible to implement in most cases of practical interest.On the other hand, the single-move efficient sampler produces results much faster, with a reasonably high acceptance rate but more autocorrelation in the drawings for t 160.

Empirical Application: Bayesian Estimation of Weekly Excess Returns from Three Major Stock Markets: Dow-Jones, FTSE, and Nikkei
In this section we apply the procedures described above to weekly excess returns from three major stock markets: Dow-Jones, FTSE, and Nikkei for the period of the last week of 1979:8 to the second to the last week of 2008:5 1,500 observations .To guarantee 0 ≤ β ≤ 1 − α ≤ 1 and to ensure that ω > 0 we also used some accept-reject method for the Bayesian inference.This means that, when drawing from the posterior as well as from the prior , we had to ensure that α, β > 0, α β < 1 and ω > 0.
In order to implement our proposed Bayesian approach, we first have to specify the hyperparameters that characterize the prior distributions of the parameters.In this respect, our aim is to employ informative priors that would be in accordance with the "received wisdom."In particular, for all data sets, we set the prior mean for β equal to 0.7, and for a, ω, and γ we decided to set their prior means equal to 0.15, 0.4, and 0.0, respectively.We had also to decide on the prior mean of δ.We set its prior mean equal to 0.05, for all markets.These prior means imply an annual excess return of around 4%, which is a typical value for annualized stock excess returns.Finally, we set the prior mean of ϕ equal to 0.75, of ϕ 2 u equal to 0.01, and the hyperparameters ν 0 and d o equal to 1550 and 3, respectively, for all three datasets, something which is consistent with the "common wisdom" of high autocorrelation of the price of risk.We employ a rather vague prior and set its prior variance equal to 10,000 for all datasets.
We ran a chain for 200,000 simulations for the three datasets and decided to use every tenth point, instead of all points, in the sample path to avoid strong serial correlation.The posterior statistics for the Dow-Jones, FTSE, and Nikkey are reported in Table 2. Inefficiency  factors are calculated using a Parzen window equal to 0.1T where, recall, T is the number of observations and indicate that the M-H sampling algorithm has converged and well behaved This is also justified by the ACFs of the draws.However, they are not presented for space considerations and are available upon request.With the exception of the constants δ and the ϕ 2 u 's, there is low uncertainty with the estimation of the parameters.The estimated persistence, α β, for all three markets is close to 0.8 with the highest being the one of FTSE 0.822 , indicating that the half life of a shock is around 3.5.The estimated asymmetry parameters are round 0.4 with "t-statistics" higher than 3.2, indicating that the leverage effect is important in all three markets.In a nut shell, all estimated parameters have plausible values, which are in accordance with previous results in the literature.
We have also performed a sensitivity analysis to our choice of priors.In particular, we have halved and doubled the dispersion of the prior distributions around their respective means.Figures 1, 2, and 3 show the kernel density estimates for all parameters for all datasets for the posterior distributions for the three cases: when the variances are 10,000 baseline posterior , when the variances are halved small variance posterior , and when the variances are doubled large variance posterior .We used a canonical Epanechnikov kernel, and the optimal bandwidth was determined automatically by the data.The results which are reported in Figures 1, 2, and 3 indicate that the choice of priors does not unduly influence our conclusions.
Finally, treating the estimated posterior means as the "true" parameters, we can employ the formulae of Section 2.1 and compare the moments implied by the estimates and the sample ones.One fact is immediately obvious.All order autocorrelations of excess returns implied by the estimates are positive but small, with the 1st one being around 0.04, which is in accordance with the ii stylized fact see Section 1 .However, for all the three markets, the sample skewness coefficients are negative, ranging from −0.89 FTSE to −0.12 Nikkey , whereas the implied ones are all positive, ranging from 0.036 FTSE to 0.042 Dow-Jones .Nevertheless, the model is matching all the other stylized facts satisfactorily, that is, the estimated parameter values accommodate high coefficient of variation, leptokurtosis as well the volatility clustering and leverage effect.

Conclusions
In this paper, we derive exact likelihood-based estimators for our time-varying GQARCH 1,1 -M model.Since in general the expression for the likelihood function is unknown, we resort to simulation methods.In this context, we show that MCMC likelihoodbased estimation of such a model can in fact be handled by means of feasible O T algorithms.Our samplers involve two main steps.First we augment the state vector to achieve a firstorder Markovian process in an analogous manner to the way in which GARCH models are simulated in practice.Then, we discuss how to simulate first the conditional variance and then the sign given these simulated series so that the unobserved in mean process is revealed as a residual term.We also develop simulation-based Bayesian inference procedures by combining within a Gibbs sampler the MCMC simulators.Furthermore, we derive the theoretical properties of this model, as far as moments and dynamic moments are concerned.
In order to investigate the practical performance of the proposed procedure, we estimate within a Bayesian context our time-varying GQARCH 1,1 -M-AR 1 model for weekly excess stock returns from the Dow-Jones, Nikkei, and FTSE index.With the exception of the returns' skewness, the suggested likelihood-based estimation method and our model is producing satisfactory results, as far as a comparison between sample and theoretical moments is concerned.
Although we have developed the method within the context of an AR 1 price of risk, it applies much more widely.For example, we could assume that the market price of risk is a Bernoulli process or a Markov's switching process.A Bernoulli's distributed price of risk would allow a negative third moment by appropriately choosing the two values of  the in-mean process.However, this would make all computations much more complicated.
In an earlier version of the paper, we assumed that the market price of risk follows a normal distribution, and we applied both the classical and the Bayesian procedure to three stock markets where we decided to set the posterior means as initial values for the simulated EM algorithm .The results suggested that the Bayesian and the classical procedures are quite in agreement see Anyfantaki and Demos 24 .Finally, it is known that e.g., 32, pages 84 and 85 the EM algorithm slows down significantly in the neighborhood of the optimum.As a result, after some initial EM iterations, it is tempting to switch to a derivative-based optimization routine, which is more likely to quickly converge to the maximum.EM-type arguments can be used to facilitate this switch by allowing the computation of the score.In particular, it is easy to see that E ∂ ln p δ | r, φ, F 0 ∂φ | r, φ n , F 0 0, 5.1 so it is clear that the score can be obtained as the expected value given r, φ, F 0 of the sum of the unobservable scores corresponding to ln p r | δ, φ, F 0 and ln p δ | φ, F 0 .This could be very useful for the classical estimation procedure, not presented here, as even though our algorithm is an O T one, it is still rather slow.We leave these issues for further research.

Appendices
A. Proof of 3.69 and 3.82 Proof of Equation 3.69 .This is easily derived using the fact that And the result comes straightforward.

B. Dynamic Moments of the Conditional Variance
We    Finally, B.17
1 Glosten at al. 2 .Let us call F t−1 the sequence of natural filtrations generated by the past values of {ε t } and {r t }.
Now sampling from p ε | r, φ ∝ p r | ε, φ p ε | φ is feasible by using an M-H algorithm where we update each time only one ε t leaving all the other unchanged 20 .In particular, let us write the nth iteration of a Markov chain as ε n t .Then we generate a potential new value of the Markov chain ε new Using the above notation, we see that the probability of drawing s t 1 conditional on {h t 1 } is equal to the probability of drawing ε t γ d t conditional on h t 1 , h t , r t , φ, where d t is given by 3.70 , which is given byp s t 1 | {h t 1 }, r, φ p ε t γ d t | h t 1 , h t ,r t , φ On the other hand, the first step involves simulating from {h t 1 } | r, φ.To avoid large dependence in the chain, we use an M-H algorithm where we simulate one h t 1 at a time leaving the others unchanged 20, 31 .So if h t 1 n is the current value of the nth iteration of a Markov chain, then we draw a candidate value of the Markov chain h new t } | {h t 1 }, r, φ using a Gibbs sampling scheme.Specifically, since conditional on {h t 1 }, r, φ the elements of {s t } are independent, we actually draw from the marginal distribution, and the acceptance rate for this algorithm is always one.The Gibbs sampling algorithm for drawing {s t } | {h t 1 }, r, φ may be 3 Return the values {s 1 , ..., s M }.3.3.2.Simulations of {ht 1 } | r, φ (Single Move Samplers) p h | r, φ p h /t | r, φ p h t | h /t , r, φ .3.62 However, we may simplify further the acceptance rate.More specifically, we have that

Table 1 :
Comparison between the efficient and inefficient MCMC simulator.: MAP denotes mean acceptance probability over the whole sample, while IR refers to inefficiency ratio of the MCMC drawings at observations 80 and 160.Time refers to the total CPU time in seconds taken to simulate one complete drawing. Note