Semi- and Nonparametric ARCH Processes

ARCH/GARCH modelling has been successfully applied in empirical finance for many years. This paper surveys the semiparametric and nonparametric methods in univariate andmultivariate ARCH/GARCHmodels. First, we introduce some specific semiparametric models and investigate the semiparametric and nonparametrics estimation techniques applied to: the error density, the functional form of the volatility function, the relationship between mean and variance, long memory processes, locally stationary processes, continuous time processes and multivariate models. The second part of the paper is about the general properties of such processes, including stationary conditions, ergodic conditions and mixing conditions. The last part is on the estimation methods in ARCH/GARCH processes.


Introduction
The key properties of financial time series appear to be the following. a Marginal distributions have heavy tails and thin centres Leptokurtosis ; b the scale or spread appears to change over time; c Return series appear to be almost uncorrelated over time but to be dependent through higher moments. See 1, 2 for some early discussions. The traditional linear models like the autoregressive moving average class do not capture all these phenomena well. This is the motivation for using nonlinear models. The ARCH class of processes have been a staple tool of empirical finance for many years now, because they have addressed all these issues with some success. This chapter is about the nonparametric approach.

The GARCH Model
Stochastic volatility models are of considerable current interest in empirical finance following the seminal work of Engle 3 . Perhaps still the most popular version is Bollerslev's 4 2 Journal of Probability and Statistics GARCH 1,1 model in which the conditional variance σ 2 t of a martingale difference sequence y t is where the ARCH 1 process corresponds to β 0. This model has been extensively studied and generalized in various ways, see the review of Bollerslev et al. 5 . Following Drost and Nijman 6 , we can give three interpretations to 2.1 . The strong form GARCH 1,1 process arises when y t σ t ε t 2.2 is i.i.d. with mean zero and variance one, where σ 2 t is defined in 2.1 . The most common special case is where ε t are also standard normal. The semistrong form arises when for ε t in 2.2 where F t−1 is the sigma field generated by the entire past history of the y process. Finally, there is a weak form in which σ 2 t is defined as a projection on a certain subspace, so that the actual conditional variance may not coincide with 2.1 . The properties of the strong GARCH process are well understood, and under restrictions on the parameters θ ω, β, γ it can be shown to be strictly positive with probability one, to be weakly and/or strictly stationary, and to be geometrically mixing and ergodic. The weaknesses of the model are by now well documented, see, Tsay 7 , for example.

The Univariate Model
There are several different ways in which nonparametric components have been introduced into stochastic volatility models. This work was designed to overcome some of the restrictiveness of the parametric assumptions in Gaussian strong GARCH models.

Error Density
Estimation of the strong GARCH process usually proceeds by specifying that the error density ε t is standard normal and then maximizing the conditional on initial values Gaussian likelihood function. It has been shown that the resulting estimators are consistent and asymptotically normal under a variety of conditions. QuasiMaximum Likelihood Estimation QMLE method proposed by Weiss 8 and Bollerslev and Wooldridge 9 shows that the estimators of the parameters obtained by maximizing a likelihood function constructed under the normality assumption can still be consistent even if the true density is not normal. In many cases, there is evidence that the standardized residuals from estimated GARCH models are not normally distributed, especially for high-frequency financial time series. Engle and Gonzàlez-Rivera 10 initiated the study of semiparametric models in which ε t is i.i.d. with some density f that may be nonnormal, thus suppose that y t ε t σ t , where ε t is i.i.d. with density f of unknown functional form. There is evidence that the density of the standardized residuals ε t y t /σ t is nonGaussian. One can obtain more efficient estimates of the parameters of interest by estimating f nonparametrically. Linton 11 and Drost and Klaassen 12 developed kernel-based estimates and establish the semiparametric efficiency bounds for estimation of the parameters. In some cases, for example, if f is symmetric about zero, it is possible to adaptively estimate some parameters, that is, one can achieve the same asymptotic efficiency as if one knew the error density. In other cases, or for some parameters, it is not possible to adapt, that is, it is not possible to estimate as efficiently as if f were known. These semiparametric models can readily be applied to deliver value at risk and conditional value at risk measures based on the estimated density.

Functional Form of Volatility Function
Another line of work has been to question the specific functional form of the volatility function, since estimation is not robust with respect to its specification. The news impact curve is the relationship between σ 2 t and y t−1 y holding past values σ 2 t−1 constant at some level σ 2 . This is an important relationship that describes how new information affects volatility. For the GARCH process, the news impact curve is m y, σ 2 ω γy 2 βσ 2 .

3.2
It is separable in σ 2 , that is, ∂m y, σ 2 /∂σ 2 does not depend on y, it is an even function of news y, that is, m y, σ 2 m −y, σ 2 , and it is a quadratic function of y with minimum at zero. The evenness property implies that cov y 2 t , y t−j 0 for ε t with distribution symmetric about zero.
Because of limited liability, we might expect that negative and positive shocks have different effects on the volatility of stock returns, for example. The evenness of the GARCH process news impact curve rules out such "leverage effects". Nelson 13 introduced the Exponential GARCH model to address this issue. Let h t log σ 2 t and let h t ω γ θε t−1 δ|ε t−1 | βh t−1 , where ε t y t /σ t is i.i.d. with mean zero and variance one. This allows asymmetric effect of past shocks ε t−j on current volatility, that is, the news impact curve is allowed to be asymmetric. For example, cov y 2 t , y t−j / 0 even when ε t is symmetric about zero. An alternative approach to allowing asymmetric news impact curve is the Glosten et al.
There are many different parametric approaches to modelling the news impact curve and they can give quite different answers in the range of perhaps most interest to practitioners. This motivates a nonparametric approach, because of the greater flexibility in functional form thereby allowed. The nonparametric ARCH literature apparently begins 4 Journal of Probability and Statistics with Pagan and Schwert 15 and Pagan and Hong 16 . They consider the case where σ 2 t σ 2 y t−1 , where σ · is a smooth but unknown function, and the multilag version σ 2 t σ 2 y t−1 , y t−2 , . . . , y t−d . This allows for a general shape to the news impact curve and nests all the parametric ARCH processes. Under some general conditions on σ · for example that σ · does not grow at a more than quadratic rate in the tails the process y is geometrically strong mixing. Härdle and Tsybakov 17 applied local linear fit to estimate the volatility function together with the mean function and derived their joint asymptotic properties. The multivariate extension is given by Härdle et al. 18 . Masry and Tjøstheim 19 also estimated nonparametric ARCH models using the Nadaraya-Watson kernel estimator. Lu and Linton 20 extended the CLT to processes that are only near epoch dependent.
Fan and Yao 21 have discussed efficiency issues in this model, see also Avramidis 22 . Franke et al. 23 have considered the application of bootstrap for improved inference. In practice, it is necessary to include many lagged variables in σ 2 · to match the dependence found in financial data. The problem with this is that nonparametric estimation of a multidimension regression surface suffers from the well-known "curse of dimensionality": the optimal rate of convergence decreases with dimensionality d, see Stone 24 . In addition, it is hard to describe, interpret and understand the estimated regression surface when the dimension is more than two. Furthermore, even for large d this model greatly restricts the dynamics for the variance process since it effectively corresponds to an ARCH d model, which is known in the parametric case not to capture the dynamics well. In particular, if the conditional variance is highly persistent, the nonparametric estimator of the conditional variance will provide a poor approximation, as reported by Perron 25 . So does only also this model not capture adequately the time series properties of many datasets, but the statistical properties of the estimators can be poor, and the resulting estimators hard to interpret.
Additive models offer a flexible but parsimonious alternative to nonparametric models, and have been used in many contexts; see 26 . Suppose that for some unknown functions σ 2 j . The functions σ 2 j are allowed to be of general functional form but only depend on y t−j . This class of processes nests many parametric ARCH models. Again, under growth conditions the process y can be shown to be stationary and geometrically mixing. The functions σ 2 j can be estimated by special kernel regression techniques, such as the method of marginal integration; see 27, 28 . The best achievable rate of convergence for estimates of σ 2 j · is that of one-dimensional nonparametric regression; see 29 . Masry and Tjøstheim 19 developed estimators for a class of time series models including 3.3 . Yang et al. 30 proposed an alternative nonlinear ARCH model in which the conditional mean is again additive, but the volatility is multiplicative σ 2 t c v d j 1 σ 2 j y t−j . Kim and Linton 31 generalized this model to allow for arbitrary but known transformations, where G · is a known function like log or level. The typical empirical findings are that the news impact curves have an inverted asymmetric U-shape.
These models address the curse of dimensionality but they are rather restrictive with respect to the amount of information allowed to affect volatility, and in particular do not nest the GARCH 1,1 process. Linton and Mammen 32 proposed the following model: where θ ∈ Θ ⊂ R p and m is an unknown but smooth function. The coefficients ψ j θ satisfy at least ψ j θ ≥ 0 and ∞ j 1 ψ j θ < ∞ for all θ ∈ Θ. A special case of this model is the Engle and Ng 33 PNP model where where m · is a smooth but unknown function. This model nests the simple GARCH 1,1 model but permits more general functional form: it allows for an asymmetric leverage effect, and as much dynamics as GARCH 1,1 . Estimation methods for these models are based on iterative smoothing. Linton and Mammen 32 showed that the news impact curves for daily and weekly S&P500 data are quite asymmetric with nonquadratic tails and is not minimal at zero but at some positive return. Below we show their estimator, denoted PNP here, in comparison with a common parametric fit, denoted AGARCH. Yang 34 introduced a semiparametric index model where ν j y; θ are known functions for each j satisfying some decay condition and g is smooth but unknown. This process nests the GARCH 1,1 when g is the identity, but also the quadratic model considered in Robinson 35 . Audrino and Bühlmann 36 proposed their model as σ 2 t Λ y t−1 , σ 2 t−1 for some smooth but unknown function Λ · , and includes the PNP model as a special case. They proposed an estimation algorithm. However, they did not establish the distribution theory of their estimator, and this may be very difficult to establish due to the generality of the model.

Relationship between Mean and Variance
The above discussion has centered on the evolution of volatility itself, whereas one is often very interested in the mean as well. One might expect that risk and return should be related; see 37 . The GARCH-in-Mean process captures this idea, it is for various functional forms of g, for example, linear and log-linear and for some given specification of σ 2 t . Engle et al. 38 introduced this model and applied it to the study of the term Structure. Here, b are parameters to be estimated along with the parameters of the error 6 Journal of Probability and Statistics variance. Some authors find small but significant effects. Again, the nonparametric approach is well motivated here on grounds of flexibility. Pagan and Hong 16 and Pagan and Ullah 39 considered a case where the conditional variance is nonparametric with a finite number of lags but enters in the mean equation linearly or log linearly. Linton and Perron 40 studied the case where g is nonparametric but σ 2 t is parametric, for example GARCH. The estimation algorithm was applied to stock index return data. Their estimated g function was nonmonotonic for daily S&P500 returns.

Long Memory
Another line of work has argued that conventional models involve a dependence structure that does not fit the data well enough. The GARCH 1,1 process for constants c j satisfying c j γβ j−1 , provided the process is weakly stationary, which requires γ β < 1. These coefficients decay very rapidly so the actual amount of memory is quite limited. There is some empirical evidence on the autocorrelation function of y 2 t for high-frequency returns data that suggests a slower decay rate than would be implied by these coefficients; see 41 . Long memory models essentially are of the form 3.8 but with slower decay rates. For example, suppose that c j j −θ for some θ > 0. The coefficients satisfy ∞ j 1 c 2 j < ∞ provided θ > 1/2. Fractional integration FIGARCH leads to such an expansion. There is a single parameter called d that determines the memory properties of the series, and where 1 − L d denotes the fractional differencing operator. When d 1 we have the standard IGARCH model. For d / 1 we can define the binomial expansion of 1 − L −d in the form given above. See Robinson 35 and Bollerslev and Mikkelsen 41 for models. The evidence for long memory is often based on sample autocovariances of y 2 t , and this may be questionable when only few moments of y t exist; see 42 . See the work of Giraitis 43 for a nice review.

Locally Stationary Processes
Recently, another criticism of GARCH processes has come to the fore, namely their usual assumption of stationarity. The IGARCH process where β γ 1 is one type of nonstationary GARCH model but it has certain undesirable features like the nonexistence of the variance. An alternative approach is to model the coefficients of a GARCH process as changing over time, thus Journal of Probability and Statistics   7 where ω, β, and γ are smooth but otherwise unknown functions of a variable x tT . When x tT t/T, this class of processes is nonstationary but can be viewed as locally stationary along the lines of Dahlhaus 44 , provided the memory is weak, that is, β · γ · < 1. In this way the unconditional variance exists, that is, E σ 2 t < ∞, but can change slowly over time as can the memory. Dahlhaus and Subba Rao 45 have recently provided a comprehensive theory of such processes and about inference methods for the ARCH special case. See 46 for a further review.
Engle and Rangel 47 propose a special case of this model where the unconditional variance σ 2 t/T ω t/T / 1 − β t/T − γ t/T varies over time but the coefficients β t/T and γ t/T are assumed to be constant. In this way, we can write y t σ t/T g 1/2 t ε t , where g t is a unit GARCH 1,1 process representing "high-frequency" volatility, while σ 2 t/T is the low-frequency unconditional volatility modelled nonparametrically. Engle and Rangel 47 also allow for covariates in the low frequency component of volatility.

Continuous Time
Recently there has been much work on nonparametric estimation of continuous time processes, see, for example, 48 . Given a complete record of transaction or quote prices, it is natural to model prices in continuous time e.g., 49 . This matches with the vast continuous time financial economic arbitrage-free theory based on a frictionless market. Under the standard assumptions that the return process does not allow for arbitrage and has a finite instantaneous mean, the asset price process, as well as smooth transformations thereof, belong to the class of special semimartingales, as detailed by Back 50 . Under some conditions, the semiparametric GARCH processes we reviewed can approximate such continuous time processes as the sampling interval increases. Work on continuous time is reviewed elsewhere in this volume, so here we just point out that this methodology can be viewed as nonparametric and as a competitor of the discrete time models we outlined above.

The Multivariate Case
It is important to extend the volatility models to the multivariate framework, as understanding the comovements of different financial returns is also of great interest. The specification of an MGARCH model should be flexible enough to represent the dynamics structure of the conditional variances and covariance matrix and parsimonious enough to deal with the rapid expansion of the parameters when the dimension increases. Semiparametric and nonparametric methods offer an alternative way to the parametric estimation by taking the advantage of not imposing a particular structure on the data. In general we have a vector time series y t ∈ R n , that satisfies where ε t is a vector of martingale difference sequences satisfying E ε t | F t−1 0 and E ε t ε t − I n | F t−1 0, while Σ t is a symmetric positive definite matrix. In this case, Σ t is the conditional covariance matrix of y t given its own history. The usual approach here is to specify a parametric model for Σ t and perhaps also the marginal density of ε t . There are many parametric models for Σ t , and we just mention two recent developments that 8 Journal of Probability and Statistics are particularly useful for large dimensional systems. First, the so-called CCC constant conditional correlation models, Bollerslev 51 where D t is a diagonal matrix with elements σ it , where σ 2 it follows a univariate parametric GARCH or other specification, while R is an n by n correlation matrix. The second model generalizes this to allow R to vary with time albeit in a restricted parametric way, and is thereby called DCC dynamic conditional correlation , Engle 52 .

Error Density
Hafner and Rombouts 53 considered a number of semiparametric models where the functional form of the conditional covariance matrix is parametrically specified while the innovation distribution is unspecified, that is, ε t is i.i.d with density function f : R n → R, where f is of unknown functional form. In the most general case, they treat the multivariate extension of the semiparametric model of Engle and Gonzàlez-Rivera 10 . They show that it is not generally possible to adapt, although one can achieve a semiparametric efficiency bound for the identified parameters. The semiparametric estimators are more efficient than the QMLE if the innovation distribution is nonnormal. These methods can often deliver efficiency gains but may not be robust to say dependent or time varying ε t . In practice, the estimated density is quite heavy tailed but close to symmetric for stock returns.
It is also worth mentioning the SNP SemiNonParametric method, which was first introduced by Gallant and Tauchen 54 . The fundamental part of the estimating procedure of the conditional density of a stationary multivariate time series relies on the Hermite series expansion, associating with a model selection strategy to determine the appropriate degree of the expansion. The estimator is consistent under some reasonable regularity conditions. One major issue with the unrestricted semiparametric model is the curse of dimensionality: as n increases the best possible rate at which the error density can be estimated gets worse and worse. In practice, allowing for four or more variables in an unrestricted way is impractical with even enormous sample sizes. This motivates restricted versions of the general model that embody a compromise between flexibility of functional form and reasonable small sample properties of estimation methods.
The first class of models is the family of spherically symmetric densities in which where g : R → R is an unknown but scalar function. This construction avoids the "curse of dimensionality" problem, and can in principle be applied to very high dimensional systems. This class of distributions is important in finance, since the CAPM is consistent with returns being jointly elliptically symmetric i.e., spherically symmetric after location and scale transformation , Ingersoll 55 . Hafner and Rombouts 53 develop estimation methods for parametrically specified Σ t under this assumption. Another approach is based on copula functions. By Sklar's theorem, any multivariate distribution can be modelled by the marginal distribution of each individual series and the dependence structure between individual series which is captured by copula functions.

Journal of Probability and Statistics 9
A copula itself is a multivariate distribution function with uniform marginals. The joint distribution function of random variables X and Y defined as F X,Y x, y C F x , G y . A bivariate distribution function whose marginals are F · and G · , and C · : 0.1 2 → R is the copula function measures the dependency.
Chen and Fan 56 proposed a new class of semiparametric copula-based multivariate dynamic models, the so-called SCOMDY models, in which case the conditional mean and the conditional variance of a multivariate time series are specified parametrically, while the multivariate distribution of the standardized innovation are specified semiparametrically as a parametric copula evaluated at nonparametric marginals. The advantage of this method is a very flexible innovation distribution by estimating the univariate marginal distributions nonparametrically and fitting a parametric copula and its circumvention of the "curse of dimensionality". An important class of the SCOMDY models is the semiparametric copulabased multivariate GARCH models, which has the following setup: where ε t ε 1,t , . . . , ε n,t is a sequence of i.i.d. random vectors with zero mean and unit variance. In this case, the conditional covariance matrix of returns is in the class of the CCC models. The key feature of the SCOMDY is the semiparametric form taken by the joint distribution function F ε of ε t : F ε ε 1 , . . . , ε n C F ε,1 ε 1 , . . . , F ε,n ε n ; θ 0 , 4.5 where C · is a parametrized copula function depended on unknown θ ∈ Θ ⊂ R m , and for i 1, . . . , n, F ε,i · is the marginal distribution function of the innovation which is assumed to be continuous but otherwise unspecified. Many examples of combinations have been introduced in the paper, such as {GARCH 1,1 , Normal copula} and {GARCH 1,1 , Student's-t copula}. Embrechts et al. 57 was the most influential paper of the early study of copulas in finance and since then, numerous copula-based models are being introduced and used in financial applications. The copula-GARCH models of Patton 58,59 proposed to make the parameter of the copula time varying in a dynamic fashion. Jondeau and Rockinger 60 modelled daily return series with univariate time-varying skewed Student-t distribution and a Gaussian or Student-t copula for the dependence. Panchenko 61 also considered a semiparametric copula-based model applied to risk management. Rodriguez

Conditional Covariance Matrix
Hafner et al. 66 proposed a semiparametric approach for the conditional covariance matrix which allows the conditional variance to be modelled parametrically by using any 10 Journal of Probability and Statistics choice of univariate GARCH-type models, while the conditional correlation are estimated by nonparametric methods. The conditional covariance matrix Σ t is defined as follows: where D t is parametrically modelled by any choice of univariate GARCH specification, and R t is treated nonparametrically as an unknown function of a state variable x t , thus R t R x t for some unknown matrix function R · . The function R · is estimated using kernel methods based on the rescaled residuals from the initial univariate parametric fits of the GARCH models.
Recently, Hafner and Linton 67 introduced a multivariate multiplicative volatility model which can be regarded as the multivariate version of the spline-GARCH model of Engle and Rangel 47 . A vector time series y t takes the form: where ε t is at least a strictly stationary unit conditional variance martingale difference sequence. The model allows the slowly varying unconditional variance matrix H · to be unknown along with the short run dynamics captured through G · , which is itself a unit variance multivariate GARCH process, for example the BEKK model where A, B are parameter matrices and u t G 1/2 t ε t . Feng 68 proposes an alternative specification call the local dynamic conditional correlation LDCC model, where the total covariance matrix is decomposed into a conditional and an unconditional components. The total covariance matrix takes the form: Specifically, σ L it σ L i t/T , while σ 2C it follows a parametric unit GARCH type process. As in parametric DCC models one first proceeds by estimating the univariate models and then using standardized residuals to estimate the model for R t .

General Properties
The properties of the parametric strong GARCH 1,1 model are well described in Nelson 69 . The necessary and sufficient condition for weak stationarity is that β γ < 1 and Eε 2 t < ∞, while the process has a unique strictly stationary solution if and only if E ln β γε 2 t < 0, when ω > 0. Bougerol and Picard 70 extended the study and found the necessary and sufficient strictly stationary conditions for GARCH p,q process. In Giraitis et al. 71 , a broad class of nonnegative ARCH ∞ has been studied and sufficient conditions for the existence of a stationary solution were established. Ling and McAleer 72 established the sufficient and necessary conditions for the existence of the higher order moments of the GARCH p,q model. See Lindner 73 , a very nice review of stationarity, mixing, distributional properties and moments of GARCH p,q processes.
In 19 , Masry and Tjøstheim they proved that under specific conditions the nonparametric ARCH model is strongly mixing. Giraitis et al. 43 surveyed the ARCH ∞ model and its properties. The ARCH ∞ process can be expanded into its Volterra representation: The necessary and sufficient condition for the existence of a unique stationary solution with E y 2 t < ∞ is ∞ j 1 b j < 1. The necessary and sufficient conditions for the existence of a stationary solution with E y 4 t < ∞ is also provided in the paper. An alternative approach to the problem was discussed by Kazakevičius et al. 74 . Kazakevičius and Leipus 75 obtained sufficient condition for the existence of stationary ARCH ∞ model without moment conditions. FIGARCH has no stationary solution with the finite second moment, while in Douc et al. 76 , a nonzero stationary solution was provided by adding additional assumptions on the distribution of ε t .
Dahlhaus and Subba Rao 45 investigated the nonstationary class of ARCH ∞ process with time-varying coefficients. Such a kind of time-varying ARCH processes can be locally approximated by stationary ARCH process, hence given the notation "locally stationary ARCH ∞ process". In addition, they also provided some sufficient conditions to ensure this process is α-mixing.
The class of copula-based, strictly stationary, semiparametric first-order Markov models can be described by the copula dependence parameter and the invariant onedimensional marginal distribution. For this class of models, Chen and Fan 77 proposed that the β-mixing temporal dependence measure is completely determined by the properties of the copula function. Beare 78 provided sufficient conditions for geometric β-mixing in terms of copulas without any tail dependence. In Chen et al. 79 , they showed that many widely used tail dependent copula-based Markov models are geometrically ergodic and hence geometrically β-mixing.

Estimation
For the parametric estimation of the GARCH model, the quasimaximum likelihood estimator QMLE is generally consistent and has a limiting normal distribution provided only the conditional mean and the conditional variance are correctly specified, that is, semistrong not strong GARCH is required and conditional normality is not required. See 9 . Weiss 8 was the first paper to study the asymptotic properties of the ARCH MLE, which showed that the MLE is consistent and asymptotically normal, requiring that y t has finite fourth moments. In Lumsdaine 80 , the consistency and asymptotic normality has been proved for GARCH 1,1 and IGARCH 1,1 , with the auxiliary main assumption that ε t is symmetric and unimodal i.i.d. with E ε 32 t < ∞. Lee and Hansen 81 proved the consistency and asymptotic normality of QMLE for strictly stationary semistrong GARCH 1,1 model with errors that the conditional fourth moment is uniformly bounded. Hall and Yao 82 assumed weak stationarity and showed that asymptotic normality holds if E ε 4 t < ∞ in the strong GARCH case, but also established limiting behaviour nonormal when E ε 4 t ∞ under weaker moment conditions. Jensen and Rahbek 83 were the first to consider the asymptotic theory of the QMLE for nonstationary strong GARCH model. The likelihood-based estimator for the parameters in GARCH 1,1 is consistent and asymptotically normal in the entire parameter space regardless of whether the process is stationary and explosive, as long as the finite conditional fourth moment is assumed. The asymptotic theory deriving from the QMLE becomes more complicated in the GARCH p,q case. So other estimation methods have to be considered, including the adaptive estimation for ARCH models 11 and Whittle estimation for a general ARCH ∞ case in 84 . Least Absolute Deviation estimators, 85 , are consistent when ε 2 t has median one. The estimator is asymptotically normal under weak moment conditions. Linton et al. 86 estimated a nonstationary semistrong GARCH 1,1 model with heavy-tailed errors and proved that the LADE converges at the rate √ n to a normal distribution under very mild moment condition for the errors. The result implies that the LADE is always asymptotically normal regardless of whether there is a stationary solution or not, and even when the errors are heavy-tailed. So the LADE is more appealing applying to the case that the errors are heavy-tailed.
When the error density is unknown, as in Engle and Gonzàlez-Rivera 10 , they proposed an efficient estimator over QMLE, based on nonparametric density. But this semiparametric estimator does not seem to capture the total potential gain in efficiency. Linton 11 and Drost and Klaassen 12 developed kernel based estimates and establish the semiparametric efficiency bounds for estimation of the parameters. In Sun and Stengos 87 , they considered a semiparametric efficient adaptive estimator of an asymmetric GARCH case by a two-step method. The unknown density function of the disturbances was first estimated by kernal methods and then the Newton-Raphson method was applied to obtain a more efficient estimator than the QMLE.
Härdle and Tsybakov 17 applied local polynomial linear fit to estimate the volatility function when both the conditional mean and conditional variance are unknown. The result they found in the paper is pointwise joint asymptotic normality of LP-estimators of conditional mean and variance. Masry and Tjøstheim 19 estimated nonparametric ARCH models using the Nadaraya-Watson kernel estimator and established strong consistency along with sharp rates of convergence under mild regularity assumption. Asymptotical theory was also provided. Engle and Rangel 47 constructed a nonstationary GARCH model by using spline methodology. This model can be seen as a special case of the class of locally stationary process we mentioned before. Dahlhaus and Subba Rao 45 estimated the parameters of nonstationary ARCH ∞ process by a weighted quasimaximum likelihood method on a segment, but the estimator is biased due to nonstationarity in the segment. A drawback of the approach is that the bias term can only be expressed under the existence of the 12th moment. If the weaker condition is provided, the estimator is still asymptotically normal while the explicit form of the bias cannot be evaluated.
Kim and Linton 88 investigated a semiparametric IGARCH 1,1 model which nests the standard IGARCH 1,1 model and allows more flexibility in functional form. The estimation strategy was based on the nonparametric instrumental variable method. They establised the optimal convergence rate and uniform convergence rate of the nonparametric part and the consistency of the parametric part. One can still obtain asymptotic normality at rate √ T under some conditions, but this is not guaranteed.
Hafner and Rombouts 53 proposed a semiparametric multivariate volatility model, which is the multivariate version of Engle and Gonzàlez-Rivera 10 . The semiparametric lower bound for the estimation is characterized in the paper and they suggested two alternative types of SP estimatior, the first one applies to general innovation densities, whereas the second one is based on the assumption of sphericity. Yet another possibility to deal with the error density is to work with copulae. Chen and Fan 56 constructed simple estimators of the copula and other parameters. They established the large sample properties of the estimator under a misspecified parametric copula, showing that both of the estimators of unknown dynamic parameters and the marginal distribution are still consistent while the estimator of the copula dependence parameter will converge in this case. Chen and Fan 77 modelled a univariate version of this class of semiparametric models, but their twostep estimators are verified to be inefficient and even biased if the time series has strong tail dependence in the simulation study of Chen et al. 79 . The new paper considers efficient estimation by using a sieve MLE method, which was first introduced by Chen et al. 89 . In addition, in the proposed nonparametric copulas, Fermanian et al. 90 constructed empirical copula estimation, while Fermanian and Scaillet 91 and Chen and Huang 92 proposed the kernel smoothing method.
Hafner and Linton 67 estimated a multivariate multiplicative volatility model that allows for nonstationarity, which is considered to be the generalization of Engle and Rangel 47 . The estimation methods they considered for the unknown parameters of low and highfrequency volatility are based on kernel estimation methods and then improved by the Gaussian likelihood estimation that takes full account of the dependence and nonstationary structure. In Feng 68 paper, the LDCC model was estimated by multivariate kernel regression, by introducing a multivariate k-NN method to solve the "curse of dimensionality" problem and asymptotic properties of the estimators were also discussed.

Conclusion
In conclusion, there have been many advances in the application of nonparametric methods to the study of volatility, and many difficult problems have been overcome. These methods have offered new insights into functional form, dependence, tail thickness, and nonstationarity that are fundamental to the behaviour of asset returns. They can be used by themselves to estimate quantities of interest like value at risk. They can also be used as a specification device enabling the practitioner to see with respect to which features of the data their parametric model is a good fit.