Long Memory Process in Asset Returns with Multivariate GARCH innovations

The main purpose of this paper is to consider the multivariate GARCH (MGARCH) framework to model the volatility of a multivariate process exhibiting long term dependence in stock returns. More precisely, the long term dependence is examined in the rst conditional moment of US stock returns through multivariate ARFIMA process and the time-varying feature of volatility is explained by MGARCH models. An empirical application to the returns series is carried out to illustrate the usefulness of our approach. The main results conrm the presence of long memory property in the conditional mean of all stock returns. Keywords: Forecasting, Long memory, Multivariate GARCH, Stock Returns. JEL classication: C22, C51, C53, G12 GREQAM, Université de la Méditerranée, France, tel: +33.4.91.14.07.70, fax: +33.4.91.90.02.27, Email: imene.mootamri@etumel.univmed.fr


Introduction
Long memory seems to be in recent years a very widespread phenomenon in the modelling of economic and …nancial time series.It can be de…ned in terms of the persistence of the autocorrelations which decays at a very slow hyperbolic rate.A large numbers of papers demonstrate the existence of long memory in …nancial economics.Peters (1991) and Greene and Fielitz (1977) found evidence of long-term positive dependence in stock returns by applying the rescaled range (R=S) statistic 1 proposed by Hurst (1951) and modi…ed by Lo (1991).Similar evidence on the German stock return is given by Lux (1996).However, Jegadeesh (1990) challenged the notion of mean reversion for stock returns.He reported negative …rst order serial correlation and signi…cant positive serial correlation for longer lags using monthly returns for individual stocks.Kim et al. (1991) also challenged the …ndings of mean reversion.Their …ndings for post war data showed persistence in returns.Volos and Siokis (2006) examined the presence of longrange dependence in a sample of 34 stock index returns using the procedure of Geweke and Porter-Hudak (1983) and Robinson (1995b).Their results provided signi…cant and robust evidence of fractional dynamics in most major and small stock markets over the sample periods.Goetzmann (1993) applied R/S tests which provided some evidence that the London stock exchange and New York Stock Exchange stock market prices may exhibit long-term memory.Some authors found signi…cant and robust evidence of positive long-term persistence in the Greek stock market (Barkoulas et al. (2000)), in the Brazil stock market (Cajueiro and Tabak (2005) and Cavalcante et al. ( 2004)), and in the Finnish stock market return data (Tolvi (2003)).Sadique and Silvapulle (2001) examined the presence of long memory in weekly stock returns of seven countries.They found evidence of long-term dependence in four countries.Moreover, Cajueiro and Tabak (2004) found that the markets of Hong Kong, Singapore and China exhibited long-range dependence, while Mills (1993) and Zhuang et al. (2000) investigated British stock returns and found little evidence of long-range dependence.However, Limam (2003) analyzed stock index returns in 14 markets and concluded that long memory tends to be associated with thin markets and Huang and Yang (1999) applied the modi…ed R=S technique to intraday data and provide evidence of long memory phenomenon in both the New York Stock Exchange and Nasdaq indices.
In fact, all these empirical works are based on univariate models.How-ever, for many important questions in empirical literature, multivariate settings are preferable.For example, suppose one is considering a portfolio of many assets.The return of the portfolio can be directly computed if one knows the asset shares and the return of each asset (Brooks et al.(2003)).Granger and Joyeux (1980) proposed the univariate fractionally integrated autoregressive moving average (ARFIMA) model to explain the long memory property that exists in the conditional mean.Therefore, we consider in this study the multivariate ARFIMA model 2 , where the fractional integration parameters determine the long memory properties of the data.Apart from the presence of long term dependence in the conditional mean, we will take into account the volatility's properties of the series 3 .More precisely, the time-varying feature of volatility is explained by multivariate generalized autoregressive conditionally heteroscedastic models (MGARCH).Despite the fact that univariate descriptions are useful and important, various …nancial operations require a multivariate framework, since high volatilities are often observed in the same time periods across di¤erent assets.The development of MGARCH models from the original univariate speci…cations represents a major step forward in the modelling of time series: these models permit time-varying conditional covariances as well as variances.To this end, the main contribution of this paper is then to incorporate the MGARCH framework to model the volatility of a multivariate process exhibiting a long term dependence and slow decay in the stock returns.More precisely, we have examined the long memory property in the …rst conditional moment of daily stock returns; the robustness of the results is also investigated by considering that its innovations are generated by a MGARCH process.We found that the long memory property existed in the conditional mean of the Nasdaq 100, New York Stock Exchange (NYSE) composite and Russell 3000 stock returns.
The rest of this paper is organized as follows.We brie ‡y review the multivariate models in the next Section.Section 3 outlines the Quasi-maximum likelihood estimation and testing procedures for the models which are applied to US stock returns.Section 4 describes multi-step forecasting with some MGARCH model and Section 5 presents the data used and provides empirical results.The paper ends with a short concluding section.

Econometric framework
The following is a brief description of the time series models used in this study.The vector ARFIMA models are discussed in detail in Sowell (1989b) and Lucee no (1996).Tsay (2007) provided a more detailed description for estimation of the vector ARFIMA models.He suggested a conditional likelihood Durbin-Levinson algorithm to e¢ ciently evaluate the conditional likelihood function of the vector ARFIMA processes.Hosoya (1996) and Nielsen (2004) proposed a class of maximum likelihood estimators and tests.Lobato (1999) analyzed a two-step estimator of the long memory parameters of a vector process by using a semiparametric version of the multivariate Gaussian likelihood function in the frequency domain.
Consider a vector ARFIMA processes: where y t = (y 1t ; ::::; y nt ) 0 ; t = 1; :::; T; is an n dimensional vector of observations, = ( 1 ; ::::; n ) 0 is the conditional mean vector of the process, (L) = I n 1 L :::: p L p and (L) = I n + 1 L + :::: + q L q are (n n) matrix polynomials in the lag operator L, and satisfy the usual stationary and invertible conditions respectively, i.e., the roots of det ( (L)) 4 and det ( (L)) are outside the unit circle.I n is the identity matrix of order n and (L) = diag h (1 L) d 1 ; ::::::; The fractional di¤erencing operator is de…ned by the binomial expansion, L j ; for i = 1; :::; n where (:) is the Gamma function 5 .Hence, the long-range dependence between observations is eventually determined only by the fractional di¤erencing parameter.These characteristics can be seen in the shapes of the spectral density and the autocorrelation function.Indeed, if jd i j < 0:5 for i = 1; :::; n; the multivariate process is stationary and invertible.If 0 < d i < 0:5; the process is characterized by strong positive dependency between observations.This is noted in the frequency domain by the spectral density increasing to an in…nite value at the zero frequency.In the time domain, the persistence is indicated by the slow decline of the autocorrelation functions which are not absolutely summable.In this case, y t is said to have a long memory 6 .If 0:5 < d i < 0; the process exhibits negative dependency between observations.In the frequency domain, this is indicated by the decline of the spectral density to zero, as the frequency approaches zero.The time domain indicates the antipersistence by the rapid decline of the autocorrelation functions.
So, in this paper, we consider a vector ARFIMA model, which generates the long memory property in the …rst conditional moment and which allows its innovations to be generated by a multivariate MGARCH process.As an illustration, the proposed model is applied to the daily stock returns of Nasdaq, New York Stock Exchange and Russell indices.
The most commonly employed distribution in the literature is the multivariate normal.Thus, we assume that the stochastic vector process " t = (" 1t ; ::::; " nt ) 0 is conditionally multivariate normal with zero expected value and covariance matrix H t : with E t 1 (" t ) = 0: We denoted by t 1 the information set generated by the observed series until time t 1.
) and z t is an independent identically distributed random vector error process such that E (z t ) = 0 and E (z t z 0 t ) = I n .
As noted by Silvennoinen and Teräsvirta (2008) 7 , the speci…cation of an MGARCH should be ‡exible enough to be able to represent the dynamics of the conditional variances and covariances, and, as the number of parameters in an MGARCH model often increases rapidly with the dimension of the model, the speci…cation should be parsimonious enough to allow for easy estimation of the model.Another feature that needs to be taken into account in the speci…cation is that the conditional covariance matrices should be positive de…nite.

Generalizations of the univariate GARCH model: VEC and BEKK Models
The V EC (p; q) model of Bollerslev et al. (1988) where h t = vech (H t ) 8 , H t is the (n n) conditional covariance matrix, e t = vech (" t " 0 t ) ; C is a n(n+1) 2 parameter vector and, A j and B j are parameter matrices.The main disadvantage of this model is the number of parameters which is equal to n(n+1) i and which becomes larger and larger as the number of variables increases.Thus, the estimation of the parameters is di¢ cult.Furthermore, the positivity of H t is not guarantee.To overcome this problem, Bollerslev et al. (1988) suggest the diagonal VEC 9 model in which A j and B j are assumed to be diagonal, each element h ijt depending only on its own lag and on the previous value of " it " jt .This restriction reduces the number of parameters to n(n+5)

2
. But even under this diagonality assumption, large scale systems are still highly parameterized and di¢ cult to work with in practice.
One of the most general forms, proposed in Engle and Kroner (1995), is the BEKK (Baba-Engle-Kraft-Kroner) representation.This formulation developed a general quadratic form for the conditional covariance equation which eliminated the problem of assuring the positive de…niteness of the conditional covariance matrix.The BEKK (p; q; K) representation for the (n n) conditional covariance matrix H t takes the form: where the summation limit K determines the generality of the process, C is upper triangular (n n) matrix, and A kj and B kj are both (n n) parameter matrices.This representation guarantees that H t is positive de…nite.Although the form of the above model is quite general especially when K is reasonably large, it su¤ers from the problems due to overparametrization 10 . 8vech is an operator that replaces the columns of the lower triangular part of H t in a n(n+1) 2 1 vector column. 9See also Bauwens et al. (2006) and Silvennoinen and Teräsvirta (2008). 10See Engle and Kroner (1995) for more discussion on the identi…cation problem of this model.
The parameters of the BEKK model do not represent directly the impact of the di¤erent lagged terms on the elements of H t , like in the VEC model.
The number of parameters in the BEKK model is equal to and is still quite large.Thus, as we already mentioned, a problem with these models is that the number of parameters can increase very rapidly as the dimension of the process increases, what creates di¢ culties in the estimation of the models due to several matrix inversions.So it is typically assumed that p = q = K = 1 in applications of this model.A further simpli…ed version of (4) in which A kj and B kj are diagonal matrices has sometimes appeared in applications.This is a diagonal BEKK model 11 where the number of parameters is equal to (p + q) Kn + n(n+1)

2
. This model is also a DVEC model but it is less general, although it is guaranteed to be positive de…nite while the DVEC is not.

Nonlinear combinations of univariate GARCH models: CCC and DCC Models
Bollerslev (1990) proposed a class of MGARCH models in which the conditional correlation matrix is time-invariant and thus the conditional covariances are proportional to the product of the corresponding conditional standard deviations.This model is so-called Constant Conditional Correlation (CCC).This restriction greatly reduces the number of unknown parameters and thus simpli…es the estimation.So, the conditional covariance H t , may always be decomposed as: where h iit can be de…ned as any univariate GARCH model, and R is a (n n) symmetric positive de…nite matrix of conditional correlations with typical element: with ii = 1; 8 i = 1; :::; n: D t denotes the (n n) diagonal matrix with typical element var t 1 (" it ) : The CCC-GARCH model assumes that the conditional correlations are constant ijt = ij , so that the temporal variation in H t is determined solely by the time-varying conditional variances for each of the elements in " t .As long as each conditional variances are positive 12 , the CCC model guarantees that the resulting conditional covariance matrices are positive de…nite.
Despite the simplicity of this model, the assumption that the conditional correlations are constant may seem too restrictive and unrealistic.Engle (2002) proposed a new class of estimator that both preserves the ease of estimation of Bollerslev's constant correlation model yet allows for non-constant correlations.Dynamic Conditional Correlation (DCC)-GARCH 13 preserves the parsimony of univariate GARCH models of individual assets' volatility with a simple GARCH process.Further, the number of parameters estimated using maximum likelihood is considerable improvement over both the VEC and the BEKK models.Tse and Tsui (2002) have also proposed a dynamic correlation MGARCH model, however no attempt has been made to allow for separate estimation of the univariate GARCH processes and the dynamic correlation estimator.
The DCC model of Engle (2002) computes the time changing conditional correlation matrix from the standardized residuals series: 8 < : where Q t = (diagQ t ) 1=2 ; Q t is the (n n) symetric positive de…nite matrix given by: 1 and 2 are non-negative scalar parameters satisfying 1 + 2 < 1: Q is the (n n) unconditional covariance matrix composed from the standardized residuals resulting from the …rst step estimation, where u t = (u 1t ; u 2t ; :::; u nt ) 0 is the standardized residuals vector 14 , and The typical element of R t will be of the form: ijt = q ijt p q ii q jj 12 See Nelson and Cao (1992) for discussion of positivity conditions for h it in univariate GARCH (p; q) models. 13Various generalizations of the DCC-GARCH model are proposed in the literature (Pelletier (2006), Billio and Caporin (2006), Cappiello et al.(2006), Silvennoinen and Teräsvirta (2009)...). 14u it = "it p hiit ; for i = 1; :::; n: A slightly di¤erent formulation was suggested by Tse and Tsui (2002): 1 and 2 are non-negative scalar parameters such that 1 + 2 < 1: Here R = ij is a time-invariant (n n) symmetric positive de…nite parameter matrix of conditional correlations with ones on the diagonal and t 1 is a (n n) correlation matrix of the past m standardized residuals u t = (u 1t ; u 2t ; :::; u mt ) : The positive de…niteness of R t is ensured by construction if R and t 1 are positive de…nite such that m n .The typical element of t 1 will be of the form: where u it = " it p h iit ; for i = 1; :::; n: The number of parameters in both the DCC models is equal to (n+1)(n+4) 2 if the conditional variances are speci…ed as GARCH (1,1).However, to check if the assumption of the conditional correlations is empirically relevant, one can test 1 = 2 = 0: A drawback of the DCC models is that 1 and 2 are scalars, so that all the conditional correlations obey the same dynamics.This is necessary to ensure that R t is positive de…nite through su¢ cient conditions on the parameters.
3 Estimation of MGARCH model Bollerslev (1990) introduced the CCC-GARCH speci…cation, where univariate GARCH models are estimated for each asset and then the correlation matrix is estimated using the standard closed form MLE correlation estimator by transforming the residuals using their estimated conditional standard deviations.The assumption of constant correlation makes estimating a large model feasible and ensures that the estimator is positive de…nite, simply requiring each univariate conditional variance to be non-zero and the correlation matrix to be of full rank.However, the constant correlation estimator, as proposed, does not provide a method to construct consistent standard errors using the multi-stage estimation process.Bollerslev (1990) noticed that the notion of constant correlation is plausible but Tse and Tsui (2002) and Tse (2000) found that it can be rejected for some assets.
For the maximum likelihood estimation (MLE) of parameters we assume the conditional normality of the errors.The log-likelihood function of the model: has the following form: where is the vector of all the parameters in the model.
Estimating the parameters simultaneously with the conditional variance parameters would increase the e¢ ciency at least in large samples, but this is computationally more di¢ cult.For this reason, we estimate the fractionally integrated model for the conditional mean and we consider (L) y t as the data for …tting the MGARCH model, where Engle and Sheppard (2001) 15 showed that the loglikelihood can be written as the sum of a mean and volatility part, depending on a set of unknown parameters and a correlation part that is: The conditional variance matrix of a DCC model can be expressed as The DCC model was designed to allow for two stage estimation, where in the …rst stage univariate GARCH models are estimated for each residual series, and in the second stage, the residuals, transformed by their standard deviations estimated during the …rst stage, are used to estimate the parameters of the dynamic correlation.The likelihood used in the …rst stage involves replacing R t by the identity matrix in (15).Let = ( 1 ; ::::; n ; ) = ( ; ) ; where the elements of i correspond to the parameters of the univariate GARCH model for the i th asset reurns, i = (c i ; i ; i ) ;for i = 1; :::; n: 15 See also Sheppard (2003).
The resulting …rst stage quasi-likelihood function is: which is simply the sum of the log-likelihoods of the individual GARCH models for each of the asset returns.Once the …rst stage has been estimated, the second stage is estimated using the correctly speci…ed likelihood, conditioned on the parameters estimated in the …rst step: n log (2 ) + 2 ln jD t j + ln jR t j + u 0 t R 1 t u t where u t = D 1 t " t are the univariate GARCH standardized residuals.Since we are conditioning on b , the only portion of the log-likelihood that will in ‡uence the parameter selection is ln jR t j + u 0 t R 1 t u t , and in estimation of the DCC parameters, it is often easier to exclude the constant terms and simply maximize:

Test of constant conditional correlations
The …rst step of modelling time-varying conditional correlations is to test the hypothesis of constant correlations.Testing data for constant correlation has proven to be a di¢ cult problem, as testing for dynamic correlation with data that have time-varying volatilities can result in misleading conclusions and can lead to rejecting constant correlation when it is true due to mis-speci…ed volatility models.Bera and Kim (2002) have provided tests of a null of constant correlation against an alternative of a dynamic correlation structure.It is an information matrix-type test that besides constant correlations examines at the same time various features of the speci…ed model.An alternative test has been proposed by Longin and Solnik (1995).We are interested in testing the null of constant correlation against an alternative of dynamic conditional correlation via the Lagrange Multiplier (LM) approach suggested by Tse (2000) which tested a null of constant conditional correlation against an ARCH in correlation alternative.A rejection of the null hypothesis supports the hypothesis of time-varying correlations.Rewrite the DCC model: 8 < : against the alternative where the conditional variances are GARCH (1,1).Under H 0 ; the LM statistic is asymptotically 2 (n (n 1) =2).Under the normality assumption the conditional log-likelihood of the observation at time t is given by: and the log-likelihood function L is given by L = P n t=1 l t : Engle and Sheppard (2001) proposed another test of the constant correlation hypothesis in the DCC models.The null hypothesis: R t = R 8 t is tested against the alternative: vech (R t ) = vech R + 1 vech (R t 1 ) + :::: + p vech (R t p ) : The test is easy to implement since H 0 implies that coe¢ cients in the regression X t = 0 + 1 X t 1 +::::+ p X t p +r t are equal to zero, where

Portmanteau Statistics
It is crucial to check the adequacy of the MGARCH speci…cation.Bollerslev (1990) suggested some diagnostics for the constant correlation MGARCH model.He computed the Ljung-Box portmanteau statistic on the cross products of the standardised residuals across di¤erent equations.Critical values were based on the 2 distribution.As mentioned by Tse (2000), diagnostics for conditional heteroscedasticity models applied in the literature can be divided into three categories: portmanteau test of the Ljung-Box type, residual-based diagnostic and Lagrange multiplier test.So, to check the overall signi…cance of the residual autocorrelation, we consider the Ljung-Box portmanteau statistic.This test was introduced by Box and Pierce (1970) for goodness-of-…t checking of univariate strong ARMA models.Ljung and Box (1978) proposed a slightly di¤erent portmanteau test which is nowadays one of the most popular diagnostic checking tools in ARMA modelling of time series.Following Hosking (1980Hosking ( -1981b)), a multivariate version of the Ljung-Box portmanteau 16 statistic is given by: where tr denotes the trace of a matrix.Y t = vech (y t y 0 t ) and C Yt (j) is the sample autocovariance matrix of order j.Under the null hypothesis, Q m (k) is distributed asymptotically as 2 (p 2 k) : Duchesne and Lalancette (2003) generalized this statistic using a spectral approach and obtained higher asymptotic power by using a di¤erent kernel than the truncated uniform kernel used in Q m (k).This test is also used to detect misspeci…cation in the conditional variance matrix H t , by replacing y t by b z t = b H 1=2 t b " t : Ling and Li (1997) proposed an alternative portmanteau statistic for multivariate conditional heteroscedasticity.They de…ned the sample lag-h (transformed) residual autocorrelation as: Their test statistic is given by LL (k) = T P k h=1 G 2 (h) and is asymptotically distributed as 2 (k) under the null hypothesis.In the derivation of the asymptotic results, normality of the innovation process is not assumed.
The statistic is thus robust with regard to the distribution choice.Tse and Tsui (2002) showed that there is a loss of information in the transformation b " 0 t b H 1 t b " t of the residuals and the test may su¤er from a power reduction.

Multivariate GARCH Prediction
Forecasting is one of the main objectives of multivariate time series analysis.
Predictions from multivariate GARCH models can be generated in a similar fashion to predictions from univariate GARCH models 17 .Indeed, for the univariate GARCH models, such as CCC model and principal component model, the predictions are generated from the underlying univariate GARCH models and then converted to the scale of the original multivariate time series by using the appropriate transformation.This section focuses on predicting from diagonal BEKK and DCC model.
To illustrate the prediction of conditional covariance matrix for multivariate GARCH models, consider the conditional variance equation for the diagonal BEKK(1,1,1) model: where C, A and B are (3 3) matrices, C is upper triangular and A and B are diagonal matrices.The model ( 16) is estimated over the time period t = 1; :::; T: Given the information at time T , the one-step-ahead prediction (k = 1) of conditional covariance matrix at time T + 1 is given by: where E T (H T +1 ) is obtained in the previous step.This procedure can be iterated to obtain E T (H T +k ) for k > 2: Let us consider the DCC (1; 1)-GARCH (1; 1) model given by (14), which can be written as: 8 > > < > > : 17 For more details see Moon et al. (2008) and Hlouskova et al. (2009).

14
where ; Q t is the (n n) symetric positive de…nite matrix, 1 and 2 are non-negative scalar parameters satisfying 1 + 2 < 1 and Q is the (n n) unconditional covariance matrix composed from the standardized residuals resulting from the …rst step estimation.However, the k-step ahead forecast of a standard GARCH(1,1) and the DCC evolution process are given by: 8 < : where Thus, the k-step ahead forecast of the correlation cannot be directly solved forward to provide a convenient method for forecasting.In examining methods to overcome this di¢ culty, two forecasts seem to be the most natural, each requiring a di¤erent set of approximations.The …rst technique proposed would be to generate the k-step ahead forecast of Q t by making the approximation that for i 2 (1; ::::; k) and An alternative approximation would be that Q R and that E t (Q t+1 ) E t (R t+1 ) : Using this approximation, we can forecast R t+k directly using the relationship In order to test which of these approximations performs better, Engle and shepard (2001) have conducted a Monte Carlo experiment.They have concluded that the forecast produced by solving forward for Q t+k was more biased than the method for solving R t+k forward which had better bias properties for almost all correlations and horizons.Also of interest is that both forecasts appear to be unbiased when the unconditional correlation is zero, and that they make essentially the same forecast when the unconditional correlation is zero.While none of these two techniques signi…cantly outperformed the other, it would seem that a logical choice for forecasting would be the method that directly solves forward R t+k which appear easier to implement.Therefore, we will choose this second technique for further work.

The data
The data employed in this study are taken from Datastream and are the 4530 daily observations on the Nasdaq 100 (NAS), New York Stock Exchange composite (NYA) and Russell 3000 (RUA) stock returns over the period January 4, 1988 to December 21, 2005.The returns series denoted by r t are calculated as r t = 100 log (y t =y t 1 ) ; where y t is the price index: Models used are Full-BEKK(1,1), Diagonal BEKK(1,1), CCC(1,1)-GARCH and DCC(1,1)-GARCH where each of the univariate GARCH models estimated is a GARCH(1,1) and we focus our attention to the covariance matrix modelling.
Figures 1 and 2 give the plots of the daily price indices and daily stock returns.We can see that the market volatility is changing over time which suggests a suitable model for the data should have a time varying volatility structure as suggested by the GARCH model.

Estimation results
For the above mentioned indices, the sample mean of returns, standard deviation of returns, skewness and kurtosis coe¢ cients as well as the Jarque-Bera and the Ljung-Box (univariate and multivariate version) tests are all reported in Table 1.Skewness is used to describe asymmetry from the normal distribution in a set of statistical data, taking two forms: positive or negative, it depends on whether data points are skewed to the left (negative coe¢ cient) or to the right (positive coe¢ cient) of the data average.Negative skewness means that there is a substantial probability of a big negative return, whereas positive one means that there is a greater than normal probability of a big positive return wich indicates that the tail on the right side is longer than the left one and the bulk of the values lie to the left of the mean.The kurtosis measures the peakedness and fatness of the tails of a probability distribution.A fat tailed distribution has higher than normal chances of a big positive or negative realization.For all series, the returns distributions display positive skewness.Moreover, the data indicates high degree of excess kurtosis (leptokurtic), since the kurtosis coe¢ cients are signi…cantly larger than those of a normal distribution which is three.The returns series appear extremely non normal based on the Jarque-Bera test.The Ljung-Box test applied to the series and squared series, provides clear evidence against the hypothesis of serial independence of observations and indicates the existence of ARCH e¤ect.The unit root tests of Phillips and Perron (1988), Kwiatkowski et al. (1992) and Dickey-Fuller Augmented (1979) reject the stationarity hypothesis at both 5% and 1% signi…cance levels for all series 18 .The estimation of fractional integration parameter d based on the GPH procedure as well as the CCC and DCC -GARCH are given in Tables 2 and  3 respectively.The results reveal a clear evidence of long-range dependence for all stock returns, since d estimates are signi…cantly positive implying covariance-stationarity of the process.These results are statistically signi…cant and contrast with most of the studies of long memory in asset returns which have generally found weak or no evidence for long memory.However, Henry (2002) investigated long range dependence in nine international stock index returns and found evidence of long memory in four of them, the German, Japanese, South Korean and Taiwanese markets, but not for the markets of the UK, USA, Hong Kong, Singapore and Australia 19 .Furthermore, Serletis and Rosenberg (2009) analyzed daily data on four US stock market indices and concluded that US stock market returns display antipersistence.This implies that the behaviour of stock returns is inconsistent with the e¢ cient market hypothesis, which asserts that returns of a stock market are unpredictable from previous price changes ( Narayan (2008) and Narayan and Smyth (2007)).
Moreover, we observe that the correlations across the three stock returns give strong evidence of time-varying correlations between them.The last column of Table 3 present the results of the estimation of the DCC parameters.we note that the estimate of 2 is statistically signi…cant at 1% signi…cance level meaning that the correlation is signi…cantly time varying.The DCC parameters estimates imply a highly persistent correlation with b 1 and b 2 , however, they satisfy the 0 < b 1 + b 2 < 1 condition of stationarity.Thus, the model is mean reverting and the conditional correlation matrix is positive semi-de…nite.Apart the tables, we compute the Lagrange Multiplier statistic proposed by Tse (2000) for constant conditional correlation test for the trivariate model which is signi…cant at the 5% level: its p-value is 0:0001.Thus, there is evidence against time-invariant correlations among the selected stock returns.In Figure 3 20 , we observe a slow decay of the autocorrelation functions which indicate the presence of long memory behaviour.The plots of conditional variance of BEKK-GARCH models and CCC-GARCH are shown respectively in Figures 4 and 5.     To check the goodness of …t of our model, we consider several diagnostic tests on the standardized residuals, the Ljung-Box test for the 12-th order serial autocorrelation and heteroskedasticity, when it is applied on the standardized residuals.We use the Jarque-Bera test, the skewness and the kurtosis coe¢ cients to test the normality of standardized residuals.From Table 3, the multivariate portmanteau test reveal that the hypothesis of no residual autocorrelation is rejected in the residuals only for both Full and Diagonal-BEKK(1,1,1) models at 5% level of signi…cance.From Table 4, we can see that, for most series, the hypothesis of uncorrelated standardized and squared standardized residuals is well supported, indicating that there is no statistically signi…cant evidence of misspeci…cation.The skewness and kurtosis coe¢ cients indicate that the standardized residuals are still not normally distributed, which is con…rmed by the Jarque-Bera test.Finally, the hypothesis of no conditional heteroscedasticity are not rejected in the residuals for all the series at 5% level of signi…cance.

Forecasting performance of estimated models
The DCC model seems better with respect to the other models in terms of information criteria.So, in order to assess the out-of-sample forecasting performance for diagonal BEKK, CCC and DCC models, we still use root mean square error (RMSE) and mean absolute error (MAE) as two criteria for comparison.we have selected an out-of-sample forecast data set using the last 1000 observations of the original data.We have re-estimated the models adding a new observation and obtaining the k-day-step ahead forecasts for k = 1; 3 and 5.The results are shown in Table 5.They appear to show a trend that the forecasting errors are proportionate to the forecasted periods.Moreover, we draw clear inference to the e¤ect that they all appeared to be more accurate in DCC than in the other models, regardless of what criterion is adopted.This seems to be consistent not only in RMSE but also in MAE (the DCC model has the lowest RMSE and MAE).In addition, predicting results of CCC perform even worse than diagonal BEKK .The results successfully provide evidence in favour of the predictive superiority of the DCC model against the diagonal BEKK and CCC.We can thus conclude that the forecasts of the DCC modelling are signi…cantly better than those of the other model.

Conclusion
The aim of the present paper is to study the dynamic modelling of the US stock returns.We considered multivariate GARCH framework to model the time-varying covariance matrices of a process exhibiting a long term dependence and be used to produce out-of-sample forecasts.In particular, we examined the persistence phenomenon in the …rst conditional moment of daily stock returns; the robustness of the results was also investigated by considering that its innovations was generated by a multivariate GARCH process.As illustration, we applied our models to the trivariate systems.
The estimated parameters show that the returns series are characterized by long memory behaviour and time-varying correlations.These results are statistically signi…cant and contrast with most of the studies of long memory in the returns series showing weak or no evidence for long term dependence.Using daily returns of Nasdaq 100, New York Stock Exchange composite and Russell 3000, the results successfully provide evidence that DCC model outperforms the other ones in estimating and forecasting covariance matrices for out-of-sample analysis.

Figure 3 .
Figure 3. Autocorrelation of returns series

Table 1 .
(12)c properties of the distribution of returns series : denotes the sample mean, is standard deviation, k is the Kurtosis (exc), sk is the Skewness cor¢ cient, JB is the Jarque-Bera normality test, Q (12) and Q 2(12)are respectively the 12-th order Ljung-Box tests for serial correlation in the residuals and squared residuals.Q m(12)and Q 2 m (12) are multivariate Ljung-Box version.The number in parenthese are the p-values. Notes

Table 2 .
Estimation of the long memory parameter

Table 3 .
Estimation of the MGARCH model

Table 3 .
Continued (12)s: The number in parenthese are standard error, ij is the correlation coe¢ cient, AIC; BIC and HQ are respectively the Akaike, Bayesian and Hannan-Quinn information criterion.Q m(12)and Q 2 m (12) are respectively the 12-th order multivariate Portmanteau tests for serial correlation in the standardized and squared standardized residuals (null Hypothesis: no serial correlation).The numbers in brackets are the p-values.

Table 4 .
Diagnostic testsNotes: k is the Kurtosis (exc), sk is the Skewness coe¢ cient, JB is the Jarque-Bera normality test, Q (12) and Q 2 (12) are respectively the 12-th order Ljung-Box tests for serial correlation in the standardized and squared standardized residuals (Null Hypothesis: no autocorrelation).LM is the test for ARCH E¤ects (Null Hypothesis: no ARCH e¤ects).

Table 5 .
Out-of-sample forecast