Sample correlations of infinite variance time series models: an empirical and theoretical study

When the elements of a stationary ergodic time series have finite variance 
the sample correlation function converges (with probability 1) to the theoretical correlation function. What happens in the case where the variance 
is infinite? In certain cases, the sample correlation function converges in 
probability to a constant, but not always. If within a class of heavy tailed 
time series the sample correlation functions do not converge to a constant, 
then more care must be taken in making inferences and in model selection 
on the basis of sample autocorrelations. We experimented with simulating 
various heavy tailed stationary sequences in an attempt to understand 
what causes the sample correlation function to converge or not to converge 
to a constant. In two new cases, namely the sum of two independent moving averages and a random permutation scheme, we are able to provide 
theoretical explanations for a random limit of the sample autocorrelation 
function as the sample grows.


Introduction
For a stationary sequence {Xn, n =0,-1, +2,...} the classical definition of the sample correlation function is 1Resnick's and Samorodnitsky's research was partially supported by NSF Grant DMS-97-04982 and NSA grant MDA904-H-1036 at Cornell University. Jason Cohen was supported by ESF DMS8-94-00535 at Cornell. "-(x 2)(x + " h E j 1 h "( n _-)2 h-0,1,..., Y j I(Xj X where .n-1 E--1Xi is the sample mean. When the variance of X n is finite, and the sequence is ergodic, (h)-,Correlation(Xo, Xh) with probability 1, for every h. See Brockwell and Davis [1]. When heavy tails are present, and the variance of X n is infinite, it makes little sense to center at X and the following heavy tailed version of the sample correlation function is often used: )xx + n 2 Many common models %r the infinite variance case are based on a-stable random variables, 0 < a < 2, or, more generally, on random variables in the domain of attraction of a-stable random variables. Recall that a random variable Z is in the domain of attraction of an a-stable law if it has appropriate regularly varying tails; that is, if Observe that p(h) in (1.2) is not the theoretical correlation which does not exist. However, Davis and Resnick [5] have produced an example of a bilinear time series where ('H(1),..., "fiH( h )::v(L(1), ., L( h ), in Nh for any h > 0 where "" denotes weak convergence and where L(h) is a non-degenerate random variable. For finite variance time series models, infinite order moving averages are dense in the sense that any empirical sample correlation function can be mimicked for any fixed number of lags by an appropriate autoregression. However, for infinite variance time series this is no longer the case and in fact most heavy tailed stationary processes are nonlinear, and, in many senses, very far from linear processes. See for example, Rosifiski [11]. However, the study of the sample correlation of more general nonlinear heavy tailed stationary processes is only beginning, the required point process 257 and regular variation tools are still being polished, and researchers have only a limited intuition into the question of which classes of processes have the property that the sample correlations converge to a non-random limit. It is precisely to develop this kind of intuition that we undertook an experimental project. In Section 2 we describe the models we have simulated. In Section 3 the simulation results are presented. Section 4 deals theoretically with two of the models and shows why a random limit occurs for the sample acf for the models under considerations.
The statistical significance of whether the sample autocorrelation is asymptotically random is profound. For example, model selection techniques for heavy tailed autoregressions based on the AIC criterion as well as coefficient estimation techniques based on Yule-Walker estimation all rely on the sample autocorrelation function converging to a constant and when this is not the case, the mischief potential for misspecifying a model is great. When the sample acf is asymptotically random, new statistical tools and parametric models need to be developed. This difficulty, as discussed in Feigin and Resnick [7], is not academic as all examples known to us of non-simulated, real, heavy tailed data exhibits the disturbing characteristic that the sample acf plot computed for a subset of the data set is not stable as the subset varies within the full data set. For example, splitting the data into three disjoint subsets and plotting the sample acf for all three produces plots which look quite different from each other.

Models
There are several classes of heavy tailed processes used in literature. One is based on various modifications of linear time series. We present two such examples here, in subsections 2.1 and 2.5; note that the former example is much more "standard" than the latter. In subsection 2.2 we consider the standard ARCH(l) process. Finally, our remaining examples are those of stationary symmetric a-stable (SaS) processes, 0 < a < 2. The structure of these processes is fairly well understood, which makes them an attractive source of examples. See Samorodnitsky and Taqqu [13]. A SaS process can be represented in the form f X n / fn(X)M(dx), n 0, 4-1, 4-2,..., (2.1) E where M is a SaS random measure on E with a e-finite control measure m, and fn E LC(rn) for all n. Only very special choices of the kernel fn will produce a stationary SaS process (Rosiflski [11]), and even more special kernels are needed to produce ergodic stationary SaS processes (Rosiflski and Samorodnitsky [12]). Two examples of stationary ergodic SaS processes are presented in subsections 2.3 and 2.4; once again the former example is much more "standard" than the latter.

Sum of Two Moving Averages
The simplest possible departure from the linear moving average process is, of course, just a sum of two such independent processes. We simulate 10 10 1 )2Z(n 1) Z(n2) n-2 (2.2) Xn-(j + l -J+ (j + l) : -J' j--1 j=l where {Z?),j-O,-t-1,...}, i-1,2 are independent sequences of iid Pareto (a) random variables. That is, P(Z( i)->A)-A -a, A_>l. We have chosen a-1.5. Of course, we could have used a-stable random variables in place of Pareto ones but this will not change the nature of the results, and so the ease with which Pareto random variables can be generated determined our choice.
Note that the choice of the coefficients in (2.2) is, basically, arbitrary. One only has to make sure that the two sequences are not proportional to each other since in that case the process reduces to the usual moving average. Any other choice of the coefficients does not change the nature of the results. Theoretical discussion of this example appears in Section 4.1.

ARCH(l)
The ARCH(l) process is defined by 1 X n-n(a+bX2n_l) , n 1,2,..., where {n,n-1,2,...} are iid standard normal random variables, independent of X0, a > 0 and 0 < b < 1. Of course, only a particular choice of the initial distribution (that of X0) will make this random recursion stationary. Instead, in simulation we start the process at 0, and discard the first 1000 observations to eliminate the initial transient in the system.
For this simulation, we used a-1 and b-0.99 which gives P(X n CA-1.014 as Aoe. See de Haan et al. [6]. One of the major differences between this process and those based on linear models is that, in an ARCH process, heavy tails appear not because of an innovation with heavy tails, but due to the combined effect of infinitely many light tailed innovations.

Mixed Moving Average
A mixed moving average process represents yet another step away from a linear moving average process. We present it in the context of SaS processes. A mixed moving average SaS process can be written in the form n-0,-t-1, + 2,..., and M is now a SaS random measure on N x E with a r-finite control measure Leb x m, m being a r-finite measure on E. The function f is in La(Leb x M). This process is ergodic (even mixing), see Surgailis et al. [14].
We have simulated a mixed moving average process with E (0, 1), m-Leb and where a s is a positive constant that depends only on c, and {ej} are lid Rademacher, i.e., P(ej 1)-P(ej Even though the representation (2.5) is valid for every choice of h and as above, he practical necessity of truncating the sum in (2.5) a a finite number of erms makes the choice of h and $ an im__portant one. In our case, for instance, the selecting a probability measure on 25 25 equivalent to rn. In the present context this has an intuitive interpretation of selecting a probability law on 25 with all positive probabilities, according to which the initial position of the random walk is chosen. Unlike the previous example, in this case it is not so obvious why one choice of such a distribution will perform, when the series representing the process is truncated, better than another such distribution. Nevertheless, it still seems that choosing the initial state according to a heavy tailed distribution will "mix" the random walks better than a light tailed initial distribution will, and so the approximate process will be closer to true stationarity in the former case than in the latter. To get a feeling of whether this is, in fact, so, we have simulated this process twice, once with the initial state chosen according to pm_2-Iml, rn-0, +1, 4-2,..., (2.6) leading to simulating the series M jl"? 1 y ( ) 2 1 Y-0 n-0,1,2,..., Xn where, as before, {ej} are iid Rademacher, {Fj} are the arrival times of a unit rate Poisson processes, {Yn, n E A r} are iid simple symmetric random walks with initial distribution given by (2.6).
The number of terms M is large. The second choice of the initial distribution is that of a heavy tailed one, with c Pm-(I rn + 1) 2, rn 0, 4-1, + 2,..., (2.7) with c 3/(r 2-3). This leads to having to simulate the series M E + 0), 3=1 where, this time, the initial state Y of a simple symmetric random walk has distribution (2.7). Once again, the number of terms M in the series is large.

Coefficient Permutation
Our final example represents yet another modification of the linear time series. Let {j} be a doubly infinite sequence of coefficients and {Zj} a random noise sequence such that the series j _jZ_j:: converges. The linear process (1.1) can be viewed as follows. Start with a sequence {j} such that j-0, for j < 0 and define X o CjZ_ j.
To find X1 we apply a shift to the sequence of coefficients

I-Z
To compute then X2, apply the shift to (1), etc. Our idea was to use an operation on the sequence of coefficients different from the pure shift.
The operation on the sequences we have chosen for this example is a combination of a shift with a randomly chosen rearrangement of the coefficients. Since, in theory, we are dealing with an infinite sequence of coefficients, which makes it difficult to deal with permutations, we rearrange the coefficients by moving the first (non-zero) coefficient into a random position. Specifically, let {Kj} and {Mj} be two independent sequences of iid positive integer valued random variables. Given a sequence (0)_ {0,1,'"}, we define recursively for j 1,..., M 1 1, and b (M1) b (0). So for j M1, (J) is obtained from (J 1) by taking the initial entry of the sequence (j-l) and moving it to the Kith spot after displacing that entry one step to the right to clear room. We then continue the recursion (2.8) for j M 1 4-1,...,M 4-M 2 1, set (M1 + M2) (0), etc. The reasons for "resetting" the coefficients back to their initial state (0) from time to time is that without such an action, the vector of coefficients tends to zero and the resulting process would be very difficult to simulate.
Of course, the series in (2.9) has to be truncated as well, so we actually simulate M Xn Z !n)Zn j' n-0,1,2,... i=0 for some large M. In particular, only the first M coefficients get permuted. If a particular K j takes a value exceeding M, we discard this value and generate K j anew.

Results
For all the examples we present time series plots of several runs and the corresponding sample correlations computed from these runs. 262 J. COHEN, S. RESNICK and G. SAMORODNITSKY

Sum of Two Moving Averages
The 9 sample autocorrelations shown in Figure 1 show enough variation that one must suspect that for the sum of two independent moving average processes, the sample correlations do not converge to a constant limit. The 9 time series plots are also given in Figure 2 and look rather different. This result may be somewhat counterintuitive for some since the sum of two independent linear processes behaves, in many respects, similarly to a linear process. A theoretical analysis of this case is presented in the next section where we verify that the sample correlation function converges in law to a nondegenerate limit.
We have generated 10000 observations in each run. To simulate this process we have chosen the number of terms M-105, and each of the 9 simulation runs was of length 1000. The results, presented in Figures 5 and 6, seem to clearly indicate that the sample acf does not converge to a constant. To check this unexpected conclusion we have generated also 4 additional simulation runs with M-106, each run having length 105. The resulting ACF, presented in Figure   7, seems to support the above conclusion. We have used a-1.5 throughout.

Random Walk
For this model we generated two batches of runs; the first one used the light tailed initial distribution (2.6), while the second one used the heavy tailed initial distribution (2.7). Once again we have used M 105 terms in the series representation of the process, simulated each run of length 104, and used c = 1.5 throughout. One of the conclusions is that, as the inspection of the time series plots seems to indicate, we see less "action" going on towards the end of the plot than at its beginning when using the light tailed initial distribution (2.6). This phenomenon is not seen in the case of the heavy tailed initial distribution (2.7). Secondly, the sample correlations appear, once again, to converge to a non-degenerate limit, even though this phenomenon is not as obvious here, as it is in the previous example. See Figures Figure 12, implying that, for this new class of stochastic processes, certain phenomena occur which require explanation. As we will show in Section 4.2, the randomness in the limit is due to the random coefficients.

Analytical Results
This section is devoted to outlining explanations of why the sample correlation converges to a random limit for the cases discussed in Sections 2.1 and 2.5, namely for the sum of two moving averages and for the coefficient permutation with reset example.
The methods of proof are standard and use the connection between point processes and regular variation as outlined in Resnick [9], Section 4.5. The method for time series was developed by Davis and Resnick [3,4].
We denote a Radon point measure on a nice space E by We examine in more detail, the example of the sum of two moving averages. The fact that the lag 1 correlation differs substantially from simulation run to simulation run raises doubts that the sample correlation function converges to a constant in this case. We will prove the following.
Theorem 1" Suppose {{Z(), -cx < n < c},i l,2} are two independent sequences of iid random variables with the same common distribution F satisfying where L is slowly varying. Let the moving average coefficients satisfy the condition for i-.1,2 E c(n i) 16 < oo, for some 0 < 6 < cA 1.
Because (4.7) says two components cannot be simultaneously large, we get the following vague convergence in [-P[bZl(Zm'l),zm'2)) ]v(m)x go + CoX V (m). where the hat denotes the deleted variable.
Step 4: Define for i-1, 2 (.(1), .( 2))_, ( E C(1)Z(1) E C'2)Z "2) ) J J Apply this to the convergence in (4.9) which yields after a continuous mapping argu- (4.10) Step 5: we now modify the procedure in Step 4. For a nonnegative integer h, we define a map from [2m + 1 x 2rn + 1,__4 defined by Apply this map to (4.9) and after letting rn-<x we obtain n l= bgi(xtl) xtl)+h which yields after adding the first and third and then the second and fourth components that E g -1 :=l>/,k g q-/,k The convergence in (4.11) holds for h 1,...,k for any k and in fact, examining the outline of the proof, we see that the convergence holds jointly. Apply the functional which takes the product of the components and then sums the points of the point measure to (4.11) when h =0 and when h is arbitrary. We get after a truncation and continuity argument that in Nk / 1 (1) E/(oil)) 2 -t-(2)E/(Ct2)) 2 (4.12) This is the desired result since for i-1, 2, (i) is a positive strictly stable random variable with index a/2.

Coefficient Permutation with tet
In this subsection we outline a justification for the empirical results of Section 3.5 and show why the sample correlation function converges to a random limit.
Recall that we start with a sequence -{j, j >_ 0}. We assume this sequence satisfies the analogue of (4.3), namely 0n < c, for some 0 < 5 < c A 1.
(4.13) n--0 We also have an iid innovation sequence {Zn} whose common distribution F satisfies (4.1) and (4.2). Remember we get (1) from by shifting the initial entry of the sequence of the Kl-th spot, displacing the entries with index greater or equal to K 1 one slot to the right. This is repeated until the reset time M 1 and so on. El Before we explain why the sample correlation has a random limit for the shift and reset process, we need the following lemma. For this result, we set Lemma 2: Suppose x! n) -1, n >_ 1, 1 <_ <_ n, (iii) For any compact K C E 2 and any fixed integer L we have L lim -(n)(K) O, r-"-ci"-'1 x so that for any i, x! n) G K c for all n large enough. x The desired conclusion (4.18) will follow if we show n f(x!n), (i))__, f(x!), ()).
However, for fixed (), f(., (cx)))e CK + (IF'l) and so from (4.15) we get 1140 as nc.  We are now in a position to show why the sample acf converges, in general, to a random limit.
For any k, as n---,oo -ol "l+h l < h < k (' fig(h), 1 < h < k)= -"in k where {t ) l> 0} is the process described in Lemma 1 Note all the randomness in the limit is caused by the random coefficients and not by the innovation sequence. We will see that in the limit, the randomness caused by the Z's cancels out.
Proof: As in the proof of Theorem 1, we proceed in a series of steps.
Step 1" For an integer m, define a vector of length m + 1 by z.m) (Zj, Zj_ l,...,Zj_m) '. where {Jk} are the points of a Poisson process on [-c, oe]\{0) with mean measure u given in (4.6) and e is a basis vector of length m + 1 all of whose entries are zero except for a 1 in the/-th spot. (See Resnick [9], page 232 or Davis and Resnick [3]).
Step 2: Define < j-0, 1,..., oe. Apply Lemma 2 to the convergences (4.22) and (4.14) and we get n m 1 (Z )/bn, (m, j)) 0 k (Jkel )" Step 3: Define ./=1Xj/ /=0 k k Step 4: We now redo Step 2. We again apply Lemma 2 to the convergences (4.22) and (4.14) to get for any integer h Note the second component on the right is obtained by ignoring the first h components of ' and then taking the inner product with the correspondingly truncated version of z.
Apply this mapping to the components of (4.25). After a continuous mapping argument and a justification of the replacement of m by oe we get gbj I(Xj + h, Xj)' Note that any with a negative subscript can be interpreted as 0. Taking products of components and summing points and then taking ratios yield the result: ('fill(h), 1 < h < k)= 2 o= O'l 'l + h 0(,))2 ,l_<h_<k El