ASYMPTOTIC NORMALITY OF A HURST PARAMETER ESTIMATOR BASED ON THE MODIFIED ALLAN VARIANCE

A BSTRACT . In order to estimate the memory parameter of Internet trafﬁc data, it has been recently proposed a log-regression estimator based on the so-called modiﬁed Allan variance (MAVAR). Simulations have shown that this estimator achieves higher accuracy and better conﬁdence when compared with other methods. In this paper we present a rigorous study of the log-regression MAVAR estimator. In particular, under the assumption that the signal process is a fractional Brownian motion, we prove that it is asymptotically normally-distributed and consistent. Finally, we discuss its connection with the wavelets-estimators.


INTRODUCTION
It is well-known that different kinds of real data (Hydrology, Telecommunication networks, Economics, Biology) display self-similarity and long-range dependence (LRD) on various time scales.By self-similarity we refer to the property that a dilated portion of a realization has the same statistical characterization as the original realization.This can be well represented by a self-similar random process with given scaling exponent H (Hurst parameter).The long-range dependence, also called long-memory, emphasizes the long-range time-correlation between past and future observations and it is thus commonly equated to an asymptotic power-law decrease of the spectral density or, equivalently, of the autocovariance function, of a given stationary random process.In this situation, the memory parameter of the process is given by the exponent d characterizing the power-law of the spectral density.(For a review of historical and statistical aspects of the self-similarity and the long-memory see [5].) Though a self-similar process can not be stationary (and thus nor LRD), these two properties are often related in the following sense.Under the hypothesis that a self-similar process has stationary (or weakly stationary) increments, the scaling parameter H enters in the description of the spectral density and covariance function of the increments, providing an asymptotic power-law with exponent d = 2H − 1.Under this assumption, we can say that the self-similarity of the process reflects on the long-range dependence of its increments.The most paradigmatic example of this connection is provided by the fractional Brownian motion and by its increment process, the fractional Gaussian noise [15].
In this paper we will consider the problem of estimating the Hurst parameter H of a self-similar process with weakly stationary increments.Among the different techniques introduced in the literature, we will focus on a method based on the log-regression of the so-called Modified Allan Variance (MAVAR).The MAVAR is a well known time-domain quantity generalizing the classic Allan variance [6,7], which has been proposed for the first time as a traffic analysis tool in [10].In a series of paper [10,9,8], its performance has been evaluated by simulation and comparison with the real IP traffic.These works have pointed out the high accuracy of the method in estimating the parameter H, and have shown that it achieves better confidence if compared with the well-established wavelet log-diagram.
The aim of the present work is to substantiate these results from the theoretical point of view, studying the rate of convergence of the estimator toward the memory parameter.In particular, our goal is to provide the limit properties and the precise asymptotic normality of the MAVAR log-regression estimator in order to compute the related confidence intervals.This will be reached under the assumption that the signal process is a fractional Brownian motion.Although this hypothesis may look restrictive (indeed this estimator is designed for a larger class of stochastic processes), the obtained results are a first step toward the mathematical study of the MAVAR log-regression estimator.To our knowledge, there are no similar results in the literature.
For a survey on Hurst parameter-estimators of a fractional Brownian motion, we refer to [11,12].However we stress once again that the MAVAR-estimator is not specifically target for the fractional Brownian motion, but it has been thought and successfully used for more general processes.
Our theorems can be viewed as a counterpart of the already established results concerning the asymptotic normality of the wavelet log-regression estimators [17,18,19].Indeed, although the MAVAR can be related to a suitable Haar-type wavelets family (see [4] for the classical Allan variance), the MAVAR and wavelets log-regression estimators do not match as the regression runs on different parameters (see sec. 5).Hence, we adopt a different argument which in turns allows us to avoid some technical troubles due to the poor regularity of the Haar-type functions.
The paper is organized as follows.In section 2 we recall the properties of self-similarity and long-range dependence for stochastic processes, and the definition of fractional Brownian motion; in section 3 we introduce the MAVAR and its estimator, with their main properties; in section 4 we state and prove the main results concerning the asymptotic normality of the estimator; in section 5 we make some comments on the link between the MAVAR and the wavelet estimators and on the modified Hadamard variance, which is a generalization of the MAVAR; in the appendix we recall some results used along the proof.

SELF-SIMILARITY AND LONG-RANGE DEPENDENCE
In the sequel we consider a centered real-valued stochastic process X = {X(t), t ∈ R}, with X(0) = 0, that can be interpreted as the signal process.Sometimes it is also useful to consider the τ -increment of the process X, which is defined, for every τ ∈ R + and t ∈ R, as In order to reproduce the behavior of the real data, it is commonly assumed that X satisfies one of the two following properties: (i) Self-similarity; (ii) Long range dependence.
(i) The self-similarity of a process X refers to the existence of a parameter H ∈ (0, 1), called Hurst index or Hurst parameter of the process, such that, for all a > 0, it holds In this case we say that X is a H-self-similar process.(ii) We first recall that a stochastic process X is weakly stationary if it is squareintegrable and its autocovariance function, C X (t, s) := Cov(X(t), X(s)), is translation invariant, namely if In this case we also set R X (t) := C X (t, 0).If X is a weakly stationary process, we say that it displays long-range dependence, or long-memory, if there exists d ∈ (0, 1) such that the spectral density of the process, f X (λ), satisfies the condition for some finite constant c f = 0, where we write f (x) ∼ g(x) as x → x 0 , if g(x) = 1 .Due to the correspondence between the spectral density and the autocovariance function, given by the long-range condition (3) can be often stated in terms of the autocovariance of the process as for some finite constant c R = 0 and β = (1 − d) ∈ (0, 1).Notice that if X is a self-similar process, then it can not be weakly stationary due to the normalization factor a H . On the other hand, assuming that X is a H-self-similar process with weakly stationary increments, i.e. the quantity does not depend on s, it turns out that the autocovariance function is given by with In particular, if H ∈ ( 1 2 , 1), the process Y τ displays long-range dependence in the sense of (4) with β = 2 − 2H.Under this assumption, we thus embrace the two main empirical properties of a wide collection of real data.
A basic example of the connection between self-similarity and long-range dependence is provided by the fractional Brownian motion B H = {B H (t), t ∈ R}.This is a centered Gaussian process with autocovariance function given by (5), where It can be proven that B H is a self-similar process with Hurst index H ∈ (0, 1), which corresponds, for H = 1/2, to the standard Brownian motion.Moreover, its increment process called fractional Gaussian noise, turns out to be a weakly stationary Gaussian process [15,22].
In the next sections we will perform the analysis of the modified Allan variance and of the related estimator of the memory parameter.

THE MODIFIED ALLAN VARIANCE
In this section we introduce and recall the main properties of the Modified Allan variance (MAVAR), and of the log-regression estimator of the memory parameter based on it.
Suppose that X has weakly stationary increments.Let τ 0 > 0 be the "sampling period" and define the sequence of times {t k } k≥1 taking t 1 ∈ R and setting t i − t i−1 = τ 0 , i.e.
Definition 3.1.For any fixed integer p ≥ 1, the modified Allan variance (MAVAR) is defined as where we set τ := τ 0 p.For p = 1 we recover the well-known Allan variance.
Let us assume that a finite sample X 1 , . . ., X n of the process X is given, and that the observations are taken at times t 1 , . . ., t n .In other words we set X i = X t i for i = 1, . . ., n.
For k ∈ Z, let us set so that the process {d p,k } k turns out to be weakly stationary for each fixed p, with E[d p,k ] = 0, and we can write 3.1.Some properties.Let us further assume that X is a H-self-similar process (see ( 2)), with H ∈ (1/2, 1).Applying the covariance formula (5), it holds with where P 2H is the polynomial of degree 2H given by Since we are interested in the limit for p → ∞, we consider the approximation of the two finite sums in (13) by the corresponding double integral, namely Computing the integral and inserting the result in (13), we get where From ( 12) and ( 14), we get Under the above hypotheses on X, one can also prove that the process {d p,k } p,k satisfies the stationary condition To verify this condition, we write explicitly the covariance as we get This immediately provides the stationary condition (17).
The limit relation and so, for |m| → +∞, (24) The log-regression MAVAR estimator associated to the weights w is defined as Roughly speaking, the idea behind this definition is to use the approximation in order to get, by ( 16) and ( 24), where α := 2H − 2. Thus, given the data X 1 , . . ., X n the following procedure is used to estimate H: • compute the modified Allan variance by (9), for integer values p(1 + ℓ), with 1 ≤ p(1 + ℓ) ≤ p max (n) = ⌊n/3⌋; • compute the weighted log-regression MAVAR estimator by (25) in order to get an estimate α of α; • estimate H by H = ( α + 2)/2.
In the sequel we will give, under suitable assumptions, two convergence results in order justify these approximations and to get the rate of convergence of α n (p, w) toward α = 2H − 2. we need to take p = p(n) → +∞ as n → +∞ in order to reach jointly the above two approximations.

THE ASYMPTOTIC NORMALITY OF THE ESTIMATOR
Since now on we will always assume that X is a fractional Brownian motion (with H ∈ (1/2, 1)) so that the process {d p,k } p,k is also Gaussian.Under this assumption, and with the notation introduced before, we can state the following results.
From this Theorem, as an application of the δ-method, we can state the following result.
Theorem 4.2.Let α n (p, w) be defined as in ( 25), for some finite integer l and a weight-vector w satisfying (24).If p = p(n) is a sequence of integers such that where α = 2H − 2, the vector w * is such that [w * ] ℓ := w ℓ (1 + ℓ) 2−2H , and Let us stress that from the above result, and due to the condition np(n) −1 → +∞, it follows that the estimator α n (p, w) is consistent.
Proof.Since n/p → +∞, without loss of generality we can assume that p(1+ l) ≤ p max (n) = ⌊n/3⌋ for each n. the notation n p = n − 3p + 1, and set n ℓ = n p ℓ ∼ n and From the definition of empirical variance and applying the Wick's rule for jointly Gaussian random variables (see the appendix), we get where u ℓ ′ ℓ = p ℓ − p ℓ ′ .Since, by (22), the function G H (q/p ℓ , u ℓ ′ ℓ /p ℓ , p ℓ ) only depends on q/p ℓ and u ℓ ′ ℓ /p ℓ = (ℓ − ℓ ′ )/(1 + ℓ) as n → +∞, we rewrite the last line as This term, multiplied by n p (τ 0 p) 4−4H , is equal to It is easy to see that this quantity converges (to a finite strictly positive limit) if and only if converges.From (23) it holds Thus, the quantity (32) is controlled in the limit n → ∞ by the sum ∞ m=1 m 4H−8 that is convergent.
Proof of Theorem 4.1.As before, without loss of generality, we can assume that p(1 + l) ≤ p max (n) for each n.Moreover, set again n ℓ = n p ℓ and n ℓ ′ = n p ℓ ′ .For a given real vector v T = (v 0 , . . ., vl), let us consider the random variable T n = T (p(n), l, v) defined as a linear combination of the empirical variances σ 2 p (n), . . ., σ 2 p(1+ l) (n) as follows In order to prove the convergence stated in Theorem (4.1), we have to show that the random variable n p (τ 0 p) 2−2H T n − l ℓ=0 v ℓ σ 2 p(1+ℓ) converges to the normal distribution with zero mean and variance v T W (H)v.To this purpose, we note that where V n is the random vector with entries d p(1+ℓ),k , for 0 ≤ ℓ ≤ l and 0 ≤ k ≤ n ℓ − 1, and A n is the diagonal matrix with entries therefore, condition (1) of Lemma A.1 is satisfied.
In order to verify condition (2) of Lemma A.1, let C n denote the covariance matrix of the random vector V n , and let ρ[C n ] denote its spectral radius.By Lemma A.2, we have where C n p(1+ℓ) is covariance matrix of the subvector (d p(1+ℓ),k ) k=0,...,n ℓ −1 .Applying the spectral radius estimate (41), and from equality (21), we then have In order to conclude, it is enough to note that where γ H (k/p, p, u, s/(τ 0 p) . By Ito's formula we get Inserting this expression in (11), we obtain It follows, using Ito's formula again and after the change of variables s = τ 0 (pv + k ′ ) and ν = τ 0 (pr 5. SOME COMMENTS 5.1.The modified Allan variance and the wavelet estimators.Suppose that X is a selfsimilar process with weakly stationary increments and consider the generalized process Y = {Y (t), t ∈ R} defined through the set of identities In short, we write Y = Ẋ.From this definition and with the notation introduced in Sec. 3, can rewrite the MAVAR as and its related estimator as Now we claim that, for p fixed, the quantity can be set in correspondence with a family of discrete wavelet transforms of the process Y , indexed by τ 0 and k.To see that, let us fix j ∈ N and k ∈ Z, and set τ 0 = 2 j and t 1 = 2 j , so that t i+k = 2 j (i + k), for all i ∈ N.With this choice on the sequence of times, it is not difficult to construct a wavelet ψ(s) such that An easy check shows that the function is a proper wavelet satisfying Eq. (39).Notice also that the components ψ i , i = 1, . . .p, of the mother wavelet, are suitably translated and re-normalized Haar functions.
In the case p = 1, corresponding to the classical Allan variance, the mother wavelet is exactly given by the Haar function, as was already pointed out in [4].
Though the wavelet representation could be convenient in many respects, the Haar mother wavelet does not satisfy one of the conditions which are usually required in order to study the convergence of the estimator (see (W2) in [20]).Moreover, there is a fundamental difference between the two methods: in the wavelet setting the logregression is done over the scale parameter τ 0 for p fixed, while the MAVAR log-regression given in (25) is performed over p and for τ 0 fixed.5.2.The modified Hadamard variance.Further generalizing the notion of the MAVAR, one can define the modified Hadamard variance (MHVAR): For fixed integers p, Q and τ 0 ∈ R, set where c(Q, q) := (−1) q Q! q!(Q−q)! .Notice that for Q = 2 we recover the modified Allan variance.The MHVAR is again a time-domain quantity which has been introduced in [8] for the analysis of the network traffic.A standard estimator for this variance, is given by This suggests that convergence results, similar to Theorems 4.1 and 4.2, can be achieved also for the MHVAR and its related log-regression estimator.
5.3.The case of stationary processes.In applications, MAVAR and MHVAR are also used in order to estimate the memory parameter of long-range dependent processes.This general case is not included in our analysis (which is restricted to the Brownian motion) and it requires a more involved investigation.To our knowledge, there are no theoretical results along this direction.

Theorem 4 . 1 .
Let p = p(n) be a sequence of integers such that