A Study of Wavelet Analysis and Data Extraction from Second-Order Self-Similar Time Series

Statistical analysis and synthesis of self-similar discrete time signals are presented. The analysis equation is formally defined through a special family of basis functions of which the simplest case matches the Haar wavelet. The original discrete time series is synthesized without loss by a linear combination of the basis functions after some scaling, displacement, and phase shift. The decomposition is then used to synthesize a new second-order self-similar signal with a different Hurst index than the original. The components are also used to describe the behavior of the estimated mean and variance of self-similar discrete time series. It is shown that the sample mean, although it is unbiased, provides less information about the process mean as its Hurst index is higher. It is also demonstrated that the classical variance estimator is biased and that the widely accepted aggregated variance-based estimator of the Hurst index results biased not due to its nature (which is being unbiased and has minimal variance) but to flaws in its implementation. Using the proposed decomposition, the correct estimation of the Variance Plot is described, as well as its close association with the popular Logscale Diagram.


Introduction
In the past decades, the self-similar processes and long-range dependence (LRD or long memory) have been applied to the study and modeling of many natural and man-made complex phenomena.These kinds of processes have been particularly attractive in the pursuit of optimal design and configuration of network communications.
The published work of Leland et al. in 1993 and1994 [1, 2] demonstrated that Ethernet traffic is statistically self-similar and that the commonly used models are unable to capture that fractal behavior, highlighting that a burstiness and LRD are present when  > 0.5.Since then, researchers have been studying extensively long memory processes and their impact on network performance, for example, Karagiannis et al. stated that the identification of LRD is not trivial and that not all scenarios in modern networks present LRD characteristics, for example, traffic in the Internet backbone is more likely to be Poisson type instead of LRD [3].
Many researchers have also addressed their studies to determine if network traffic is sufficiently modeled by self-similar processes or a more general model is needed, for example, one that considers multiscaling or multifractality [4][5][6].The advantage of the capability to model complex systems with self-similar processes is that the correlation structure is defined by a single parameter: the Hurst index ().
Unlike other statistics, the Hurst index, although it is mathematically well defined, cannot be estimated unambiguously from real world samples.Several methods have been developed then in order to estimate it.Examples of classical estimators are those based on R/S statistic [7] (and its unbiased version [8]), detrended fluctuation analysis (DFA) [8,9], maximum likelihood (ML) [10], aggregated variance (VAR) [7], wavelet analysis [11,12], and so forth.In [13], Clegg developed an empirical comparison of estimators for data in raw form and corrupted.An important observation is that the estimation of the Hurst index may differ from one estimator to another, and the selection of the most adequate estimator is a difficult task.This selection depends greatly on how well the data sample meets the assumptions the estimator is based on.However, through analytical and empirical studies, it has been discovered that the estimators that have the best performance in bias and standard deviation, and, consequently, in mean squared error (MSE) are Whittle ML Mathematical Problems in Engineering and the wavelet-based estimator proposed by Veitch and Abry in [11].
From these two estimators, the wavelet based is computationally simpler and faster [7,11].
In addition to the Hurst index, other statistical characteristics are needed to describe the phenomenon under study.The most common are the first and second-order statistics, that is, mean, variance, and correlation.The classical estimators of these characteristics have been proposed decades ago, for example, Kenney and Keeping demonstrated in 1939 and 1951 that the classical variance estimator is unbiased for independent and identically distributed Gaussian observations [14,15].Confidence interval is also given for these estimations, for example, [ ∈ (−3/ √ , 3/ √ )] ≈ 99% (where  is the sample mean estimated from a sample of size ) for a standardized white noise process.This confidence interval is narrower as the sample size increases.
It has been claimed that most processes satisfy those common assumptions [16].As it has been expressed, however, other authors (Leland et al. [1], Taqqu et al. [5], Tsybakov and Georganas [17], Veitch and Abry [11], and many others) conclude that traffic characteristics present correlation and that the estimation of these statistics (including confidence interval) with the classical estimators (which do not consider correlation) may lead to estimation errors and, consequently, to wrong decisions or inaccurate models, especially when data presents accentuated LRD.
Several estimators of the Hurst index have been proposed, but many of them do not consider the effect of correlation on the estimation of first-and second-order statistics, thus applying incorrectly the classical formulae.Particularly, it has been claimed that the aggregated variance method can only be used as a heuristic method, and that the Variance Plot (also named Variance-time Plot; see Section 2.3) can only be used to check whether the time series is self-similar or not and, if so, to obtain a crude guess for the Hurst index [22, page 44].This work clarifies this point, demonstrating that the Variance Plot can be estimated efficiently and that the estimation of the Hurst index from it is actually unbiased and has minimum variance (similarly to the wavelet-based estimator).
This work is motivated by the mentioned importance of the self-similar processes in many areas, especially in the analysis and modeling of Internet traffic, and by the fact that there are still some misunderstandings and bad practices that must be overcome.
This document is organized as follows.Section 2 defines the discrete time self-similarity and some of its statistical properties.Section 3 describes the proposed set of basis functions and the analysis and synthesis equations, which are used to generate self-similar samples whose Hurst index matches the estimator proposed by Veitch et al. and the corrected version of the variance-based estimator.Section 4 defines statistics for the sample mean and variance of self-similar discrete processes; Section 5 explains how the variance-based estimator has been misunderstood and defines the correct estimator, which coincides with the Haar wavelet based in a particular case.Section 6 presents some simulations and measurements and, finally, Section 7 concludes the work.

Self-Similarity
Self-similarity describes the phenomenon where certain properties are preserved irrespective of scaling in space or time.Deterministic self-similarity is clearly exemplified by popular figures as Sierpinski's triangle or Koch's snow flake.This form of self-similarity is named scale invariance and makes different scales of the same object undistinguishable.Stochastic self-similarity is not that obvious, it refers to how statistical properties of a stochastic process are preserved under time expansion.Stochastic self-similarity is defined for continuous and discrete time stochastic processes.
Self-similarity (either continuous or discrete time) is tightly related to short-and long-range dependencies (SRD and LRD, resp.).The degree of self-similarity is defined and measured through the so named Hurst index  (0 <  < 1).It is known that processes with  < 0.5 are SRD, and processes with  > 0.5 are LRD.If  = 0.5, neither SRD nor LRD are present.For example, the commonly used white Gaussian noise (WGN) has always  = 0.5 and does not present any time dependency.
Processes with LRD are also named long memory, as current and future realizations of these are strongly correlated.The dividing line between SRD and LRD processes is not ambiguous; for LRD processes, the autocovariance function is not absolutely convergent (i.e., the sum is not finite), while it is for SRD processes.This work refers only to stochastic discrete time self-similarity.

Discrete Time Self-Similarity.
The definition of discrete time stochastic self-similarity is given in terms of the aggregated processes.Let {  ;  ∈ N} be a discrete time series derived from a self-similar process with stationary increments and Hurst index  (-SSSI).The aggregated time series, derived from   is the sequence given by [17]  () = { ()   ;  ∈ N} , where each term  ()  is defined as where  represents the aggregation level.That is, each new time series is obtained by partitioning the original time series into nonoverlapping blocks of size  and then averaging each block to obtain its respective values.Let   be a covariance stationary discrete time series with mean   = 0, variance  2  and autocovariance function (ACvF)   () and  ()  its aggregated series.Then it is said that   is self-similar (-SS), if the following holds [18]: where ∼ means equality in distribution.The many methods for generating artificial discrete time self-similar sequences are classified in sequential and fixed length.All these have particular advantages and shortcomings associated to accuracy, generation time, memory or processing resources, and so forth.Sequential generators are sometimes more appropriate for long duration simulations, but the level of approximation and several parameters are needed, while fixed length only depends on the desired Hurst index but may need to store part or all of the sequence in memory.For the purpose of the simulations described in Section 6, a fixed-length Davies-Harte fractional Gaussian noise (FGN) generator is used.FGN is a special type of noise where the autocovariance function has a special shape (expression (8)).Section 3.2 describes a theoretical method to synthesize self-similar discrete sequences using Haar wavelet-based decomposition.Although Daubechies wavelet-(DW-) based approximations are better in mean and variance, the Haar-based method generates a self-similar sequence of specified Hurst index from practically any given sequence, regardless of whether it is self-similar or not if some conditions are met [19].For other applications, DW are preferred; see [20] as an example.

Properties of Self-Similar Discrete Time
Series.The definition of discrete stochastic self-similarity (3) has important implications about the stochastic process   ; these implications include the following properties [21].
(i) Zero-mean: (ii) Power law of the th order moments: (iii) Power law of the th order absolute moments: 2.3.Second-Order Discrete Self-Similarity.The second-order definition of self-similarity is derived from (5) for  = 2.The variance of the aggregated time series is defined by the following [17]: An equivalent definition is where  ()  () is the autocovariance function of  ()  .If a discrete time series   satisfies these conditions, it is called second-order self-similar with Hurst index  (-SOSS).Note that the mean of an -SOSS process is not necessarily zero.
The plot log[var( ()  )] versus log() is known as Variance Plot.It is a straight line of slope 2 − 2 for selfsimilar processes.This plot is the basis of the variancebased estimator of the Hurst index.It has been "shown" in the literature that the variance-based estimator underestimates the Hurst index and that the variance-based estimator throws a coarse estimation of the true Hurst index.Section 5 demonstrates that this is a consequence of inadequate implementations of this estimator, that is, the aggregated variance is estimated with the classical formula (36), which is not correct if any correlation exists.An apparent solution is to use the proposed unbiased estimator, but that leads to an illconditioned problem: the Hurst index is needed to estimate the variance and vice versa.The solution for this situation is also described (see Section 5.1).

Wavelet Decomposition and the Logscale Diagram.
The wavelet decomposition transforms a signal   into a sum of orthogonal components as follows: where each function  , () is derived from a basis function  0 (), namely, the mother wavelet, by scaling and displacement, that is, and coefficients   (, ) is the value at time  of scale , computed as an inner product between the signal   and the wavelet function  , (): (, ) = ⟨ () ,  , ()⟩ .
The statistic  2 () is then defined from these coefficients as which, for an -SOSS process, is related to the Hurst index as where the quantity   , related to the power of the process, is considered a constant.The plot log 2  2 () versus  forms the widely known Logscale Diagram described by Veitch and Abry [11].The Logscale Diagram of an -SOSS process is a straight line of slope 2 − 1.To obtain an unbiased estimation of the Hurst index based on the Logscale Diagram, it is also necessary to subtract the bias that results of averaging the logarithms of the respective variance of real world time series, which is estimated as [11] where   is the number of coefficients available at octave, that is If the Logscale Diagram of a time series cannot be adequately modeled with a linear model, possibly the scaling behavior needs to be described with more than one scaling parameter, that is, the Hurst parameter is not adequate (or insufficient) [22,23].However, even if the time series under study is not self-similar, the Logscale Diagram can show whether or not it presents LRD.

An Orthogonal Decomposition Performed by Subtracting Aggregated Series
The time series   can be decomposed into a set of time series, each one defined as where  (  )  is the time series   after two operations, which are as follows.
(2) Expansion of level   , which consists of "repeat" each element of a time series   times, that is, These zero-mean components  , , have three important properties.
(i) They synthesize the original time series without loss (assuming zero-mean), that is, (ii) They are pair-wise orthogonal: (iii) If   is exactly or at least second-order self-similar, then the variance of its components satisfies var ( where Another useful property relates the variance of the component to the variances of the aggregated series, that is, It is easy to proof (21): from ( 16), it turns out that var[ , and (21) comes after a substitution.
Properties (i), (ii), and (iii) imply that Then, the variance of the th component is related to the variance of   as var ( It is easy to prove the following relation: An immediate consequence of ( 24) is that the plot  + log  [var( It is a straight line for -SOSS time series, and the slope is related to the Hurst index so that  = 2 − 1.
For example, for  = 2 the statistics var( and in this case the basis function is the Haar wavelet, which is defined as [24,25] Figure 1 shows the components obtained from an -SOSS sample of size 32 and  = 0.9.The squared form of the components is due to the expansion of the aggregated series and disappears after the downsampling. The authors of [26] described the aggregation as an inner product with the signal and the Haar "father" wavelet, and then, the relation between wavelet coefficients and aggregation levels is obvious.At this point there are similarities between this section and that previous work, the most important is that the relation between aggregation levels and Haar wavelet is described by Abry et al.However, two differences must be highlighted: (1) the decomposition presented in this work only coincides with that definition for  = 2 in (16); for higher values, the Haar wavelet is not sufficient to describe the components of (16); and (2) authors of [26] discard anyway the estimation of the Hurst index based on the so named "-aggregation." In this work, it is clarified that the "-aggregation" is misunderstood leading to incorrect implementations (i.e., the "classical" variance estimator), and a generalization of the "-aggregation" is proposed.
The function defined by ( 27) is a generalized form of the Haar wavelet.It is always a rectangle-shaped function, but it is not symmetric about the horizontal axis except for  = 2.
Figure 3 shows the basis function for  = 3.Note that the phase shift moves the rectangle of height 2/3 from onethird to another.In this case, there exists also redundant information, as  3,0,0,2 () = − 3,0,0,0 () −  3,0,0,1 () and in the general case  ,,, () = − ∑  ̸ =   ,,, (), which means that one of the coefficients, for example, the one obtained with the last phase shift, can be discarded.This means that the sequence of coefficients can be downsampled without loss of information.For a sample of length , it is easy to verify that the number of observations that remains in all components (sequences of coefficients) after the downsampling is  − 1, which can be complemented with the sample mean, in the case that this is not zero.
where these weights   are defined as where Ŝ2 () and ĉ  are the respective estimations of  2 () and    (the associated power parameter [4]) from   and  1 is the desired Hurst index of the new synthetic series.It is necessary that Ŝ2 () > 0 for all  = 1, . . ., .
The weighted sum (29) can also be expressed in terms of the orthogonal components described in Section 3 and defined by (16) as follows: where the weights   are computed as where  is defined by (20).Note that the only restriction of (32), similarly to (30), is that the estimated variance Ĉ , , must be nonzero.Even though this synthesis does not depend either on the marginal distribution or the correlation structure of the input signal, it is preferable that this is selfsimilar (e.g., FGN) and that its Hurst index is close to that of   1 (), that is,  1 ≈ .
Pathological behavior can be produced in the output series for some critical conditions, for example, noticeable steps, which may produce a nonstationary signal, result from transforming an SRD, or uncorrelated input signal to an LRD output with  close to 1. Impulses of very large magnitude (outliers) can be also be produced when, for some , Ŝ2 () is close to zero.
A similar methodology was developed by Deléchelle et al. [27], to synthesize fractional Gaussian noise by performing a weighted sum of the intrinsic mode functions (IMF) of a white noise process.The advantages of the proposed synthesis compared to that of Deléchelle et al. are that the components defined by ( 16) are exactly orthogonal, the relation between the weights for the reconstruction sum are mathematically well defined, and the Hurst index of the synthesized time series matches perfectly the wavelet estimator proposed by Abry et al. [4], that is, the estimated  of the synthesized series is unbiased (( Ĥ) = ) and has zero variance (var( Ĥ) = 0).The disadvantages are that the components are sequences of squared signals (because of the expansion described in Section 3) and not sinusoids, and noticeable steps arise when synthesizing a time series with high Hurst index, for example, close to 1, from an input that is SRD or weakly correlated.A solution for this problem is to apply interpolation (as in EMD) instead of expanding the series in order to produce softer components (sinusoids or polynomial) instead of square type, with the consequence that the Hurst index is no longer exact, but approximated.

Estimation of Mean and Variance
Self-Similar Time Series

Sample Mean.
The sample mean of a self-similar process is unbiased: its expected value is the process mean, that is, () =   , where  = 1/ ∑  =1   , regardless of the presence of correlation between observations.However, its variance does not depend only on the sample size () but also on the degree of self-similarity () of the process as follows: which becomes (classical estimator) for  = 0.5 (uncorrelated observations).
Figure 4 shows the probability distribution function (PDF) of the sample mean of standardized FGN.
To derive (33), consider that , estimated from a sample of size , behaves exactly the same as the stationary aggregated process  ()   , defined by (1), and its variance is determined by the definition of second-order self-similarity (7).Expression (33) can be also derived (for  > 0.5) from the autocorrelation coefficient () = 0.5[( + 1) 2 − 2 2 + ( − 1) 2 ] for  ≥ 1 ((0) = 1) and var() = ( 2  /  2 ) ∑  ,=1 ().Important implications of (33) about the uncertainty of the mean are (1) that it increases with the Hurst index, for example, var() tends to  2  as  tends to 1, which makes the sample mean worth a single observation and (2) that it cannot be zero for any case when estimated from a finite-size sample.

Sample Variance.
For a high number of observations, sample variance is usually calculated as It is known that estimator ( 35) is biased, so for small samples, it is more adequate to use Particularly, if the observations   are independent and come from a normal distribution, σ2 is distributed as σ2  ∼ ( 2  /) 2 −1 , as stated by Cochran's theorem [28,29].Formula (36) is the most used estimator of the sample variance [14] but, as Beran indicates in [30], it is needed to know which assumptions this estimator is based on in order to apply it correctly; otherwise, it may be the source of errors that in practice cannot be negligible for all cases.
A self-similar process is uncorrelated only and only if the Hurst index is 0.5.In this particular case, the classical estimator of sample variance, defined by (36), is unbiased.
The expected value of the sample variance defined by ( 36) is which can be expressed as then, Mathematical Problems in Engineering Expression (39) proves that the classical estimator ( 36) is biased (i.e., (σ 2  ) ̸ =  2  ) for  ̸ = 0.5.It is straightforward that the unbiased variance estimator for self-similar processes is then which obviously becomes (36) for  = 0.5.
A plot of log 10 (σ 2  −  2  ) versus  is shown in Figure 5.Note that, for a fixed sample size, as  increases the estimation of the variance by means of the classical estimator, (36) becomes less significant.
Figure 5 exemplifies that as the sample size is greater, the classical variance estimator has less bias; however, as the Hurst index of the process is greater, the bias is also greater (in magnitude).Note that as  approaches to 1, the variance is considerably underestimated, which makes the classical variance estimator useless.
The variance of the estimated variance of a self-similar time series can be approximated by applying the formula proposed by Yunhua in [31] for  = 0, that is, This approximation is close to the variance of σ2  , with the disadvantage that it has a discontinuity in  = 0.75.Further work can be developed in order to verify this approximation and to quantify its error.
Let us mention that, although the proposed estimator of the sample variance is unbiased, its performance relies on the estimation of the Hurst index.This dependence is very noticeable as  approaches to 1, as the statistic 1 −  2 Ĥ−2 is especially sensitive to the variation of Ĥ under that condition.The derivative (1− 2 Ĥ−2 )/ Ĥ = −2 2 Ĥ−2 ln() versus  is shown in Figure 6.Note that as the estimated  increases, the estimation of the mean of the sample variance is more variable.
An immediate implication of this is that processes with Hurst index close to 1 must be carefully treated, as slight deviations of the Hurst index estimation derive in a nonnegligible error in the estimation of the process variance.

Statistics of the Aggregated Process.
The aggregated process  ()  (defined by ( 2)) derived from an -SS process is also -SS (self-similar with the same Hurst index).It is also true for the case of -SOSS processes, and this aggregated process is, by definition, identically distributed to the sample mean obtained from a set of  =  observations ( is the aggregation level, as in ( 2)), that is,  ()  ∼ μ.Also, the aggregated sample variance obtained with the classical estimator ( 36) is biased, as it is well known [32].
The variance of the aggregated series of an -SOSS process must be then estimated with the unbiased formula (40) adapted to the number of observations in the sample, that is, where   is the size of the series after aggregation (i.e.,   = /).Note that the estimation of the sample mean from the aggregated sample is also unbiased, that is, [ ()   ] = (  ), and it is more reliable than the mean estimated from a sample of the same size, that is, var[ ()   ] =  2−2 var() if the samples are of equal size.

Statistics of the Orthogonal Components.
Let { X ;  = 1, . . ., } be a finite-length self-similar time series such that  =   ( < ∞) and  ≥ 2 (i.e.,  is a power of ); then a set of nonzero  components ( Ĉ , , ;  = 1, . . ., ) can be obtained as expressed by the analysis (16).As the components are pair-wise orthogonal, the variance of X (σ 2  ) is the sum of a finite number of variances: Expressions ( 43) and (19) imply that the variance of the th component ( Ĉ , , ) (computed using formula (35)), which has also finite length, is and estimating var(  46) implies that the estimation of the variance of components is unbiased (a desirable property) and, as a consequence, so it is the estimation of the statistic  2 ().Another implication of ( 46) is that the estimations of the Hurst index and the power parameter    from the Logscale Diagram are unbiased, as have previously been proven by the authors of [11].

Correlation of the Wavelet Coefficients.
Let us describe the autocovariance of the Haar wavelet coefficients, that is, the case for  = 2.The structure of this th component ( 2, , ), as a function of the elements of   , is Note that ) . ( Assuming that   represents a zero-mean -SOSS process, and according to the definition (8), ( 2  (2 −1 ) 2(+)−1 ) =   (2 − 1).Then, (48) is calculated as and the correlation coefficient of the th component is then As (50) shows, the correlation structure is the same for all components, that is,    () is independent of .Other implications of (50) are that which can be approximated as 0.8 0.9 that is, the coefficients are weakly correlated and the sum of correlations is finite [33]; furthermore, none of the components (sequences of wavelet coefficient) can be a self-similar time series except for  = 0.5, for example, components of a white noise process are also white noise processes.A plot of    () versus  is shown in Figure 7 for  = {0.1,0.3, 0.5, 0.7, 0.9}.The sum ∑ ∞ =0    () versus  is shown in Figure 8.

Variance Plot-Based Estimation of the Hurst Index
As described in Section 2.3, the Variance Plot is a straight line for self-similar time series but, as many authors have claimed, it underestimates the Hurst index when working with real world data.This is a consequence of the inadequate Mathematical Problems in Engineering estimation of the aggregated variance, caused by the application of the classical formula (36) regardless of whether the original process presents any type of correlation.The solution would be then to apply the unbiased formula (42), but it leads to an ill-conditioned problem: the Hurst index needs to be estimated and known at the same time.Then, it is not that the Variance Plot is not adequate to estimate the Hurst index, rather the flaw of those implementations is that the aggregated variance is underestimated.Nevertheless, it is actually possible to estimate the Hurst index analytically from the Variance Plot: the key is to choose the aggregation levels so that they form a geometric series, as explained in Section 5.1.Note that a numerical method can also be applied to estimate simultaneously the Variance Plot and the Hurst index, but the proposed solution is computationally simpler and more efficient.
Note that the Variance Plot can be estimated without bias, but the Hurt index is not estimated from it.Furthermore, if the aggregation levels are taken as   = 2  , the estimator is exactly the same than the one that uses Haar wavelet.The authors of [34] developed an empirical study of estimation of the Hurst index from series with the presence of trends.They conclude that a method named differenced-variance (a variation of the variance-bases estimator) should not be used for estimating the Hurst index.The proposed solution is also a differenced-variance type method, but it can be used to estimate the Hurst index without bias and with optimal variance.Evidently, when working with real world traces, the Variance Plot may differ from the straight line, and an additional bias results from the logarithm as [log(⋅)] ̸ = log[(⋅)].This bias can be subtracted analogously to (15), that is, where   is the bias defined in (14).

Simulation and Measurements
6.1.Estimation of the Sample Mean.In order to verify the equations that describe the mean and variance of the sample mean, a set of zero-mean, unitary variance, and FGN time series of size   = 10 6 observations are generated using an implementation of the generator proposed by Davies and Harte in [35], each for a different Hurst index for  = {0.30,0.50, 0.70, 0.90}.Then, the mean is estimated from blocks of size  = 100 and the empirical PDF is obtained from the estimations and compared to the classical (34) and proposed (33) estimators.Only proposed estimator (33) represents adequately this phenomenon for the four cases.

Estimation of the Sample Variance.
The followed procedure to verify the proposed estimator of the sample variance (σ 2  ) consists of the generation of a set of 100 FGN samples of size 1024 for each value of  = {0.05,0.10, 0.15, . . ., 0.95} and the estimation of the variance using the classical formula (36) and the proposed estimator (40).The respective mean of both estimations for each set was obtained.For the estimation of the Hurst index the wavelet estimator of Veitch and Abry [11] is used.
Figure 10 shows that the classical formula underestimates the variance noticeably for higher values of .For  > 0.95 the estimated variance is less than half of the process variance.The proposed formula (40) does not underestimate the variance, but for high values of  the estimated variance is significantly different from one realization to another.This variation results from the estimation of the Hurst index, as the statistic 1 −  2 Ĥ−2 is very sensitive to the variations of Ĥ, which depends, in turn, on the efficiency of the sample generator.This is an indicator that the generator proposed by Davies and Harte may be less accurate as  is closer to 1.
Figure 11 shows the variance of the estimated variance obtained with the proposed formula (40) compared to the approximation proposed by Yunhua in [31], as expected, the variance of the estimation is close to zero (lower than 0.001) for  < 0.5, but as the Hurst index increases, it becomes noticeable when  > 0.75.This verifies the observation of [31], which says that beyond  = 0.75 the precision of the autocorrelation is about one order lower than when  < 0.75.

Synthesis of 𝐻-SOSS Time Series.
To exemplify the proposed wavelet-based synthesis (described in Section 3.2), four time series with respective Hurst index 0.3, 0.5, 0.7, and 0.9 were synthesized from an FGN sample of size 1024.The Logscale Diagram of the four new time series were obtained and compared to that of the original sample.The plot   versus  for each one; the four synthesized series is shown in Figure 12.One can visually check the presence of positive correlation in Figures 12(c) and 12(d).
The Logscale Diagram of these artificial series is shown in Figure 13.Note that the original Logscale Diagram of the source sample is not a straight line, but it so is for the synthesized series.Also, the estimated Hurst index of this series is Ĥ = 0.56 and its estimated Logscale Diagram is not a straight line, but the estimated Hurst index of the four generated series is exactly the desired, for example, Ĥ = 0.30 for the series shown in Figure 12(a) and the same for the others, and their respective Logscale Diagram is a straight line.[36].The discovery of LRD (a kind of asymptotic fractal scaling) and weak self-similarity in the VoIP jitter data traces was followed by a further work that shows the evidence for multifractal behavior.The discovery of evidence for multifractal behavior is a richer form of scaling behavior associated with nonuniform local variability, which could lead to a complete and robust model of IP network traffic over all time scales.Motivated by such concerns, the evidence for multifractal behavior of VoIP jitter data traces is reviewed.In order to accomplish this analysis, the time series of VoIP jitter into a set of time series or components  2, , is decomposed as defined by (16).The behavior of these components is used to determine the kind of asymptotic fractal scaling.If the variance of the components of a time series is modeled by a straight line, the time series exhibits monofractal behavior, and a linear regression can be applied in order to estimate the Hurst parameter.On the other hand, if the variance of the components cannot be adequately modeled with a linear model, then the scaling behavior should be described with more than one scaling parameter, that is, the time series exhibits multifractal behavior [23].In Figures 14 and 15, we show the components behavior of the collected VoIP jitter data traces.Figure 14 shows the components behaviors of a VoIP jitter data trace that belongs to the data sets with SRD.It is observed that the variance of the components of this time series is modeled by a straight line; therefore, the time series exhibits monofractal behavior.
Figure 15 shows the components behaviors of a VoIP jitter data trace that belongs to the data sets with LRD.It is observed that the variance of the components of this time series cannot be adequately modeled with a linear model, and the scaling behavior should be described with multiple scaling parameters (biscaling); therefore, this time series exhibits multifractal behavior.These results show that VoIP jitter with SRD or LRD exhibit monofractal or multifractal behavior, respectively.This phenomenon explains the behavior of the data traces with SRD and high degree of self-similarity (scale invariance), because the self-similarity is defined for a single scale parameter.On the other hand, the data traces with LRD exhibit weak self-similarity because they have associated nonuniform local variability (multifractal behavior).

Mathematical Problems in Engineering
The implication of this behavior for VoIP and other interactive multimedia services is that receiver dejitter buffer may not be large enough to mask the jitter with LRD and multifractal characteristics.

Conclusion
An orthogonal decomposition, that constitutes a powerful statistic tool to study discrete time series, is presented.The resulting components ( , , ) have zero-mean and three desirably properties: (1) they synthesize the source signal without loss, (2) they are pair wise orthogonal, and (3) their variances form a geometric series whose rate is related to the Hurst index.
This decomposition is firstly compared to Haar wavelet based.For a particular case these two coincide, but the proposed one is more general, as other levels of aggregation may occur.In this case, the Haar wavelet is not sufficient and a special class of basis functions are defined, and the wavelet coefficients are obtained by the inner products between the signal under study and a scaled, displaced, and phase shifted versions of the basis function, in contradistinction of other wavelet decomposition, that only apply scaling and displacement.For a fixed scaling and displacement, the last phase shift can be discarded (i.e., the components are downsampled) as it does not provide additional information.
The proposed decomposition can be used to estimate the Hurst index, as the plot  + log  [var( , , )] versus  is equivalent to the Logscale Diagram proposed by Veitch and Abry in [11].It is a straight line for -SOSS time series, and the slope is related to the Hurst index so that  = 2 − 1.
The components can be also used to synthesize -SOSS time series by means of the weighted sum defined by ( 31) and (32), regardless of the distribution of the source signal and whether it is or not self-similar.The synthesis is exact, that is, the Hurst index of the synthesized series is exactly the desired, which is an advantage over the proposed synthesis, that uses IMF from an EMD decomposition, described in [27].
A study of the estimated mean and variance of self-similar time series is also presented.Both statistics, mean and variance, can be estimated without bias, by applying the classic sample mean ( = 1/ ∑  =1   ) and the proposed variance estimator (40).The variance of these estimators depends on the Hurst index of the process.In the case of the sample mean, its variance is an increasing function of , as expressed by (33), such that var() ∈ ( 2   −2 ,  2  ).Note that the sample mean becomes less significant as  approaches to 1.For the variance of σ2  , it can be observed that it has the best performance when  = 0.5 (the variance of σ2  is minimal).For  < 0.5, the variance increases as  is lower, but it is still acceptable (e.g., var(σ 2  ) < 0.001 2  when  → 0. But when  gets close to 1, the uncertainty of the sample variance increases rapidly, making the estimation of  2  less significant.It is demonstrated also (clarifying a popular misunderstanding) that the Variance Plot can be used to estimate efficiently the Hurst index.The claims of many researchers about the inefficiency of the Variance Plot are the result of an ill-conditioned problem: to estimate the Hurst index the variance of the aggregated series is needed and vice versa, leading to a vicious circle.It is shown how this problem is avoided by estimating the Variance Plot using a set of aggregation levels that follow a geometric series and calculating the differences between the variances of the corresponding aggregated series; then, a weighted linear regression is applied to estimate the slope (ŝ), and the estimated Hurst index is Ĥ = ŝ/2 + 1.This result gives more significance to the results published by Abry et al. in [26].

Figure 5 :
Figure 5: Logarithm of the variance bias for different sample sizes.

Figure 8 :
Figure 8: Sum of correlation coefficients of the Haar wavelet components.

Figure 9 (
a) shows that the variance of the estimated mean does not fit the classical model when SRD or LRD is present (Figures 9(a), 9(c), and 9(d)), but only for the uncorrelated case (Figure 9(b)).

Figure 15 :
Figure 15: Components behavior of VoIP jitter data traces: multifractal behavior.