Smoothing the Sample Autocorrelation of Long-Range-Dependent Traffic

This paper depicts our work in smoothing the sample autocorrelation function (ACF) of traffic. The experimental results exhibit that the sample ACF of traffic may be smoothed by the way of average. In addition, the results imply that the sum of sample ACFs of traffic convergences. Considering that the traffic data used in this research is long-range dependent (LRD), the latter may be meaningful for the theoretical research of LRD traffic.


Introduction
Let (()) be a sample record of teletraffic time series (traffic for short), where () for  ∈ N (the set of natural numbers) is the series of timestamps, indicating the timestamp of the th packet arriving at a server.Thus, (()) represents the packet size of the th packet recorded at time () on a packet-by-packet basis.We use a real-traffic trace named BC-Aug89 recorded on an Ethernet at the Bellcore Morristown Research and Engineering facility, which contains 1,000,000 packets [1].It was used in the pioneering work for revealing some statistical properties of traffic in fractals, such as selfsimilarity and long-range dependence (LRD) [2,3].
Note 1.The statistical properties described in the early literature, for example, [2,3], turn to be ubiquitous in today's traffic, according to the research stated in [4,5].Thus, the trace measured in 1989 keeps its value in the research of general patterns of traffic.
The word traffic is a collective noun.In addition to the traffic on a packet-by-packet basis as previously described, it may imply the time series called interarrival times, which is in the form (see [6,7])  () =  ( + 1) −  () . (1) The term traffic may also imply the time series named accumulated traffic, denoted by (), on an interval-byinterval basis, which is given by where  is the interval width.It stands for the accumulated bytes of arrival traffic in the th interval.
Note 3. The attributes of traffic are application dependent.More other meanings of traffic are available.It may mean the packet count [2].In some applications, one may be interested in the number of connections within a given time interval [4], the packet size or the number of packets on a flow-by-flow basis [8], envelopes of traffic [9], or traffic bounds [10].we will use () for the discussions instead of (()) for the purpose of simplicity.
Remark 1 (burstiness).Without considering the Ethernet preamble, header, or CRC, the Ethernet protocol forces all packets to have at least a minimum size of 64 bytes and at most the maximum size of 1518 bytes.The fixed limit of 1518 bytes is specified by IEEE standard without technical reason.Thus, it is often the case that the packet size of traffic may take the same value within a short period of time as Figure 1(c) shows.In addition, traffic has the behavior of "burstiness." By burstiness, one implies that there would be no packets transmitted for a while, then flurry of transmission, no transmission for another long time, and so forth if one observes traffic over a long period of time [11,12].This phenomenon, indicated in Figures 1(b)-1(d), was described as intermittency by Tobagi et al. [13].
Note 4. The intermittency of a random function is conventionally discussed in the field of turbulence [14], but we note that it is also a phenomenon of traffic.
The traffic series () is LRD.Denote its autocorrelation function (ACF) by () in the stationarity case.Then, where  is the mean operator and  is lag.Its power spectrum density (PSD) denoted by () is the Fourier transform of (): where  is the operator of the Fourier transform,  = 2, where  is the frequency.Note that the PSD of () belongs to 1/ noise [15].Thus, even from a view of data processing, () is preferred [16], because () is divergent at  = 0.In addition, the correlation model of traffic is desired in networking; see the statement of Paxson and Floyd [17,p. 5] as follows.
The issue of "how to go from the pure correlational structure, expressed in terms of a time series of packet arrivals per unit time, to the details of exactly when within each unit of time each individual packet arrives" has not been solved.For this reason, we discuss the issue of ACF estimation of LRD traffic.The remainder of this paper is as follows.In Section 2, we shall brief the preliminaries.Smoothing the sample ACF of traffic is discussed in Section 3. A case study is shown in Section 4. Discussions and future work are in Section 5, which is followed by conclusions.

Brief of Time Series
Denote by {  ()} a set of sample functions for  ∈ N and 0 <  < ∞, and   () ∈ R (set of real numbers) is the th sample function.A process consists of a set of sample functions [18,19].Note 5.In the case of traffic, requiring a set of sample functions {  ()} for  ∈ N at a specific point in networks may be unrealistic since one can only measure a single history of traffic trace at that specific point.One may never achieve a set of sample records of real traffic in the sense of repeated experiments under the exactly same conditions for 0 <  < ∞.Therefore, in traffic engineering, we are interested in a sample function () instead of a process.Note 6.We consider a time series () that is a random function.In this research, the terms of random function, time series, or process are interchangeable if there are no confusions.

Moment.
Denote by (, ) the probability density function (PDF) of a random function (), which is usually written as () in short.Then, the following is called its moment of order : The moment of a random function is consistent with the moment of a force in physics as well as mechanics in expression [20].It serves as a useful tool to represent certain important characteristics of a random function.

Mean and Mean
Square.Using the concept of the moment, one may conveniently represent the mean of () as its first-order moment given by The second moment of () may be its mean square value denoted by Note 7. The mean of () represents its average value around which () fluctuates.
Note 8.The mean square of () stands for its strength or average power.To explain this, we assume that () is a voltage exerting on a resistor of one Ohm.In this case,  2 () is the power the resistor consumes at time .Therefore, (7) implies the average power.Hence, the strength of ().
2.1.2.ACF.Now, we consider the product of () at two points, say ( 1 )( 2 ) for  1 ̸ =  2 .Since both ( 1 ) and ( 2 ) are random variables, we denote by ( 1 ,  1 ;  2 ,  2 ) the joint PDF of ( 1 ) and ( 2 ).With the help of the concept of moment, [( 1 )( 2 )] may be expressed in the form The function ( 1 ,  2 ) in ( 8) is called the ACF of ().It represents how one random variable ( 1 ) at time  1 correlates with the other ( 2 ) at another time  2 .In other words, it represents the correlation of () at two different points  1 and  2 .Note 9.In the case of  1 =  2 = , the ACF of () reduces to its mean square: Note 10.The term of the second-order moment of () may imply the moments of order 2, which in the wide sense or in general include mean square (7) and ACF (8).
Note 11.If we consider the moments of () up to 2, () is called 2-order random function, which plays a role in engineering.The moments of orders higher than 2 correspond to the case of higher order statistics, which we do not discuss in this paper.

PSD.
The Fourier transform of the ACF (, ) is given by It represents the energy distribution of ().
Note 12.In general,   () is time dependent.Therefore, the mean, mean square, ACF, and PSD may generally be time dependent.In the stationary case, they are independent of time.
As a matter of fact, the PDF of a Gaussian random function is given by In ( 16),  can be determined by (15) while  can be obtained from the following: Hence, Remark 4 results.
Note 16.If () is weak stationary, its () and () are constants.Its ACF depends only on lag: We list two properties of ACF below.
Note 17. P1 is obvious because the correlation between ( 1 ) and ( 2 ) is always equal to that between ( 2 ) and ( 1 ).P2 is natural because the correlation between the same point ( 1 ) and ( 1 ) always reaches its maximum.
Without loss of generality for the statistical analysis of random functions, one may adopt the concept of normalized random functions.By normalized, we mean (0) = 1 or (0) = 1.Therefore, a normalized random function may be obtained by ()/√(, ).5)∼ (15) regarding the mean, variance, and ACF of a random function () are associated with its PDF.That implies that they can be determined under the condition that the PDF is known.However, that may usually be too restrictive in practical applications in engineering.Fortunately, Wiener et al. proposed a computation approach using time average without relating to its PDF if () is ergodic [18,31,32].Note 18.It may be very difficult if not impossible for one to test the ergodicity by a sample function of a traffic trace.In practice, one may simply assume that a traffic trace () is ergodic.

Computational Methods. Previous expressions in (
In what follows, we suppose that () is causal.By causal, we mean that () is defined for 0 ≤  < ∞ and () = 0 for  < 0. In addition, we only consider () in the weak stationary sense.By using the time average, therefore, the mean of () is given by Its mean square is written by Its ACF is expressed by Its variance is given by Similarly, its autocovariance is given by In what follows, we only consider random functions with mean zero.Accordingly, () is equal to (), and Ψ =  2 unless otherwise stated.
We write the PSD of () by Alternatively, () can be expressed by
(i) The mean of () exists.

Smoothing Sample ACF of Traffic
Previous discussions require that −∞ <  < ∞.Even in the case of () being causal, 0 <  < ∞ is always required.However, that requirement may not be satisfied in practice in general since traffic () can be measured only in a finite time interval.

Sample ACF.
Suppose that one records traffic () in [0, ].Then, he or she attains a sample ACF of (), which may be estimated by  1 (): Note 19.The sample ACF  1 () may yet be a representative of the true ACF () in a way.Mathematically,  1 () = () under the condition of  → ∞.Unfortunately, the condition  → ∞ may be physically unrealizable.Now, assume that there is another person who measures the traffic at the same point in networks, but he does the measurement in the time interval [, 2].Then, the sample ACF is obtained by Due to the randomness of (), errors in numerical computations, and errors in the measurement of traffic data, though the width of [, 2] is equal to that of [0, ], one has, in general, Denote by   () the sample ACF of () in the interval [( − 1), ] for  = 1, 2, . . ., .Then, For the similar reasons we explained in (28), one generally has Note 20.It may be quite reasonable to consider each sample ACF   () as an estimate of the true ACF of (), but neither may be so appropriate unless  is large enough.In fact,   () is a random variable [32].

Smoothed Sample ACF.
Denote by r() the ACF estimate of ().The estimate r() is a random variable again.It has its distribution in the general form of (31).The issue studying the concrete form of ( 31) is interesting, but it is beyond the scope of this paper: In this research, we are interested in good estimate of ().
By good estimate, we mean that both its bias and variance are small.Since the sample ACF   () is unbiased.Therefore, what one is interested in is to find a way such that Var [r()] is small.The literature about this is relatively rich; see, for example, [31] and references therein.A simple way to reduce Var [r()] is average.That is, one may compute r() by the average of the sample ACFs as follows assuming that both lim  → ∞   () and lim  → ∞ r() exist: In that case, Var [r()] is inversely proportional to  [31]: The above implies the assumptions that both lim  → ∞   () and lim  → ∞ r() exist.The research of whether lim  → ∞   () or lim  → ∞ r() exists is attractive, but it is out of the scope of the paper.In the experimental research discussed in this paper, we assume that both exist.Note 21.The previous expression needs, for the purpose of ACF estimation of real-traffic (), purposely sectioning the sample record of a traffic trace () into a set of blocks such that the number of blocks, that is, the average count , is large enough for the desired level of Var [r()].
Note 22. ACF estimate r() is the average of the sample ACFs or the sum of the sample ACFs divided by .Other smoothing methods are available; see, for example, [40].
The previous discussions take the usage of integral.In numerical computations, the integral above should be replaced by summation.In the discrete case, we replace  by  for  = 1, 2, . . ., .In addition,   () is replaced by   () and () by ().Thus, we have The above computation does not follow (35) directly.In practice, the fast Fourier transform (FFT) and its inverse (IFFT) are suggested.More precisely, in the interval [( − 1), ], according to the Wiener theorem [18-20, 23, 31, 32, 34, 35], we have Then, Note 23.Usually,  as well as  take the form of 2  , where  is a positive integer.

A Case Study
Using the real-traffic trace BC-Aug89 in this case study, we set  = 2048.Using the technique of average may reduce the variance of the sample ACF.Denote by 16() the average of   () for  = 1, . . ., 16. Denote by 32() the average of   () for  = 1, . . ., 32.Denote by 64() the average of   () for  = 1, . . ., 64.Figures 4, 5, and 6, respectively, indicate the smoothed sample ACFs 16(), 32(), and 64().It can be seen that the fluctuations in Figure 3 are considerably reduced in 16().As a result, the larger the average count, the smoother the curve of the sample ACF estimate; see Figures 5 and 6.
Note 25.Though Var [r()] is inversely proportional to the averages count , over-large  may be unnecessary for improving an estimate.For instance, by eye, one may see that the one in Figure 6 does not show much improvement as that in Figure 5.

Discussions and Future Work
The previous exhibits the obvious effects of smoothing sample ACFs by average.However, there are critical points that need discussions regarding the smoothing of sample ACFs of traffic.
Traffic is LRD [1][2][3][4][5].According to Taqqu's law, it is heavy tailed [41].Resnick et al. [42] explained an important result in the aspect of sample ACF of heavy-tailed time series.It was stated in [42] that the sample ACF of heavy-tailed series may be random when the sample size approaches infinity if the series is with infinite variance.The case study in Section 4 demonstrates that the sum of sample ACFs is convergent.Consequently, the sample ACF is convergent too.Thus, may we infer that traffic, at least the data used there, is with finite variance?The answer to that question may be desired in traffic theory.We shall work on it in the future.Finally, it is noted that the relationship between the sample size and the variance of the sample ACF refers to [43].In addition, the relationship between the sample size and the variance bound of the sample ACF of fractional Gaussian noise with LRD is described in [44].

Conclusions
We have discussed the smooth effect of sample ACFs of traffic by average.Future research whether traffic is with finite variance or infinite one has been noted.
2.1.4.Weak Stationarity.If all moments of () do not vary with time, () has the property of strong stationarity.If the moments up to 2 are independent of time, irrelevant of the moments of order higher than 2, we say that () is of weak stationarity or stationary in the wide sense.

)
Note 14.How much the variation of () away from its mean () is characterized by its variance or standard deviation.
[21][22][23]alysis of variance (ANOVA) is a branch of statistics, which plays a role in many aspects of techniques, especially in the fields of statistics tests and experimental design[21][22][23].2.2.2.Autocovariance.In the case of mean zero, one uses the autocovariance function denoted by ( 1 ,  2 ) (ACF for short again) to characterize the correlation property of [()−()].