ON A KINETIC MODEL OF THE INTERNET TRAFFIC

We present the modiﬁcation of the Prigogine-Herman kinetic equation related to the network tra ﬃ c. We discuss a solution of this equation for homogeneous time-independent situations and for the lognormal desired speed distribution function, obtained from the tra ﬃ c measurements. This solution clearly shows two modes corresponding to individual ﬂow patterns (low concentration mode) and to collective ﬂow patterns (tra ﬃ c jam mode). For situations with low concentration, there is almost a linear dependence of the information ﬂow versus the concentration and the higher the average speed the lower the concentration at which the optimum ﬂow takes place. When approaching the critical concentration, there are no essential di ﬀ erences in the ﬂow for di ﬀ erent average speeds, whereas for the individual ﬂow regions there are dramatic di ﬀ erences.


Introduction
Within the global framework of the information society, fast and reliable data exchange between local and wide-area computer networks is a priority issue.In this connection, a major challenge for the emerging high-speed integrated-services communication networks is to elaborate models that can realistically capture basic features of the network traffic.Such models may serve as a basis for development of methods and tools for quality assessment, providing more efficient control and management of information flows on the Internet [7,17].
Prigogine and Herman developed (see [14] and the references therein) a model of vehicular traffic dynamics based on principles of statistical physics and kinetic theory.This model has proved to be quite successful for description and explanation of main features of vehicular traffic [16].In [6] we applied the kinetic approach to the network traffic measurements and discussed the first results on the network traffic dynamics.The present work is the revised and extended version of paper [6].
Section 2 describes the data acquisition system for traffic measurements realized on a standard IBM PC at the input lock of the medium size local area computer network.In Section 3, we present some basic concepts of the kinetic equation and discuss its modification related to the network traffic.In Section 4, we discuss a form of the statistical distribution of the network traffic and a corresponding desired speed distribution function, which is needed for the kinetic equation.Section 5 analyzes solutions of the kinetic equation for the chosen desired distribution function.

Data acquisition system
Our study is based on the traffic measurements obtained at the input of the Dubna University (http://www.uni-dubna.ru)local area network (LAN) with approximately 200-250 interconnected computers.Two protocols were used in the "Dubna" LAN: the Net-BEUI is applied only for internal exchanges, and the TCP/IP for external communications.The measurements of the network traffic have been realized at the external side of the input lock of LAN.The performance of the data acquisition system is based on realization of an open mode driver [18], see Figure 2.1.
In standard conditions the network adapter of a computer is in a mode of detecting a carrying signal (main harmonic 4-6 MHz).After appearing in the cable bits of the package preamble, the network adapter comes to a mode of 1 bit and 1 byte synchronization with the transmitter and starts receiving the first bytes of the package heading.As soon as one succeeds in extracting the MAC address of the shot receiver from the first bytes taken by the adapter, the network adapter compares it to its own.In the case of negative result of the comparison, the network adapter ceases to record the shot's bytes into its internal buffer and cleans its contents and then waits until the next package appears.In order to provide conditions for receiving and analyzing all the packages transmitted over the network, it is necessary to move the adapter devices to a free mode when all possible shots are recorded in the buffer.This operation is executed through the instructions of the NDIS driver.
The free mode driver records the accepted packages in the preliminary capture buffer and displays the flag of receiving the package.Then the receiving package module is activated and analysis of the margin of the package's type is carried out to extract TCP/IP packages from the whole stream.
After identification, it is possible to separate and delete the data block as well as to record the headers to the SQL server database.The recording is performed simultaneously with the time data with a frequency up to 10 kHz.Although the recording is performed with buffering, the mode of saving the packages' headers requires enormous server's resources, as in this case there is a permanent procedure of recording with small portions to the hard disk.That is why this mode is switched on, if required, at the management system's instruction.
The system also provides control over the external traffic of the LAN on the basis of controlling the records in the router table.Initial information on the legal IP addresses is saved in the database of the LAN computers from which data on legal addresses are loaded into the main memory array.The users which do not participate in forming the external traffic are not taken into account when calculating the number of transferred and received bytes.In order to decrease the number of sessions of recording the information on the external traffic in the database, a timer of load out of the buffer and a timer of changing a current date have been introduced into the system.
The recorded traffic data correspond approximately to 20 hours (1600000 records with a frequency up to 10 kHz, which corresponds to 1millisecond bin size) of measurements.The part of this series corresponding approximately to 1 hour of measurements and aggregated with different bin sizes is presented in Figure 2.2.The contribution of the Net-BEUI traffic has been estimated around 1-6 packages per second during daily working hours.This is negligibly small compared to the TCP/IP traffic.In this connection, we may neglect the influence of non-IP traffic on the TCP/IP traffic.

Basic concepts of the kinetic model of network traffic
Following the basic concepts of the Prigogine-Herman kinetic model of a vehicular traffic [14], we assume that for the network traffic as well, the velocity distribution function f (x,v,t) satisfies the following kinetic equation: Here f 0 is the "desired" velocity distribution function [14], x and v are the position and velocity of the information packages ("vehicles"), v is the average velocity, c is the concentration of information packages, P is the probability of "passing" in the sense of increase of information flow, and T is the relaxation time.
In particular, for the network traffic we assume the following: (i) the space dependence x is omitted, because the passing of packets is a homogeneous space-independent process, (ii) the velocity v is the amount of information passing during the time interval corresponding to the chosen aggregation window, (iii) the time t is discrete with unit equal to the size of the aggregation window (for example 1second).
The amount of information packages dN within the velocity interval between v and v + dv at time t is given by Once the velocity distribution function f (v,t) is known, we may derive other quantities involved in (3.1), such as the concentration c(t), the average velocity v(t), and the information flow q(t), To find the distribution function f (v,t), we will study first the stationary solution f (v), which satisfies one of the following equations: The quantity f (v) describes the situation in which there is a steady state between slowing down of information transfer caused by the "interaction processes" and the speeding up corresponding to more effective passing of information through network.
In order to perform calculations, we need to know how the relaxation time T and the probability P of passing change with concentration.Following [14], we suppose that the relaxation time T is of the form where τ is an arbitrary parameter.We also assume [14] that the probability P of passing has the form where c p is the limiting or jam concentration when the information packets can no longer pass and η is the normalized concentration.Note that for small concentration c, the relaxation time T is also small, whereas for large concentrations, the relaxation time is very long too.
We go back to the stationary solution of the kinetic equation.The denominator can be rewritten as follows: (3.9) In the case f is given by This solution reduces to the desired speed distribution function f 0 in the limit c → 0. If 1 − cT v(1 − P) < 0, we have no solution because the distribution function can not be negative.
In the case f is given by The important feature of (3.13) is that it admits the deterministic solution where α is an arbitrary constant and δ(v) is the Dirac delta function.
As a result, we have the following general form of the stationary solution: where α is unknown.However, for the case (3.10), the singularity occurs for a negative value of v and we, therefore, have to take α = 0. On the contrary, if we would have 1 − cT v(1 − P) < 0, then the singularity would occur for positive values of v, which is impossible.Therefore, we have only two possibilities given by (3.11) and (3.13).The solution given by (3.11) corresponds to what may be called the individual flow pattern and is related in a simple way to the ideal or desired speed distribution function.The second solution, given by (3.13), corresponds to what may be called the collective flow pattern.This solution is characterized by the occurrence of a singularity at the origin.However, the critical concentration at which the individual flow becomes collective does not depend on the desired speed distribution function.

Lognormal distribution of traffic measurements
In [2,3,4] we applied systematically the nonlinear time series analysis approach [1] to the above-described traffic measurements.It has been found that these techniques can be successfully used for deeper understanding of main features of the traffic data.In order to reconstruct the underlying dynamical system, we estimated the correlation length and the embedding dimension of the traffic series.The reliable values of the correlation length and the embedding dimension provided the application of a layered neural network for identification and reconstruction of the dynamical system.We have found that the trained neural network reproduces the packet size distribution of real measurements aggregated with 1second bin size.This distribution looks like the lognormal distribution [3,9].The lognormal distribution has been first observed, to our knowledge, by Lucas et al. [12] for the empirical probability distributions of packet arrivals aggregated at 100 milliseconds.Later they developed the background traffic model, or (M,P,S) model [11], which realistically generated the aggregated traffic flows for a large campus network.The lognormal distributions for packet arrivals were observed at different stream scales [11].Similar interarrival time distributions for channel arrivals were observed in cellular telephony [15].However, there was no reliable explanation of reasons responsible for appearance of such a distribution.Having available traffic data measured at high frequency (each arriving packet was recorded independently, see Section 2), we obtained the possibility to analyze the influence of the aggregation bin on the form of the packet size distribution.(The detailed analysis of influence of the aggregation process on the form of the statistical distribution of the traffic measurements is given in paper [5].)One can clearly see that for small bin sizes, the packet size distributions have rather chaotic and nonsystematic character.However, when the aggregation bin size approaches 1second (see Figure 4.4), the distribution assumes a stable form that does not change with further increase of the aggregation bin.
The distribution in Figure 4.4 is well approximated by the lognormal function [9] f where x is the variable, σ and µ are the parameters of lognormal distribution, and A is the normalizing multiplier.
The fitting procedure was realized with the help of the MINUIT package [10] in frames of the well-known PAW (Physics Analysis Workstation, see details in [8]).The MINUIT  package is conceived as a tool to find the minimum value of a multiparameter function and to analyze the shape of the function around the minimum [10].In Table 4.1, we present the results of fitting the packet size distributions aggregated with different bin sizes with the help of the lognormal distribution.(In paper [6] the χ 2 values presented in Table 4.1 are given with errors caused by the uncorrect output of the MINUIT package in frames of PAW.)Here the calculated value of the minimized function FCN, usually defined as χ 2 [10], is where a is the parameter vector, e 2 i = x i is the square of the error on the individual observations, and N is the number of channels in the fitting histogram.
One can see that the fitting curve corresponding to the lognormal distribution approximates experimental distribution with a reliable accuracy on all regions of the analyzed distribution.However, as it can be seen from Table 4.1, experimental distributions did not pass the χ 2 -test.
The main reason is that the analyzed distributions are based on the whole data set, which corresponds approximately to 20 hours of measurements.However, the traffic series, as well as corresponding statistical distributions, behave differently depending on whether the measurements were done during working hours or not.In this connection, we used a part of the daily traffic (shown in Figure 2.2) and tested the correspondence of experimental distributions to the null hypothesis (4.1) applying the χ 2 goodness-of-fit criterion.The results of this analysis are presented in Table 4.2.
Here α is the probability (in %) that the observed chi-square will exceed the value χ 2 by chance even for a correct model; see, for instance, [9,13].
These results show that the hypothesis (4.1) can be accepted with a high probability; see also Figure 4.5.At the same time it must be noted (see Figure 4.4) that the influence of the inactive period of LAN does not change significantly the fundamental form of the statistical distribution of traffic data.
We conclude [5], therefore, that (i) the aggregation of traffic measurements forms (starting from some threshold value of the aggregation window) a statistical distribution, which does not change its form with further increase of the aggregation window; (ii) this distribution is approximated with high accuracy by the lognormal distribution.
From the viewpoint of further usage of the lognormal distributions presented above for different levels of aggregation, it is convenient to make the following change of variables y = x/A.In this case, instead of (4.1), we get where µ = µ − logA, and parameter σ has the same value as in (4.1); see also Table 4.1.This function can be considered as the desired velocity distribution function f 0 .

Analysis of solutions of the kinetic equation
We first analyze the case when concentrations are much smaller than the critical concentration c crit (see (3.12)) given by at which the transition to the collective flow occurs.Then we may obtain from (3.11), as the first approximation to the desired speed distribution function, or with γ = Tc(1 − P). (5.4) Note that the value of v0 is obtained by using the desired speed distribution function (4.3).Using expressions (3.7) and (3.8), we may rewrite (5.4) in the form One can see from (5.5) that for low concentration situations, the value γ varies approximately as the third power of the concentration.Using (5.3) we may obtain the form of the flow at low concentration: This result shows that the first-order deviations from the linear portion of the flow curve are determined by the dispersion of the desired speed distribution function.For higher concentrations, higher statistical moments contribute to this effect.
As for higher concentrations, it appears impossible to express the distribution function f analytically; numerical analysis becomes necessary.The results of numerical computations are presented in Figures 5.1 and 5.2.
Figure 5.1 shows a dependence of the normalized flow q/c p against the normalized concentration η for the desired speed distribution function (4.3).The family of curves (dashed and dashed-dotted curves) are for the case c p τ = 0.1 and an average desired speed v0 ranging from 10 to 80.These curves demonstrate the individual flows for various values of v0 .The linear dependence of the curves for low concentrations is clearly seen.We also see that the higher the average speed, the lower the concentration at which the optimum flow takes place.It must be noted also that for η ∼ 1, there are no essential differences in the flow for different average desired speeds, whereas for lower η there are dramatic differences.The solid curve represents the collective flow curve.
It is quite interesting that the critical values of the normalized flow do not depend on the aggregation window.This may imply that the phase lines which have been obtained with help of different values σ and µ at given aggregation window (Table 4.1) have the same form.
In Figure 5.2, we present the dependence of the average speed versus the normalized concentration for different average desired speeds.
Both Figures 5.1 and 5.2 look very similar to what has been obtained by Prigogine and Herman [14] for the vehicular traffic.

Conclusion
We present here the modification of the Prigogine-Herman's kinetic equation related to the network traffic.For the lognormal desired speed function obtained from traffic measurements and for given probability P of passing, limiting concentration c p , and parameter τ, the dependence of the normalized flow q/c p versus the normalized concentration η clearly shows two modes corresponding to individual flow patterns (low concentration mode) and to collective flow patterns (traffic jam mode).For low concentration situations, the normalized flow depends linearly on η and for higher average speed, the concentration at which optimum flow takes place is lower.When approaching the critical concentration, there is no essential difference in the flow for various average desired speeds, whereas for lower η (corresponding to the individual flow region) there are dramatic differences.
The first results demonstrate a very interesting behavior of information traffic flows that can be useful from the practical point of view.However, to establish more close relations between the predictions given by the kinetic model and real network traffic, more detailed studies are needed, including research in the relaxation time T, the form of probability P, and so forth.

Figure 4 .
1 shows the packet size distribution for raw traffic measurements, while Figures 4.2, 4.3, and 4.4 present the distributions for measurements aggregated with bin sizes 10milliseconds, 100milliseconds, and 1second, correspondingly.

Figure 5 . 2 .
Figure 5.2.The average velocity as function of η = c/c p for different average desired speeds v0 and τc p = 0.1 for the lognormal distribution with σ = 0.93 and µ = 0.83.

Table 4 .
2. Results of fitting the daily part of the packet size distributions aggregated with different bin sizes by the function (4.1).