Research Article Characteristics of Video Traffic from Videoconference Applications: From H.261 to H.264

This paper presents modelling results for H.26x video traffic generated by popular videoconference software applications. The analysis of videoconference data, that were measured during realistic point-point videoconference sessions, led us to the general conclusion that the traffic can be distinguished into two categories: unconstrained and constrained. In the unconstrained traffic, there is a direct relation between the encoder and the form of the frequency histogram of the frame-size sequence. Moreover, for this type of traffic, strong correlations between successive video frames can be found. On the other hand, where bandwidth constraints are imposed during the encoding process, the generated traffic appears to exhibit similar characteristics for all the examined encoders with the very low autocorrelation values being the most notable one. On the basis of these results, this study proposes methods to calculate the parameters of a widely adopted autoregressive model for both types of traffic.


Introduction
H.26x videoconference traffic is expected to account for large portions of the multimedia traffic in future heterogeneous networks (wire, wireless and satellite).e videoconference traffic models for these networks must cover a wide range of traffic types and characteristics because the type of the terminals will range from a single home or mobile user (low video bit rate), where constrained video traffic is mainly produced, to a terminal connected to a backbone network (high video bit rate), where the traffic is presented to be unconstrained.
Partly due to the above reasons, the modelling and performance evaluation of videoconference traffic has been extensively studied in literature and a wide range of modelling methods exist.e results of relevant early studies [1][2][3][4][5][6][7][8] concerning the statistical analysis of variable bit rate videoconference streams being multiplexed in ATM networks, indicate that the histogram of the videoconference framesize sequence exhibits an asymmetric bell shape and that the autocorrelation function decays approximately exponentially to zero.An important body of knowledge, in videoconference traffic modelling, is the approach in [5] where the DAR(1) [9] model was proposed.More explicitly, in this study, the authors noted that AR models of at least order two are required for a satisfactory modelling of the examined H.261 encoded traffic patterns.However, in the same study, the authors observed that a simple DAR(1) model, based on a discrete-time, discrete state Markov Chain performs better-with respect to queueing-than a simple AR(2) model.e results of this study are further veri�ed by similar studies of videoconference traffic modelling [7] and VBR video performance and simulation [6,10].In [11], Dr. Heyman proposed and evaluated the GBAR process, as an accurate and well-performed single-source videoconference traffic model.
e DAR (1) and GBAR(1) models provide a basis for videoconference traffic modelling through the matching of basic statistical features of the sample traffic.On this basis and towards the modelling of videoconference traffic encoded by the Intra-H261 encoder of the ViC tool, the author in [12] proposed a DAR(p) model using the Weibull instead of the Gamma density for the �t of the sample histogram.In [13], the authors concluded that Long Range Dependence (LRD) has minimal impact on videoconference traffic modelling.
Relevant newer studies of videoconference traffic modelling reinforce the general conclusions obtained by the above earlier studies by evaluating and extending the existing models and also proposing new methods for successful and accurate modelling [14][15][16][17][18].An extensive public available library of frame size traces of unconstrained and constrained MPEG-4, H.263, and H.263+ offline encoded video was presented in [19] along with a detailed statistical analysis of the generated traces.In the same study, the use of movies, as visual content, led to frames generation with a Gammalike frame-size sequence histogram (more complex when a target rate was imposed) and an autocorrelation function that quickly decayed to zero (a traffic model was not proposed though in the certain study).
Of particular relevance to our work is the approach in [20], where an extensive study on multipoint videoconference traffic (H.261-encoded) modelling techniques was presented.In this study, the authors discussed methods for correctly matching the parameters of the modelling components to the measured H.261-encoded data derived from realistic multipoint conferences (in "continuous presence" mode).
e above studies certainly constitute a valuable body of knowledge.However, most of the above studies examine videoconference traffic traces compressed by encoders (mainly H.261) that were operating in an unconstrained mode and as a result produced traffic with similar characteristics (frame-size histogram of Gamma form and strong short-term correlations).Today, a large number of videoconference platforms exist, the majority of them operating over IP-based networking infrastructures and using practical implementations of the H.261 [21], H.263 [22,23], H.263+ [22,23] and H.264 encoders [24].e above encoders operate on sophisticated commercial soware packages that are able of working in both unconstrained and constrained modes of operation.In unconstrained VBR mode, the video system operates independently of the network (i.e., using a constant quantization scale throughout transmission).In the constrained mode, the encoder has knowledge of the networking constraints (either imposed offline by the user or online by an adaptive bandwidth adjustment mechanism of the encoder) and modulate its output in order to achieve the maximum video quality for the given content (by changing the quantization scale, skipping frames, or combining multiple frames into one).Furthermore, most of the previous studies have dealt with the H.261-encoding of movies (like Starwars) that exhibit abrupt scene changes.However, the traffic patterns generated by differential coding algorithms depend strongly on the variation of the visual information.For videoconference, the use of a single-model based on a few physically meaningful parameters and applicable to a large number of sequences seems possible, as the visual information is a typical head and shoulders content that does not contain abrupt scene changes and is consequently more amenable to modelling.Moreover, an understanding of the statistical nature of the constrained VBR sources is useful for designing call admission procedures.Modelling constrained VBR sources, to the best knowledge of these authors, is an open area for study.Our approach towards this direction was to gather video data generated by constrained VBR encoders that used a particular rate control algorithm to meet a prede-�ned channel constraint and then model the resulting trace using techniques similar to those used for unconstrained VBR.e difficulty with this approach is that the resulting model could not be used to understand the behaviour of a constrained VBR source operating with a different rate control algorithm or a different channel constraint.However, given that in constrained VBR the encoder is in the loop, it is more likely that network constraints are not violated and that the source operates closer to its maximum allowable traffic.is may make constrained VBR traffic more amenable to modelling than unconstrained VBR traffic.e basic idea is that we can assume worst case sources (i.e., high motion contents), operating close to the maximum capacity and then characterize these sources.
Taking into account the above, it is important to examine whether the models established in literature are appropriate for handling this contemporary setting in general.It is a matter of question whether all coding strategies result in sig-ni�cantly different statistics for a �xed or different sequence.Along the above lines, this study undertook measurements of the videoconference traffic encoded, during realistic low and high motion head and shoulders experiments, by a variety of encoders of popular commercial soware modules operating in both unconstrained and constrained modes.Moreover, the modelling proposal was validated with various traces available in literature [19] (to be referred as "TKN traces" from now on).
e rest of the paper is structured as follows.Section 1 describes the experiment characteristics and presents the �rst-order statistical quantities of the measured data.Section 2 discusses appropriate methods for parameter assessment of the encoded traffic.Finally, Section 3 culminates with conclusions and pointers to further research.

The Experimental and Measurement Work
e study reported in this paper employed measurements of the IP traffic generated by different videoconference encoders operating in both unconstrained and constrained modes.More explicitly, we measured the traffic generated by the H.26x encoders (e NV, NVDCT, BVC and CellB encoders [25] were examined in [26] and it was found that they resulted in similar traffic patterns with the H.261 encoder.Hence, the modelling proposal for H.261, in the current study, is expected to be applicable for these encoders, too.) included in the following videoconference soware tools: ViC (version v2.8ucl1.1.6)[27], VCON Vpoint HD [28] In particular, H.264 was examined in [32], results of which are also presented under the generic context of the current study.All traces examined are representative of the H.26x family video systems.Especially, the ViC video system uses encoders implemented by the open H.323 community [33].ese encoders are based on stable and open standards and as a consequence their examination is more probable to give reusable modelling results.At this point we must note that VCON Vpoint HD could not establish an H.264 connection with Polycom PVX and vice versa.is is due to the fact that the RTP payload format for the H.264 has still some open issues (media unaware fragmentation).More explicitly, the clients use different RTP payload types to communicate.
For all the examined encoders, compression is achieved by removing the spatial (intraframe) and the temporal (interframe) redundancy.In intraframe coding, a transform coding technique is applied at the image blocks, while in interframe coding, a temporal prediction is performed using motion compensation or another technique.en, the difference or residual quantity is transform coded.Here, we must note that the ViC H.261 encoder [34,35] performs only intraframe coding oppositely to the H.261 encoders of Vpoint, eConf, and EnVision, where blocks are inter-or intracoded.e above encoding variations in�uence the video bit rate performance of the encoders and as a consequence the statistical characteristics of the generated traffic traces.
At this point, we may discuss about the basic functionality of the examined video systems which is a fundamental factor in the derived statistical features of the encoded traffic and a basic reason of the experiments' philosophy we followed.e rate control parameter (bandwidth and frame rate) sets a traffic policy, that is, an upper bound on the encoded traffic according to the user's preference (obviously depending on his/her physical link).An encoder's conformation to the rate control of the system is commonly performed by reducing the video quality (and consequently the frame size quantity) through the dynamic modulation of the quantization level.In the case of ViC, a simpler method is applied.e video quality remains invariant and a frame rate reduction is performed when the exhibited video bit rate tends to overcome the bandwidth bound.In fact, in ViC, the video quality of a speci�c encoder is a parameter determined a priori by the user.In the case of Vpoint, Polycom, eConf, and EnVision, the frame rate remains invariant and a video quality reduction is performed when the exhibited video bit rate tends to overcome the bandwidth bound.is threshold can be set through the network setting of each client.Moreover, Vpoint utilizes adaptive bandwidth adjustment (ABA).ABA works primarily by monitoring packet loss.If the endpoint detects that packet loss exceeds a prede�ned threshold, it will automatically drop to a lower conference data rate while instructing the other conference participant's endpoint to do the same.
Two experimental cases were examined as presented in Table 1 (TKN traces are also included).Case 1 included experiments where the terminal clients were operating in unconstrained mode while Case 2 covered constrained-mode trials.In both Cases, two "talking-heads" raw-format video contents were imported in the video systems through a Virtual Camera tool [36] and then peer-to-peer sessions of at least half an hour were employed in order to ensure a satisfactory trace length for statistical analysis.ese contents were offline produced by a typical webcam in uncompressed RGB-24 format: one with mild movement and no abrupt scene changes, "listener", (to be referred as VC-L) and one with higher motion activities and occasional zoom/span, "talker" (VC-H).e video size was QCIF (176 × 144) in both Cases and all scenarios (VC-H and VC-L).In Case 1, no constraint was imposed either from a gatekeeper or from the soware itself.e target video bit rates that were imposed in Case 2 are shown in Table 1.In each case, the UDP packets were captured by a network sniffer and the collected data were further postprocessed at the frame level (it is important to note here, that analysis at the MacroBlock (MB), as in [14], level has been examined and found to provide only a typical smoothing in the sample data.We believe that the analysis at the frame level is simpler and offers a realistic view of the traffic.)by tracing a common packet timestamp.e produced frame-size sequences were used for further statistical analysis.
Speci�c parameters shown in Table 1, for the VC-H and VC-L traces, depend on the particular coding scheme, the nature of the moving scene, and the con�dence of the measured statistics.Moreover, traffic traces available in literature where used for further validation.Speci�cally, the traces used were "office cam" and "lecture room cam" (from the TKN library).ese traces were offline H.263 encoded in a constrained and unconstrained mode.Some primary conclusions, as supported by the experiments' results (see Table 1), arise concerning the statistical trends of the encoders' traffic patterns.Speci�cally, H.263+ produces lower video bit rate than H.263 and H.261 do.is was expected, since the earlier encoder versions have improved compression algorithms than the prior ones (always with respect to the rate produced).Finally, for all the encoders, the use of the VC-H content led to higher rate results (as reasonable).Similar results were observed for the mean frame size and variance quantities.In all cases, the variance quantities of the VC-H content were higher than that of VC-L with the exception of the ViC H.263+ encoder (Case 1-Traces 5, 6) where the opposite phenomenon appeared.
H.264 and the encoders used for the production of the TKN traces tend to adjust their quality in a "greedy" manner so as to use up as much of the allowed bandwidth as possible.At this point, we must note that Trace 4 of Case 2 is semiconstrained (i.e., the client did not always need the available network bandwidth).However, this particular case can be covered by the "worst-case" Case 2-Trace 3, where the target rate is reached (full-constrained traffic).
Taking into account the above context, the following questions naturally arise.
(i) What is the impact of the encoders' differences on the generated videoconference traffic trends?
(ii) Can a common model capture both types of traffic, unconstrained, and constrained?
(iii) Are the traffic trends invariant of the constraint rate selected?(iv) How does the motion of the content in�uence the generated traffic-for each encoder-and the parameters of the proposed traffic model?
(v) Can a common traffic model be applied for all the above cases?
e above questions pose the research subject which is thoroughly examined in the context to follow.eir answers will be given along with the respective analysis.

Traffic Analysis and Modelling Assessment
e measured traffic analysis for all experimental sets con-�rms the general body of knowledge that literature has formed concerning videoconference traffic.Traffic analysis was employed for all experimental cases.More explicitly, in all cases, the frame-size sequence can be represented as a stationary stochastic process, with a frequency histogram of an approximately bell-shaped (more narrow in the case of H.263, H263+ and H.264 encoding) Probability Distribution Function (PDF) form, see Figures 1(a)-1(c) and 2(a)-2(e) more complex in the TKN traces as their content (office and lecture cam) probably contained more scene changes than our contents VC-L and VC-H.Examining more thoroughly the sample histograms, we noted that the smoothed frame-size frequency histograms of the H.261 encoder have an almost similar bell-shape (see Figures 1(a), 1(b), 2(a), and 2(b)) while a more narrow shape appears in the H.263, H.263+, and H.264 histograms (Figures 1(c) and 2(b)-2(e)).e VC-H frequency frame-size histograms appeared to be more symmetrically shaped than the correspondent VC-L histograms.is is reasonable as the rate of the H.26x encoders depends on the activity of the scene, increasing during active motion (VC-H) and decreasing during inactive periods (VC-L).
Furthermore, the AutoCorrelation Function (ACF) of the unconstrained traffic (for all traces of Case 1) appeared to be strongly correlated in the �rst 100 lags (short-term) and slowly decaying to values near zero (see some indicative Figures 3(a)-3(c) of the traces of Case 1).On the contrary, the ACFs of the constrained traffic (Case 2) decayed very quickly to zero denoting the lack of short-term correlation (see Figures 3(d) and 3(e)).is conclusion is very critical in queueing as the short-term correlation parameter has been found to affect strongly buffer occupancy and over�ow probabilities for videoconference traffic.In fact, to verify this assumption, we measured the buffer occupancy of the constrained traces in queueing experiments of different traffic intensities.Buffering was found to be very small at a percentage not affecting queueing.On this basis, it is evident, that for the purpose of modelling of the two types of traffic not a common model can be applied.More explicitly, a correlated model is needed for the case of unconstrained traffic while a simpler noncorrelated model is enough for constrained traffic.e DAR model, proposed in [5], has an exponentially matching autocorrelation and so matches the autocorrelation of the data over approximately hundred frame lags.is match is more than enough for videoconference traffic engineering.Consequently, this model is a proper solution for the treatment of unconstrained traffic.When using the DAR model, it is sufficient to know the mean, variance and the autocorrelation decay rate of the source, for admission control, and traffic forecasts.For the constrained traffic traces, a simple random number generator based on the �t of the sample frame-size histogram can be directly applied.e DAR model with the autocorrelation decay rate value equal to zero can also be a solution.is feature turns constrained videoconference traffic more amenable to traffic modelling than its counterpart unconstrained as only two parameters are needed, the mean and the variance of the sample.
�peci�cally, as shown in [32] for the H.264 traffic, it can be characterized as constrained and the generated traffic is uncorrelated and can be accurately represented by a  queue.
e rest of the paper discusses methods for correctly matching the parameters of the modelling components to the data and for combining these components into the DAR model.

Fitting of the Frame-Size Frequency Histograms of the
Traces.A variety of distributions was tested for �tting the sample frame-size frequency histograms.ese are the following: Gamma, Inverse Gamma (or Pearson V), Loglogistic, Extreme Value, Inverse Gauss, Weibull, Exponential, and Lognormal.e most dominant ones found to be the �rst three.Even though the Inverse Gauss density performed similarly to the Gamma distribution, it is not included in the analysis to follow, as the Gamma distribution is widely adopted in literature.Finally, the Extreme Value distribution performed, in total, worse than the other ones.
For the purpose of �tting the selected distributions� density to the sample frame-size sequence histogram, although various full histogram-based methods (e.g., [20]) have been tried in literature, as well as maximum likelihood estimations (MLE), we followed the approach of the simple moments matching method.is method has the advantage of requiring only the sample mean frame size and variance quantities and not full histogram information.us, taking into account that the sequence is stationary-and as a result the mean and the variance values are almost the same for all the sample windows-it is evident that only a part of the sequence is needed to calculate the corresponding density parameters.Furthermore, this method has the feature of capturing accurately the sample mean video bit rate, a property that is not ensured in the case of MLE or histogram-based models.However, in the cases of not satisfactory �t by none of the  examined distributions (as in the case of the TKN traces) a histogram-based method can be applied.
If  is the mean,  the variance of the sample sequence  and   the mean and   the variance of the logarithm of the sample , then the distribution functions and the corresponding parameters derived from the moments matching method are given by the following equations, for each distribution correspondingly, (1): Gamma, (2): Inverse Gamma, and (3): Loglogistic: where  =  2 ,  =  and where  =  2   2 and  = ( 2   1) where  = (()    ),   = [()], and  = 3[()].
Given the dominance of the above distributions, modelling analysis and evaluation will be presented for the above three densities.e numerical results (densities' parameters) from the application of the above parameters-matching methods appear in Table 2. e modelling evaluation of the above methods has been performed from the point of queueing.As a consequence, we thoroughly examined �ts of cumulative distributions.is was done as follows: we plotted the sample quantiles from the sample cumulative frequency histogram and the model quantiles from the cumulative density of the corresponding distribution.e Q-Q plot of this method refers to cumulative distributions (probabilities of not exceeding a threshold).
Figures 1(a)-1(c) and Figures 2(a)-2(c) present Q-Q plots for all traces of both Case 1 and 2, respectively.e results suggest that for �tting videoconference data, the coding algorithm used should be taken into consideration.ere seems to be a relationship between the coding algorithms and the characteristics of the generated traffic.For instance, for H.261, in most cases, the dominant distribution is Gamma (1), as can be veri�ed from the Q-Q plots depicted in Figures 1(a) and 1(b), and for H.263 and H.263+, the Loglogistic density (3) has a more "stable" performance than the other two (Q-Q plots shown in Figures 1(c) and 2(c).e Inverse Gamma density (2) seems to be suitable for H.263 traffic (see Figure 1(c)) although it was outperformed by the Loglogistic density in some cases.However, as will be commented upon later, it did not provide a solution in all cases of constrained traffic.
We must note that in Case 2, where a constrain was imposed, the moments matching method for calculating the distribution's parameters did not always provide a good �t, and performed as shown in Figures 2(a)-2(c) (inverse Gamma and Loglogistic are depicted.e Gamma density provided similar �t).To provide an acceptable �t, a histogram-based method proposed for H.261 encoded traffic in [20], known as C-LVMAX, was used.is method relates the peak of the histogram's convolution to the location at which the Gamma density achieves its maximum and to the value of this maximum.e values of the shape and scale parameters of the Gamma density are derived from:  = (2 2 max  2 max  1)2 and  = 1(2 max  2 max ) where  max is the unique maximum of the histogram's convolution density at  max .Numerical values for this �t appear also in Table 2 (for Case 2 only).Figures 2(a)-2(c) show how the three distributions �t the empirical data using the method of moments (Inverse Gamma, Loglogistic) and the C-LVMAX method (Gamma C-LVMAX).e Inverse Gamma density could not be calculated for all the constrained traces (Case 2-Traces 5, 6, 8, 9), due to processing limitations (for large a, b parameters the factor (1) (1) in ( 2) is very small, near zero, and consequently its inverse quantity could not be calculated (however, the values of the parameters of the Inverse Gamma density for Case 2-Traces 5, 6, 8, 9 are given in Table 2)).
Summing up the above analysis, it is evident that the Gamma density is better for H.261 unconstrained traffic, the Loglogistic for unconstrained H.263, H.263+ traffic, and the C-LVMAX method for all cases of constrained traffic (including H.264 traffic, as proposed in [32]).However, if a generic and simple model needed to be applied for all cases then the most dominant would be the Loglogistic density.

Calculation of the Autocorrelation Decay Rate of the
Frame-Size Sequences.At this point, we may discuss about the calculation of the autocorrelation decay rate of the framesize sequence of the unconstrained traces (as denoted in the previous sections, constrained traffic appeared to be uncorrelated and as a result the decay rate of its autocorrelation function can be set to zero).From the Figures 3(a)-3(d), it is observed that the ACF graphs of unconstrained traffic exhibit a reduced decay rate beyond the initial lags.It is evident that unconstrained video sources have very high short term correlation, feature which cannot be ignored for traffic engineering purposes.is is a behaviour also noted in earlier studies [4].
To �t the sample ACF, we applied the model proposed in [20] that is based on a compound exponential �t.is model �ts the autocorrelation function with a function equal to a weighted sum of two geometric terms: where  1 ,  2 are the decay rates with the property: | 2 | < | 1 | < 1. is method was tested with a least squares �t to the autocorrelation samples for the �rst 100, since the autocorrelation decays exponentially up to a lag of 100 frames (short-term behavior) or so and then decays less slowly (longterm behavior).is match is more than enough for traffic engineering, as also noted in [37].What is notable is that using this model, the autocorrelation parameter  is chosen not at lag 1, as in DAR model.For each encoder (in Case 1), the parameter numerical values of the above �t appear in Table 2.

Conclusions
is paper proposed modelling methods for two types of videoconference traffic: unconstrained and constrained.e analysis of extensive data that were gathered during experiments with popular videoconference terminals and different encoding schemes, as well as, of traffic traces available in literature, suggested that while the unconstrained traffic traces exhibited high short-term correlations, the constrained counterpart patterns appeared to be mostly uncorrelated, in a percentage not affecting queueing.We used the measured data to develop statistical traffic models for unconstrained and constrained traffic.ese models were further validated with different videoconference contents (low motion and high motion, TKN library).Different statistical models for �tting the empirical distribution (method of moments and C-LVMAX method) were examined.
For �tting the videoconference data, the coding algorithm used should be taken into consideration.ere seems to be a relationship between the coding algorithms and the characteristics of the generated traffic.For instance, for H261, in most cases, the dominant distribution is Gamma, and for H263 and H263+, Loglogistic has a more "stable" performance.Moreover, the Inverse Gamma density could not be calculated for all constrained traces, due to processing limitations.is fact constitutes the Inverse Gamma density as impractical as a generic model for H.263 traffic.
Regarding the unconstrained traces, a careful but simple generalization of the DAR model can simulate conservatively and steadily the measured videoconference data.For the constrained traces, the traffic can be captured by the C-LVMAX method via a random number generator, producing frames at a time interval equal to the sample.On the other hand, if a moments matching method needed to be applied, then the Loglogistic density is a direct solution.Another interesting assumption is that the traffic trends remain invariant when a different network constrained is selected, as evident from the TKN traces.So, the proposed model for the constrained traffic can be applied without taking into account the speci�c network constraint.
It is evident that if a generic and simple model needed to be applied for all cases of videoconference traffic then the most dominant would be the DAR model based on the �t of the Loglogistic density with a decay rate properly assigned to the �t of the sample ACF at the �rst 100 lags (although a 500 lags �t would lead to a more conservative queueing performance), for the case of unconstrained traffic, and to zero for the constrained traffic.
Future work includes the integration of the proposed models in dynamic traffic policy schemes in real diffserv IP environments.e study of semiconstrained traffic cases, although their counterpart "worst-case" full-constrained cases cover their traffic trends, is of particular interest, too.

F 1 :
Frame-size histograms versus moment �t and the respective �-� Plots for unconstrained traces.
T 1: Statistical quantities of the sample frame-size sequences.
T 2: Parameter values of the modeling components.