Many network monitoring applications and performance analysis tools are based on the study of an aggregate measure of network traffic, for example, number of packets in transit (NPT). The simulation modeling and analysis of this type of performance indicator enables a theoretical investigation of the underlying complex system through different combination of network setups such as routing algorithms, network source loads or network topologies. To detect stationary increase of network source load, we propose a dynamic principal component analysis (PCA) method, first to extract data features and then to detect a stationary load increase. The proposed detection schemes are based on either the major or the minor principal components of network traffic data. To demonstrate the applications of the proposed method, we first applied them to some synthetic data and then to network traffic data simulated from the packet switching network (PSN) model. The proposed detection schemes, based on dynamic PCA, show enhanced performance in detecting an increase of network load for the simulated network traffic data. These results show usefulness of a new feature extraction method based on dynamic PCA that creates additional feature variables for event detection in a univariate time series.
The dynamics of many complex systems such as computer networks, financial systems, transportation systems, or power systems are mathematically intractable due to their complexity ([
In studying network traffic performance, besides analysis of aggregate network traffic, load estimate of link traffic is another useful measure. When using this technique, the link-traffic data are sampled and analyzed to make inference from a subnetwork to a global network. The inference problem based on the study of a subdomain of the entire network system leads to an accuracy requirement problem for network traffic estimates. Some sampling techniques for traffic-load estimation are proposed in [
Although an increase of network-source load will lead to increaseing of both the mean level and the network traffic volatilities, focusing on the volatilities is more important than focusing on the mean level because the fluctuations of network-packet traffic reflect the behaviors of the uncertainty of network performance. The traditional method of testing the increase of data variance, one of the measures of network volatility, is by
The work is a theorical investigation that focuses on analysis of simulated data, from both sythetic and a network simulator. The main contribution of this paper is the proposal of dynamic PCA coupled with nonoverlapping moving window technique for applications to data analysis of complex network systems. The paper is organized as follows: in Section
We briefly describe the PSN model, developed in [
The PSN model connection topology is represented by a weighted directed multigraph
In the PSN model, time is discrete, and we observe the network state at the discrete times
A PSN-model setup is defined by a selection of: a type of network connection topology, a type of ecf, a type of routing table and its update algorithm, a value of source load, seeds of two pseudorandom number generators, and a final simulation time
The simulation experiments were conducted for the PSN model setup with a network connection topology that is isomorphic to
In the PSN model, for each family of network setups, which differ only in the value of the source load
Another very important “real-time” network performance indicator is an indicator called number of packets in transit (NPT) ([
The behaviors of the time variability of NPT data are the key characteristics of NPT data ([
Extraction of additional time-dependent variables from time series data was originally accomplished by introducing dynamic PCA ([
In multivariate analysis, ideally, observations of underlying multivariate should be collected independently. In the network-traffic-monitoring problem, NPT data is collected over time. This implies that NPT data is correlated if the a length of window is designed to be a small value. Also, in the data matrix presented in (
When the above discussed technique is applied to the NPT data we denote each of the simulated paths of NPT training data with
To perform the feature extraction of NPT test data with a length of
If the variance-covariance structure of the extracted feature variables changes, in particular, when the variability of some of the feature variables is increased, the projections of new observations will significantly change as the dominant feature variables change. The PC classifier, which has been used successfully in anomalous event detection of network traffic (e.g., [
The single hypothesis detection scheme has two independent hypotheses. One uses only major PCs and another one uses only minor PCs for the purpose of detection. The first detection scheme is based on the following null and alternative hypothesis:
While the detection scheme based on major PCs detects the change of variance and covariance structure of multivariate data, the detection scheme based on minor PCs detects the change of correlation structure of multivariate data. If the increase of network source load leads to both changes, that is, of the variance-covariance structure and the correlation structure of NPT data, a combined method using both major PCs and minor PCs can be applied to increase the detection rates. This combined detection scheme is based on the following null hypothesis and alternative hypothesis:
In order to evaluate the performance of detection the rejection percentages of the test are used as a detection rate and the performance of detection is given as follows:
The detection rate depends on the selection of
In order to demonstrate the application of dynamic PCA as a feature extraction method, we first apply this method to a set of synthetic univariate stationary time-series data. Using the test data, we are trying to detect an increase of data variance by the scheme using major PCs, the scheme using minor PCs, or the combined scheme. The following stationary AR
In this experiment, the width of the nonoverlapping moving window is set to be
Detection performance for the scheme using major PCs, minor PCs, and the combined scheme. The line in red corresponds to the significant level.
Figures
The dynamic PCA method and its detection schemes are applied to NPT data associated with different source loads and routing algorithms to detect the load increase for each ecf. Because of the dimension-reduction property of PCA,
Detection rate with different numbers of features used for detection under the
Detection rate (the estimate of
Detection rate (the estimate of
Detection rate (the estimate of
Detection rate (the estimate of
Figures
For the detection of the load increase, the modified dynamic PCA method is highly successful with a large value of power, even for a small increase of source load, that is, for a normal-high traffic with source load
The detection scheme using major PCs performs best in detecting a load increase of a network traffic. The detection scheme based on minor PCs performs well for the test data with an increase of the load, but it gives a larger type I error rate than the specified ones when the test data are part of normal traffic. The combined detection scheme performs better than the detection scheme with minor PCs, not only successfully detecting an increase of network load, but also performing well in preventing false alarms for the test data from normal traffic.
In this paper, we examined new network load increase detection schemes based on a modified dynamic PCA approach and on parts of extracted features acting as a classifier to detect the load increase of a set of univariate NPT data. The initial testing used a set of simulation data from stationary AR
However, the difficulty of applying this linear method when dealing with high local-time variability needs a solution. Extending this method to a kernel-based method for NPT data may be promising. Improvement of analysis and detection performance using kernel-based detection methods could explain potential nonlinearity within the extended feature variables. The proposed detection methods, tested on the offline simulation data, can also be applied to an online detection problem. Extending our current work to an online-load-increase detection problem would facilitate detecting normal-high network traffic instantaneously.
This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET: