Study of Stationary Load Increase of Computer-Network Traffic via Dynamic Principal-Component Analysis

Many network monitoring applications and performance analysis tools are based on the study of an aggregate measure of network traffic, for example, number of packets in transit (NPT). The simulation modeling and analysis of this type of performance indicator enables a theoretical investigation of the underlying complex system through different combination of network setups such as routing algorithms, network source loads or network topologies. To detect stationary increase of network source load, we propose a dynamic principal component analysis (PCA) method, first to extract data features and then to detect a stationary load increase. The proposed detection schemes are based on either the major or the minor principal components of network traffic data. To demonstrate the applications of the proposed method, we first applied them to some synthetic data and then to network traffic data simulated from the packet switching network (PSN) model. The proposed detection schemes, based on dynamic PCA, show enhanced performance in detecting an increase of network load for the simulated network traffic data. These results show usefulness of a new feature extraction method based on dynamic PCA that creates additional feature variables for event detection in a univariate time series.


Introduction
The dynamics of many complex systems such as computer networks, financial systems, transportation systems, or power systems are mathematically intractable due to their complexity ( [1][2][3]). Better understanding of states of the complex systems and how these states change is accomplished by analyzing the data coming from the underlying complex systems [4]. In network system performance analysis, the traffic data is measured over time and statistical quality control techniques such as process control are often applied to detect whether thresholds are exceeded based on the standard deviations of observed variables. Statistical process control is the application of statistical methods such as principal component analysis (PCA) to the monitoring and control of a process to ensure that it operates at its full potential to produce conforming product. Monitoring the changes of traffic load is a practical issue to ensure that network systems are not overridden by users [5], in particular, when the load increase is stationary. We define the stationary load increase as a state of network traffic that is before the phase transition. The phase transition is due to a large increase of network source load so that the amount of network traffic appears to be increasing upward. That is, the stationary load increase results in an increase of network traffic volatility, that does not lead to an onset of network congestion immediately.
In studying network traffic performance, besides analysis of aggregate network traffic, load estimate of link traffic is another useful measure. When using this technique, the link-traffic data are sampled and analyzed to make inference from a subnetwork to a global network. The inference problem based on the study of a subdomain of the entire network system leads to an accuracy requirement problem for network traffic estimates. Some sampling techniques for traffic-load estimation are proposed in [6] as a way to limit the measurement overhead and to meet the required accuracy. In [7], a packet-probing technique is described to detect the presence of a competing network load in a cluster environment and it distinguishes between the loads caused by network transmission and by computational operation. Our work is different from the link-traffic-load estimate. We focus instead on an aggregate measure of network traffic, the number of packets in transit (NPT), and illustrate the usefulness of the proposed statistical methods by applying them to data generated by a packet-switching network model. Study of NPT performance indicator leads to an overall control and management of network traffic, ignoring the detailed spatial packet traffic dynamics in the network. Using this aggregate measure of network performance, we aim to detect a stationary increase of traffic load in the network. This technique means identifying a small increase in a network load that leads to an increase of network traffic flow, but does not lead to an onset of congestion immediately.
Although an increase of network-source load will lead to increaseing of both the mean level and the network traffic volatilities, focusing on the volatilities is more important than focusing on the mean level because the fluctuations of network-packet traffic reflect the behaviors of the uncertainty of network performance. The traditional method of testing the increase of data variance, one of the measures of network volatility, is by F-test. However, the construction of this test statistic is based on the normal assumption and ignores the time-dependent structure of the data. Also, F-test was diagnosed as being extremely sensitive to nonnormality ( [8,9]). PCA is an another technique for analyzing the data variance. It transforms a number of variables into a number of uncorrelated principal components. Because of the uncorrelatedness of principal components, using the principal components leads to better identification of the change of variance-covariance structure. Therefore, PCA has been broadly used for monitoring link traffic of a network to detect anomalous events (e.g., [10,11]). In such applications, the extracted principal components of a set of test data predict an anomalous event. In this paper, we apply dynamic PCA as a feature extraction method and use the PC classifier in the dynamic framework to detect the change of fluctuations of the network traffic in a feature-extracted subspace. Our approach is different from the existing ones (e.g., [10,11]) as we analyze univariate time series data. We use a nonoverlapping moving window technique to extract a set of features from univariate network traffic data. The obtained features are treated as the observations of a multidimensional feature variable. As a result, each coordinate of the multidimensional feature variable is spatially correlated, but less autocorrelated when the size of a moving window is large. To detect the load increase, first, we extract feature information of a set of NPT-training data with a reference level of network-traffic load and then we detect the load increase of network traffic in the extracted features of a set of test data, using the proposed detection schemes based on the hypothesis testing method.
The work is a theorical investigation that focuses on analysis of simulated data, from both sythetic and a network simulator. The main contribution of this paper is the proposal of dynamic PCA coupled with nonoverlapping moving window technique for applications to data analysis of complex network systems. The paper is organized as follows: in Section 2 we provide a brief description of the network simulator, its experimental setup, and the simulated NPT data. In Section 3 we present the methodologies proposed for analyzing NPT data. Section 4 provides a justification of the appropriateness of using the proposed method to a set of synthetic data and shows the results of our application to the simulated network traffic data. Section 5 reports our conclusions and outlines the future work. In the PSN model each node performs the functions of host and router and maintains one incoming and one outgoing queue which is of unlimited length and operates according to a first-in, first-out policy. At each node, independently of the other nodes, packets are created randomly with probability λ, called source load. In the PSN model all messages are restricted to one packet carrying only the following information: time of creation, destination address, and number of hops taken. The PSN model connection topology is represented by a weighted directed multigraph L where each node corresponds to a vertex and each communication link is represented by a pair of parallel edges oriented in opposite directions. To each edge is assigned a cost of packet transmission. For a given PSN model setup, all edge costs are computed using the same type of edge cost function (ecf) that is either the ecf called ONE (ONE), or QueueSize (QS), or QueueSizePlusOne (QSPO). The ecf ONE assigns a value of "one" to all edges in the lattice L. The ecf QS assigns to each edge in the lattice L a value equal to the length of the outgoing queue at the node from which the edge originates. The ecf QSPO assigns a value that is the sum of a constant "one" plus the length of the outgoing queue at the node from which the edge originates. The edge costs assigned by ecf ONE do not change during a simulation run, thus this results in a static routing. Since the routing decisions made using the ecf QS or QSPO rely on the current state of the network simulation this implies adaptive or dynamic routing. In the PSN model, each packet is transmitted via routers from its source to its destination according to the routing decisions made independently at each router and based on a minimum least-cost criterion. During a simulation of the PSN model using dynamic routing packets have the ability to avoid congested nodes, they do not have this ability when the static routing is used instead. In the PSN model, time is discrete, and we observe the network state at the discrete times k = 0, 1, 2, . . . , T, where T is the final simulation time. At time k = 0, the setup of the PSN model is initialized with empty queues, and the routing tables are computed. The time-discrete, synchronous and spatially distributed PSN model algorithm consists of the sequence of five operations advancing the simulation time from k to k + 1. These operations are: (1) update routing tables, (2) create and route packets, (3) process incoming queue, (4) evaluate network state, and (5) Update simulation time. The detailed description of this algorithm is provided in [12,13].

Packet-Switching Network Model and Simulation Data
A PSN-model setup is defined by a selection of: a type of network connection topology, a type of ecf, a type of routing table and its update algorithm, a value of source load, seeds of two pseudorandom number generators, and a final simulation time T. The first pseudorandom number generator provides the sequence of numbers required for packets generation and routing. The second one is used for adding extra links to a regular network connection topology. The details of PSN model setup are provided in [12]. In the PSN model, for each family of network setups, which differ only in the value of the source load λ, values of λ sub-c for which packet traffic is congestion-free are called subcritical source loads, while values λ sup-c for which traffic is congested are called supercritical source loads. The critical source load λ c is the largest subcritical source load. Thus, λ c is a very important network performance indicator because it is the phase transition point from free-flow to congested state of a network. Details about how we estimate the critical source load are provided in [12]. For the PSN-model setups considered here, the estimated critical source load (CSL) values are, respectively, λ c = 0.115 for L p (16, ONE), λ c = 0.120 for L p (16, QS), and λ c = 0.120 for L p (16, QSPO).

Experimental Setups of PSN Model and Network
Another very important "real-time" network performance indicator is an indicator called number of packets in transit (NPT) ( [12,15,16]). This indicator, N v (ecf, λ, k), for a given PSN model with L p (16, ecf, λ) setup and v value of the seed of the first pseudorandom number generator, is given by the total number of packets in the network at time k, that is, by the sum over all network nodes of the number of packets in each outgoing queue at time k. The NPT time series, that is, N v (ecf, λ, k), for k = 0, . . . , T, is an important time-dependent, that is, dynamic, aggregate measure of network performance providing information on how many packets are in the network on their routes to their destinations at time k for a given PSN-model setup L p (16, ecf, λ) and v value of the seed of the first pseudorandom number generator. Thus, where is its time average. The volatility of NPT data increases with the increase of source-load value for each ecf ONE, QS, and QSPO. However, from our empirical studies [17], the changes of volatilities for ecf QS and QSPO are difficult to distinguish. To detect an increase of network source load, for each type of ecf, the simulated NPT data is categorized into two groups, normal traffic and normal-high traffic. By normal traffic we mean a traffic such that the NPT data has the same value of source load as the networktraffic training data or a value smaller than the one that the network-traffic training data has. Normal-high traffic means traffic such that the NPT data correspond to a source-load value larger than the one that the network-traffic training data has. To detect an increase of the network-source load, we choose the NPT data simulated using the model setup L p (

Principal Component Extraction by Dynamic PCA.
Extraction of additional time-dependent variables from time series data was originally accomplished by introducing dynamic PCA ( [18,19]). The method considers observations, taken at times 1, 2, . . . , n, that is, where n is the number of observations and l + 1 ≤ k ≤ n. In the present work, p = 1. Using dynamic PCA, the input data matrix X to be analyzed is arranged as follows: where l is the time lag that is used for capturing the dynamics of time series. By doing eigenvalues analysis, dynamic PCA aim to determine a suitable time lag l for the purpose of modeling stochastic process x(k). Instead of focusing on the analysis of the eigenvalues of the covariance matrix of the input data matrix (3), our goal is to apply the dynamic PCA method to extract additional feature variables from univariate NPT data. For this reason, we do not need to determine the optimal value of l. What we need is the reasonable large value of l so that we can treat each window as a realization of the object of interest, that is, multivariate data. In this case, l is referred to as the length of window. Because of this, we turn analysis of onedimensional data into a problem that focuses on analysis of multivariate data, by defining each element of the window as a time-depdendent feature variable. The extension of feature variables makes a multivariate method applicable to onedimensional time-series data.
In multivariate analysis, ideally, observations of underlying multivariate should be collected independently. In the network-traffic-monitoring problem, NPT data is collected over time. This implies that NPT data is correlated if the a length of window is designed to be a small value. Also, in the data matrix presented in (3), the observations are highly series correlated so that further analysis is affected. In order to potentially improve the performance of using dynamic PCA, we propose a method of applying a nonoverlapping moving window technique. This method decreases correlation of each extracted time-dependent feature variable in the window when the width of the moving window is large.
When the above discussed technique is applied to the NPT data we denote each of the simulated paths of NPT training data with n observations by N v (k). Recall that for each ecf = ONE, QS, and QSPO, respectively, the N v (ecf, λ, k) denotes the NPT data shifted by its time average N v (ecf, λ), where λ is the source load value of NPT training data. In what follows when confusion does arise we will use for N v (ecf, λ, k) a shorter notation N v (k). Applying the nonoverlapping moving window technique to N v (k), the input data matrix becomes where n = ml, m is the total number of the moving windows of x(k) each with length l. The benefit of applying the nonoverlapping moving window data segmentation technique to the NPT data is that, for a large value of l, the sequence of data To perform the feature extraction of NPT test data with a length of n = ml (where n, m, and l are defined as before), we organize each of NPT test data into a column vector, denoted by Y s = [y s (1), y s (2), . . . , y s (n)] , where s represents each of the simulated paths in the test data set. For each ecf, Y s refers to the NPT data shifted by its time average, that is, N s (ecf, λ, k), for each ecf = ONE, QS, and QSPO, respectively, where λ is the source-load value of NPT test data. We first partition Y s into m windows each of the length l, that is, Y s = [y s (1), y s (2), . . . , y s (m)] , where y s (k * ) = [y s 1 (k * ), y s 2 (k * ), . . . , y s l (k * )] is the k * th window of Y s , for k * = 1, 2, . . . , m. The objective of feature extraction by PCA is to project each nonoverlapping moving window of the network traffic test data y s (k * ) = [y s 1 (k * ), y s 2 (k * ), . . . , y s l (k * )] onto the normalized eigenvectors V i , for 1 ≤ i ≤ l, of the matrix (4).

Detection Schemes.
If the variance-covariance structure of the extracted feature variables changes, in particular, when the variability of some of the feature variables is increased, the projections of new observations will significantly change as the dominant feature variables change. The PC classifier, which has been used successfully in anomalous event detection of network traffic (e.g., [10,11]), can be used in order to detect such change. In the anomalous event detection, PCA was applied to multivariate network traffic data to detect the existence of the anomalous events caused by a significant change of variance-covariance structure of the network traffic data. Our work extends the PC classifier to the dynamic PCA framework and enables the application of this extended PCA to univariate time series data to detect the increase of network-source load. In our detection schemes, the PC classifier consists of two functions of extracted PC scores of each test NPT data s as follows: where k * is the index of the moving window of the NPTtest data and k * = 1, 2, . . . , m. The m is the total number of windows of each NPT-test data. The l 1 and r, respectively, are the number of major PCs and minor PCs selected, and they are referred to as feature dimension in the later discussion. l 2 is the number of total PCs retained from the feature extraction by PCA. The maximum number allowed for l 2 is equal to l, however, because data often contains noise, l 2 is usually assigned a smaller value than l. In the case, we treat the components that are corresponding to smaller eigenvalues to be noise components and ignore them in further analysis. In this paper, major PCs mean the first few PCs in the retained PCs and minor PCs correspond to the last few PCs in the retained PCs. When the increase of networkload leads to a significant increase of both variance and covariance of the selected feature variables, this increase of network load is then detectable by major PCs. Because large values for minor PCs imply a violation of the correlation structure of the feature variables, the network-load increase is then detectable when the increase of load leads to a significant change of correlation structure of the feature variables [20].

Detection Scheme by Single
Hypothesis. The single hypothesis detection scheme has two independent hypotheses. One uses only major PCs and another one uses only minor PCs for the purpose of detection. The first detection scheme is based on the following null and alternative hypothesis: The test statistics for this detection scheme are f s 1 (k * ). If the hypothesis testing rejects the null hypothesis H major 0 , then the test data is classified into the normal-high group; if it accepts, then it is classified into the normal group. The second detection scheme is based on the following null and alternative hypothesis: The test statistics for this detection scheme are f s 2 (k * ). Similarly, if the hypothesis testing rejects H minor 0 , then the test data is classified into the normal-high group; otherwise it is classified into the normal group. In each of the detection schemes, the significance level of the hypothesis testing has to be specified first. This specified significance level is then used to determine the critical values: f

Detection Scheme by Multiple Hypothesis.
While the detection scheme based on major PCs detects the change of variance and covariance structure of multivariate data, the detection scheme based on minor PCs detects the change of correlation structure of multivariate data. If the increase of network source load leads to both changes, that is, of the variance-covariance structure and the correlation structure of NPT data, a combined method using both major PCs and minor PCs can be applied to increase the detection rates. This combined detection scheme is based on the following null hypothesis and alternative hypothesis: If either f s 1 (k * ) or f s 2 (k * ) is significant, then the test data is classified into the normal-high group; otherwise it is classified into the normal group. The constructions of f s 1 (k * ) and f s 2 (k * ) are based on major PCs and minor PCs, respectively. These two test statistics are statistically independent. The performance of the detection of the load increase may depend on the choice of detection scheme as different schemes detect different types of change of variance-covariance structure and correlation structure.

Detection Performance Measures.
In order to evaluate the performance of detection the rejection percentages of the test are used as a detection rate and the performance of detection is given as follows: where T 1 is the total number of rejections of hypothesis testing among a set of the NPT-test data series and T * is the total number of the NPT-test data. In the major PCs detection scheme, given that NPT-test data series s is from a normal traffic group, the detection rate is the misclassification rate or false detection rate, denoted by d major 1 . When a series of test data s is from a normal-high traffic group, the detection rate is the probability of detecting normal-high traffic, denoted by d The detection rate depends on the selection of l 1 and r, which are the sizes of major PCs and minor PCs used for data classification. A satisfactory result of the calculated detection rate may be obtained by investigating the relationship between the detection rate and the feature dimension l 1 or r.

Synthetic Data.
In order to demonstrate the application of dynamic PCA as a feature extraction method, we first apply this method to a set of synthetic univariate stationary time-series data. Using the test data, we are trying to detect an increase of data variance by the scheme using major PCs, the scheme using minor PCs, or the combined scheme. The following stationary AR(1) model is used to generate data: where |φ 1 | < 1 and ω t is a Gaussian white noise with mean zero and variance one. For the simulations, we choose three values of φ 1 = 0.6, φ 1 = −0.7, and φ 1 = 0.7. Because the theoretical variance of x t is 1/(1 − φ 2 1 ) [21], the theoretical variance of x t in the AR(1) model with φ 1 = 0.6 is equal to 1.5625, and the theoretical variance of x t with φ 1 = 0.7 or −0.7 is 1.9608. An increase of variance of the AR(1) time series when φ 1 = 0.6 changes to φ 1 = 0.7 or −0.7, results in the AR(1) model with φ 1 = 0.6 being selected as the simulation model of the normal type, and the AR(1) models with φ 1 = −0.7 and φ 1 = 0.7 being treated as other two models for simulating test data. We simulate two time series of the normal type, using φ 1 = 0.6. One is assigned to training data set and another one becomes the test data of the normal type. In addition, two test time series are simulated, using φ 1 = −0.7 and φ 1 = 0.7, respectively. The lengths of all the simulated data are equal to 10,000.
In this experiment, the width of the nonoverlapping moving window is set to be l = 40 (i.e., it is determined by the significant time lag of autocorrelation function plots of the data). The detection results using the discussed simulated data are reported in Figure 1. In Figure 1, the increase of variance of test data with φ 1 = 0.7 shows that it is detectable by major PCs (Figure 1(a)), but it is not detectable by minor PCs (Figure 1(b)). The marked change in the correlation matrix causes an increase of variance to be detectable by minor PCs (Figure 1(d)), but it is not detectable by major PCs (as shown in Figure 1(c)). The performance of using minor PCs shown in Figure 1(d) is not extremely satisfactory for most of the retained feature dimensions, but the result is acceptable when r is 3. In this case, the detection rate d minor 1 is slightly higher than the predefined 5% type I error rate, and the values of d minor 2 are much larger than the predefined type I error rate. Figures 1(e) and 1(f) show that the increase of data variance is detectable for feature dimension l 1 ≤ 5 and r ≤ 5, but the performance of the detection is dropped when l ≥ 6 and r ≥ 6. The l 2 in (5) is set to be 20 for the results shown in Figure 1, as the first 20 PCs explain about 86% of the total variation of the training data. In this simulation experiment, we have demonstrated that the dynamic PCA and its detection schemes can successfully capture the increase of data variance in data simulated from an AR(1) model with various model parameters. In particular, the combined detection scheme promises increased precision of detection.

Network Traffic Data.
The dynamic PCA method and its detection schemes are applied to NPT data associated with different source loads and routing algorithms to detect the load increase for each ecf. Because of the dimensionreduction property of PCA, l 2 corresponding to the dimension of subfeature space is often far smaller than the total number of originally selected feature variables. The nonoverlapping moving window size l from the modified dynamic PCA is set as l = 100 and l 2 = 20, for all types of the ecfs. The first 20 PCs explain about 95% of total variations of the training data of each type. The threshold values of f major 0 and f minor 0 used for load increase detections are determined by the 95th percentiles of the empirical cumulative distribution functions of f s 1 (k) and f s 2 (k), respectively, where 1 ≤ k ≤ 24 × 60. Figure 2 shows the results of detection rate d for a single-hypothesisdetection scheme, using either major PCs or minor PCs of different sizes of selected major PCs or selected minor PCs.
Figures 2(a) and 2(b) display the results based on the single-hypothesis-detection schemes using major PCs and minor PCs, for λ = 0.095, 0.100, respectively. The detection rate d is calculated using a 5% type I error (i.e., 5% significant level of hypothesis testing) for each hypothesis testing of the PC scores. The feature dimension parameter is l 1 or r, depending on the choice of detection method and varies from 1 to 10. In the case of source load λ = 0.095, the detection rate d is smaller than 5% for the detection schemes using a smaller number of major PCs and for the detection schemes using marginal minor PCs for all types of the ecfs, suggesting that the proposed methods successfully prevent a high false alarm when the network traffic source load is lower than the source load of the training data. For the test data with source load λ = 0.100, and for some predefined type I error rates, the single-hypothesis-detection schemes fail to accept the null hypothesis. However, the calculated type I error rate d major 1 or d minor 1 is only slightly larger than the type I error. The NPT training data and the test data were generated using the same network setup, and these NPT data have high local-time variability.
For the detection of the load increase, the modified dynamic PCA method is highly successful with a large value of power, even for a small increase of source load, that is, for a normal-high traffic with source load λ = 0.105. Figure 2    in the detection of network-load increase for all types of the ecfs when major PCs are used for detection. This successful detection indicates a major change in variance-covariance structure when the network-traffic load goes from a normal level to a normal-high level. The detection scheme using major PCs performs best in detecting a load increase of a network traffic. The detection scheme based on minor PCs performs well for the test data with an increase of the load, but it gives a larger type I error rate than the specified ones when the test data are part of normal traffic. The combined detection scheme performs better than the detection scheme with minor PCs, not only successfully detecting an increase of network load, but also performing well in preventing false alarms for the test data from normal traffic.

Conclusions and Future Work
In this paper, we examined new network load increase detection schemes based on a modified dynamic PCA approach and on parts of extracted features acting as a classifier to detect the load increase of a set of univariate NPT data. The initial testing used a set of simulation data from stationary AR(1) models. The 95th percentile of the empirical cumulative distribution function of the extracted features was calculated as the threshold value for classification and the feature variables of the test data were extended according to the number of feature variables of the training data. After being projected onto the feature space obtained from the training data of the test data, the test statistics of hypothesis testing were calculated and then compared to the threshold value to enable a decision of load increase at each time k. The final decision of detection of load increase is based on the relative ratio of the number of successful detections to the total number of detections. This rate specifies the probability of PC scores of the NPTtest data over the threshold value. The proposed detection schemes show enhanced performance for the detection of load increase; in particular, the detection scheme that uses only the first PC. These detection schemes prevent false alarms when the test data show normal traffic because the method differentiates normal network traffic from normalhigh network traffic.
However, the difficulty of applying this linear method when dealing with high local-time variability needs a solution. Extending this method to a kernel-based method for NPT data may be promising. Improvement of analysis and detection performance using kernel-based detection methods could explain potential nonlinearity within the extended feature variables. The proposed detection methods, tested on the offline simulation data, can also be applied to an online detection problem. Extending our current work to an online-load-increase detection problem would facilitate detecting normal-high network traffic instantaneously.