Chaotic Time Series Analysis

Chaotic dynamical systems are ubiquitous in nature and most of them does not have an explicit dynamical equation and can be only understood through the available time series. We here briefly review the basic concepts of time series and its analytic tools, such as dimension, Lyapunov exponent, Hilbert transform, and attractor reconstruction. Then we discuss its applications in a few fields such as the construction of differential equations, identification of synchronization and coupling direction, coherence resonance, and traffic data analysis in Internet.


Introduction
Chaotic dynamical systems are ubiquitous in nature such as the tornado, stock market, turbulence, and weather.Their functions are different in different situations.For example, in the case of tornado, the chaotic behavior is harmful to human beings and need to be avoided or controlled.But in the case of the activities in human brain, the chaotic behaviors are useful and necessary to sustain the normal functions of brain.Thus, it is an important task to understand chaos and let it serve human society better.
The chaos has been studied for a long time.It is usually believed that Poincaré is the first one who studied chaos.In the later 19th century, Poincaré studied the restricted threebody problem where one body is negligible small, compared to the other two.Poincaré found that the solution of this simple system is very complicated and cannot be given precisely.Then in 1963, Lorenz revealed the "butterfly effect" in studying the weather prediction and is thus recognized as the father of chaos.But the formal use of chaos is from the paper of "Period three implies chaos" by Li and Yorke in 1975.After that, chaos have been widely studied and a lot of important concepts has been introduced, such as the dimensions, Lyapunov exponents, Fourier transform and Hilbert transform, and attractor reconstruction.
The most striking feature of chaos is the unpredictability of its future.This feature is usually called as the "sensitivity dependence on initial conditions" or "butterfly effect."For example, if two initial conditions have a small difference δx, their difference after time t will be δxe λt with λ > 0, that is, exponential separation.Thus, a tiny difference or even a cutoff error will be blown up quickly and results in a big difference in the near future."Similar causes have similar effects" is invalid in chaotic systems except for short periods.A system is called chaotic system, provided that its maximum Lyapunov exponent is positive.
Mathematically, chaos can be produced by both discrete and continuous equations 1-6 .The discrete systems can be expressed as such as the Logistic map, Henon map, standard map, tent map, circle map, and Ikeda map.
The continuous systems can be expressed as differential equation with three or more degrees of freedom x t x 1 t , x 2 t , x 3 t , . . ., x m t .Typical chaotic flows are the Lorenz equation, R össler equation, Duffing's equation, Chua's circuit, and so forth.The discrete map and continuous flow are not unrelated but have a close relationship.The discrete map 1.1 can be considered as a projection of flow 1.2 on a specific Poincaré surface of section.Letting δt 1 and taking limit, 1.1 will be transformed into 1.2 .On the other hand, 1.2 can be transformed into 1.1 by integration, that is, x n 1 x n t T n t F x t dt with T n being the returning time to the Poincaré section.A common point between the discrete and continuous systems is that both are deterministic.That is, each initial condition uniquely determines the time evolution.Thus, the chaos is usually called "deterministic chaos" and can be considered as some kind of randomness produced by deterministic system.A necessary condition for 1.1 and 1.2 to show chaos is that the functions f and F must be nonlinear.
There are infinite unstable periodic orbits in chaos, which form invariant sets 1, 2 .An invariant set is the image of itself under time evolution.Once a trajectory is located in these invariant sets, it will stay there forever.Comparing with the dense chaotic orbits, the invariant sets are seldom and its measure is zero.In experimental situations or in numerical simulations, as there is always noise or cutoff error, an arbitrary trajectory will never stay at these invariant sets.The invariant sets are thus not observed in general.Given a long enough time, a trajectory will lose memory on its initial condition and eventually settle on a restricted geometry, called stable attractor.Once a trajectory enters the attractor, it will stay in it forever if there is no external perturbation.The trajectory will go through all the phase points there with its nature measure except the unstable periodic orbits.That is, the attractor is also an invariant set.This property is called ergodicity in chaotic attractor.
Mathematically, there are a lot of measures such as the Lebesgue measure, Hausdorff measure, counting measure, and natural measure.Suppose that the relative probability for a continuous variable x to appear at some location is described by the probability density P x .The measure μ D is the probability to find x in an area D, that is, μ D D P x dx.The average of a function f x is When the measure μ is an invariant measure and the average of function f x on μ equals the average on time, μ becomes an ergodic measure with This measure does not include those unstable periodic orbits in the attractor and is called as natural measure.To use this measure, one waits until all transients have been wiped off and looks for an invariant probability measure describing the distribution of typical orbits.Thus, in the natural measure, we say that the chaotic trajectories are ergodic.Based on this property, many quantities are more conveniently defined as averages over the natural measure in phase space.Because of the nonperiodicity of chaotic trajectory, the attractor is usually called as "strange attractor," in contrast to the attractors of fixed point and periodic orbits.Practically, we usually do not have the dynamical equations 1.1 and 1.2 , like the earthquake, brain and stock market, and so forth, thus the details of the system equations in the phase space and the asymptotic invariant set that determines what can be observed through experimental probes are unknown.What we can obtain is some time series from one or at best a few of the dynamical variables of the system, which poses a big challenge to characterize the chaotic systems.An interesting question is how one go from one or a few time series to the multivariate state or phase space which is required for chaotic motions to occur?
Fortunately, this problem has been intensively studied in the past decades and has now formed a subject called data mining or data analysis 7-11 .The basic assumption is that the measured time series comes from the attractor of the unknown system with ergodicity, which contains the information of the attractor.Then one can use the measured time series to figure out the properties of attractor such as its dimension, its dynamical skeleton, and its degree of sensitivity on initial conditions.The delay-coordinate embedding technique established by Takens 12 provides a practical solution to this task.
Suppose that an experiment is conducted and one time series s n s nΔt is measured, where Δt is the sampling interval.Such a time series can be, for instance, a voltage signal from a physical or biological experiment, or the concentration of a substance in a chemical reaction, or the amount of instantaneous traffic at a point/node in the Internet, and so on.In principle, the measured time series comes from an underlying dynamical system with 1.1 and 1.2 , with or without the influence of noise.Most commonly, the time series is a consequence of scalar measurements of some quantity which depends on the current state of the system, taken at a fixed sampling time: where, more often than not, the measurement function s is unknown as F, x is the mdimensional variable and η n is the measurement noise.Let us neglect the effect of noise at this level of presentation.Equation 1.5 becomes The total recording time is then T NΔt.In general, T should be sufficiently large so that the full dynamics are exhibited in the time series.This is what we have in hand.In the following we will show how to extract the information on the unknown attractor from 1.5 or 1.6 .

Basic Concepts and Analytic Tools of Chaotic Time Series
To characterize the attractor, some basic concepts have been introduced such as dimensions and Lyapunov exponents 1, 2 .These quantities are invariant under the evolution operator of the system and thus are independent of changes in the initial conditions of the orbit, and both are independent of the coordinate system in which the attractor is observed.Therefore, we can evaluate them from experimental data.The advantage of these quantities is that each one of them will turn a sequence of data into a single number, which is convenient for comparison between different time series.As an attractor is generally located in a finite area, its trajectory shows some kinds of oscillatory behavior, that is, recurrence.To show the oscillatory behavior from a given scalar time series, a convenient way is to transform it into a 2-dimensional vector by the Hilbert transform.To obtain the attractor of the corresponding time series, we have to reconstruct an auxiliary phase space by an embedding procedure, that is, the delay embedding technique.

Dimensions
Dimension quantifies the self-similarity of a geometrical object 7, 8 .For a homogeneous object, its dimension is a fixed number; while for a heterogeneous object, its different parts may have different dimensions and need to be characterized by the multifractal dimension.We here focus on the homogeneous objects.In this situation, a finite collection of points is zero dimensional, lines have dimension one, surfaces two and bulks three, and so forth.A chaotic attractor usually has a fractal dimension, that can be characterized by several ways.Three of them are convenient for numerical calculation, which are the box-counting dimension, information dimension, and correlation dimension.
For calculating the box-counting dimension D 0 , we divide the attractor into spheres/boxes of radius and ask how many boxes do we need to cover all the points in the data set.If we evaluate this number N as a function of as it becomes small, then we define the box-counting dimension by Take Henon map x n 1 y n 1 − 1.4x 2 n , y n 1 0.3x n as an example.Its attractor is shown in Figure 1 a , where the square lattice has length .We count the number of square lattice which contains at least one point of the trajectory and then obtain one point in Figure 1   Then we change the in Figure 1 a into /2 and do the same process to obtain another point in Figure 1 b .In this way, we get Figure 1 b in which the slope of the straight line is The shortage of box-counting dimension is that it treats all the boxes with points of the trajectory as the same, no matter how many points in each box.Information dimension overcomes this shortage and is defined in terms of the relative frequency of visitation of a typical trajectory by where

2.3
P i is the relative frequency with which a typical trajectory enters the ith box of the covering.
Comparing with the definition of D 0 , we see that the box-counting dimension is weight-free while the information dimension is weighted on the visiting frequency in the specific boxes.When P i is a constant, D 1 will become D 0 .When dimension is high, the calculation of both D 0 and D 1 will be time consuming and a convenient dimension for this case is the correlation dimension D 2 .Comparing with the box-counting dimension D 0 and the information dimension D 1 , the correlation dimension D 2 is much easier to be calculated and is defined by The sum in the numerator of 2.4 can be expressed as the following correlation sum 9 : where Θ • is the Heaviside function, Θ x 1 for x ≥ 0 and 0 otherwise, and |x i − x j | stands for the distance between points x i and x j .Thus the sum just counts the pairs x i , x j whose distance is smaller than .Then 2.4 becomes As 2.5 is very easy to be calculated, the correlation dimension D 2 has been widely used in time series analysis.
In practice, the sum in the equation of C also depends on the embedding dimension m as x t depends on the embedding space.The correlation dimension can be calculated in two steps.First one has to determine the correlation sum C , for the range of available and for several embedding dimensions m.Then we inspect C m, for the signatures of self-similarity.If these signatures are convincing enough, we can compute a value for the dimension.Grassberger and Procaccia demonstrated that if the embedding dimension and the number of data points are large enough and if the data are sufficiently noise-free, then the function ln C versus ln has a linear region, called the scaling region 13, 14 .The slope of the function in that linear region is D 2 .Due to such reasons, the correlation dimension D 2 usually is estimated by examining the slope of the linear portion of the plot of ln C versus ln for a series of increasing values of m.

Lyapunov Exponents
Lyapunov exponent is the most important quantity to chaotic systems as a positive maximal Lyapunov exponent is a strong signature of chaos.In the contrary, a zero maximal Lyapunov exponent denotes a limit cycle or a quasiperiodic orbit and a negative maximal Lyapunov exponent represents a fixed point.An m-dimensional system has m Lyapunov exponents with λ 1 , λ 2 , . . ., λ m in descending order.Consider a dynamical system described by 1.2 .Taking variation of both sides of 1.2 yields the following equation governing the evolution of the infinitesimal vector δx in the tangent space at x t : Solving for 2.7 gives δx t A t δx 0 , 2.8 where A t exp ∂F/∂x dt is a linear operator that evolves an infinitesimal vector at time 0 to time t.The mean exponential rate of divergence of the tangent vector is then given by The Lyapunov exponents are given as where e i is an m-dimensional basis vector.Obviously, each Lyapunov exponent is an average of the local divergence rates over the whole attractor.For chaotic system, values of λ i do not depend on the choice of the initial condition x 0 because of the ergodicity.
To check whether a time series is chaotic or not, one needs to calculate the λ 1 , that is, the maximal Lyapunov exponent λ max .This task is much easier than the calculation of all the λ i .The reason is that a chaotic trajectory will automatically go to its maximum expending direction.This property can be conveniently used to calculate the λ max .Similarly, a chaotic trajectory will automatically go to its maximum contracting direction if we let it do backward evolution with t → −t, which can be used to calculate the smallest Lyapunov exponent λ m .Numerically, one can calculate the λ max as follows.Choose two very close initial points and let their distance be d 0 1.After integrating the dynamical system for a small time interval τ, their distance will become d i .Considering that the attractor has a finite size, it is very easy for the trajectory to reach the boundary of attractor.Once it happens, the distance d i will not be exponentially growing.A convenient way to overcome this problem is to do renormalization, which makes the evolution begin at a small distance again.In detail, we choose a new point at the end of one trajectory and let their distance be d 0 .Doing the integration again, we can get another d i .In this way we have Figure 2 shows its schematic illustration.The approach shown in Figure 2 can be also applied to the case of discrete systems.Consider an m-dimensional map x i i 1, 2, . . ., m .Let {x n i } be its state at time n.We make a new state {x n i } by adding a small perturbation {dx i }.That is, Mathematical Problems in Engineering {x n i x n i dx i }.Let both {x n i } and {x n i } evolve for a time period to obtain their distance {dx n i }.Then the maximal Lyapunov exponent λ max can be given by where dx n m i 1 dx n i 2 .Practically, we cannot use the above approach to experimental data or time series as there may not always have a pair of points with distance d 0 at time t nτ.Thus we need to do some modification on it.The detailed process is as follows.Because of the recurrence of chaotic trajectory, one can always find two close points with distance d 0 t 0 at time t 0 .Then follow their trajectory to time t 1 and measure their distance d 1 t 1 .The evolution time is t 1 − t 0 .After that, we do renormalization to find a new beginning point.Considering that the candidates for choosing are not sufficient, we choose one from them by the following two rules: 1 the distance d 0 t 1 must be small; 2 to keep the direction of the maximal expending, the angle between d 0 t 1 and d 1 t 1 must be small.Figure 3 shows its schematic illustration.In this way we have

2.13
Experimental data are generally contaminated by noise.Its influence can be minimized by using an averaging statistics when computing Lyapunov exponents.The concrete process can be taken as follows.Choose a point x 0 and select all its neighbors with distance smaller than .The size of the neighborhood should be as small as possible, but large enough such that on average each reference point has at least a few neighbors.Compute the average over the distances of all neighbors to the reference part of the trajectory as a function of the relative time.Substitute this average into 2.11 or 2.13 to get the maximal Lyapunov exponent λ max .To calculate more Lyapunov exponents from data except the λ max , see 15, 16 for details.

Fourier Transform and Hilbert Transform
Fourier spectral analysis is traditionally the time series analysis method and very powerful in revealing the periodicity of time series.Its basic idea is that most signals, and all engineering signals, can be represented as a sum of sine waves.By Fourier transform, a time series is transformed into a frequency spectrum, that is, from real space to frequency domain or Fourier space.The Fourier transform establishes a one-to-one correspondence between the signal at certain times time domain and how certain frequencies contribute to the signal.Thus, instead of describing the statistical properties of a signal in real space one can ask about its properties in Fourier space.
The Fourier transform of a function s t is given by and that of a finite, discrete time series by s n e 2πikn/N .
A necessary condition for the Fourier analysis to be meaningful is that the time series should be piecewise stationary.However, a chaotic time series has a broad-band power spectra for which the Fourier spectrum gives no indication about the deterministic origin, let alone the fundamental invariants of the underlying dynamical system.For chaotic time series, a powerful approach is the Hilbert transform 10 .
Consider a time series x t .One performs a mathematical transform to obtain the corresponding imaginary part x t , yielding the following complex signal ψ t :

2.16
ψ t is called analytic signal and the value of x t is obtained from x t through a transform called Hilbert transform: where P.V. stands for the Cauchy principal value for the integral.The Hilbert transform is closely related to the Fourier transform and their relationship can be seen clearly as follows.Observe that if the real signal x t has a Fourier transform S ω , then the complex signal, ψ t , the spectrum of which is composed of positive frequencies of S ω only, is given by the inverse transform of S ω , where the integration goes only over the positive frequencies: S ω e iωt dω.

2.18
The factor of 2 is inserted so that the real part of the analytic signal is x t , not one half of that.The explicit form of ψ t can then be obtained in terms of the real signal x t .Because x t e −iωt dt, 2.19 the complex signal can be written as x t e −iω t−t dt dω.

2.20
The following mathematical identity 17 : where ˙ x t and ẋ t denote the derivatives of x t and x t to t, respectively.Note that the instantaneous frequency ω t is fundamentally different from the concept of frequency in the Fourier transform defined in the base of simple harmonic functions.Here ω t measures the rate of rotation in the complex plane of analytic signal and is very useful in detecting the phase synchronization.

Attractor Reconstruction
Reconstruction of phase space is so important that we can do nothing without doing it.For example, our just introduced dimensions and Lyapunov exponents are very important concepts to the description of chaotic attractor.However, their calculations depend on the trajectory in attractor.For a measured time series, it is just a scalar measurement from one or a few variables of the attractor but not a trajectory.Thus, we need firstly to figure out the trajectory from the given time series.How to do that is a big challenge.Fortunately, the delaycoordinate embedding technique laid by Takens 12 gives the mathematical foundation to solve this problem.He showed that if the dynamical system and the observed quantity are generic, then the delay-coordinate map from a d-dimensional smooth compact manifold M to R m , m > 2d, is a diffeomorphism on M. That is, under fairly general conditions, the underlying dynamical system can be faithfully reconstructed from time series in the sense that a one-to-one correspondence can be established between the reconstructed and the true but unknown dynamical systems.
Takens' delay-coordinate method goes as follows.From a measured time series s k s t 0 kΔt with Δt being the sampling interval, the following vector quantity of m components is constructed:

Embedding Dimension
To have a faithful representation of the true dynamical system, the embedding dimension should be large enough.Takens' theorem provides a lower bound for m.Let us figure out this bound.We note that in a space of dimension m one subspace of dimension d 1 and another subspace of dimension d 2 generically intersect in subspaces of dimension If this is negative, then there is no intersection of the two subspaces.Figure 4 shows examples.
In Figure 4 a we have d 1 d 2 1 and m 2, thus we obtain d I 0, which means that the intersection set consists of points, and the intersections are generic because small perturbations cannot remove them.In Figure 4 b we have d 1 d 2 1 and m 3, thus we obtain d I −1 < 0, which means that two one-dimensional curves do not intersect generally in a three-dimensional space.In Figure 4 c we have d 1 1, d 2 2, and m 3, thus we obtain d I 0 again, which means that the intersections are generic.To obtain a one-to-one correspondence between points on the invariant sets in the actual and reconstructed phase spaces, self-intersection must not occur.Otherwise, there will be two directions at the selfintersection, which will destroy the one-to-one correspondence.When we ask about the intersection of two subspaces of the same dimension d A , namely, the orbit with itself, then intersections fail to occur when 2d A − m < 0 or m > 2d A .

2.28
The question of unfolding a set of dimension d A from projections to lower dimensions concerns self-intersections of the set with itself as the projections are made.So the relevant criterion for unfolding a single object is m > 2d A .We wish m be an integer and let the lowest m which satisfies the unfolding condition m > 2d A be the embedding dimension m E .Thus, we have m E int 2d A 1 , where int • means taking integer.For example, the box counting dimension of the strange attractor for the Lorenz model is d A ≈ 2.06 which would lead us to anticipate m E 5 to unfold the Lorenz attractor.
In some situations, the needed embedding dimension is very large such as the EEG electroencephalogram data.As large embedding dimension requires long time series 10 5 ∼ 10 6 points and are thus computationally expensive, we hope to relax the condition m > 2d A , especially for calculating some statistical quantities such as Lyapunov exponents and fractal dimension.To reduce the cost of calculation, we may choose d A < m < 2d A , provided that the self-intersections is neglectable.Schroer et al. show that, provided that the dimension m of measurement space is large than the information dimension D 1 of the underlying dynamics, a prediction based on the reconstructed self-intersecting attractor is possible most of the time 18 .

Time Delay
The embedding theorem is silent on the choice of time delay to use in constructing mdimensional data vectors.Considering that the s t , s t τ , . . ., s t m − 1 τ in 2.26 are serving as independent variables in the reconstruction space, they should be independent from each other and thus the delay time τ needs to be chosen carefully.In this sense, the chosen of τ should satisfy three conditions.
1 It must be some multiple of the sampling time τ s , since we only have data at those times.
2 If τ is too short, the coordinates s t and s t τ will not be independent enough and the reconstructed attractor will be accumulated around the diagonal line.That is, we will not have seen any of the dynamics unfold in that time.
3 Finally, since chaotic systems are intrinsically unstable, if τ is too large, any connection between the measurements s t and s t τ is numerically tantamount to being random with respect to each other.
Therefore, the delay time τ should be large enough so that s t and s t τ are rather independent, but not so large that they are completely independent in a statistical sense.This is a difficult problem and can be solved by the mutual information whose first minimum is the good candidate of τ.

Mutual Information I τ
For a measured data s t , we create a histogram for the probability distribution of the data.Denote by p i the probability that the signal assumes a value inside the ith bin of the histogram, and let p ij τ be the probability that s t is in bin i and s t τ is in bin j.Then the mutual information for time delay τ reads

2.29
In the special case of τ 0 the joint probabilities p ij p i δ ij and the expression yields the Shannon entropy of the data distribution.Apart from this degenerate case, the value of the mutual information is independent of the particular choice of histogram, as long as it is fine enough.In the limit of large τ, s t and s t τ have nothing to do with each other and p ij thus factorizes to p i p j and the mutual information becomes zero.The mutual information I τ between measurements s n and time lagged measurements s n T is both easy to evaluate directly from the time series s n and easy to interpret.A very nice property of average mutual information is that it is invariant under smooth changes of coordinate system.
Fraser and Swinney suggest that one use the function I τ as a kind of nonlinear autocorrelation function to determine the optimal τ 19 .The first minimum of I τ marks the time lag value to use in time delay reconstruction of phase space where s t τ adds maximal information to the knowledge we have from s t , or, in other words, the redundancy is least.Let us take the Lorenz system ẋ σ y − x , ẏ γx − y − xz, ż −bz xy as an example.The time delay can be also chosen by other methods such as the autocorrelation function 7 .It is pointed out the delay spanned window m − 1 τ, rather than τ and m separately, is a crucial issue for the attractor reconstruction.The reason is that the window m − 1 τ determines the characteristics of the correlation integral whose linear part gives the correlation dimension D 2 .However, the first minimum of mutual information is not consistently successful in identifying the optimal window 20 .The optimal window length for successful embedding τ w should satisfy τ w ≥ τ p with τ p being the mean orbital period of the underlying chaotic system and is approximated from the oscillations of the time series 21 .

Modeling and Forecasting: Applications of Chaotic Time Series Analysis
The time series analysis has been widely applied in all the fields where the dynamical equations are unknown and only one or a few data are available.The most popular application is to calculate the dynamical parameters which partially reflect the properties of the attractor.For example, from a time series one can first use the delay-coordinate embedding technique to reconstruct the phase space and then calculate its dimensions and Lyapunov exponents and so forth.by the methods introduced in the previous section.As this part is relatively easy and straightforward, we will not focus on it in this section.We would like to introduce a few typical cases where the time series analysis is necessary and useful tool to tell more information on the system, not only limited to the attractor.

Constructing Dynamical Equations
The most important question of time series analysis may be that: Is it possible to figure out the ordinary differential equations ODEs from a scalar time series, which govern the behavior of the system?This problem has been well studied and the answer is yes.A lot of methods have been developed to solve this problem.We here introduce two of them, that is, the standard approach and synchronization approach

The Standard Approach
This approach assumes every ODE can be written in a standard form 22, 23 Reference 24 points that the above method is inefficient in many situations.In particular, universal models often contain a large number of coefficients and demonstrate divergent solutions.For solving this problem, 24 gives a modified approach as follows.Its main idea is to let the function f depend explicitly on time and takes the form where the linear terms cos ωt and sin ωt is chosen for simple and effective approximation.In general, using higher powers of cos ωt and sin ωt are also possible.For 3.3 , we still have one parameter to be determined, that is, the driving frequency ω or driving period T .The idea to find T is as follows 25 .First, a sufficiently big value of a polynomial order K is selected.Second, an initial estimate T T * of the period value is found this estimate can be derived as a location of a peak in the power spectrum of the observed series .At this value of T , one obtains the values of c l 1 ,l 2 ,...,l D , a l 1 ,l 2 ,...,l D , b l 1 ,l 2 ,...,l D , and an approximation error by the linear least squares method.Then, the trial value of T is varied through a certain range near the initial estimate T T * and approximation is performed for each of the trial values.The graph of T has a sharp and deep minimum which corresponds quite precisely to the "true" value of the driving period.

Synchronization Approach
Recently, Sorrentino and Ott proposed an adaptive strategy that, by using synchronization, is able to obtain a set of ODE that describes the unknown real system 26 .They assume that the ODEs governing the system dynamics are expressible or approximately expressible in terms of polynomials of an assigned degree.Then they extract the whole set of parameters of the unknown system from knowledge about the dynamical evolution of its state vector and its first derivative.The detailed steps are as follows 26 .
Consider a system ẋ F x with x x 1 , x 2 , . . ., x m T and F x f 1 x ,f 2 x , . . ., f m x T , where f i x is a degree two polynomial

3.4
For example, in the case of R össler system, m 3 and Then the task is how to make a ijk , b ij , c i evolve to the true coefficient a ijk , b ij , c i .To accomplish this goal, we envision coupling the true system to the model system 3.4 .It is then hoped that when the synchrony is achieved, the model coefficients will be a good approximation to the corresponding coefficients of the real system.The coupling from the true system to the model is designed as follows: The quantity H is in general an n ≤ m vector of n observable scalar quantities that are assumed to be known functions of the system state x t .Γ is an n × m constant coupling matrix.Here we assume H x x and Γ γI m , where γ is a scalar quantity and I m is the identity matrix of dimension m.To obtain the synchronized solution x t x t , we introduce a potential To make Ψ i 0, we let the parameters a ijk , b ij , and c i adaptively evolve according to the following gradient descent relations: β a , β b , β c > 0. Our hope is that a ijk , b ij , and c i will converge under this evolution to the true parameter values, a ijk , b ij , c i .We consider the first equation of 3.8 .Let f i jk denote f i x 1 , x 2 , . . ., x m evaluated at a ijk 0, we have f i x 1 , x 2 , . . ., x m a ijk x j x k f i jk .Substituting this into the right-hand side of the first equation of 3.8 , we obtain 3.9 Similarly letting f i j denote f i x 1 , x 2 , . . ., x m evaluated at b ij 0, the second equation of 3.8 gives Finally, we consider the third equation of 3.8 with f i denote f i x 1 , x 2 , . . ., x m evaluated at c i 0. Then

3.11
We here consider the case where β a,b,c are very large.For this situation the solutions a ijk , b ij , and c i rapidly converge to the minimum of the potentials, indicating we can set the Mathematical Problems in Engineering averages • • • v on the right-hand side of 3.9 -3.11 to zero.We further consider that the average • • • v is performed over a time scale v −1 , which is much larger than the time scale T s for variation in x t , in which case a ijk , b ij , and c i vary slowly compared to x t .Under these conditions, 3.8 and 3.9 -3.11 then yield 3.12 Equation 3.12 constitutes a system of M m 2 /2 3m/2 1 m linear equations for the M quantities a ijk , b ij , and c i .In practice, it is inconvenient to explicitly calculate the integrals for these quantities in terms of the form at every time step.Instead we will use the fact that I t satisfies the differential equation dI dt vI vG t 3.14 and obtain I t as a function of time by solving 3.14 .Thus the adaptive system for finding estimates of the quantities a ijk , b ij , and c i is 3.6 for x t and 3.12 for a ijk , b ij , and c i , where the various terms in 3.12 are of the form I t G t v obtained by integrating 3.14 .Sorrentino and Ott have applied this method to both the R össler and Lorenz systems and confirmed that it works well 26 .

Detecting the Phase Locking
Another important application of time series analysis is in biology and medical sciences.It is found that in living systems, synchronization is often essential in normal functioning, while abnormal synchronization can lead to severe disorders.For example, the EEG data from human brain shows that synchronization under normal conditions seems to be essential for the binding problem; whereas epilepsies are related to abnormally strong synchronization 27-29 .In principal, the synchronization relationship between two time sequences may be phase synchronization PS , generalized synchronization GS , Lag synchronization LS , and complete synchronization CS , and so forth, depending on the coupling strength 30 .
For two coupled identical systems, one may observe CS where there is an invariant synchronization manifold 31 .And for two coupled nonidentical systems, it may show PS and then LS when coupling is increasing 32 .At PS, their frequencies are locked whereas their amplitudes remain chaotic and uncorrelated 33-35 .And at LS, the corresponding variables become identical after the transformation of time shift.For two coupled different systems, we may observe GS when coupling is large enough 36-45 , where there is a functional relation between their states.
CS is very easy to be detected by directly checking the identity between the measured two time series.However, it is not so easy to detect the PS, LS and GS.To detect PS, we first need to calculate the phase φ 1,2 from the two time series by the Hilbert transform.If they satisfy the condition where n, are integers, the two time series are phase-locking or in PS.To check LS, we calculate a similarity function S: which is a time-averaged difference between the variables x 1 and x 2 taken with the time shift τ.Then we search for its minimum σ min τ S τ .If there exists a τ 0 with σ 0, we have LS, that is, x 2 t τ x 1 t .Figure 6 shows the similarity function S for two coupled nonidentical R össler systems
To check GS, there is a convenient method, that is, the auxiliary system approach 38 .The idea is as follows.Consider two coupled systems ẋ F x , ẏ G y, x .Then we construct an auxiliary response system y identical to y, link it to the driving system x in the same way as y is linked to x.That is, ẏ G y , x .Instead of checking the relationship between x and y, we check the relationship between y and y .If there is an identical synchronization between y and y , then we have a GS between x and y.Unfortunately, this method fails for the time series where the dynamical equations are unknown.An alternative approach to detect the GS is to estimate the maximal conditional Lyapunov exponent λ R max .There is GS if λ R max < 0. The conditional Lyapunov exponent is determined by the variational equation of the response system at δx 0 δ ẏ D y G y, x δy, 3.18 where D y G denotes the Jacobian matrix with respect to the y variable.For a dynamical system with no explicit driving and response parts, the GS can be determined by the transverse Lyapunov exponent which is from the variational equation on the invariant manifold x y.Its detail can be found in 41, 46 .For the case of nonstationary time series, we do not have a fixed relationship like the CS, PS, LS, and GS for all the time, that is, the synchronized relation may change with time such as in the EEG data with epileptic seizure.An approach based on the permutation entropy may be useful for this case 47, 48 .Let us briefly introduce the concept of entropy.In probability theory, entropy quantifies the uncertainty associated to a random process.Consider an experiment with outcome S {s 1 , s 2 , . . ., s n }.Assume that the probability of s i is p i with i p i 1.If s 1 has a probability very close to 1, then in most experiments the outcome would be s 1 thus the result is not very uncertain.One does not gain much information from performing the experiment.One can quantify the "surprise" of the outcome as information − ln probability .The entropy associated to the experiment is which is simply the expectation value of the information produced by the experiment.Entropy quantifies the information content, namely, the amount of randomness of a signal.For two measured scalar time series data x i nΔt , y i nΔt , n 1, 2, . . ., we divide the long time series into segments with fixed finite length τ.Then in each segment denoted as τ j j 1, 2, . . ., we use the technique of sliding window analysis to partition the time series data into short sequences of a given length m 3.Each shifting in the time series corresponds to a new short sequence.For example, x i nΔt , x i n 1 Δt , x i n 2 Δt and x i n 1 Δt , x i n 2 Δt , x i n 3 Δt are different sequences.For a time series with length N, the total short sequences in one segment will be N − m 1. Inside a short sequence, we distinguish it as a pattern by the natural position order.Hence we have four different patterns for m 3, such as Using p i i 1, . . ., 4 to denote the probability that one of the patterns appears in the time series of a segment, we can define the permutation entropy H τ j as Because the time series is always changing with time, H τ j changes with time period interval τ j and will be determined by the local topological structure of the time series.For two time series with GS, we cannot expect them to have the same permutation entropy H τ j in the corresponding time period τ j because they are not identity, but we would expect the corresponding H τ j have similar changing tendency.That is, we require the changing tendency of H τ j will be in step if there is GS between the two time series x i nΔt and y i nΔt .By this way we can figure out the degree of GS 47, 48 .

Inferring Coupling Direction
Sometimes, knowing the synchronized relationship between two time series is not enough, and we need to know more information about them such as which one is the driving system and which one is the response system.For example, in the prediction and control of epileptic seizure, we need to find out the focus.This problem has been well studied in the past decade and a number of approaches have been proposed, such as the crosscorrelation functions, cross-spectral analysis, Granger causality, the nearest neighbors, and phase dynamical modeling 49-53 .We here introduce typical three of them: the Granger causality, the nearest neighbors, and the phase dynamical modeling.

Granger Causality
Granger causality is based on the notion of predictability 49 .In general, in a coupled system that involves two interacting fields, Different values of d 1 and d 2 need to be tested in order to select those values that provide the most faithful results.In the framework of linear AR, we have In 3.22 the coefficients α i and β i are selected in order to minimize the mean square error

3.23
The null hypothesis that y n does not Granger cause x n is supported when β i 0, which reduces 3.23 to where the coefficients α i are selected in order to minimize the mean square error

3.25
When 2 xy appears to be less than the 2 x , it is assumed that process y influences process x.This model leads to the well-known alternative test statistics, the Granger-Sargent test 54

3.26
Thus, the influence of {y n } on {x n } is characterized by the value of the normalized prediction improvement S 2 xy and the reverse influence of {x n } on {y n }, S 2 yx is described by an equation similar to 3.26 in which x and y should be interchanged.

The Nearest Neighbor Approach
This method is based on the existence of GS and is a nonlinear prediction approach 55-57 .Since many features of real data such as EEG signals cannot be generated by linear models, it is generally argued that nonlinear measures are likely to give more information than the one obtained with conventional linear approaches.The main idea is that for a driver/response system with y G x , the response system y will follow the driver system x.Therefore, the nearest neighbors of x will have corresponding nearest neighbors of y but the inverse does not work.Based on this feature, the direction of coupling can be figured out.
Consider two time series {x n } and {y n }.Let us reconstruct delay vectors x n x n , . . ., x n− m−1 τ and y n y n , . . ., y n− m−1 τ , where n 1, . . ., N, m is the embedding dimension, and τ denotes the time lag.Let r n,j and s n,j , j 1, . . ., k, denote the time indices of the k nearest neighbors of x n and y n , respectively.Figure 7 shows the schematic illustration with k 1.For each x n , the square mean Euclidean distance to its k neighbors is defined as x n − x s n,j 2 .

3.28
If the point cloud {x n } has an average squared radius R X k n X if they are independent.Accordingly, we can define an interdependent measure S k X | Y as 28, 56, 57 is, if X depends more on Y than vice versa, we say that Y is more "active" than X.

Mathematical Problems in Engineering
This approach is good at the case of only two coupled systems.For the case of multiple coupled systems, we need to pay attention to not only the values of S k X | Y or S k Y | X but also their difference.Take a ring of three unidirectionally coupled oscillators as an example.We will obtain that both the values of S k X | Y and S k Y | X between the oscillators 1 and 2 are greater than zero, but it does not means that their coupling is bidirectional.That is, the indirect coupling should be also considered.

Phase Dynamical Modeling
This approach is presented by Rosenblum and Pikovsky 58 and widely applied in real situations 59, 60 .The main idea is to use a general property that a weak coupling affects the phases of interacting oscillators, whereas the amplitudes remain practically unchanged.In detail, consider a case of weak coupling where the dynamics can be reduced to those of two phases φ 1,2 :

3.31
where the small parameters 1,2 ω 1,2 characterize the strength of coupling and the random terms ξ 1,2 describe noisy perturbations that are always present in real-world systems.Functions q 1,2 , f 1,2 are 2π-periodic in all arguments.If the coupling is bidirectional, both 1 and 2 will be greater than zero.In the case of an unidirectional driving, say from system 1 to system 2, we have 1 0 and 2 > 0. System 3.31 describes the phase dynamics of weakly coupled noisy limit cycle oscillators, Josephson junctions, and phase locked loops, as well as phase dynamics of weakly coupled continuous-time chaotic systems 58 .
The coupling direction is represented by the ratio between 1 and 2 , thus the goal is to estimate the ratio from the measured time series.To do it, we first extract the phase φ 1,2 t k from data by Hilbert transform approach, where t k kδt, δt is the sampling interval, k 1, . . ., N. Then we compute for each time point the increments Δ 1,2 k φ 1,2 t k τ − φ 1,2 t k , τ is a time delay.These increments can be considered as generated by some unknown twodimensional noisy map Δ 1,2 k F 1,2 φ 1,2 k , φ 2,1 k η 1,2 k .Next, we fit in the least mean square sense the dependencies of Δ on φ 1 and φ 2 using a finite Fourier series as the probe function: 3.32 The sum in 3.32 is taken to the terms with |l| ≤ 3 for m 0, |m| ≤ 3 for l 0, and |m| |l| 1.
The results of fitting are used to quantify the cross-dependencies of phase dynamics of two systems by means of the coefficients c 1,2 defined as dφ 1 dφ 2 .

3.33
Finally, the directionality index is introduced as In the case of unidirectional coupling r ±1, while for nearly symmetrical coupling r is close to zero.

Coherence Resonance Induced by Noise
When two chaotic systems are coupled bidirectionally, there will be a synchronization when the coupling strength is strong enough.However, when there is noise in the system, the synchronization manifold will be destroyed from time to time, resulting in some kind of bursting.It is revealed that the time intervals of bursting will show some regularity, which can be enhanced by an optimal external noise and results in a phenomenon called coherence resonance 61, 62 .Coherence resonance is referred to the fact that noise can actually be utilized to improve the temporal regularity of the bursting time series, in the absence of an external periodic signal.where k is the coupling strength, ξ x,y,z t are independent Gaussian noise, and D quantifies the noise strength.When D 0, there is a synchronized manifold for k > k c ≈ 3.92 61 .But for D > 0, the difference between the two systems exhibits on-off intermittence.Take two time series from the variable y of the two coupled systems and let |Δy| denote their difference.Figure 8 shows the time series of |Δy| for three typical D. It is easy to see that the bursting interval, denoted by T in Figure 8 b , shows different behaviors in the three panels of Figure 8.
To show the regularity of time series |Δy|, Figure 9 shows the corresponding power spectra.From this figure we see that there are no peak in the spectra for both the small and large D, indicating a lack of temporal regularity in the bursting time series.A pronounced peak does exist at the intermediate noise level see Figure 9 b , indicating the existence of a strong time-periodic component in the time series.This apparent temporal regularity can be quantified by the characteristics of the peak at a nonzero frequency ω p in the spectrum.In particular, we utilize the quantity β s Hω p /Δω, where H is the height of the spectral peak, Δω is the half width of the peak 63 .By its definition, a high value of β s indicates a strong temporal regularity in the bursting time series.This phenomenon can be also described by another quantity β T T / Var T , where T and Var T are the average and variance of the interval T 64 .Both quantities β s and β T show a bell shape on the noise strength D, that is, a resonance on D, which has been confirmed by experiment 62 .

Correlation of Traffic Data in Internet
Undoubtedly, the internet has become a very important tool in our daily life 65-68 .The operations on the internet, such as browsing World Wide Web WWW pages, sending messages by email, transferring files by ftp, searching for information on a range of topics, and shopping, have benefited us a lot.Therefore, sustaining its normal and efficient functioning is a basic requirement.However, the communication in the internet does not always march/go freely.Similar to the traffic jam on the highway, the intermittent congestion in the internet has been observed 69 .Once it happens, the accumulated packets in Internet will increase linearly with time.Figure 10 shows three status from an Internet model 70 , where the top, middle and bottom curves denote the phases of congestion, busy/buffer and free, respectively.From Figure 10 it is easy to see that there exist erratic fluctuation, heterogeneity, and nonstationarity in the data.These features make the correlation difficult to be quantified.A conventional approach to measure the correlation in this situation is by the detrended fluctuation analysis DFA , which can reliably quantify scaling features in the fluctuations by filtering out polynomial trends.The DFA method is based on the idea that a correlated time series can be mapped to a self-similar process by integration 71-74 .Therefore, measuring the self-similar feature can indirectly tell us information about the correlation properties.The DFA method has been successfully applied to detect long-range correlations in highly complex heart beat time series 71 , stock index 72 , physiological signals 73 , and particle condensation 74 .
The DFA method is a modified root-mean-square rms analysis of a random walk and its algorithm can be worked out as the following steps.
1 Start with a time series s j , where j 1, . . ., N, and N is the length of the data, and integrate s j to obtain For scale-invariant time series with power-law correlations, there is a power-law relationship between the rms fluctuation function F m and the box size m: F m ∼ m α .

3.39
The scaling α represents the degree of the correlation in the signal; the signal is uncorrelated for α 0.5 and correlated for α > 0.5.Using the DFA method to the data in Figure 10, we find that the values of α for the three curves from top to bottom are 1.3, 0.7, and 0.5, respectively, indicating that the data of congestion phase are correlated while data of free phase are uncorrelated.
Except the above applications, there are many other fields of time series analysis such as chaos controlling, testing for nonlinearity with surrogate data, EEG data analysis and multifractals, and stock market analysis, which have gotten widely interesting from both physics and engineers 7-9 .

Conclusions
The chaos theory and its time series analysis have been well studied in the past decades.A lot of important results have been achieved.This paper is contributed to the brief summary of chaotic time series analysis.The concepts, such as dimension, Lyapunov exponents, Hilbert transform, and attractor reconstruction, have been discussed.Several applications of time series analysis have been explained in detail, such as constructing dynamical equations, detecting the phase locking, inferring coupling direction, and coherence resonance induced by noise and correlation of traffic data in Internet.These are only a few of the applications of time series analysis, and a lot of other applications are not included here, such as the analysis of transient chaotic time series.Even in these mentioned fields, new techniques and methods are continuously showing up.

Figure 1 :
Figure 1: a The attractor of Henon map; b the slope of the straight line represents the box-counting dimension D 0 .

Figure 2 :
Figure 2: Schematic illustration of how to numerically calculate the maximal Lyapunov exponent.

Figure 3 :
Figure 3: Schematic illustration of how to calculate the maximal Lyapunov exponent from time series or experimental data.
0 kΔt, τ is the delay time which is an integer multiple of Δt, and m is the embedding dimension.Clearly, what time lag τ to use and what dimension m to use are the central issues of this reconstruction based on delay-coordinate embedding.Let us discuss under what condition of τ and m, the reconstructed vector u t can represent the true trajectory of the unknown attractor.

Figure 4 :
Figure 4: Schematic illustration of intersection of the two subspaces.a d 1 d 2 1 and m 2; b d 1 d 2 1 and m 3; c d 1 1, d 2 2 and m 3.

Figure 5 a
shows how the mutual information I τ changes with τ when the parameters are taken as σ 16, γ 45.92, b 4. It is easy to see that the first minimum of I τ occurs at τ 10.Figures 5 b to 5 d represent the reconstructed attractors by τ 1, 10, 20, respectively.

Figure 5 :
Figure 5: a The mutual information I τ versus the time delay τ for Lorenz system; b to d represent the cases where the attractors are reconstructed by τ 1, 10, 20, respectively.
all the coefficients a ijk , b ij , and c i are zero, except a 313 1, b 12 −1, b 13 −1, b 21 1, b 22 0.165, b 33 −10 , and c 3 0.2.Although the system function F is unknown, it is appropriate to try to model the dynamics of the true system by ẋ F x with

Figure 6 :
Figure 6: The dependence of the similarity function S on the time lag for different coupling strengths.

Figure 7 :
Figure 7: Schematic illustration for the indices of the nearest neighbors in x and y state space, respectively, where the number 1, 3 denote the r n,j and 1 , 3 the s n,j .

2 , 3 . 27 and
the Y-conditional squared mean Euclidean distance is defined by replacing the nearest neighbors by the equal time partners of the closest neighbors of y n , by construction, we have 0 < S k X | Y ≤ 1. 3.30 Low values of S k X | Y indicate independence between X and Y, while high values indicate synchronization.This dependence becomes maximal when S k X | Y → 1.The opposite dependence S k X | Y is defined in complete analogy.It is in general not equal to S k Y | X .Both S k X | Y and S k Y | X may be of order 1.Thus, X can depend on Y, and at the same time Y can depend on X

2 37 4 1 Y m i 2 . 3 . 38 5
Divide the integrated profile y i into boxes of equal length m.In each box, we fit y i to get its local trend y fit by using a least-square fit.3The integrated profile y i is detrended by subtracting the local trend y fit in each box:Y m i ≡ y i − y fit i .3.For a given box size m, the rms fluctuation for the integrated and detrended signal is calculated Repeat this procedure for different box size m.
is a function of time and also of the coefficients a ijk , b ij , and c i .Also note that if x 1 t , x 2 t , . . ., x m t are chaotic, then the quantities f i x 1 t , x 2 t , . . ., x m t , i 1, 2, . . ., m, vary chaotically as well.It is easy to see that the potential satisfies Ψ i ≥ 0. Once Ψ i 0, we have synchronization and the coefficients are obtained.
Granger causality tests whether past values of one field X statistically help to predict the current values of the other field Y better than using past values of Y alone.Should past values of X contain information about current values of Y beyond that contained in the preceding Y sequence or any other variables contained in the information set , variability in the X field is said to "Granger causal" variability in the Y field.Similarly, we can test whether previous values of Y cause variability in the present values of X.Consider two time series {x n } and {y n }.If there is a causal relation between process y and process x, the prediction of signal {x n } can be improved by incorporation into the model the past of signal {y n }.As a result, we have a "joint"/bivariate autoregressive AR model and g are polynomials that can be determined from the current data.d 1 is the correlation length to the previous values, and d 2 describes "inertial" properties of the influence.If d 2 1, then the influence is instantaneous, otherwise it is nonlocal in time.