Symplectic Principal Component Analysis: A New Method for Time Series Analysis

Experimental data are often very complex since the underlying dynamical system may be unknown and the data may heavily be corrupted by noise. It is a crucial task to properly analyze data to get maximal information of the underlying dynamical system. This paper presents a novel principal component analysis PCA method based on symplectic geometry, called symplectic PCA SPCA , to study nonlinear time series. Being nonlinear, it is different from the traditional PCAmethod based on linear singular value decomposition SVD . It is thus perceived to be able to better represent nonlinear, especially chaotic data, than PCA. Using the chaotic Lorenz time series data, we show that this is indeed the case. Furthermore, we show that SPCA can conveniently reduce measurement noise.


Introduction
Data measured in experimental situations, especially in real environments, can be very complex since the underlying dynamical system may be nonlinear and unknown structure, and the data may be very noisy.It is challenging to appropriately analyze the measured data, especially the noisy ones.Since chaotic phenomena have been discovered, interpretation of irregular dynamics of various systems as a deterministic chaotic process has been popular and widely used in almost all fields of science and engineering.A number of important algorithms based on chaos theory have been employed to infer the system dynamics from the data or reduce noise from the data 1-6 .The first step of these approaches is to reconstruct a phase space from the data so that the dynamic characteristic of the system can be properly studied 7 .This is achieved using Takens' embedding theorem 8 , which states that the system dynamics under the noise-free case can be reconstructed from one-dimensional signal, that is, a time series.However, the actual systems may often be noisy-sometimes so noisy that the reconstructed attractor of the nonlinear system could exhibit different features when different analysis techniques are used 9-12 .Therefore, appropriate analyses of the measured data are a critical task in the fields of science and engineering.In this work, we propose a novel nonlinear analysis method based on symplectic geometry and principal component analysis, called symplectic principal component analysis SPCA .
The symplectic geometry is a kind of phase space geometry.Its nature is nonlinear.It can describe the system structure, especially nonlinear structure, very well.It has been used to study various nonlinear dynamical systems 13-15 since Feng 16 has proposed a symplectic algorithm for solving symplectic differential.However, from the view of data analysis, few literatures have employed symplectic geometry theory to explore the dynamics of the system.Our previous works have proposed the estimation of the embedding dimension based on symplectic geometry from a time series [17][18][19][20]

Method
Consider a dynamical system defined in phase space R d .A discretized trajectory at times t nt s , n 1, 2, . .., may be described by maps of the form In SPCA, a fundamental step is to build the multidimensional structure attractor in symplectic geometry space.Here, in terms of Taken's embedding theorem, we first construct an attractor in phase space, that is, the trajectory matrix X from a time series.Then, we describe the symplectic principal component analysis SPCA based on symplectic geometry theory and give its corresponding algorithm.

Attractor Reconstruction
Let the measured data the observable of the system under study x 1 , x 2 , . . ., x n be recorded with sampling interval t s ; n is the number of samples.Takens' embedding theorem states that if the time series is indeed composed of scalar measurements of the state from a dynamical system, then, under certain genericity assumptions, a one-to-one image of the original set {x} is given by the time-delay embedding, provided d is large enough.That is, the time-delay embedding provides the map into where d is embedding dimension, m n − d 1 is the number of dots in d-dimension reconstruction attractor, and X m×d denotes the trajectory matrix of the dynamical system in phase space, that is, the attractor in phase space.

Symplectic Principal Component Analysis
SPCA is a kind of PCA approaches based on symplectic geometry.Its idea is to map the investigated complex system in symplectic space and elucidate the dominant features underlying the measured data.The first few larger components capture the main relationship between the variables in symplectic space.The remaining components are composed of the less important components or noise in the measured data.In symplectic space, the used geometry is called symplectic geometry.Different from Eulid geometry, symplectic geometry is the even-dimensional geometry with a special symplectic structure.It is dependent on a bilinear antisymmetric nonsingular cross product-symplectic cross product: x, y x, Jy , where x 2 y 2 .

2.5
The measurement of symplectic space is area scale.In symplectic space, the length of arbitrary vectors always equals zero and without signification and there is the concept of orthogonal cross-course.In symplectic geometry, the symplectic transform is the nonlinear transform in essence, which is also called canonical transform, since it has measurepreserving characteristics and can keep the natural properties of the original data unchanged.
It is fit for nonlinear dynamics systems.
The symplectic principal components are given by symplectic similar transform.It is similar to SVD-based PCA.The corresponding eigenvalues can be obtained by symplectic QR method.Here, we first construct the autocorrelation matrix A d×d of the trajectory matrix X m×d .Then, the matrix A can be transformed as a Hamilton matrix M in symplectic space.

Theorem 2.1. Any d × d matrix can be made into a Hamilton matrix M. Let a matrix as A, so
where M is Hamilton matrix.∴ H is also unitary matrix ∴ H is symplectic unitary matrix.

2.11
For Hamilton matrix M, its eigenvalues can be given by symplectic similar transform and the primary 2d-dimension space can be transformed into d-dimension space to resolve 17-19 , as follows:

2.12
ii Construct a symplectic matrix Q, where B is up Hessenberg matrix b ij 0, i > j 1 .The matrix Q may be a symplectic Household matrix H.If the matrix M is a real symmetry matrix, M can be considered as N.Then, one can get an upper Hessenberg matrix referred to 2.13 , namely, where H is the symplectic Householder matrix.
iii Calculate eigenvalues λ B {μ 1 , μ 2 , . . ., μ d } by using symplectic QR decomposition method; if M is a real symmetry matrix, then the eigenvalues of A are equal to those of B:

2.15
iv These eigenvalues μ {μ 1 , μ 2 , . . ., μ d } are sorted by descending order, that is, Thus, the calculation of 2d-dimension space is transformed into that of d-dimension space.
The μ is the symplectic principal component spectrums of A with relevant symplectic orthonormal bases.In the so-called noise floor, values of μ i , i k 1, . . ., d, reflect the noise level in the data 18, 19 .The corresponding matrix Q denotes symplectic eigenvectors of A.

Proposed Algorithm
For a measured data x 1 , x 2 , . . ., x n , our proposed algorithm consists of the following steps: 1 Reconstruct the attractor X m×d from the measured time series, where d is the embedding dimension of the matrix X and m n − d 1.
2 Remove the mean values X mean of each row of the matrix X.
3 Build the real d × d symmetry matrix A, that is, Here, d should be larger than the dimension of the system in terms of Taken's embedding theorem.
4 Calculate the symplectic principal components of the matrix A by QR decomposition, and give the Household transform matrix Q.
5 Construct the corresponding principal eigenvalue matrix W according to the number k of the chosen symplectic principal components of the matrix A, where Generally, the second estimated data will be better than the first estimated data.
Besides, it is necessary to note that, for the clean time series, the step 8 is unnecessary to handle.

Numerical and Experimental Data
In order to investigate the feasibility of SPCA, this paper employs the chaotic Lorenz time series as follows: x t e t , Here, e is a white Gaussian measurement noise.The measurement noise e is used because all real measurements are polluted by noise.For more details of noise notions, refer to the literature 23-26 .

Performance Evaluation
SPCA, like PCA, not only can represent the original data by capturing the relationship between the variables, but also can reduce the contribution of errors in the original data.Therefore, this paper studies the performance analysis of SPCA from the two views, that is, representation of chaotic signals and noise reduction in chaotic signals.

Representation of Chaotic Signals
We first show that, for the clean chaotic time series, SPCA can perfectly reconstruct the original data in a high-dimensional space.We first embed the original time series to a phase space.Considering that the dimension of the Lorenz system is 3, d of the matrix A is chosen as 8 in our SPCA analysis.To quantify the difference between the original data and the SPCAfiltered data, we employ the root-mean-square error RMSE as a measure: where x i and x i are the original data and estimated data, respectively.When k d, the RMSE values are lower than 10 −14 see Figure 1 .In Figure 1, the original data are generated by 3.1 when noise e 0. The estimated data is obtained by SPCA with k d.The results show that the SPCA method is better than the PCA.Since the real systems are usually unknown, it is necessary to study the effect of sampling time, data length, and noise on the SPCA approach.From Figures 1 and 2, we can see that the sampling time and data length have less effect on SPCA method in the case of free noise.
For analyzing noisy data, we use the percentage of principal components PCs to study the occupancy rate of each PC in order to reduce noise.The percentage of PCs is defined by where d is the embedding dimension and μ i is the ith principal component value.From Figure 3, we find that the first largest symplectic principal component SPC of the SPCA is a little larger than that of the PCA.It is almost possessed of all the proportion of the symplectic principal components.This shows that it is feasible for the SPCA to study the principal component analysis of time series.
Next, we study the reduced space spanned by a few largest symplectic principal components SPCs to estimate the chaotic Lorenz time series see Figure 4 .In Figure 4, the data x is given with a sampling time of 0.01 from chaotic Lorenz system.The estimated data is calculated by the first three largest SPCs.The average error and standard deviation between the original data and the estimated data are −6.55e− 16 and 1.03e − 2, respectively.The estimated data is very close to the original data not only in time domain see Figure 4 a but also in phase space see Figure 4 b .We further explore the effect of sampling time in different number of PCs.When the PCs number k 1 and k 7, respectively, the SPCA and PCA give the change of RMSE values with the sampling time in Figure 5.We can see that the RMSE values of the SPCA are smaller than those of the PCA.The sampling time has less impact on the SPCA than the PCA.In the case of k 7, the data length has also less effect on the SPCA than the PCA see Figure 6 .
Comparing with PCA, the results of SPCA are better in Figures 4, 5, and 6.We can see that the SPCA method keep the essential dynamical character of the primary time series generated by chaotic continuous systems.These indicate that the SPCA can reflect intrinsic nonlinear characteristics of the original time series.Moreover, the SPCA can elucidate the dominant features underlying the observed data.This will help to retrieve dominant patterns from the noisy data.For this, we study the feasibility of the proposed algorithm to reduce noise by using the noisy chaotic Lorenz data.

Noise Reduction in Chaotic Signals
For the noisy Lorenz data x, the phase diagrams of the noisy and clean data are given in Figures 7 a and   that noise is very strong.The first denoised data is obtained in terms of the proposed SPCA algorithm see Figures 7 c -7 f .Here, we first build an attractor X with the embedding dimension of 8.Then, the transform matrix W is constructed when k 1.The first denoised data is generated by 2.18 and 2.19 .In Figure 7 c , the first denoised data is compared with the noisy Lorenz data x from the view of time field.reduction, the results are greatly improved in Figures 7 e and 7 f , respectively.The curves of the second denoised data are better than those of the first denoised data whether in time domain or in phase space by contrast with Figures 7 c and 7 d .Figure 7 g shows that the PCA technique gives the first denoised result.We refer to our algorithm to deal with the first denoised data again by the PCA see Figure 7 h .Some of noise has been further reduced but the curve of PCA is not better than that of SPCA in Figure 7 e .The reason is that the PCA is a linear method indeed.When nonlinear structures have to be considered, it can be misleading, especially in the case of a large sampling time see Figure 8 .The used program code of the PCA comes from the TISEAN tools http://www.mpipks-dresden.mpg.de/∼tisean/ .
Figure 8 shows the variation of correlation dimension D 2 with embedding dimension d in the sampling time of 0.1 for the clean, noisy, and denoised Lorenz data.We can observe that, for the clean and SPCA denoised data, the trend of the curves tends to smooth in the vicinity of 2. For the noisy data, the trend of the curve is constantly increasing and has no platform.For the PCA denoised data, the trend of the curve is also increasing and trends to a platform with 2. However, this platform is smaller than that of SPCA.It is less effective than the SPCA algorithm.This indicates that it is difficult for the PCA to describe the nonlinear structure of a system, because the correlation dimension D 2 manifests nonlinear properties of chaotic systems.Here, the correlation dimension D 2 is estimated by the Grassberger-Procaccia's algorithm 27, 28 .

Discussion and Conclusion
In this paper, we have proposed a novel PCA based on symplectic geometry, called SPCA.From the view of theory, this method can reflect nonlinear structure of nonlinear dynamical systems very well because it is intrinsically nonlinear.Using chaotic Lorenz data and calculating RMSE, percentage, correlation dimension, and phase space diagrams, we have shown that the SPCA method can yield more reliable results for chaotic time series with wider range of data length and sampling time, especially with short data length and undersampled    sampling time than the classic PCA.With regard to noise reduction, SPCA algorithm is also more effective than PCA.
We wish to emphasize that SPCA has phase delay property; that is, the second row of SPCA-filtered data is closer to the original data.It is worth further investigation in future.

Figure 1 :Figure 2 :
Figure 1: RMSE versus sampling time curves for the SPCA and PCA.

Figure 3 :
Figure 3: The percentage of principal components for the SPCA and PCA.
7 b .The clean data is the chaotic Lorenz data x with noise-free see 3.1 .The noisy data is the chaotic Lorenz data x with Gaussian white noise of zero mean and one variance see 3.1 .The sampling time is 0.01.The time delay L is 11 in Figure 7.It is obvious Lorenz chaotic time series (b) Phase diagrams

Figure 4 :
Figure 4: Chaotic signal reconstructed by the proposed SPCA algorithm with k 3, where a the time series of the original Lorenz data x without noise and the estimated data; b phase diagrams with L 11 for the original Lorenz data x without noise and the estimated data.The sampling time T s 0.01.

Figure 5 :
Figure 5: The RMSE values versus the sampling time for the SPCA and PCA, where a the PCs number k 7; b k 1.

Figure 7 dFigure 6 :
Figure 6: The RMSE versus the data length for the SPCA and PCA, where k 7. The sampling time is 0.1.

Figure 7 :
Figure 7: The noise reduction analysis of the proposed SPCA algorithm and PCA for the noisy Lorenz time series, where L 11.
. Subsequently, Niu et al. have used our method to evaluate sprinter's surface EMG signals 21 .Xie et al. 22 have proposed a kind of symplectic geometry spectra based on our work.In this paper, we show that SPCA can well represent chaotic time series and reduce noise in chaotic data.
use, k can be chosen according to 2.16 .6Get the transformed coefficients S {S 1 , S 2 , . . ., S m }, where For the noisy time series, the first estimation of data is usually not good.Here, one can go back to the step 6 and let X i X s in 2.18 to do step 6 and 7 again.
Then, the reestimation data x s 1 , x s 2 , . . ., x s m can be given.