DDNS Discrete Dynamics in Nature and Society 1607-887X 1026-0226 Hindawi Publishing Corporation 10.1155/2014/365204 365204 Research Article Functional Principal Components Analysis of Shanghai Stock Exchange 50 Index Wang Zhiliang Sun Yalin Li Peng Sivasundaram Seenith College of Mathematics and Informatics North China University of Water Conservancy and Hydroelectric Power, Zhengzhou 450000 China ncwu.edu.cn 2014 2272014 2014 28 02 2014 23 05 2014 04 07 2014 22 7 2014 2014 Copyright © 2014 Zhiliang Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The main purpose of this paper is to explore the principle components of Shanghai stock exchange 50 index by means of functional principal component analysis (FPCA). Functional data analysis (FDA) deals with random variables (or process) with realizations in the smooth functional space. One of the most popular FDA techniques is functional principal component analysis, which was introduced for the statistical analysis of a set of financial time series from an explorative point of view. FPCA is the functional analogue of the well-known dimension reduction technique in the multivariate statistical analysis, searching for linear transformations of the random vector with the maximal variance. In this paper, we studied the monthly return volatility of Shanghai stock exchange 50 index (SSE50). Using FPCA to reduce dimension to a finite level, we extracted the most significant components of the data and some relevant statistical features of such related datasets. The calculated results show that regarding the samples as random functions is rational. Compared with the ordinary principle component analysis, FPCA can solve the problem of different dimensions in the samples. And FPCA is a convenient approach to extract the main variance factors.

1. Introduction

In the present study of data analysis we have learned, the data we research is either cross-sectional data or panel data. In the practical research, however, we often meet with such data which has functional characteristics. Functional data is multivariate data with an ordering on the dimensions . The data seem to deserve the label “functional” since they so clearly reflect the smooth curves that we assume generated them. The typical dataset of this sort consists of time series and cross-sectional data, such as the time series of stock price, and some datasets even may take on curves or images. Advances in data collection and storage have tremendously increased the presence of such functional data, whose graphical representations are curves, images, or shapes. The theoretical and practical developments in functional data analysis are mainly from the last four decades, due to the rapid development of computer recording and storing facilities. As a new area of statistics, functional data analysis extends existing methodologies and theories from the fields of data analysis, generalized linear models, multivariate data analysis, nonparametric statistics, and many others. Recently, there were several impressive attempts to analyze functional dataset such as Ramsay et al. , who proposed some new concepts and methods in the field of FDA.

FPCA is the functional analogue of the well-known dimension reduction technique in the multivariate statistical analysis and is useful in determining the common factors or trends that are present in the dynamics of the underlying recovered functions.

The advance of FPCA can be seen when Karhunen  and Loève  independently developed a theory on the optimal series expansion of a continuous stochastic process. Motivated by a dataset of growth curve, Rao  developed some preliminary ideas on FPCA and proposed statistical tests for the equality of average growth curves over a period of time. Much later, Dauxois et al.  introduced a functional exposition of PCA with applications to statistical inference. Several other notable developments have arisen out of the systematic research of the functional data analysis group named the Toulouse School of Functional Data Analysis .

In recent years, Hall and Hosseini-Nasab [12, 13] showed how the properties of functional principal component analysis can be elucidated through stochastic expansions and related results. Yao et al.  proposed a FPCA procedure via a conditional expectation method, which is aimed at estimating functional principal component scores for sparse longitudinal data. Hall and Vial  have investigated the properties of FPCA and have given some insights into methodology and convergence rates. Di et al.  introduced multilevel FPCA, which is designed to extract the intra- and intersubject geometric components of multilevel functional data. Based on FPCA, Hyndman and Shang  proposed graphical tools for visualizing functional data and detecting functional outliers.

Due to the theoretical and practical developments, FPCA has been successfully applied to many practical problems, such as the analysis of cornea curvature in the human eye , the analysis of electronic commerce , the analysis of growth curve , the analysis of income density , the analysis of implied volatility surface in finance , the analysis of longitudinal primary biliary liver cirrhosis , and the analysis of spectroscopy data . Furthermore, Hyndman and Shahid Ullah  proposed a smoothed and robust FPCA and used it to forecast age-specific mortality and fertility rates.

The objective of this paper is to study the monthly volatility of return of Shanghai 50 index which consists of 50 stocks. Treating stock price series as random function in a space spanned by finite dimensional functional bases, we intensively explore methods of functional data analysis, especially functional principal component analysis.

In the area of finance, some impressive papers with the functional data analysis are found such as Ramsay and Ramsey , Muller and Ulrich , and Miao . But, few republications are found with research on the increasingly flourishing Chinese financial market. This paper will fill the blank both in theory and in application.

Our study can be described as an exploratory data approach:

Data  collectionData  AnalysisConclusions.

This paper is organized as follows. In Section 2, we describe the functional principal component analysis (FPCA), which plays a significant role in the development of functional data analysis. It is also an essential ingredient of functional principal component regression (FPCR). Section 3 will illustrate the empirical study with the application of the theory in Section 2. Some further discussion and a conclusion are presented in Section 4.

2. Methodology

As mentioned before, an important tool in the functional data analysis toolbox is FPCA, that is, functional principal component analysis. The main idea of FPCA is just like multivariate principal component analysis (PCA) but its principal component weights or harmonics are functions of time. They carry the main features of the functional data object and can be interpreted separately.

The differences in notation between PCA and FPCA are summarized in Table 1.

The differences in notation between PCA and FPCA .

PCA FPCA
Variables X = [ x 1 , x 2 , , x n ] , xi=[x1i,,xpi],   i=1,,n f ( t ) = [ f 1 ( t ) , f 2 ( t ) , , f    n ( t ) ] , t[x1,xp]

Data Vectors RP Curves L2[x1,xp]

Covariance Matrix V=Cov(X)RP×RP Operator V bounded between x1 and xp,  ϕk(t)L2[x1,xp],x1xpVξk(t)dt=λkξk(t)V: L2[x1,xp]L2[x1,xp]

Eigen structure Vector ΦkR,  VΦk=λkΦk, for 1kmin(n,p) Function ϕk(t)L2[x1,xp],  x1xpVϕk(t)dt=λkϕk(t), for 1kn

Components Random variables in RP Random variables in L2[x1,xp]

The basic assumption of FDA is that data generating process can be described as a smooth function. FPCA finds the set of orthogonal principal component function by maximizing the variance along each component.

The first functional principal component ϕ1(t) is defined by (1)ϕ1(t)=argmaxΦ2=11Ni=1N(Lϕ(t)fi(t)dt)2 subject to (2)ϕ1=1. The kth functional principal component ϕk(t) can be found analogously, subject to the additional constraint (3)Lϕj(t)ϕk(t)dt=0,j<k. The sample covariance function of f(x)=[f1(x),f2(x),,fn(x)],x[x1,xp] is given by (4)V(s,t)=1Ni=1Nfi(s)fi(t), where function fi(t) has usually been first centered.

Covariance operator V extends the concept of a sample covariance matrix to functional data; it is easy to show that V is a positive compact symmetric linear operator. It is obvious that (5)(VϕK)(t)=λkϕK(t),λ1λ20. Detailed calculation procedure is provided below.

Step 1.

The data we need in this paper is collected through some public resources such as WIND database.

Step 2.

The data we get may be dirty, so data preprocessing is necessary. Then, the raw data are collected, cleaned, and organized.

Step 3.

The data are next converted to functional form. Through this step, the raw data for observation i are used to define a function fi that can be evaluated at all values of t over interval [x1,xp]. In order to do this, a basis must be specified. A nonparametric method is used to estimate fi(t) for t[x1,xp],i=1,,n.

Then, we express each function as a linear combination of basic functions and approximate each function by a finite number of basis functions φk. Consider (6)fi(t)k=1Kβi,kφk,i=1,,n. Some popular basis functions, such as polynomial basis functions, Bernstein polynomial basis functions, Fourier basis functions, and wavelet basis function and B-spline, are used to estimate the functions. B-spline is our first choice because of its goodness of fitting nonperiodic data in our study.

Step 4.

The function may also need to be registered or aligned in order to show some important features. Vertical amplitude variation and horizontal variation can be separated by this step. In our study, this step is not used due to our data characteristics.

Step 5.

Next, a variety of preliminary displays and summary statistics are developed. For example, first and second derivative curves estimated from these data using techniques discussed before are displayed and we can elude that some curves have larger variation, while other curves are with less impressed variation.

Step 6.

Then exploratory analyses such as FPCA can be carried out.

The first principal component can be found by solving (7)Vϕ1(t)=λ1ϕ1(t),ϕ1=1.

Step 7.

The kth functional principal component is a solution of (8)Vϕk(t)=λkϕk(t),ϕk=1

subject to (9)Lϕj(t)ϕk(t)=0,j<k.

Step 8.

Accumulative percentage of explained variance is calculated, and some discussion and economic explanation about the functional principal component are provided finally.

3. Application

We now represent the monthly rate of return of 50 stocks in Figure 1, which constitute the SSE50 index.

The monthly rate of return of 50 stocks.

As we can see, almost nothing can be seen in this form of plot. So, some work must be taken to study the data.

Then, the datasets are converted to functional form, which means functions that can be evaluated at all values of T over some interval. The 50 functions are displayed in Figure 2, with the estimated mean function in bold. There are features in this data too subtle to see in this type of plot.

The functional form of 50 stocks. Note: the black bold line is the mean function.

An impression is that some curves are high (with good investment return) and that some curves are low (with not so good investment return).

We therefore conclude that some of the variation from curve to curve can be explained at the level of certain derivatives. The fact that derivatives are of interest is further reason to think of the records as functions, rather than vectors of observations in discrete time.

Next, we will give the fitted curve of the 50 curves we have got. With the stock code 600019, we can see that the fitted result is pretty good as illustrated in Figure 3.

The functional form of the stock code 600019.

Figures 4 and 5 display the first and second derivative curves estimated from these data using techniques discussed before. We can elude that some curves have larger variation, while other curves are with less impressed variation.

The first derivative curves.

The second derivative curves.

Now, in Figure 6, we illustrate the variance-covariance structure of return rate. The peak point at the middle of the diagonal represents the largest variance in October.

The variance-covariance structure of return rate.

At last, the principal component functions are represented in Figures 711 as perturbations of the mean function by adding and subtracting a multiple of each principal component function.

The first principal component function.

The second principal component function.

The third principal component function.

The fourth principal component function.

The fifth principal component function.

Table 2 includes accumulative percentage of explained variance.

Accumulative percentage of explained variance.

PC The percentage of explained variance
PC1 30.22%
PC2 22.07%
PC3 19.91%
PC4 11.02%
PC5 7.54%

Total 90.76%

We can see that the first principal component function (Figure 7), which accounts for 30.22 percent of the variation, has always had an obvious positive effect on the mean function between February and April 2011. In fact, the concept of high-speed rail provoked the Chinese financial market vigorously during that period. Therefore, we can reasonably believe that the first principal component represents the speculation boom.

The second principal component function (Figure 8), which accounts for 22.07 percent of the variance, is seen to pick up the influence of tighten monetary policy to control the excessive price rises, especially the price of real estate.

With the stock price getting lower and lower, more and more investors believe the current price is worth taking risk, which forms some power of buying driving price briefly rebounded. The third principal component (Figure 9), which accounts for 19.91 percent of the variation, is believed to be the representative of the influence.

With the excess drop in price, a growing number of blue chips are underestimated and the investment value will promote a price return. The effect is summarized to the fourth principal component (Figure 10), which accounts for 11.02 percent of the variation.

The fifth principal component (Figure 11), which accounts for only 7.5 percent of the variation, having little effect on the mean function, will not be discussed in this paper.

4. Conclusion and Further Discussion

FPCA attempts to find the dominant modes of variation around an overall trend function and is thus a key technique in functional data analysis. As we described before, modern data analysis has benefited and will continue to benefit greatly from the development of functional data analysis. In this paper, we mainly illustrate the functional principal components analysis by the research on monthly return rate of stocks constituting Shanghai 50 index. We extracted the main variance factors over time by extracting principal component regarding the samples as random functions, which has strong theoretical and practical value. A functional feature of the proposed approach that distinguishes it from established methods for spot volatility analysis is that it is geared towards the analysis of observations drawn from all realizations of the volatility process, rather than observations from a single realization. As we described before, the first principal component function has always had an obvious positive effect on the mean function and thus could be summarized as the representation of the speculation boom. The second principal component function, on the other hand, having outstanding negative effect on the mean function between May to July 2011, is seen to pick up the influence of fiscal austerity to curb the fast rising prices.

In addition, the proposed FPCA method is easy to program and implement. By smoothing the underlying functions or curves, the principal components we need are extracted easily. The fast computational speed of our method makes it feasible to be applied in empirical studies with a large number of observations.

Besides the proposed method in this paper, several other methods, such as curves classification, nonparametric analysis, and functional depth analysis, can be utilized to analyze functional data. These methods will be considered in the next study. Moreover, in order to emphasize the interest of doing the functional approach and compare the corresponding results, we will treat the curves as high dimensional standard vectors in the future work.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Hall P. Müller H. Wang J. Properties of principal component methods for functional and longitudinal data analysis Annals of Statistics 2006 34 3 1493 1517 10.1214/009053606000000272 ZBL1113.62073 2-s2.0-33747153005 Ramsay J. O. Dalzell C. J. Some tools for functional data analysis (with discussion) Journal of the Royal Statistical Society B 1991 53 3 539 572 Ramsay J. O. Silverman B. W. Applied Functional Data Analysis: Methods and Case Studies 2002 New York, NY, USA Springer Ramsay J. O. Silverman B. W. Functional Data Analysis 2005 2nd New York, NY, USA Springer Ramsay J. O. Hooker G. Graves S. Functional Data Analysis with R and MATLAB 2009 New York, NY, USA Springer Shang H. L. A Survey of Functional Principal Component Analysis (Working Paper06/11) 2011 Department of Econometrics and Business Statistics, Monash University Karhunen K. Zur Spektraltheorie stochastischer Prozesse 1946 1946 34 7 MR0023012 ZBL0063.03144 Lo{\`e}ve M. Fonctions aleatoires a decomposition orthogonal exponentielle La Revue Scientique 1946 84 159 162 MR0017892 Rao C. R. Some statistical methods for comparison of growth curves Biometrics 1958 14 1 1 17 Dauxois J. Pousse A. Les analyses factorielles en calcul des probabilities et en statistique: essai d'etude synthetique [Ph.D. thesis] 1976 Toulouse, France l'Universite Paul-Sabatier de Toulouse Dauxois J. Pousse A. Romain Y. Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference Journal of Multivariate Analysis 1982 12 1 136 154 10.1016/0047-259X(82)90088-4 2-s2.0-0000957849 Hall P. Hosseini-Nasab M. On properties of functional principal components analysis Journal of the Royal Statistical Society B: Statistical Methodology 2006 68 1 109 126 10.1111/j.1467-9868.2005.00535.x 2-s2.0-33645039219 Hall P. Hosseini-Nasab M. Theory for high-order bounds in functional principal components analysis Mathematical Proceedings of the Cambridge Philosophical Society 2009 146 1 225 256 10.1017/S0305004108001850 ZBL1153.62050 2-s2.0-57549119141 Yao F. Müller H. Wang J. Functional data analysis for sparse longitudinal data Journal of the American Statistical Association 2005 100 470 577 590 10.1198/016214504000001745 ZBL1117.62451 2-s2.0-19744375466 Hall P. Vial C. Assessing the finite dimensionality of functional data Journal of the Royal Statistical Society B 2006 68 4 689 705 10.1111/j.1467-9868.2006.00562.x 2-s2.0-33746255372 Di C. Crainiceanu C. M. Caffo B. S. Punjabi N. M. Multilevel functional principal component analysis The Annals of Applied Statistics 2009 3 1 458 488 10.1214/08-AOAS206 2-s2.0-79960273349 Hyndman R. J. Shang H. L. Rainbow plots, bagplots, and boxplots for functional data Journal of Computational and Graphical Statistics 2010 19 1 29 45 10.1198/jcgs.2009.08158 2-s2.0-77749302275 Locantore N. Marron J. S. Simpson D. G. Tripoli N. Zhang J. T. Cohen K. L. Robust principal component analysis for functional data Test 1999 8 1 1 73 Wang S. Jank W. Shmueli G. Explaining and forecasting online auction prices and their dynamics using functional data analysis Journal of Business & Economic Statistics 2008 26 2 144 160 10.1198/073500106000000477 2-s2.0-41649089971 Chiou J. M. Li P. L. Functional clustering and identifying substructures of longitudinal data Journal of the Royal Statistical Society B: Statistical Methodology 2007 69 4 679 699 10.1111/j.1467-9868.2007.00605.x 2-s2.0-34547837451 Kneip A. Utikal K. J. Inference for density families using functional principal component analysis Journal of the American Statistical Association 2001 96 454 519 532 10.1198/016214501753168235 2-s2.0-1542678851 Cont F. Fonseca J. The dynamics of implied volatility surfaces Quantitative Finance 2002 2 1 45 60 Yao F. Muller H. G. Wang J. L. Functional linear regression analysis for longitudinal data Annals of Statistics 2005 33 6 2873 2903 10.1214/009053605000000660 2-s2.0-19744369661 Yao F. Müller H. Functional quadratic regression Biometrika 2010 97 1 49 64 10.1093/biomet/asp069 2-s2.0-77249159873 Hyndman R. J. Shahid Ullah M. Robust forecasting of mortality and fertility rates: a functional data approach Computational Statistics and Data Analysis 2007 51 10 4942 4956 10.1016/j.csda.2006.07.028 2-s2.0-33750436765 Ramsay J. O. Ramsey J. B. Functional data analysis of the dynamics of the monthly index of nondurable goods production Journal of Econometrics 2002 107 1-2 327 344 10.1016/S0304-4076(01)00127-0 ZBL1051.62118 2-s2.0-0346724434 Muller H. G. Ulrich M. S. Functional data analysis for volatility Journal of Economics Literature Classification Codes: C14, C51, C52, G12, G17, 2011 Miao H. Potential applications of function data analysis in high-frequency financial research Journal of Business & Financial Affairs 2013 2 1, article e125 10.4172/2167-0234.1000e125