Tensorial Kernel Principal Component Analysis for Action Recognition

We propose the Tensorial Kernel Principal Component Analysis (TKPCA) for dimensionality reduction and feature extraction from tensor objects, which extends the conventional Principal Component Analysis (PCA) in two perspectives: working directly with multidimensional data (tensors) in their native state and generalizing an existing linear technique to its nonlinear version by applying the kernel trick. Ourmethod aims to remedy the shortcomings ofmultilinear subspace learning (tensorial PCA) developed recently in modelling the nonlinear manifold of tensor objects and brings together the desirable properties of kernel methods and tensor decompositions for significant performance gain when the data are multidimensional and nonlinear dependencies do exist. Our approach begins by formulating TKPCA as an optimization problem.Then, we develop a kernel function based on Grassmann Manifold that can directly take tensorial representation as parameters instead of traditional vectorized representation. Furthermore, a TKPCA-based tensor object recognition is also proposed for application of the action recognition. Experiments with real action datasets show that the proposedmethod is insensitive to both noise and occlusion and performswell comparedwith state-of-the-art algorithms.


Introduction
Recent years have witnessed a dramatic increase in the quantity of multidimensional data which are so large and complex that it becomes difficult to process them using traditional data processing applications.Hence, there is a growing need for the development and application of feature extraction and dimensionality reduction to analyze multidimensional data.
Tensor provides a natural and efficient way to describe such multidimensional data.The entries of tensor are addressed by more than two indices.The number of indices defines the order of the tensor, and each index defines one of the so-called "modes." In reality, there are a lot of data that can be formed by tensor.For example, two order tensors include gray level images in computer vision and pattern recognition [1][2][3][4], multichannel EEG signals in biomedical engineering [5][6][7].Three-order tensors include diffusion tensor imaging (DTI) in brain research [8], hyperspectral cube in remote sensing [9], silhouette sequences in gait analysis [10], and gray video sequences in action recognition [11,12].There are also multidimensional signals that can be formed in more than three-order tensor in the application of color video sequences surveillance, social network analysis [13], and so forth.Figure 1 shows two examples of 3-order tensor for a silhouette sequence and a reconstructed fiber tracts of human brain measured by DTI.The hypothesis behind DTI is that the bundles of fiber tracts make the water diffuse asymmetrically and DTI derives tract directional information from 3-order tensors describing this anisotropy.
Principal Component Analysis (PCA) [14] is one of the most important techniques in the class of unsupervised learning algorithms which linearly transforms a number of possibly correlated variables into uncorrelated features called principal components (PCs).The transformation is performed to find directions of maximal variation.Normally, only a few principal components can account for most of the variation in the original dataset.However, PCA is not suitable to discover nonlinear relationships among the original variables.To overcome this limitation, Schölkopf et al. [15] originally proposed Kernel Principal Component Analysis (KPCA) [15] which performs PCA in a Reproducing Kernel Hilbert Space (RKHS) rather than in the input space.In principle, the Mathematical Problems in Engineering kernel methods nonlinearly map a set of training samples to a higher dimensional RKHS where conventional linear PCA is performed, with the resulting subspaces being nonlinear with regards to the original input space.In practice, the mapping is performed implicitly via the kernel trick [15], where an appropriately chosen kernel function is used to evaluate dot products of mapped input space vectors without having to explicitly carry out the mapping.As a classical dimensionality reduction method, PCA and KPCA have been widely used in extracting feature from tensor objects.However, before feeding tensorial data to PCA or KPCA, the tensors have to be typically transformed into long vectors by concatenating the tensor entry-wise.This will present several problems.Firstly, the integral structure of tensor is disintegrated; therefore, the information correlated with surrounding entries could be lost.Secondly, the vectorized representation lies in a very high dimensional space, which will bring the Curse of Dimensionality dilemma [16].Thirdly, only sparse data are available in many application areas such as web document classification, face recognition, and disease classification based on gene expression profiling; consequently, the small sample size (SSS) problem [17,18] is inevitable there.
Due to the challenges above, recently, interests have grown in multilinear subspace learning (hereinafter referred to as tensorial PCA) that reduces dimensionality of multidimensional data directly from their tensorial representations.Initiated by the pioneer work of Yang et al. [3], a two dimensional PCA (2DPCA) algorithm is proposed.This algorithm solves for a linear transformation that projects an image to a low dimensional matrix while maximizing the variance measure.It works directly on image matrices, but there is only one linear transformation in the 2mode.Thus, the image data are projected in the 2-mode (the row mode) only, while the projection in the 1-mode (the column mode) is ignored.A more general algorithm named the generalized PCA (GPCA) is introduced in [1], which takes into account the spatial correlation between the image pixels in neighborhood and applies double linear transforms to both the left and right sides of input image matrices.However, it is formulated for matrices only.Later, the work of multilinear PCA (MPCA) [19] generalizes GPCA to work for tensors of any order, where the objective is to find a core tensor (see Section 2.2) that captures most of the original tensorial input variations.In [20], two robust MPCA (RMPCA) algorithms are proposed, where iterative algorithms are derived on the basis of Lagrange multipliers to deal with sample outliers and intrasample outliers.In [21], the nonnegative MPCA (NMPCA) extends MPCA to constrain the projection matrices to be nonnegative.NMPCA preserves the nonnegativity of the original tensor samples that is important when the underlying data have physical or psychological interpretation.RMPCA and NMPCA can be considered as an extension of MPCA.Although the above algorithms exploit tensorial structure for subspace learning, they are formulated on the multilinear projection of tensor to tensor only.Moreover, there exists a different projection scheme that projects tensor to vector.The tensor rank-one decomposition (TROD) algorithm introduced in [22] is one example.This algorithm looks for a second-order projection that projects an image to a low dimensional vector while minimizing a least square (reconstruction) error measure.Nonetheless, the input data are not centered before learning and the work is formulated only for matrices.Later, an uncorrelated MPCA (UMPCA) algorithm is proposed in [23] and adopted in [24], which extracts uncorrelated multilinear features through tensor to vector projection while capturing most of the variation in the original data input.
Although tensorial PCA methods achieve better performance than naive PCA, there remain several shortcomings in them.Firstly, the nonconvex optimality criterion and the suboptimal iterative solver, used by tensorial PCA, do not have any guarantee of global optimality of the solution found.Secondly, the objective of most tensorial PCA algorithms is to find the most expressive core tensor for each input tensor.Therefore, the disadvantage arising in this way is that more storage is required to represent the core tensor compared with the scalar representation used in PCA or KPCA.Last but most  1 gives connections and differences with other PCA techniques.To the best of our knowledge, this is the first study that addresses the TKPCA problem.
Our approach begins by formulating TKPCA as an optimization problem.Unlike traditional PCA, the Covariance Matrix cannot be formed directly from tensorial data.We thus derive the TKPCA in a support vector machine (SVM) fashion, which leads to a convex optimization and fits into the primal-dual framework [16].The primal problem can be solved by dual representation through kernel trick which is based on the Mercer theorem related to positive definite kernels [25].Subsequently, a kernel function with tensorial inputs (tensorial kernel) can be plugged into the dual solution, which takes the nonlinear structure of tensorial representation into account.Furthermore, we design a novel tensorial kernel based on Grassmann Manifold and the positive definiteness which is also proved.
The benefits of the TKPCA can be summarized as follows.
(i) Tensor representation of multidimensional data reduces the SSS problem and the Curse of Dimensionality phenomenon, facilitating a precise classification performance even for low number of training samples and complex data structure.
(ii) Kernel method remedies the shortcoming of tensorial PCA in modelling the nonlinear manifold of tensor objects.
(iii) TKPCA is equivalent to performing a standard KPCA except that the parameters of kernel function are in natural tensor representations, and in general, KPCA achieves higher compression rate (CR) than tensorial PCA.Therefore, TKPCA offers better CR performance.(iv) TKPCA is insensitive to environmental variations and more robust to noise.This is because the Grassmann based kernel function compares similarity between subspaces that are low dimensional approximation of original data and the approximation can "fill in" the missing data.Moreover, the kernel is derived with the geodesic distance on the Grassmann manifold other than the Euclidean distance.Therefore, the TKPCA is expected to capture the topological structure underlying tensor dataset.(v) TKPCA is a convex optimization problem which means that any local minimum must also be global.Therefore, TKPCA do not suffer from the issue of local minima as tensorial PCA.
The main contributions of this paper include the following.
(i) A new TKPCA is introduced for nonlinear dimensionality reduction and feature extraction from tensor object, by encoding the structured information embedded in the tensorial data into the kernels framework.(ii) A novel tensorial kernel function is proposed based on Grassmann kernel which can directly measure the similarity between tensorial inputs.Furthermore, the strict positive definiteness proof of proposed kernel function is given.(iii) A recognition system is developed for action recognition by selecting more discriminative features after TKPCA projection.
The rest of this paper is organized as follows.Section 2 introduces basic notations, kernel methods concepts, and the notion of multilinear projection for dimensionality reduction.In Section 3, the problem of TKPCA is formulated.Then, the detailed algorithm is summarized and discussed in detail.Moreover, a TKPCA-based tensor object recognition is proposed for application of action recognition.Section 4 lists experiments on action recognition and compares performance against state-of-the-art algorithm.We also assess the noise robustness and investigate sensitivity against occlusion and misalignment.Finally, Section 5 summarizes the major findings of this work.

Background and Notation
This section firstly reviews the notations and some basic multilinear operations that are necessary in defining our TKPCA.Then, a multilinear projection is introduced for dimensionality reduction and feature extraction from tensor object.We provide the conceptual foundations of kernel methods in the last part.

Notations and Basic Multilinear Algebra.
Following the notation conventions in multilinear algebra, pattern recognition, and adaptive learning literature [26][27][28][29], vectors The mode- matricization of A X () The mode- matricization of th tensor sample vec(A) The vectorization of A U ()  The th projection matrix U ()

X
The orthonormal base matrix of X () K The Gram or Kernel Matrix The number of training samples

𝑁
The order of a tensor object, the number of indices/modes

𝑃
The -mode dimensionality in the projected space or the number of dominant eigenvectors

𝑄
The number of most discriminative components of  (⋅) The map from vector to RKHS (⋅) The map from tensor to HSF 1-mode vectors 1-mode matricization are denoted by lowercase boldface letters, for example, x; matrices by uppercase boldface, for example, X; and tensors by calligraphic letters, for example, X. Their elements are denoted with indices in parentheses.Indices are denoted by lowercase letters, spanning the range from 1 to the uppercase letter of the index, for example,  = 1, 2, 3, . . ., .In addressing part of a vector/matrix/tensor, ":" denotes the full range of the respective index, and  1 :  2 denotes indices ranging from  1 to  2 .In this paper, only real-valued data are considered.Table 2 summarizes the important symbols used in this paper for quick reference.An th-order tensor is denoted as A ∈ R  1 × 2 ×⋅⋅⋅×  , which is addressed by  indices   ,  = 1, . . ., , with each   addressing the -mode of A.
The -mode vectors of A are defined as the   dimensional vectors obtained from A by varying its index   while keeping all the other indices fixed.The mode- matricization of A is denoted as where the column vectors of A () are the -mode vectors of A. Figure 2 illustrates the 1-mode (column mode) matricization of a third-order tensor.The -mode product of a tensor A by a matrix U ∈ R   ×  , denoted by A ×  U, is a tensor defined with entries: ( The two most commonly used tensor decompositions are Tucker and CANDECOMP/PARAFAC (CP).Both of which can be regarded as higher-order generalizations of the matrix Singular Value Decomposition (SVD).Let A ∈ R  1 × 2 ×⋅⋅⋅×  denote an th-order tensor; then, Tucker decomposition is defined as follows: where S ∈ R  1 × 2 ×⋅⋅⋅  , with   <   denotes the core tensor and U () ∈ R   ×  .When all {U () }  =1 are orthonormal and the core tensor is all orthogonal, this model is called High Order Singular Value Decomposition (HOSVD) [30]; see Figure 3.When all factor matrices have the same number of components and the core tensor is superdiagonal, Tucker model is simplified to CP model.In general, CP model is considered to be a multilinear low rank approximation, while Tucker model is regarded as a multilinear subspace approximation.
The distance between tensors A and B can be measured by the Frobenius norm [31], dist(A, B) = ‖A − B‖  .Although this is a tensor-based measure, it is equivalent to a distance measure of corresponding vector representations.Let vec(A) be the vector representation (vectorization) of A; then, dist(A, B) = ‖ vec(A) − vec(B)‖ 2 .This implies that the distance between two tensors equals to the Euclidean distance between their vectorized representations.

Multilinear Principal Component
Analysis.An thorder tensor X resides in the tensor (multilinear R   are the  vector (linear) spaces.For typical image and video tensor objects, although the corresponding tensor space is of high dimensionality, tensor objects typically are embedded in a lower dimensional tensor subspace (or manifold), in analogy to the (vectorized) face image embedding problem where vector image inputs reside in a low-dimensional subspace of the original input space [15].Thus, it is possible to find a tensor subspace that captures most of the variation in the input tensor objects, and it can be used to extract features for recognition and classification applications.To achieve this objective, let us assume that a set of  tensor objects {X}  =1 is available for training.Each tensor object where   is the mode dimension of the tensor.The objective of Multilinear Principal Component Analysis of Tensors (MPCA) [19] is to find a multilinear transformation captures most of the variations observed in the original tensor objects, assuming that these variations are measured by the total tensor scatter: , where X is the empirical mean.In other words, the MPCA objective is the determination of the  projection matrices that maximize the total tensor scatter.
However, there is no known optimal solution which allows for the simultaneous optimization of the  projection matrices.Instead of global optimization, [19] propose a suboptimal iterative solution.[25] have gained considerable popularity during the last few decades, providing attractive solutions to a variety of problems.The strategy adopted is to embed the data into a space where the patterns can be discovered as linear relations.This will be done in a modular fashion: the first module that performs a nonlinear mapping into RKHS or feature space implicitly through a kernel function and the second module that is a specific learning algorithm in a dual form designed to discover linear relations in the feature space.The basic assumption is that the obtained feature space reflects nonlinear structure of input data.Hence, the only information that is required is the similarity measure in the feature space, which leads us to avoid explicitly having to know the nonlinear mapping function.Instead, the similarity measure of two data points in the feature space, that is, an inner product, should be appropriately defined by a reproducing kernel formulated in the input space, which is called a kernel trick.

Kernel Methods. Kernel methods
The main ingredients of kernel methods are elucidated through kernel PCA, given a set of centered observations {x  }  =1 ∈ R  independent and identically distributed (i.i.d.) according to the generator (x).PCA optimally chooses a subspace that captures most of the variance of the data.The first principal component is defined as   = w  x  , where the weight w can be estimated as the leading eigenvector of sample covariance matrix C = (1/) ∑  =1 x  x   , satisfying w = Cw which implies that w can be also expressed as a linear combination of the training samples, that is, w = ∑  =1   x  .Thus, the dual representation of PCA is  = K with K(, ) = ⟨x  , x  ⟩, referred to Kernel Matrix that consist of inner products between all pairs of training samples.After estimation of  by diagonalizing K, the first principal component of test sample x  is obtained by   = w  x  = ∑  =1   ⟨x  , x  ⟩.Note that, for the dual representation of PCA, all information from training samples is given by the Kernel Matrix K.This matrix acts as an information bottleneck, as all the information is available to a kernel algorithm.
let us consider an embedding or map  : R  → , where  refer to feature space which could have an arbitrarily large dimensionality.The pairwise inner products in feature space can be computed efficiently directly from the original data items using a kernel function (x  , x  ) = ⟨(x  ), (x  )⟩.Hence, the Kernel Matrix K can be computed without explicit knowledge of (⋅).Finally, the first principle component of test sample embedded into feature space (x  ) is computed by

Kernel Principal Component Analysis of Tensor Objects
In this section, we propose a novel unsupervised learning method, called Tensorial Kernel Principal Component Analysis (TKPCA), for nonlinear dimensionality reduction and feature extraction from tensor objects.Unlike conventional PCA, there is no closed-form formula for Covariance Matrix of tensorial data.Therefore, our approach begins by formulating TKPCA as an optimization problem.Then, we develop a kernel function that can directly take tensorial data as parameters other than vectorial ones.Moreover, the detailed algorithm is summarized and discussed.A TKPCA-based tensor object recognition is also proposed for application of action recognition.

TKPCA as an Optimization
Problem.As we have seen in Section 2.3, the KPCA is classically derived by constructing the Covariance Matrix explicitly.However, in statistics and probability theory, the Covariance Matrix is a matrix of covariance between elements of a random vector.This means that, before feeding tensors into Covariance Matrix, we have to transform them into vectors firstly, which conflicts with our purpose of this paper.To solve this difficulty, we derive TKPCA as an optimization problem.In this way, the explicit construction of Covariance Matrix is bypassed.Note that there is a number of other ways to derive the PCA [14], and a Generalized Covariance Matrix (GCM) [32] concept also provides an alternative solution from other perspective.
Given is a set of centered tensorial observations , according to the generator (X) and a nonlinear mapping  : R  1 × 2 ×⋅⋅⋅×  → , where  refer to a space of multilinear functions corresponding the infinite dimensional tensors which could have an arbitrarily large dimensionality (see Section 3.2), to the objective is to optimally choose a subspace that captures most of the variance from tensorial samples.The starting point is to define projection onto weight  as Recall that, while least squares support vector machine classifiers (LS-SVM) have a natural link with kernel Fisher Mathematical Problems in Engineering discriminant analysis (minimizing the within class scatter around targets +1 and −1), for TKPCA, we can take the interpretation of a one-class modeling problem with zero target value around which one maximizes the variance.Let us now reformulate the TKPCA problem as follows: where zero is considered as a single target value.For Kernel Fisher discriminant analysis one aims at minimizing the within scatter around the targets, while for TKPCA analysis one is interested in finding the direction(s) for which the variance is maximal.This interpretation leads to the following primal optimization problem max where  ∈ R + .Equation ( 6) maximizes the empirical variance of   around value 0 while keeping the norm of the corresponding parameter w small by the regularization term −(1/2)w  w. one can also include a bias term; see [33].
The Lagrangian corresponding to ( 6) is with conditions for optimality given by By eliminating the primal variables w and , one obtains (1/)()−∑  =1 ()(X  )  (X  ) = 0, for  = 1, . . ., .This is an eigenvalue decomposition that can be present in matrix formulation as where  = 1/,  ≡ [ 1 , . . .,   ]  , and K is the centered Kernel Matrix defined entry-wise by where X  , X  ∈ {X  }  =1 .The optimal solution to the formulated problem is obtained by selecting the eigenvectors corresponding to the first  largest eigenvalues, where  is a slight abuse of the notation which, however, simplifies the description.For test sample X  , the first projection becomes where  ≡ [ 1 , . . .,   ]  is the first eigenvector of Kernel Matrix (9).For computing the kernel functions in (10) and (11), we present a tensorial kernel function in next section, which can directly take tensorial data as parameters other than vectorial ones.

RKHS Induced by Multilinear
Functions.Kernel should be constructed from input space in a way that the highdimensional feature space implicated by kernel function reflects the underlying structure of data in original input space.Although a number of kernels have been designed for tensorial objects, few approaches exploit the underlying structure of tensorial space.Recently, Signoretto et al. [34] generalized RKHS to adapt to multilinear functions, which allows a reproducing kernel to exploit algebraic geometry of tensorial space.In principle, the idea of Signoretto et al. is to propose a tensorial kernel that can directly take tensorial data as parameters other than vectorial ones.After that, the tensorial kernel is plugged into prime-dual framework to learn the structural information embodied in the tensors.
A bounded (continuous) multilinear function on RKHSs denoted by  : H 1 ×H 2 ×⋅ ⋅ ⋅×H  → R is said to be Hilbert-Schmidt if it satisfies some constraints.The ensemble of such well behaved Hilbert-Schmidt functions equipped with the inner product ⟨,   ⟩ forms a Hilbert Space denoted by HSF, which is a space of multilinear functions corresponding the infinite dimensional tensors.Using boldface  denoting the map from tensor object to multilinear function space, we have  : R  1 × 2 ×⋅⋅⋅×  → HSF and define  : According to the theory of [34], for N-order tensors X  , X  , a kernel function, exploiting structural properties possessed by the given tensorial representations, can be stated as  product kernel: where (X  , Y  ) denotes tensorial kernel, (X () , X () ) is th factor kernel of tensorial kernel, and X () and X () is the mode- matricization of X  and X  , respectively.Equation (13) implies that the similarity measure induced by the kernel function between two tensor objects can be represented as product of factor kernels which measure similarity between mode- matricization of two tensors.

Factor Kernel on Grassmann
Manifold.The factor kernel represents a similarity measure between two matrices obtained by mode- matricization of two tensors.In [34], Signoretto et al. adopt Chordal distance as metric to measure such similarity that lead to an ad hoc approach to obtain tensorial kernel.This inconsistency can cause complications and weak guarantees.In our approach, the factor kernel is build from Grassmann kernel by a number of simple operations, resulting in a simpler and better-understood formulation.Note that our factor kernel differs from the result of Signoretto et al.
The fixed dimensional linear subspaces form a non-Euclidean and curved Riemannian manifold known as Grassmann manifold, allowing the subspaces to be represented as points on it.In TKPCA, such low dimensional subspaces is used to approximate the mode- matricization of tensors.The benefits of using subspaces are two-fold: (a) comparing two subspaces is cheaper than comparing two  matricizations of tensor directly when them are very large, for example, too many frames per video, and (b) it is more robust to noise since the subspace can "fill in" the missing pictures.
Given a   × ( 1 × ⋅ ⋅ ⋅ ×  −1 ×  +1 × ⋅ ⋅ ⋅ ×   ) mode- matricizations X () of rank , we can represent it as a subspace (and hence as a point on a Grassmann manifold) through any orthogonalisation procedure like SVD.More specifically, let X () = U ()  X D () X V ()  X , where the   ×  orthonormal matrix U ()  X represents an optimised subspace of order  (in the mean square sense) for X () and can be seen as a point on Grassmann manifold (,   ), which is the set of -dimensional linear subspaces of the R   .The Riemannian distance between two subspaces is the length of the shortest geodesic connecting the two points on the Grassmann manifold.Among many different distances, a few of them can be induced to form a positive definite kernel, and the Projection metric is the one.
The Projection metric can be understood by associating a point span span(U () X ) ∈ (,   ) with its projection matrix U ()  X U ()  X by an embedding: The image () is the set of rank  orthogonal projection matrices.This map is in fact an isometric embedding [35], and the projection metric is simply a Euclidean distance in R   ×  .The corresponding innerproduct of the space is tr[(U () , and therefore, the projection kernel is a Grassmann kernel.Motivated by classic Gaussian kernel (, ) = exp(−1/ 2 2 ‖ − ‖ 2  2 ), we propose a novel factor kernel based on projection kernel (15), and by Theorem 1, it is provable positive definiteness as required by Mercer's Theorem [36].

Theorem 1. Let adjustable parameter 𝜎 ∈ R + ; the function that exploit metric on Grassmann manifolds
is positive definite kernel function.
Proof.We first verify that the Projection kernel ( 15) is positive definite kernel function.
The positive definiteness of Projection kernel follows from the properties of the Frobenius norm.For all U () X  , U () X  ∈ G, and   ,   ∈ R, we have Next, we use Projection kernel as a footstone to build the more complex factor kernel.
The exponential function can be arbitrarily closely approximated by polynomials with positive coefficients and hence is a limit of kernels.Since the positive definiteness property is closed under taking pointwise limits, is a positive definite kernel function for  ∈ R + .Assuming kernel (18) corresponds to a feature map (⋅), normalising this kernel corresponds to the feature map Hence, we can express the normalised kernel k in terms of (X () , X () ) as follows: where the k is a valid kernel because it was derived from the feature map ( 19).
training samples per class is small, the situation considered in this paper, experimental analysis indicates that PCA outperforms LDA [41,42].

TKPCA-Based Tensor Object
Recognition.After a projection by TKPCA, a new feature vector is obtained for each tensor object.The classification tasks on tensor objects are reduced to classification tasks in vector spaces.More precisely, for any query tensor object X, the projection y ∈ R  on  most dominant eigenvectors is obtained.Similarly, the gallery set containing data samples with labels is also represented by vectors.Then, any classification methods can be employed to label query.However, as we have seen above, TKPCA maximizes not only the within-class variation but also the between-class variation.This is due to the fact that the TKPCA works as an unsupervised technique without considering the class label.To overcome this limitation, a feature selection strategy is proposed to select eigenvectors for a more discriminative subspace.The strategy works according to the criterion that is based on the maximization of the following ratio [43]: where  is the number of classes,  is the number of samples in the gallery set,   is the number of samples for class , and   is the class label for the th gallery sample X  .y is the feature vector of X  in the projected nonlinear subspace.The mean feature vector y = (1/) ∑  y  , and class mean feature vector y  = (1/  ) ∑ ,  = y  .For the eigenvector selection, only the first  most discriminative components of y  are kept for classification, with  determined empirically or cross-validated.Upon the extraction of the proper set of features, a classifier such as Nearest Neighbor Classifier, Bayesian Classifier, Neural Network, and Support Vector Machine can be applied to recognize the objects.Here, we use a Nearest Neighbor Classifier for classification.The distance between two arbitrary feature vectors is defined by where the norm denotes the Euclidean distance between the two feature vectors.Such a simple classifier is selected to study the performance mainly contributed by the TKPCA-based feature extraction algorithm although better classifiers can be investigated.

Experiments
This section illustrates the efficacy of TKPCA in tensor object recognition, by applying it to the emerging application of Action Recognition [44] and comparing its performance against state-of-the-art algorithm.We also assess the noise robustness of the proposed approach and investigate sensitivity against occlusion and misalignment.
The action recognition is the process of labeling videos containing human motion with action labels.We will test our method on two action datasets: the KTH human motion dataset [45] and the Ballet dataset [46].The proposed TKPCA-Based Tensor Object Recognition in Section 3.4 treats each action video as a 3rd-order tensor sample with the spatial row space, column space, and the time space accounting for the 3 modes.The whole dataset will be a 4thorder tensor, with the addition of the sample space.

KTH Dataset.
The KTH human motion data set [45] contains six types of human actions walking, jogging, running, boxing, hand waving, and hand clapping performed several times by 25 subjects in four scenarios: outdoors, outdoors with scale variation, outdoors with different clothes, and indoors.See Figure 4 for sample frames.We first run an automatic pre processing step to track and stabilise the video sequences so that all of the figures appear in the center of the    Handw.
Walking Jogging Running Boxing Handc.Handw.  8.50 2.50 2.50 0.00 3.00 5.00 0.00 5.50 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 87.88 4.55 0.00 6.06 0.00 1.52  field of view.All videos were resized to 32 × 32 × 32.In order to have a standard length of 32 frames per video, the middle 32 frames were used.As this paper focuses on kernelizing Principal Component Analysis for tensor objects, the convectional Kernel Principal Component Analysis using a Gaussian kernel performed on vectorization of tensor vec(X) should be compared with our new approach TKPCA first.However, for each action video, the dimensionality of vectorized tensor is up to 32768 (32 3 ) which prevents KPCA from computing CPs efficiently.Thus, a reduced-order model [47] is adopted in KPCA.The kernel parameter  is optimized by methods of cross-validation for both TKPCA and KPCA.The test samples are projected onto the feature subspace to obtain the discriminative features as shown in Figure 5. Observe that TKPCA outperforms KPCA with respect to the discriminative ability, and six classes are well separated even in two dimensional space.
In order to test the TKPCA's ability to capture nonlinear structure of input data, we conduct another experiment to compare TKPCA with the multilinear PCA (MPCA) [19].Figure 6 illustrates the confusion matrices of TKPCA and MPCA.The confusion matrix is a specific table layout to see if the recognizer is confusing two classes in which rows correspond to the ground truth, and columns correspond to the classification results.It can be seen that TKPCA achieves average accuracy of 98%, while MPCA achieves 84%, and the confusion of TKPCA only appears among boxing, hand clapping, and hand waving, which is consistent with our intuition that these actions are easily confused.The superiority of TKPCA over MPCA indicates that nonlinear structures of video volumes captured by tensorial kernel significantly improve the discriminative performance.
Next, the proposed TKPCA algorithms are compared against the state-of-the-art action recognition algorithms.We compared TKPCA against spatial-temporal words (STW) [48] and bag of words model in conjunction with multiple kernel learning (BoW-MKL) [49,50].In STW, a video sequence is represented by a set of spatial-temporal words, extracted from space-time interest points.The algorithm then utilises latent topic models such as the probabilistic latent semantic analysis [51] to learn the probability distributions of the spatial-temporal words.BoW-MKL exploits global spatial-temporal distribution of interest points by extracting holistic features from clouds of interest points accumulated over multiple temporal scales.Then, extracted features are fused using MKL.We also compared TKPCA against Tensor Canonical Correlation Analysis (TCCA) [11] and Discriminative Canonical Correlation Analysis (DCCA) [11].TCCA is an extension of canonical correlation analysis (a principled tool to inspect linear relations between two sets of vectors) to tensor spaces and measures video-to-video tensors in a way similar to our method.DCCA implements a linear discriminant function that maximizes the canonical correlations of within-class sets and minimizes the canonical correlations of between-class sets.To facilitate comparison with prior work, we followed the leave-one-out (LOO) cross validation protocol used in STW [48] and TCCA [11].
Looking at the results in Table 3, the first thing to note is that no algorithm is universally the best.In terms of top classification rates, STW, BoW-MKL, and our method are, respectively, best for two of the six actions.However, when our method is better, it is typically by a larger amount, and this is reflected in the higher overall average classification rate of 98% versus 95% for STW and 90% for BoW-MKL.

Ballet Dataset.
The Ballet dataset contains 44 real video sequences of 8 actions collected from an instructional ballet DVD.The dataset consists of 8 complex motion patterns performed by three subjects.The actions include "left-toright hand opening", "right-to-left hand opening", "standing hand opening", "leg swinging", "jumping", "turning", "hopping", and "standing still".Figure 7 shows samples.This dataset has a uniform background and fair illumination and therefore minimises the effect of variations in illumination and background.Yet, at the same time, it is very challenging due to the significant within-class variations in terms of speed, spatial and temporal scale, clothing, and movement.Available samples of each action were randomly split into training and testing sets (the number of actions in both training and testing sets were fairly even).The process of random splitting was repeated ten times, and the average classification accuracy was record.All video sequences were uniformly resized to 50 × 50 × 50.In order to have a standard length of 50 frames per video sequences, the middle 50 frames were used.
For the sake of comparison between tensor based methods, TKPCA algorithm is contrasted with the Tensor as a point on a Product Manifold (TPM) [12] and Tensor Canonical Correlation Analysis (TCCA) [11].TPM maps a video tensor to a point on a product manifold and the geodesic distance on a product manifold, is computed for tensor classification.Table 4 shows that the TKPCA algorithm obtains the highest accuracy and outperforms state-of-theart tensor based methods of TPM and TCCA significantly.The confusion matrix of the proposed TKPCA method is shown in Figure 8.Our performance on this dataset is not as good as the previous ones, which might be because of the complexity of actions in this dataset and significant withinclass variations.

Sensitivity Analysis
4.3.1.Sensitivity to Noise.In addition to the above, to assert robustness to noise we add two types of noise, to the clean Ballet dataset.We compare TKPCA, TPM and TCCA in case of additive Gaussian noise and sparse noise spikes.The noise process in the case of additive Gaussian noise is (0,   ), added to th frame with   , the standard deviation of the frames and  ∈ [0, 1].In case of spike noise, we randomly add values drawn from the normal process, (0,   ), to randomly choose time points of each video sequences.The number of noisy time points is no more than 5% of the length of the time-series, spread uniformly over the full timespan.Figure 9 depicts TKPCA, TPM, and TCCA recognition accuracies in presence of different levels of noise.Figure 9 indicates that TKPCA outperforms others, the advantage of which is becoming obvious as the noise level grows.This could be due to our underlying kernel function builded on the Grassmann kernel which is more robust to noise since the subspace points on Grassmann Manifold can fill in the missing pictures.

Sensitivity to
Occlusion.An important aspect of the proposed approach relates to the sensitivity against occlusion.
We assess the performance at various levels of occlusion in Ballet dataset, from 1.56% up to 45%, by replacing a set of randomly located square blocks of size 4 × 4 in the query frames with a blank block.The location of occlusion is randomly chosen for each query frame and is unknown to the system.The training frames do not contain occlusions.Figure 10(a) shows the recognition rates of TKPCA, TPM, and TCCA.The proposed TKPCA method significantly outperforms the other two methods in almost all levels of occlusion.Up to 40 percent occlusion, the performance of TKPCA has dropped roughly by 20 percentage points.The proposed TKPCA method has better captured the nonlinear intrinsic geometry and is hence more robust to the missing parts.

Sensitivity to Misalignment.
The temporal and spatial misalignment could deteriorate the performance of an action recogniser drastically.In this part, we only consider spatial misalignment and assess and contrast the sensitivity of TKPCA algorithm as compared to TPM and TCCA on Ballet dataset.To this end, we have introduced random displacements to the frames of query videos and measured the accuracy for various amounts of displacements. Figure 10(a) shows the result.The horizontal axis here demonstrates the degree of misalignment.Figure 10(a) reveals that all studied algorithms are sensitive to misalignment.The larger the displacement, the lower would be the recognition accuracy.This is mainly due to the tensorial representation of video highly depending on the relationships that are fragile to misalignment.

Conclusion
In this paper, we present a new TKPCA algorithm for dimensionality reduction and feature extraction from tensor objects, such as 2D/3D images and video sequences.TKPCA determines a subspace of lower dimensionality that captures most of the nonlinear variation present in the original tensorial representation.A novel tensorial kernel, which can directly measure the similarity between tensorial inputs, is also proposed based on Grassmann kernel to capture the topological structure underlying tensor dataset.Furthermore, the strict positive definiteness proof of proposed kernel function is given.Experimental results show that the TKPCA remedies the shortcoming of tensorial PCA in modelling the nonlinear manifold of tensor objects and reduces the SSS and Curse of Dimensionality problem.Furthermore, it achieves more compression rate and is robust to both noise and occlusion.To the best of our knowledge, the problem of TKPCA has not been considered in the existing literatures.
Finally, there are still some aspects of TKPCA that deserve further study.For example, TKPCA is essentially batch optimization problem, with all training tensor data being available in advance.Such assumption is unsuitable for largescale data sets and thus unadapted for real-time applications [47].Therefore, for applications like video surveillance [52] or social networks analysis [53], an online scheme of TKPCA is expected.

Figure 1 :
Figure 1: Two examples of 3-order tensor: (a) a gait silhouette sequences with the column, row, and time mode; (b) a diffusion tensor imaging (DTI) scan of fiber tracts in human brain which derives tract directional information from 3-order tensors that describe anisotropic diffusion of water.

Figure 2 :
Figure 2: Visual illustration of the 1-mode unfolding of a third-order tensor.

Figure 5 :
Figure 5: Visualization of samples of KTH action recognition dataset in the first two dimensional subspace: (a) represents TKPCA using a tensor kernel; (b) represents KPCA using a Gaussian kernel performed on the vectorization of tensors.Observe that the features obtained by TKPCA are more discriminative than KPCA features.

Figure 6 :
Figure 6: Confusion matrix (in %) for the TKPCA (a) and MPCA (b) methods on the KTH action recognition dataset using LOO protocol.

Figure 7 :
Figure 7: Some example frames from an instructional ballet DVD.

Figure 9 :
Figure 9: Noise resilience analysis: (a) noisy (additive Gaussian) query and clean training set; (b) query and training are both noisy (additive Gaussian); (c) noisy (sparse spikes) query and clean training set; (d) query and training are both noisy (sparse spikes).

Table 1 :
Comparison between several PCA Algorithms.
1.1.TKPCA.Motivated by the above drawbacks, in this paper, we propose a Tensorial Kernel Principal Component Analysis (TKPCA) to extend the conventional PCA to its kernelized tensor counterpart.TKPCA aims to overcome the drawbacks in traditional PCA and MPCA and brings together the desirable properties of kernel methods and tensor decompositions (see Section 2.1) for significant performance gain when the data are multidimensional and nonlinear dependencies do exist.Table

Table 2 :
List of important notations.

Table 3 :
Recognition accuracy (in %) for the KTH action recognition dataset.

Table 4 :
Recognition accuracy (in %) along its standard deviation for the Ballet dataset.