Incremental Tensor Principal Component Analysis for Handwritten Digit Recognition

To overcome the shortcomings of traditional dimensionality reduction algorithms, incremental tensor principal component analysis (ITPCA) based on updated-SVD technique algorithm is proposed in this paper.This paper proves the relationship between PCA, 2DPCA, MPCA, and the graph embedding framework theoretically and derives the incremental learning procedure to add single sample and multiple samples in detail. The experiments on handwritten digit recognition have demonstrated that ITPCA has achieved better recognition performance than that of vector-based principal component analysis (PCA), incremental principal component analysis (IPCA), and multilinear principal component analysis (MPCA) algorithms. At the same time, ITPCA also has lower time and space complexity.


Introduction
Pattern recognition and computer vision require processing a large amount of multi-dimensional data, such as image and video data.Until now, a large number of dimensionality reduction algorithms have been investigated.These algorithms project the whole data into a low-dimensional space and construct new features by analyzing the statistical relationship hidden in the data set.The new features often give good information or hints about the data's intrinsic structure.As a classical dimensionality reduction algorithm, principal component analysis has been applied in various applications widely.
Traditional dimensionality reduction algorithms generally transform each multi-dimensional data into a vector by concatenating rows, which is called Vectorization.Such kind of the vectorization operation has largely increased the computational cost of data analysis and seriously destroys the intrinsic tensor structure of high-order data.Consequently, tensor dimensionality reduction algorithms are developed based on tensor algebra [1][2][3][4][5][6][7][8][9][10].Reference [10] has summarized existing multilinear subspace learning algorithms for tensor data.Reference [11] has generalized principal component analysis into tensor space and presented multilinear principal component analysis (MPCA).Reference [12] has proposed the graph embedding framework to unify all dimensionality reduction algorithms.
Furthermore, traditional dimensionality reduction algorithms generally employ off-line learning to deal with new added samples, which aggravates the computational cost.
To address this problem, on-line learning algorithms are proposed [13,14].In particular, reference [15] has developed incremental principal component analysis (IPCA) based on updated-SVD technique.But most on-line learning algorithms focus on vector-based methods, only a limited number of works study incremental learning in tensor space [16][17][18].
To improve the incremental learning in tensor space, this paper presents incremental tensor principal component analysis (ITPCA) based on updated-SVD technique combining tensor representation with incremental learning.

Mathematical Problems in Engineering
This paper proves the relationship between PCA, 2DPCA, MPCA, and the graph embedding framework theoretically and derives the incremental learning procedure to add single sample and multiple samples in detail.The experiments on handwritten digit recognition have demonstrated that ITPCA has achieved better performance than vector-based incremental principal component analysis (IPCA) and multilinear principal component analysis (MPCA) algorithms.At the same time, ITPCA also has lower time and space complexity than MPCA.

Tensor Principal Component Analysis
In this section, we will employ tensor representation to express high-dimensional image data.Consequently, a highdimensional image dataset can be expressed as a tensor dataset  = { 1 , . . .,   }, where   ∈ R  1 ×⋅⋅⋅×  is an  dimensional tensor and  is the number of samples in the dataset.Based on the representation, the following definitions are introduced.Definition 1.For tensor dataset , the mean tensor is defined as follows: Definition 2. The unfolding matrix of the mean tensor along the th dimension is called the mode- mean matrix and is defined as follows: Definition 3.For tensor dataset , the total scatter tensor is defined as follows: where ‖‖ is the norm of the tensor.
Definition 4. For tensor dataset , the mode- total scatter matrix is defined as follows: where  () is the mode- mean matrix and  ()  is the mode- unfolding matrix of tensor   .
According to the above analysis, it is easy to derive the following theorems.
Theorem 5 (see [11]).For the order of tensor data  = 1, that is, for the first-order tensor, the objective function of MPCA is equal to that of PCA.
Proof.For the first-order tensor,   ∈ R ×1 is a vector, then (6) is So MPCA for first-order tensor is equal to vector-based PCA.
Theorem 6 (see [11]).For the order of tensor data  = 2, that is, for the second-order tensor, the objective function of MPCA is equal to that of 2DsPCA.
Proof.For the second-order tensor,   ∈ R  1 × 2 is a matrix; it is needed to solve two projective matrices  (1) and  (2) , then (5) becomes The above equation exactly is the objective function of B2DPCA (bidirectional 2DPCA) [20][21][22].Letting  (2) = , the projective matrix  (1) is solved.In this case, the objective function is Then the above equation is simplified into the objective function of row 2DPCA [23,24].Similarly, letting  (1) = , the projective matrix  (2) is solved; the objective function is Then the above equation is simplified into the objective function of column 2DPCA [23,24].
Although vector-based and 2DPCA can be respected as the special cases of MPCA, MPCA and 2DPCA employ different techniques to solve the projective matrices.2DPCA carries out PCA to row data and column data, respectively, and MPCA employs an iterative solution to compute  projective matrices.If it is supposed that the projective matrices { (1) , . . .,  (−1) ,  (+1) , . . . () } are known, then  () is solved.Equation ( 6) can be expressed as follows: where (1) .Because Based on the Kronecker product, we can get the following: So Since  () ∈ R   ×  is an orthogonal matrix,  ()  ()  = ,  = 1, . . ., ,  ̸ = , and  (−)  (−)  = .If the dimensions of projective matrices do not change in iterative procedure, then The above equation is equal to B2DPCA.Because MPCA updates projective matrices during iterative procedure, it has achieved better performance than 2DPCA.
Theorem 7. MPCA can be unified into the graph embedding framework [12].
Proof.Based on the basic knowledge of tensor algebra, we can get the following: Letting   = vec(  ),  = vec(), we can get the following: where the similarity matrix  ∈ R × ; for any , , we have   = 1/.So ( 16) can be written as follows: So the theorem is proved.

Incremental Tensor Principal Component Analysis
The covariance tensor of initial samples is The mode- covariance matrix of initial samples is When the new sample is added, the mean tensor is The mode- covariance matrix is expressed as follows: where the first item of ( 23) is The second item of ( 23) is Consequently, the mode- covariance matrix is updated as follows: Therefore, when a new sample is added, the projective matrices are solved according to the eigen decomposition on (26).

Incremental Learning
Its mode- covariance matrix is The first item in (28) is written as follows: Putting (30) into (29), then (29) becomes as follows: The second item in (28) is written as follows: where Then (32) becomes as follows: Putting ( 31) and ( 34) into (28), then we can get the following: It is worthy to note that when new samples are available, it has no need to recompute the mode- covariance matrix of all training samples.We just have to solve the mode- covariance matrix of new added samples and the difference between original training samples and new added samples.However, like traditional incremental PCA, eigen decomposition on  () has been repeated once new samples are added.It is certain that the repeated eigen decomposition on  () will cause heavy computational cost, which is called "the eigen decomposition updating problem." For traditional vectorbased incremental learning algorithm, the updated-SVD technique is proposed in [25] to fit the eigen decomposition.This paper will introduce the updated-SVD technique into tensor-based incremental learning algorithm.
For original samples, the mode- covariance matrix is where According to the eigen decomposition  ()  old = svd(Σ  ), we can get the following: So it is easy to derive that the eigen-vector of  () old is the left singular vector of  ()  old and the eigen-values correspond to the extraction of left singular values of  ()  old .For new samples, the mode- covariance matrix is where According to (35), the updated mode- covariance matrix is defined as follows: where Therefore, the updated projective matrix  () is the eigen-vectors corresponding to the largest   eigen-values of  () .The main steps of incremental tensor principal component analysis are listed as follows: input: original samples and new added samples, output:  projective matrices.
Step 1. Computing and saving eig ( ()  old Step 2. For  = 1 : Processing QR decomposition for the following equation: Processing SVD decomposition for the following equation: Computing the following equation: Then the updated projective matrix is computed as follows: Step 3. Repeating the above steps until the incremental learning is finished.
Vector-based PCA converts all data into vector and constructs a data matrix  ∈ R × ,  =   .For vectorbased PCA, the main computational cost contains three parts: the computation of the covariance matrix, the eigen decomposition of the covariance matrix, and the computation of low-dimensional features.The time complexity to compute covariance matrix is ( 2 ), the time complexity of the eigen decomposition is ( 3 ), and the time complexity to compute low-dimensional features is ( 2 +  3 ).
Letting the iterative number be 1, the time complexity to computing the mode- covariance matrix for MPCA is ( +1 ), the time complexity of eigen decomposition is ( 3 ), and the time complexity to compute lowdimensional features is ( +1 ), so the total time complexity is ( +1 +  3 ).Considering the time complexity, MPCA is superior to PCA.
For ITPCA, it is assumed that  incremental datasets are added; MPCA has to recompute mode- covariance matrix and conducts eigen decomposition for initial dataset and incremental dataset.The more the training samples are, the higher time complexity they have.If updated-SVD is used, we only need to compute QR decomposition and SVD decomposition.The time complexity of QR decomposition is ( +1 ).The time complexity of the rank- decomposition of the matrix with the size of ( + ) × ( +  −1 ) is

Experiments
In this section, the handwritten digit recognition experiments on the USPS image dataset are conducted to evaluate the performance of incremental tensor principal component analysis.The USPS handwritten digit dataset has 9298 images from zero to nine shown in Figure 1.For each image, the size is 16 × 16.In this paper, we choose 1000 images and divide them into initial training samples, new added samples, and test samples.Furthermore, the nearest neighbor classifier is employed to classify the low-dimensional features.The recognition results are compared with PCA [26], IPCA [15], and MPCA [11].
At first, we choose 70 samples belonging to four classes from initial training samples.Firstly, 36 PCs are preserved and fed into the nearest neighbor classifier to obtain the recognition results.The results are plotted in Figure 2. It can be seen that MPCA and ITPCA are better than PCA and IPCA for initial learning; the probable reason is that MPCA and ITPCA employ tensor representation to preserve the structure information.
The recognition results under different learning stages are shown in Figures 3, 4, and 5.It can be seen that the recognition results of these four methods always fluctuate violently  when the numbers of low-dimensional features are small.However, with the increment of the feature number, the recognition performance keeps stable.Generally MPCA and ITPCA are superior to PCA and IPCA.Although ITPCA have comparative performance at first two learning, ITPCA begin to surmount MPCA after the third learning.Figure 6 has given the best recognition percents of different methods.We can get the same conclusion as shown in Figures 3, 4, and  5.
The time and space complexity of different methods are shown in Figures 7 and 8, respectively.Taking the time complexity into account, it can be found that at the stage of initial learning, PCA has the lowest time complexity.With  the increment of new samples, the time complexity of PCA and MPCA grows greatly and the time complexity of IPCA and ITPCA becomes stable.ITPCA has slower increment than MPCA.The reason is that ITPCA introduces incremental learning based on the updated-SVD technique and avoids decomposing the mode- covariance matrix of original samples again.Considering the space complexity, it is easy to find that ITPCA has the lowest space complexity among four compared methods.

Conclusion
This paper presents incremental tensor principal component analysis based on updated-SVD technique to take full advantage of redundancy of the space structure information and online learning.Furthermore, this paper proves that PCA and 2DPCA are the special cases of MPCA and all of them can be unified into the graph embedding framework.This paper also analyzes incremental learning based on single sample and multiple samples in detail.The experiments on handwritten digit recognition have demonstrated that principal component analysis based on tensor representation is superior to tensor principal component analysis based on vector representation.Although at the stage of initial learning, MPCA has better recognition performance than ITPCA, the learning capability of ITPCA becomes well gradually and exceeds MPCA.Moreover, even if new samples are added, the time and space complexity of ITPCA still keep slower increment.

Figure 1 :
Figure 1: The samples in USPS dataset.
For each time of incremental learning, 70 samples which belong to the other two classes are added.So after three times, the class labels of the training samples are ten and there are 70 samples in each class.The resting samples of original training samples are considered as testing dataset.All algorithms are implemented in MATLAB 2010 on an Intel (R) Core (TM) i5-3210 M CPU @ 2.5 GHz with 4 G RAM.

Figure 2 :
Figure 2: The recognition results for 36 PCs of the initial learning.

Figure 3 :
Figure 3: The recognition results of different methods of the first incremental learning.

Figure 4 :
Figure 4: The recognition results of different methods of the second incremental learning.

Figure 5 :
Figure 5: The recognition results of different methods of the third incremental learning.

Figure 6 :
Figure 6: The comparison of recognition performance of different methods.

Figure 7 :
Figure 7: The comparison of time complexity of different methods.

Figure 8 :
Figure 8: The comparison of space complexity of different methods.