MPE Mathematical Problems in Engineering 1563-5147 1024-123X Hindawi Publishing Corporation 10.1155/2014/429451 429451 Research Article EVD Dualdating Based Online Subspace Learning http://orcid.org/0000-0003-1028-5989 Jin Bo 1 Jing Zhongliang 1 http://orcid.org/0000-0003-4255-7854 Zhao Haitao 2 Liang Yan 1 School of Aeronautics and Astronautics, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240 China sjtu.edu.cn 2 School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237 China ecust.edu.cn 2014 2472014 2014 10 04 2014 25 06 2014 24 7 2014 2014 Copyright © 2014 Bo Jin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Conventional incremental PCA methods usually only discuss the situation of adding samples. In this paper, we consider two different cases: deleting samples and simultaneously adding and deleting samples. To avoid the NP-hard problem of downdating SVD without right singular vectors and specific position information, we choose to use EVD instead of SVD, which is used by most IPCA methods. First, we propose an EVD updating and downdating algorithm, called EVD dualdating, which permits simultaneous arbitrary adding and deleting operation, via transforming the EVD of the covariance matrix into a SVD updating problem plus an EVD of a small autocorrelation matrix. A comprehensive analysis is delivered to express the essence, expansibility, and computation complexity of EVD dualdating. A mathematical theorem proves that if the whole data matrix satisfies the low-rank-plus-shift structure, EVD dualdating is an optimal rank-k estimator under the sequential environment. A selection method based on eigenvalues is presented to determine the optimal rank k of the subspace. Then, we propose three incremental/decremental PCA methods: EVDD-IPCA, EVDD-DPCA, and EVDD-IDPCA, which are adaptive to the varying mean. Finally, plenty of comparative experiments demonstrate that EVDD-based methods outperform conventional incremental/decremental PCA methods in both efficiency and accuracy.

1. Introduction

Principal component analysis (PCA) , known as the subspace learning, or the Karhunen-Loeve transform , has been an active topic in machine learning and pattern recognition societies in the last several decades. As a well-known unsupervised linear dimension reduction and multivariate analysis method, PCA has been applied to biometric recognition , gene classification , latent semantic indexing , and visual tracking .

In order to obtain the optimal set of normal orthogonal basis, which endues PCA with the minimal reconstruction error, the batch-mode PCA can be achieved in two ways: the eigenvalue decomposition (EVD) of the data covariance matrix and the singular value decomposition (SVD) of the data matrix. Both approaches have a high computational cost and a mass demand of storage, in the case of a high-dimensional and large-scale dataset. In practical applications, not all the observations are available before training. Especially in online usage, samples arise sequentially along with time. In these situations, the batch-mode PCA does not satisfy the demand for real-time process due to its requirement to recompute the EVD or SVD of the whole data every time.

To solve this issue, incremental learning has been investigated for more than two decades in both applied mathematics and machine learning community, whose task is to update the learning results without reexecuting the whole process when adding new data points. Various effective incremental PCA (IPCA) methods have been proposed.

In a period of knowledge explosion, the fast growing information is usually adulterated with mock, invalid, or expired data. The presence of a few deviated samples might tremendously contaminate the solved model, such as principal directions in PCA. The overdue instances, which can be regarded as outliers compared to unexpired instances in some degree, could reduce the accuracy of data model. Therefore, for an intelligent learning system, the only function to admit new instances is not enough, but the capability to eliminate aberrant samples is also necessary. This is the aim of decremental learning. Comparing with IPCA, decremental PCA (DPCA) did not receive adequate attention in the literature. Only a few methods have been proposed in the last ten years. Besides, there is no incremental decremental algorithm of subspace learning to the best of our knowledge. Similar works are only about support vector machine (SVM) . Cauwenberghs and Poggio  propose an incremental decremental method of adding and/or deleting a single sample, and Karasuyama and Takeuchi  expand it to the case of multiple instances.

Because the essence of PCA is SVD or EVD in the mathematical form, the task of incremental PCA and decremental PCA is equivalent to updating and downdating SVD or EVD. In existing methods, most IPCA approaches adopt similar strategies via updating SVD. However, these tactics based on SVD may be impossible to be implemented for decremental PCA. Lorenzelli and Yao  point out that SVD downdating is NP-hard without knowing right singular vectors. Hall et al.  argue that right singular vectors of the remained matrix cannot be computed without visiting elements of the right singular vectors of the original matrix. Besides, in many practical applications, such as subspace learning [12, 13], image reconstruction , face recognition , and visual tracking , only left singular vectors are needed as the projection matrix, so right singular vectors are usually not stored to save memory. If the data matrix and the right singular vectors are not preserved, the position information of deleted points in the queue may be unknowable in the decremental case, which causes right singular vectors to be incomputable. The problem of incomplete position information does not arise in increment PCA, because it is a common sense that new instances are appended to the tail of queue.

Based on the demand on incremental decremental learning and the difficulty of decremental learning in the analysis above, we introduce a novel online subspace method for simultaneous incremental decremental learning. The contributions in this paper are as follows.

To avoid the problem of lacking right singular vectors in decremental learning, we utilize EVD instead of SVD and propose a dualdating algorithm for eigenspace, that is, EVD dualdating, which can accept and delete samples at the same time. Our algorithm transforms the EVD updating and downdating of the covariance matrix into a SVD updating problem plus an EVD of a small autocorrelation matrix. To the best of our knowledge, it is the first attempt of simultaneous incremental decremental subspace learning and has a simpler and unitized mathematical form, which theoretically guarantees a better performance than the conventional multiple-step implementation.

Several theoretical and computational analyses are presented to further explore the property of EVD dualdating, including the essence and geometric explanation of EVD dualdating, expansive forms of EVD dualdating for data revising and weighted updating, the computation complex of EVD dualdating, a mathematical theorem which demonstrates the optimality of EVD dualdating in the sequential mode if the data matrix satisfies low-rank-plus-shift structure, and a selection method of the optimal rank k based on eigenvalues.

It is proofed that the change of mean caused by adding or deleting samples in the varying-mean PCA can be transformed into adding and deleting several equivalent vectors in the zero-mean PCA. Thus, three online PCA algorithms are derived based on EVD dualdating to cope with changeable mean: incremental PCA (EVDD-IPCA), decremental PCA (EVDD-DPCA), and incremental decremental PCA (EVDD-IDPCA).

The remaining of this paper is organized as follows. Section 2 briefly reviews the updating and downdating methods of both SVD and EVD and incremental PCA. The proposed EVD downdating algorithm and its analyses are presented in Section 3. In Section 4, EVD dualdating is applied to incremental decremental PCA with mean updating. Section 5 presents the experiment results and comparisons with other approaches. Section 6 concludes this paper. In the end, proofs of lemmas and theorems are in the Appendix.

2. Related Work

Over the past few decades, many efficient incremental PCA methods have been proposed. Generally, existing IPCA algorithms can be divided into three categories. The first category updates eigenvectors without any matrix decomposition. The typical method is the candid covariance-free IPCA (CCIPCA) . The second category updates principal components via EVD updating. The subspace merging and splitting model, developed by Hall et al. , belong to this category. With the help of partition R-SVD  and SVD updating , the third category is the most studied which recomputes singular values and singular vectors via sequentially updating SVD.

Weng et al.  propose an incremental PCA method without computing the covariance matrix, the candid covariance-free IPCA(CCIPCA). The CCIPCA algorithm computes principal components sequentially and considers the complementary space of lower order PCs when calculating higher order PCs. Because the computation of the ( i + 1 ) th PC depends on the i th one, the error will be accumulated in the whole process. Besides, the sample-to-dimension ratio needs to be large enough to avoid some problems coming from the view of statistical estimation. This condition is not satisfied in many situations.

Hall et al.  develop a merging and splitting eigenspace model (MSES). This algorithm is an online subspace learning algorithm based on EVD, via solving a small eigenproblem on a new orthonormal basis. MSES is able to update or downdate EVD by adding or subtracting the eigenspace of added or deleted samples and adaptive to the change of data mean.

Except these two approaches above, other incremental PCA methods are based primarily on SVD. Levy and Lindenbaum  propose the sequential Karhunen-Loeve (SKL) method based on the partitioned R-SVD algorithm, which simplifies the SVD of a large data matrix into the SVD of some small ones via a sequential procedure. Then, this sequentialized partitioned R-SVD algorithm is utilized to extract PCs from a sequence of human face images. Besides, a forgetting factor is employed to weaken the affection of old data. However, in SKL, the mean of data is not taken into consideration, so the result is not accurate enough when confronting an image sequence with a variational mean, such as a human face under a changeable illumination. Sko c ˇ aj and Leonadris  develop the weighted and robust incremental subspace learning (WR-ISL) algorithm, which has the ability to deal with the change of mean and weighted data. However, WR-ISL does not consider the chunk updating mode; in other words, only one sample can be manipulated in each round. The mean updating for multiple samples is solved by Ross et al. , who demonstrate that the covariance matrix of the combined data is equal to the sum of covariance matrices of the old data, the new data, and an additional vector when taking the mean into account. According to this, Ross et al. obtain an extended SKL algorithm with mean updating, which is applied to visual tracking and successfully locates one human face and one toy with different poses under variational background and illumination, both indoor and outdoor.

Zha and Simon  propose a more generalized mathematical formula to update SVD, namely, SVD updating. This algorithm, which is applied to LSI, is an efficient incremental method to recalculate the rank- k SVD for updating documents, updating terms, and term weight corrections. Moreover, Zha and Simon prove that if the united data matrix satisfies the low-rank-plus-shift structure, the result of the SVD updating algorithm with the new data and the optimal rank- k approximate of old data is still an optimal rank- k estimation. Zhao et al.  propose a chunk incremental PCA approach via the SVD updating algorithm, known as SVDU-IPCA. Comparing to other incremental PCA methods, SVDU-IPCA computes the eigendecomposition of the autocorrelation matrix instead of the covariance matrix. The motivation is that usually the sample number is much smaller than the data dimension in practical applications, so the dimension of the autocorrelation matrix is also smaller than the covariance matrix. Then, Zhao et al. find a strategy to update the eigendecomposition of a autocorrelation matrix by SVD updating. However, the change of mean is not considered in SVDU-IPCA, so it is not suitable for the situation with changeable mean. Besides, it suffers from the problem of growing demand for storage and computation, because the size of autocorrelation matrix is dilating along with the new data, and an additional process is needed to transfer the resulting right singular vectors and kept whole data to principal components. Huang et al.  propose an improved SVDU-IPCA method to handle changeable mean data and decrease the storage, where only a small package of concentrated data is saved to calculate left singular vectors.

Although a great deal of research has been accomplished about incremental subspace learning, the research on decremental learning is still inadequate in the literature. The merging and splitting eigenspace model developed by Hall et al.  can downdate EVD to recompute PCs when deleting some samples from the old data. Meanwhile, they claim that it is impossible to achieve SVD downdating in a closed form with their model. Brand  proposes a fast modification model of rank- k singular value decomposition (MSVD). As an extension of the term weight corrections form of SVD updating, MSVD is able to recompute the rank- k SVD of the modified data matrix after updating, downdating, revising, and recentering terms. However, this method does not take the mean into consideration, so its result is not accurate when the data mean was time-varying. Melenchón and Martínez  develop a method for downdating, composing, and splitting SVD (DCSSVD) with a changeable mean. DCSSVD accomplishes these by downdating, composing, and splitting the right singular vectors firstly, then computing the mean and SVD of the remained right singular vectors, and finally calculating the resulting SVD. However, this method suffers from a severe efficiency problem, since the core process is the SVD of a d × k matrix, whose computation complexity is O ( d k min ( d , k ) ) , still depending on the data dimension. AIPCA, proposed by Wang et al. , is a decremental version of SVDU-IPCA algorithm which recomputes the eigendecomposition of the autocorrelation matrix by MSVD. Although AIPCA achieves decremental subspace learning, it inherits disadvantages of SVDU-IPCA and MSVD, such as incapability of handling changeable mean, a large memory to preserve the data matrix, and an additional process to transfer its results to left singular vectors.

Beside the accuracy and efficiency, the severest problem faced by SVD-based decremental methods is that it is a NP-hard problem without the position information of deleted samples in the data matrix, which might be not obtainable in many practical applications.

3. EVD Dualdating

In Section 3.1, we first briefly review the SVD updating  algorithm. Our EVD dualdating algorithm is proposed in Section 3.2. Then, the related analyses are reported in the following sections.

3.1. SVD Updating

Given a data matrix A = [ x 1 , x 2 , , x n ] R d × n , its SVD is A = U Σ V T , where U R d × d , Σ R d × n , and V R n × n . The best rank- k approximation of A is (1) A ^ = U k Σ k V k T , where Σ k is the diagonal matrix with the largest k singular values and U k and V k are the first k columns of U and V , respectively. We call (1) the rank- k singular value decomposition (rank- k SVD) of A .

When new samples B = [ x n + 1 , x n + 2 , , x n + m ] R d × m come, how to compute the rank- k SVD of the new data matrix A = [ A B ] by only using U k , Σ k , V k , and B ?

To solve this problem, Zha and Simon  propose an efficient mathematical tool, namely, SVD updating. Its detailed procedure is described in Algorithm 1.

<bold>Algorithm 1: </bold>SVD updating.

Input: The rank- k SVD of old data A , U k , Σ k , V k , new data B .

Output: The rank- k SVD of total data A , U ˇ k , Σ ˇ k , V ˇ k .

(1) Compute QR decomposition, ( I d - U k U k T ) B = QR ;

(2) Compute the rank- k SVD, [ Σ k U k T B 0 R ] = U ~ k Σ ~ k V ~ k T ;

(3) The rank- k SVD of [ A B ] is

A ^ = ( [ U k Q ] U ~ k ) Σ ~ k ( [ V k 0 0 I m ] V ~ k ) T .

3.2. EVD Dualdating

In this section, a thorough discussion of the proposed dualdating algorithm for EVD is presented. Dualdating means updating and downdating together; in other words, we consider the situation of adding and deleting samples simultaneously.

Given a data matrix A = [ x 1 , x 2 , , x n ] R d × n , its SVD is A = U Σ V . Let S A = A A T R d × d the covariance matrix (in this section, we do not distinguish the covariance matrix and the scatter matrix, since their difference is only the coefficient 1 / n ), then S A is a symmetric positive semidefinite matrix, and its EVD is S A = U Λ U T , where Λ = Σ 2 . The best rank- k approximation of S A is (2) S ^ A = U k Λ k U k T , where U k is the first k columns of U and Λ k is a diagonal matrix with the largest k eigenvalues in Λ . For any matrix A R d × n , we call (2) the rank- k eigenvalue decomposition (rank- k EVD) of A or S A .

Now some old samples D are to be deleted, where D can be composed of arbitrary p ( p < n ) columns in A . Without losing any generality, let D be the last p columns: D = [ x n + 1 - p , , x n ] R d × p . Meanwhile, new instances B are available: B = [ x n + 1 , , x n + m ] R d × m . We are interested in how to express the rank- k EVD of the final data matrix A = [ x 1 , , x n - p , x n + 1 , , x n + m ] R d × ( n - p + m ) as modifications to U k , Λ k via D and B .

3.2.1. Basic Procedure

The basic procedure of the proposed EVD dualdating algorithm is as follows. Let (3) Φ = [ I n I m - I p ] R ( n + m + p ) × ( n + m + p ) .

Thus, the covariance matrix of A can be written as (4) A A T = A A T + B B T - D D T = [ A B D ] Φ [ A B D ] T .

The basic idea of EVD dualdating is to transform the dualdating problem into a SVD updating problem plus an extra process with a small computation complexity. Firstly, consider the matrix [ A B D ] . Knowing U k , Σ k = Λ k 1 / 2 , we assume that right singular vectors in the rank- k SVD of A are V k . Then, the rank- k SVD of [ A B D ] can be calculated by the SVD updating algorithm: (5) [ A B D ] ^ = U ˇ k Σ ˇ k V ˇ k T , where U ˇ k R d × k , Σ ˇ k R k × k , and V ˇ k R ( n + m + p ) × k .

Take (5) into (4); we have (6) A A T = U ˇ k Σ ˇ k V ˇ k T Φ V ˇ k Σ ˇ k T U ˇ k T .

Let (7) Ψ = Σ ˇ k V ˇ k T Φ V ˇ k Σ ˇ k T R k × k .

Because A A T is a symmetric positive semidefinite matrix, Ψ is also symmetric positive semidefinite. Usually k d , so Ψ is a small matrix compared to the covariance matrix of A . The EVD of Ψ is (8) Ψ = P Λ P T , where P R k × k , Λ R k × k is the diagonal eigenvalue matrix.

Finally, the rank- k EVD of A A T is (9) A A T = ( U ˇ k P ) Λ ( U ˇ k P ) T .

By (3) to (9), we have successfully converted the dualdating problem of EVD into a SVD updating problem of adding m + p samples plus an EVD of a small k × k matrix.

3.2.2. Further Simplification

Although the basic procedure of our EVD dualdating algorithm is given, one problem still remains unsolved: the assumed right singular vectors V k is unobtainable. Here we address this problem by simplifying the computation of Ψ .

Consider the results of the SVD updating algorithm on the rank- k SVD of [ A B D ] : (10) U ˇ k = [ U k Q ] U ~ k , Σ ˇ k = Σ ~ k , V ˇ k = [ V k I m I p ] V ~ k , where U k R d × k ,   Q R d × ( m + p ) ,   U ~ k R ( k + m + p ) × k ,   U ˇ k R d × k , Σ ~ k R k × k ,   V k R n × k , V ~ k R ( k + m + p ) × k , and V ˇ k R ( n + m + p ) × k .

Take (10) into (7); we have (11) Ψ = Σ ~ k V ~ k T [ V k T V k I m - I p ] V ~ k Σ ~ k T = Σ ~ k V ~ k T [ I k I m - I p ] V ~ k Σ ~ k T .

From (11), it can be seen that the right singular vectors V k are actually not needed, and the computation of Ψ is simplified.

3.2.3. Algorithm

The detailed procedure of EVD dualdating has been presented above. To sum up, the pseudocode of our EVD dualdating algorithm is described in Algorithm 2. To achieve pure updating or downdating of EVD, it is only needed to let B or D be an empty set. From the computation progress, EVD dualdating is degenerated into the standard SVD updating in the updating mode.

<bold>Algorithm 2: </bold>EVD dualdating.

Input: The rank- k EVD of old data A R d × n , U k , Λ k , the deleted data D R d × p , the added data B R d × m .

Output: The rank- k EVD of the remained data A R d × ( n + m - p ) , U k , Λ k .

(1) Let equivalent adding data matrix B = [ B D ] .

(2) Compute QR decomposition, ( I d - U k U k T ) B = QR .

(3) Σ k = Λ k 1 / 2 , compute the rank- k SVD, [ Σ k U k T B 0 R ] = U ~ k Σ ~ k V ~ k T .

(4) Let Φ = [ I k I m - I p ] , calculate Ψ = Σ ~ k V ~ k T Φ V ~ k Σ ~ k .

(5) Compute the EVD of Ψ , Ψ = P Λ P T .

(6) The rank- k EVD of A is

U k = ( [ U k Q ] U ~ k ) P ,                      Λ k = Λ .

3.3. Analysis of EVD Dualdating

In this section, we first analyze the mechanism of EVD dualdating for incremental and decremental learning. Second, some extended forms of EVD dualdating are given for particular uses. Third, the computation complexity on the proposed EVD dualdating algorithm is presented. Fourth, the optimality of EVD dualdating in the sequential usage is demonstrated. Finally, we discuss how to determine the optimal rank k .

3.3.1. Mechanism of Incremental and Decremental Learning

For convenience, when analysing the essence of incremental and decremental learning based on EVD dualdating, we only consider the pure updating or downdating situation and denote the changed matrix as B R d × m in both situations.

According to the procedure of EVD dualdating, two key decompositions are the SVD updating of the equivalent adding matrix [ A B ] and the EVD of the small matrix Ψ . And in the SVD updating algorithm, the core step is the rank- k SVD of the following matrix: (12) H [ Σ k U k T B 0 R ] = U ~ k Σ ~ k V ~ k T , where Σ k is the diagonal variance matrix of the original data A , U k T B is the coefficient matrix by projecting B onto the subspace spanned by U k , and R is the upper triangular reconstruction error matrix of B . So a conclusion can be obtained that, in H , the left k columns represent the original data, and the remained m columns represent the added or deleted data.

Then, divide the columns of V ~ k into two partitions: (13) V ~ k = [ V ~ k 1 V ~ k 2 ] , where V ~ k 1 are the first k rows of V ~ k and V ~ k 2 are the last m rows of V ~ k . Thus, V ~ k 1 , V ~ k 2 stand for the old data and the changed data, respectively.

So the matrix Ψ can be written as (14) Ψ = Σ ~ k [ V ~ k 1 T V ~ k 2 T ] [ I k ± I m ] [ V ~ k 1 V ~ k 2 ] Σ ~ k T = Σ ~ k V ~ k 1 T V ~ k 1 Σ ~ k T ± Σ ~ k V ~ k 2 T V ~ k 2 Σ ~ k T .

Now, let us observe the situation from the view of geometry shown in Figure 1. On the left is the column space of data. The red arrows represent the orthogonal basis U k of the old subspace. The green arrow is the added or deleted samples B , whose projection on U k and reconstruction error are the green dashed arrows. After QR decomposition, a new basis in the extended subspace is made up by the red and pink arrows. However, the projection of the data matrix on this basis is not completely diagonalized. So, the SVD of the coefficient matrix H on this basis is executed to obtain the diagonalizing matrix. Then, the new orthogonal basis after adding B is represented by the blue arrows. At this time, the row space of H is drawn on the right of the figure. The black and pink arrows compose a standard orthogonal basis, where the black ones are the elements corresponding to old samples, and the green one is the elements corresponding to changed samples. The blue arrow represents the orthogonal vectors V ~ k Σ ~ k T in the row space of H after adding B via SVD updating. Because EVD dualdating adds samples B at first no matter whether the case is incremental or decremental, so it needs to make an adjustment in the row space of H . If it is deletion, the elements corresponding to changed samples are sign-changed. As shown in Figure 1, the component, marked by the cyan blue dashed arrow, is reversed. From (14), Ψ is in fact the sum or difference of autocorrelation matrices of the old data V ~ k 1 Σ ~ k T and the changed data V ~ k 2 Σ ~ k T . According to the relationship between the column and row space, the EVD of Ψ is utilized to acquire the new rotation matrix P in the data space. Finally, the resulting orthogonal basis U k is the orange arrows.

Visualization of EVD dualdating.

To sum up, the aim of EVD dualdating is to obtain the projection matrix caused by the change of sample set, and the essence of EVD dualdating is to transform the EVD of a varying covariance matrix in the data space to the EVD of a varying autocorrelation matrix in a dimension-reduced row space.

3.3.2. Extendibility of EVD Dualdating

From the deduction of EVD dualdating, it can be seen that nearly no restriction is imposed on B , D , and Φ . In the downdating mode, the procedure is still feasible even if D is not columns of A . Meanwhile, Φ can be selected as any matrix which matches the dimension. The only condition needed to be satisfied is that Ψ must be a positive semidefinite matrix. From another view, our EVD dualdating algorithm has a favorable extendibility.

The standard dualdating mode for EVD dualdating is adding and deleting samples synchronously. As we mentioned before, when B or D is an empty set, EVD dualdating can work at the pure incremental or decremental mode. When m = p , EVD dualdating can be seen as data revising. Another useful extension is forgetting updating, or the so-called weighted updating, which is very important for online applications. In the learning procedure, prior instances should be assigned a low weight since they become antiquated as time goes on. Without proper weighting mechanisms, the contribution of too many old similar samples can become too prominent that new instances seem meaningless. In , the forgetting factor is used to destrengthen the effect of old images of the tracked object. Via EVD dualdating, a concise but meticulous weighting formula can be acquired, in which the weight of an arbitrary sample can be modulated, similar as the way adopted in . The sole operation is modifying Φ as follows: (15) Φ = [ w 1 I k W ] , W = [ w 2 w m + 1 ] . The equivalent data matrix of weighted updating is [ w 1 A W B ] . The detailed expansions of EVD dualdating are listed in Table 1.

Expansions of EVD dualdating.

Old data New data Φ Miscellanea
Update A d × n [ A d × n B d × m ] [ I k I m ]

Downdate [ C d × ( n - p ) D d × p ] C d × ( n - p ) [ I k - I p ]

Dualdate [ C d × ( n - p ) D d × p ] [ C d × ( n - p ) B d × m ] [ I k I m - I p ]

Revise [ C d × ( n - p ) D d × p ] [ C d × ( n - p ) B d × p ] [ I k I p - I p ]

Weighted update A d × n [ w 1 A d × n W B d × m ] [ w 1 I k W ] W = diag ( w 2 , , w m + 1 )
3.3.3. Computation Complexity of EVD Dualdating

Before analyzing, we define some signs to simplify the representation: QR ( m , n ) and SVD ( m , n ) stand for the QR and SVD decomposition of a m × n rectangle matrix and QR ( m ) , SVD ( m ) , and EVD ( m ) stand for the QR, SVD, and EVD decomposition of a m × m square matrix. The computation of the proposed EVD dualdating algorithm is composed of four parts: QR of ( I d × d - U k U k T ) B , SVD of H , EVD of Ψ , and other multiplication operations including calculating reconstruction error, Ψ , and U k . Because the first k column of ( I d × d - U k U k T ) B is already orthogonal, its QR decomposition only operates the last m + p columns actually. The computation complexity is presented in Table 2, where the computation is presented into two parts: matrix decomposition and transformation cost.

Computation complexity of EVD dualdating and other updating/downdating methods.

Updating Downdating
Decomposition Transformation Decomposition Transformation
MSES SVD ( d , m ) + QR ( d , m ) + EVD ( k + m ) O ( d ( k + m ) 2 ) SVD ( d , p ) + EVD ( k ) O ( d k ( k + m ) )
DCSSVD QR ( n - p , k ) + SVD ( d , k ) O ( ( d + n ) k 2 )
MSVD QR ( d , m ) + QR ( n + m , m ) + SVD ( k + m ) O ( ( d + n + m ) ( k + m ) 2 ) QR ( d , p ) + QR ( n , p ) + SVD ( k + p ) O ( ( d + n ) ( k + p ) 2 )
AIPCA QR ( k + m , m ) + QR ( n + k , m ) + SVD ( k + m ) O ( ( d + k + m ) ( n + m ) ( k + m ) ) QR ( k , p ) + QR ( n , p ) + SVD ( k + p ) O ( d n ( k + p ) )

Dualdating
decomposition Transformation

EVDD QR ( d , m + p ) + SVD ( k + m + p ) + EVD ( k ) O ( d k ( k + m + p ) )

In the pure updating or downdating mode, there are two matrix decompositions in our EVD-Dualdating algorithm, one more than other pure updating and downdating methods. This may cause EVD dualdating slower than other methods. But taking the dimension and the transformation cost into account, the efficiency of EVD dualdating is close or even better, comparing to other methods. The main advantage of our algorithm can be reflected in the dualdating mode. As the only method achieving simultaneous updating and downdating, EVD dualdating can avoid many repeating processes and decrease the cumulative error. An experimental comparison of efficiency and accuracy on our EVD dualdating and other incremental and decremental methods is presented in Section 5.

3.4. Justification of the Sequential Usage of <inline-formula> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M281"> <mml:mrow> <mml:msub> <mml:mrow> <mml:mi>U</mml:mi></mml:mrow> <mml:mrow> <mml:mi>k</mml:mi></mml:mrow> </mml:msub></mml:mrow> </mml:math></inline-formula> and <inline-formula> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M282"> <mml:mrow> <mml:msub> <mml:mrow> <mml:mi mathvariant="normal">Λ</mml:mi></mml:mrow> <mml:mrow> <mml:mi>k</mml:mi></mml:mrow> </mml:msub></mml:mrow> </mml:math></inline-formula>

In many online applications, it is impossible to store the original data because of the limitation of the physical medium and the consideration about efficiency. Described in a mathematical form, this means that the original data matrix A is unobtainable and replaced by its best rank- k approximation which can be calculated by U k and Λ k . So, it is urgent to demonstrate the effectiveness of EVD dualdating in a sequential process.

Zha and Simon  proof that when the combined matrix satisfied the low-rank-plus-shift structure, SVD updating is optimal when A is replaced by its best rank- k approximation. Here, a theoretical demonstration is given to illustrate that if the whole data matrix satisfies the low-rank-plus-shift structure, the result of EVD dualdating after any adding or deleting operations is also an optimal rank- k estimation. First, we state Lemma 1 without proof.

Lemma 1.

Let A R d × n , d > n , and its EVD is S A = A A T = U A Λ A U A T = i = 1 n λ i u A i u A i T , where u A i and λ A i are the i th eigenvector and eigenvalue, respectively. Then for p > k , one has (16) best k ( S A ) = best k ( S A - i = p n λ A i u A i u A i T ) .

The lemma above indicates that for the rank- k EVD it is safe to cutoff the minor eigenspaces, without affecting the optimality. With this, we discuss under the low-rank-plus-shift structure, when A is replaced by best k ( A ) , the information discarded will also be discarded after EVD dualdating. The conclusion is summarized in Theorem 2, whose proof can be found in the Supplementary Material (available online at http://dx.doi.org/10.1155/2014/429451).

Theorem 2.

Given a matrix A R d × n , with its best - k approximation A ^ , the deleted data D R d × p from A , the added data B R d × m , d > n > p . Let C = [ A D _ ] be the remained data from A , F = [ C D B ] the full data, and E = [ C B ] the final data, where the underline means deletion. Let C ^ = [ best k ( A ) D _ ] be the remained matrix after deleting columns corresponding to D from A ’s best - k approximation, and let E ^ = [ C ^ B ] be the final data from A ^ . Assume that F satisfies the low-rank-plus-shift structure; that is, (17) F T F = X F + σ 2 I n , σ > 0 , where X F is symmetric and positive semidefinite with rank ( X F ) = k ; then (18) best k ( S E ) = best k ( S ^ E ) .

3.5. Criterion for the Optimal Rank <inline-formula> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M325"> <mml:mrow> <mml:mi>k</mml:mi></mml:mrow> </mml:math></inline-formula> Selection

In the deduction above, the rank of subspace is assumed to be a fixed number k ; however the optimal dimension of subspace depends on a priori information, which is possibly unknown in practical applications. Based on the fact that the bulk of variability of a given dataset can be captured by the top few eigenvectors, we introduce an eigenvalue-based method to determine the best rank k of subspace during the online learning procedure.

Supposing the truncation operation is not yet executed in steps 4 and 6 of Algorithm 2, the ranks of obtained U k and Λ k are k + m + p . First, we define the rate r i : (19) r i = j = 1 i λ j j = 1 k + m + p λ j , which indicates the proportion of the first i eigenvalues in all eigenvalues. Then, the best dimension k can be selected as the minimum number i exceeding some threshold: (20) k = min { i r i T r } , where the threshold T r is a value in ( 0,1 ) . In the batch mode, the threshold only depends on the proportion of information to be preserved. For EVD dualdating, because the estimation of rank k and truncation are performed in every round, the threshold is relative to the ratio of saved and new information. In practical implementation, it can be chosen according to the chunk size of added and deleted samples.

4. Incremental Decremental PCA Based on EVD Dualdating

In the deduction of EVD dualdating in Section 3.2, the mean of samples is not considered, but in practical applications, centralization is a necessary process to reduce the effect of environment. In this section, we first provide a brief review of PCA. Then, we take mean into account and propose three online subspace learning algorithms: EVDD-IPCA, EVDD-DPCA, and EVDD-IDPCA.

4.1. Principal Component Analysis

Principal component analysis (PCA) is one of the most popular multivariate analysis and dimension reduction methods. The goal of PCA is to find a set of normal orthogonal basis, so-called principal components, which has the best reconstruction performance in the sense of minimum mean squared error (MMSE).

Given a data matrix A = { x 1 , x 2 , , x n } R d × n , the covariance matrix of A is defined by S A = ( 1 / n ) i = 1 n ( x i - μ A ) ( x i - μ A ) T . Principal components (PCs) are the first k eigenvectors { u 1 , , u k } corresponding to the largest k eigenvalues { λ 1 , , λ k } . Let U = { u 1 , , u k } , Λ = diag ( λ 1 , , λ k ) ; then U and Λ can be achieved by the EVD of the covariance matrix, S A = U Λ U T . Another way of solving PCA is to compute the SVD of the centralized data matrix A - μ 1 n = U Σ V T , where 1 n stands for a 1 × n full- 1 row vector, each column of left singular vectors U is a principal component, and Σ = n Λ is the singular value matrix.

4.2. Incremental and Decremental PCA

When confronting a huge dataset with a high dimension, both batch-mode methods, no matter EVD or SVD, cost tremendous time and storage. Besides, for an online learning system, it has to face an awkward circumstance that not all the instances are available before training, or some expired instances need to be deleted after training. Obviously, these problems exceed the ability of the batch-mode PCA. The incremental and decremental PCA is a natural solution.

In this section, we consider EVD dualdating with a time-varying mean, and deduce the incremental decremental PCA formula based on EVD dualdating. As mentioned before, EVD dualdating degenerates into SVD updating without right singular vectors in the updating mode, so EVDD-IPCA is actually the same as the extended sequential KL algorithm. Nonetheless we still present it in this paper for integrity. The interested reader can find more details in the reference paper .

The key idea of EVDD-based incremental and decremental PCA algorithm is that centralizing the original samples, the added samples, and the deleted samples separately and utilizing some mean-revising vectors to keep the covariance matrix equal to the original one. The methods of determining these mean-revising vectors are introduced in Lemmas 3, 4, and 5. For incremental or decremental PCA, there is only one mean-revising vector, noted as the equivalent added vector x a or the equivalent deleted vector x d , respectively, which is direct ratio to the difference of the original mean and the changed sample mean. For increment decremental PCA, the situation is a little more complex. Because of the existence of intersecting items, there are three mean-revising vectors, two equivalent added vectors x a 1 , x a 2 , and one equivalent deleted vector x d . Based on these lemmas, the proposed EVDD-IPCA, EVDD-DPCA, and EVDD-IDPCA algorithms are presented in Algorithms 3, 4, and 5.

<bold>Algorithm 3: </bold>EVDD-base incremental PCA (EVDD-IPCA).

Input: The sample number, mean, eigenvectors, eigenvalues of old data, n , μ A , U k , Λ k , the sample number and mean of

new samples B , m , μ B .

Output: The sample number, mean, eigenvectors, eigenvalues of new data, n , μ , U k , Λ k .

(1) Update the sample number and mean, n = n + m , μ = ( n / ( n + m ) ) μ A + ( m / ( n + m ) ) μ B .

(2) Compute the extra added sample, x a = ( n + m ) / n m ( μ A - μ B ) .

(3) Equivalent added data matrix, B ^ = [ B - μ B 1 m , x a ] .

(4) Compute U k , Λ k , via EVD-Dualdating with U k , Λ k , and added data B ^ .

<bold>Algorithm 4: </bold>EVDD-base decremental PCA (EVDD-DPCA).

Input: The sample number, mean, eigenvectors, eigenvalues of old data, n , μ A , U k , Λ k , the sample number and mean of

deleted samples D , p , μ D .

Output: The sample number, mean, eigenvectors, eigenvalues of remained data, n , μ , Λ k , U k .

(1) Update the sample number and mean, n = n - p , μ = ( n / ( n - p ) ) μ A - ( p / ( n - p ) ) μ D .

(2) Compute the extra deleted sample, x d = n p / ( n - p ) ( μ A - μ D ) .

(3) Equivalent deleted data matrix, D ^ = [ D - μ D 1 p , x d ] .

(4) Compute U k , Λ k via EVD-Dualdating with U k , Λ k and deleted data D ^ .

<bold>Algorithm 5: </bold>EVDD-base incremental decremental PCA (EVDD-IDPCA).

Input: The sample number, mean, eigenvectors, eigenvalues of old data, n , μ A , U k , Λ k , the sample number and mean of

added samples B , m , μ B , the sample number and mean of deleted samples D , p , μ D .

Output: The sample number, mean, eigenvectors, eigenvalues of remained data, n , μ , U k , Λ k .

(1) Update the sample number and mean, n = n + m - p , μ = ( n μ A + m μ B - p μ D ) / ( n + m - p ) .

(2) Compute the extra sample, x a 1 = n ( μ A - μ ) , x a 2 = m ( μ B - μ ) , x d = p ( μ D - μ ) .

(3) Equivalent added data matrix, B ^ = [ B - μ B 1 m , x a 1 , x a 2 ] .

(4) Equivalent deleted data matrix, D ^ = [ D - μ D 1 p , x d ] .

(5) Compute U k , Λ k via EVD-Dualdating with U k , Λ k and added data B ^ , deleted data D ^ .

4.2.1. Incremental PCA Lemma 3.

Let A = { x 1 , x 2 , , x n } , B = { x n + 1 , x n + 2 , , x n + m } be two data matrices, and let their concatenation be E = [ A B ] . Denote the means and scatter matrices of A , B , and E as μ A , μ B , and μ E and S A , S B , and S E , respectively. This holds (21) S E = S A + S B + x a x a T , where x a = ( ( n + m ) / n m ) ( μ A - μ B ) .

4.2.2. Decremental PCA Lemma 4.

Let C = { x 1 , x 2 , , x n - p } , D = { x n - p + 1 , x n - p + 2 , , x n } be two data matrices, and let A = [ C D ] be their concatenation. Denote the means and scatter matrices of C , D , and A as μ C , μ D , and μ A and S C , S D , and S A , respectively. This holds (22) S C = S A - S D - x d x d T , where x d = ( n p / ( n - p ) ) ( μ A - μ D ) .

4.2.3. Incremental Decremental PCA Lemma 5.

Let C = { x 1 , x 2 , , x n - p } , D = { x n - p + 1 , x n - p + 2 , , x n } , and B = { x n + 1 , x n + 2 , , x n + m } be three data matrices, and let A = [ C D ] , E = [ C B ] . Denote the means and scatter matrices of A , D , B , and E as μ A , μ D , μ B , and μ E and S A , S D , S B , and S E , respectively. This holds (23) S E = S A + S B + x a 1 x a 1 T + x a 2 x a 2 T - S D - x d x d T , where x a 1 = n ( μ A - μ E ) , x a 2 = m ( μ B - μ E ) , and x d = p ( μ D - μ E ) .

Remark 6.

As an important approach of dimension reduction, PCA is utilized as the preprocessing method for many other machine learning methods, and the feature extraction method in other applications. Because these methods usually work in the subspace of PCA, there is a great demand to achieve simultaneous online incremental decremental subspace learning and data reconstruction. Artac et al.  propose a method to sequentially compute the coefficients of a sample in IPCA. Here, we introduce an incremental approach to update the projection coefficients of a data point after renewing the subspace via EVD-IDPCA, without storing the original data. For any sample x i , assuming the eigenvectors is U k and the mean is μ when it is added into the dataset, the reconstruction of x i is x ^ i = U k c i + μ , where c i = U T x i is the projection coefficients of x i on the basis U k . Then, at each round of EVD-IDPCA, the projection coefficients of x i can be updated by (24) c i = ( U ~ k ( 1 : k ) P ) T c i + U k T ( μ - μ ) , where U ~ k ( 1 : k ) is the first k rows of U ~ k . It is worth noticing that in (24) ( U ~ k ( 1 : k ) P ) T is a procedure variable in EVD dualdating, and U k T ( μ - μ ) only needs to be computed once for all samples, so the computational amount of updating c i is small, O ( k 2 ) , but the memory to store a data point is reduced from O ( d ) to O ( k ) .

5. Experiment

In this section, experiments of the proposed algorithms based on EVD dualdating are presented, compared with other classic methods. Because incremental PCA has been discussed a lot in the earlier literature and the proposed EVDD-IPCA is actually equivalent to the extended sequential KL algorithm, we do not verify IPCA methods in this paper any more. The interested reader can find the performance analysis and comparison in relative papers [12, 15, 16, 20]. In the following content, decremental PCA, incremental decremental PCA experiments on real-world datasets are firstly reported; then, an adaptive rank selection experiment of EVD-Dualdating on an artificial dataset is represented. All experiments are performed with Matlab, on a computer with dual-core 2.0 GHz CPU and 4 G RAM.

5.1. DPCA Experiment: Performance Evaluation on Real-World Data

In order to verify the performance and efficiency of the proposed EVDD-DPCA and EVDD-IDPCA, four datasets are used, including cases of both high dimension and huge size. The FERET  database is a standard dataset used for facial recognition system evaluation managed by DARPA and NIST. The AR  dataset is a popular face image dataset, where images are shot under different facial expressions, illumination conditions, and partial occlusions due to sun glasses and scarf. The Yale Face Database B (Yale B)  contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses × 64 illumination conditions). Subsets of AR, FERET, and Yalb B are employed in our simulation, which includes 952, 720, and 4050 cropped and centralized face images, respectively. The fourth database is the Columbia Object Image Library (COIL-100) , and it includes 7200 color images of 100 objects, which are taken at pose intervals of 5 degrees, corresponding to 72 poses per object. The detailed information of four datasets and our experiment settings are listed in Table 3.

Dataset and configuration for DPCA.

Data set Dimension Class Delete Sample Training Testing
FERET 92 × 112 120 40 6 4 2
AR 92 × 112 119 39 8 6 2
Yale B 25 × 30 90 30 45 30 15
COIL-100 25 × 25 100 40 72 42 30

To compare the performance of decremental learning, we implement the proposed EVDD-DPCA algorithm with the batch-mode PCA, MSES , MSVD , DCSSVD , and AIPCA . First, the whole data are learned via the batch-mode PCA; then, assuming some classes are expired data, their samples are deleted chunk by chunk. In our experiment, the number of expired classes is 40 for FERET, 39 for AR, 30 for Yale B, and 40 for COIL-100, and the chunk size is 10. Every experiment is repeated 20 times to reduce the disturbance from the process scheduling of operating system and randomized grouping. The performance is mainly evaluated by their efficiency, accuracy of eigenspace, and performance of face recognition:

execution time;

weighted angle between PCs of the batch-mode PCA and DPCA methods: (25) θ i w = λ i | θ i | j = 1 k λ j , where λ i is the i th eigenvalue of the batch mode and θ i is the angle between the i th PCs;

recognition rate.

5.1.1. Computational Efficiency by the Subspace Dimension <inline-formula> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M504"> <mml:mi>k</mml:mi> <mml:mo mathvariant="bold">(</mml:mo> <mml:mn>10</mml:mn> <mml:mo mathvariant="bold">-</mml:mo> <mml:mn>200</mml:mn> <mml:mo mathvariant="bold">)</mml:mo></mml:math> </inline-formula>

Recalling the analysis of computation complexity in Section 3.3, the practical computational efficiency depends on the dimension of the small matrix for decomposition and the cost of transformation. From Table 2, DCSSVD has a larger computation complexity of matrix decomposition, QR ( n - p , k ) and SVD ( d , k ) ; AIPCA has a larger computation complexity of transformation, O ( d n ( k + p ) ) . For MSES, MSVD and our EVDD, they have close computation complexities: MSES needs one additional SVD ( d , p ) to extract the eigenspace model of deleted samples before subtracting; MSVD has two QR decomposition of the residual matrices in both the row and column space, QR ( d , p ) and QR ( n , p ) , as well as a larger transformation cost O ( ( d + n ) ( k + p ) 2 ) ; EVDD has one additional EVD ( k ) to transform updating to downdating. Therefore, when the data dimension d is high or the size of dataset n is huge, that is, d , n k , p , DCSSVD and AIPCA achieve lower efficiencies, and MSES, MSVD, our EVDD achieve close higher efficiencies. This conclusion is also demonstrated by Figures 2(a), 2(b), 2(c), and 2(d), which show the execution time by kept PCs ( k : 10–200) of MSES, DCSSVD, MSVD, AIPCA, and EVDD-DPCA, on FERET, AR, Yale B, COIL-100. From these figures, we observe that our proposed EVDD-DPCA achieves a better or comparable efficiency.

CPU time of DPCA methods on different datasets, k : 10–200, chunk = 10.

FERET

AR

Yale B

COIL-100

5.1.2. PC Estimation Equality to Ground-Truth PCs

In order to evaluate the accuracy, the angles between the resulting PCs of DPCA methods and the batch-mode PCA can be adopted. But, we choose the weighted angles by their corresponding eigenvalue, which are more suitable for evaluation because they emphasize the importance of the leading PCs. Figures 3(a), 3(b), 3(c), and 3(d) show the weighted angles of the first 50 PCs of DPCA methods on four datasets, when the number of kept PCs is 100, and the chunk size is 10. Figures 4(a), 4(b), 4(c), and 4(d) show the weighted angles error of the first 50 PCs of DPCA methods on different datasets by the number of kept PCs ( k : 10–200), when the chunk size is 10.

The first 50 weighted angles of DPCA methods on different datasets, k = 100 , chunk = 10.

FERET

AR

Yale B

COIL-100

Error of the first 50 weighted angles of DPCA methods on different datasets, k : 10–200, chunk = 10.

FERET

AR

Yale B

COIL-100

From these figures, our proposed EVDD-DPCA algorithm performs the best accuracy of the eigenvector estimation. The accuracy of principal direction depends on the estimation of mean and the cut-off error. The error of mean will cause a bias of the origin for data centralizing, which may cause the direction of the resulting basis totally different in the worst situation. The cut-off error accumulates in the sequential process, so the more times the truncation happens, the lower accuracy the final result remains. The method to update the mean is the same in MSES and EVDD-DPCA, whose estimate is equal to the true mean. In DCSSVD, the new mean is updated via the right singular vectors V . However, V is cut off to the reduced dimension, so its estimation of mean is not accurate. But the inaccuracy of mean will not affect its computation of singular vectors, because the mean correction item is stripped off from V , and no data centralizing process is executed. So errors of the singular vectors in EVDD-DPCA, MSES and DCSSVD mainly come from the cut-off error. Before splitting, MSES calculates the EVD of the deleted data, whose result is cut off to the kept dimension. The step will bring in more cut-off error. DCSSVD directly deals with the right singular vectors V to achieve downdating, so it actually ignores the information of deleted samples reflected by high order PCs. In AIPCA and MSVD, the mean is not updated, so all the remained samples centralized with the old mean. Therefore, their results deviate far away from true PCs. In Figures 3(a), 3(b), 3(c), and 3(d), it can be seen that the weighted angle of our proposed EVDD-DPCA is much smaller than other methods, because of the accurate estimate of mean and the smaller cut-off error. MSES and MSVD have close performances, and AIPCA and MSVD have larger errors. The same conclusion can be obtained in Figures 4(a), 4(b), 4(c), and 4(d). The fluctuation at the beginning of these curves is because the dimension of observed PCs is increasing from 10 to 50.

5.1.3. Results of Recognition with Minimum Distance Classifier

In the recognition experiment, the resulting PCs are used as the projection matrix to project the testing image to the subspace, then minimum distance classifier (MDC) is utilized for recognition. The advantage of MDC in our online application is that only the mean of each class in the projection subspace needs to be saved. The distance between a sample x and a class Ω in MDC is defined by a Mahalanobis distance: (26) d ( x , Ω ) = [ ( x ~ - μ Ω ) T Λ k - 1 ( x ~ - μ Ω ) ] 1 / 2 , where x ~ is the projection vector in the subspace, μ Ω is the mean of the class Ω in the subspace, Λ k is the eigenvalue matrix estimated by EVD dualdating.

Figures 5(a), 5(b), 5(c), and 5(d) represent recognition rates of the full-data PCA, the batch-mode PCA, DPCA methods. The result shows that the full-data PCA has a lower recognition rate due to the existence of expired instances, and all DPCA methods have close recognition rates, nearly equal to the batch-mode PCA. The similar results are also obtained by Ozawa et al. . This phenomenon can be explained via random projection (RP) . According to Johnson-Lindenstrauss lemma , arbitrary set of N points in a high-dimensional Euclidean space can be mapped onto a k O ( log N / ϵ 2 ) ( 0 < ϵ < 1 ) dimension subspace where the distances between all pair of points are approximately preserved. So as long as k is large enough, for arbitrary k -dimensional random projection, the classification performance is mainly determined by MDC and the structure of data space itself. In our experiments, the smallest k is between the range [ 40,60 ] on FERET, AR, Yale B, and about 100 on COIL-100.

Recognition rate of DPCA methods on different datasets, k : 10–200, chunk = 10.

FERET

AR

Yale B

COIL-100

5.2. IDPCA Experiment: Performance Evaluation on Real-World Data

To compare the performance of incremental decremental subspace learning methods, we implement the proposed EVDD-IDPCA algorithm with the batch-mode PCA, MSES , MSVD , DCSSVD , and AIPCA . Because DCSSVD only accomplishes decremental PCA, we combine it with the extended SKL to achieve IDPCA. As a decremental version of SVDU-IPCA, AIPCA is connected with SVDU-IPCA to fulfill IDPCA in our experiment.

The datasets for IDPCA is the same as in the DPCA experiment and the configuration is shown in Table 4. In our experiment, samples of pretraining classes are learned by the batch-mode PCA, then at every round, a chunk of samples in expired classes are deleted, and meanwhile a chunk of samples in new classes are added. The chunk size is 10. The training/testing rate is the same as in the DPCA experiment. Execution time, weighted angle, and recognition rate are used to evaluate the performance of IDPCA methods.

Dataset and configuration for IDPCA.

Data set Dimension Class Delete Add Sample Training Testing
FERET 92 × 112 120 40 40 6 4 2
AR 92 × 112 119 39 39 8 6 2
Yale B 25 × 30 90 20 20 45 30 15
COIL-100 25 × 25 100 30 30 72 42 30
5.2.1. Computational Efficiency by the Subspace Dimension <inline-formula> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M541"> <mml:mi>k</mml:mi> <mml:mo mathvariant="bold">(</mml:mo> <mml:mn>10</mml:mn> <mml:mo mathvariant="bold">-</mml:mo> <mml:mn>200</mml:mn> <mml:mo mathvariant="bold">)</mml:mo></mml:math> </inline-formula>

Figures 6(a), 6(b), 6(c), and 6(d) present the runtime by the number of kept PCs ( k ) of IDPCA methods. Different from other IDPCA methods, which process incremental learning and decremental learning separately, our EVDD-IDPCA deals with deleted and added samples simultaneously, and avoids the repeating execution of pre-processing, post-processing and some matrix decompositions. Therefore, as shown in Table 2, via the dualdating scheme, EVD dualdating has a more concise form with less matrix decompositions and a lower transformation cost. So in our experiment, the proposed EVDD-IDPCA performs much higher efficiency than other methods, especially, when the scale of dataset is large.

CPU time of IDPCA methods on different datasets, k : 10–200, chunk = 10.

FERET

AR

Yale B

COIL-100

5.2.2. PC Estimation Equality to Ground-Truth PCs

Figures 7(a), 7(b), 7(c), and 7(d) show the weighted angles between the first 50 PCs of different IDPCA methods, when the number of kept PCs is 10, and the chunk size is 10. Figures 8(a), 8(b), 8(c), and 8(d) show the error norm of weighted angles between the first 50 PCs of IDPCA methods on different datasets by the number of kept PCs( k : 10–200), when the chunk size is 10. As the only real incremental decremental PCA method with an accurate mean estimation and a dualdating scheme, EVDD-IDPCA can obtain principal eigenvectors with fairly better approximation than other methods via avoiding redundant cut-off error. These figures show that the estimation of leading PCs in EVDD-IDPCA is significantly superior to opponents.

The first 50 weighted angles of IDPCA methods on different datasets, k = 100 , chunk = 10.

FERET

AR

Yale B

COIL-100

Error of the first 50 weighted angles of IDPCA methods on different datasets, k : 10–200, chunk = 10.

FERET

AR

Yale B

COIL-100

5.2.3. Results of Recognition with Minimum Distance Classifier

Figures 9(a), 9(b), 9(c), and 9(d) represent recognition rates of the full-data PCA, the batch-mode PCA, IDPCA methods. The result is similar as shown in DPCA experiments, that the recognition rate of full-data PCA is lower because of the existence of expired classes, and recognition rates of considered IDPCA methods are close, mainly depending on MDC and the structure of data space, when k is large enough to satisfy Johnson-Lindenstrauss lemma.

Recognition rate of IDPCA methods on different datasets, k : 10–200, chunk = 10.

FERET

AR

Yale B

COIL-100

Besides, one important advantage of EVDD-DPCA, not reflected by these DPCA and IDPCA experiments, is that the specific position information of deleted and added samples is not needed, which are necessary for DCSSVD, AIPCA, and MSVD.

5.3. Automatic Rank <inline-formula> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M549"> <mml:mrow> <mml:mi>k</mml:mi></mml:mrow> </mml:math></inline-formula> Selection and Weighted EVD Dualdating

In this experiment, the selection of the dimension k of subspace without any a priori is evaluated. An artificial dataset is used here, which includes data points generated from the following model: (27) x = A c + n , where A R d × d , c R d × 1 is a coefficient vector and n is a small noise, sampled from a normal distribution N ( 0 , σ I d ) . In the simulation, the data dimension is d = 100 , and the number of generated samples is 10000 . Then, samples are sequentially learned at different chunk sizes (5, 10, 20) by our EVD dualdating and weighted EVD dualdating algorithms. The weights are w 1 = 0.95 , and W = I in weighted EVD dualdating. In every round, the number of kept PCs is determined by (20), and the thresholds of preserved proportion are T r = 0.98,0.97,0.96 with respect to the chunk sizes 5, 10, and 20. Figure 10 shows the updating curves of kept rank k during the online learning process, where the solid lines stand for weighted EVD dualdating, and the dash lines stand for EVD dualdating. From this figure, kept ranks in all curves quickly rise from the chunk size to 50–60 at the beginning, which means new features have been added to the eigenspace. Then, ranks of the weighted EVD dualdating tend to a common stable value 53. It is worth noted that the red solid line with the smallest chunk size 5 has the fast convergence speed, and the blue solid one with the largest chunk size 20 converges slowest. For normal EVD dualdating, because the influence of leading PCs is not weakened, as the online learning progresses, it becomes unwelcome to new features, and later exclusive to minor PCs. Therefore, their kept ranks, reflected by dash lines, all have a quick decreasing trend. For example, the blue dash line (chunk = 20) ends with a rank less than 30, after all samples are learned.

Results of rank k selection of EVD dualdating and weighted EVD dualdating, chunk = 5, 10, 20.

6. Conclusion

This paper focuses on the problem of online incremental/decremental subspace learning and reports a novel dualdating algorithm of EVD, namely, EVD dualdating. Different from previous works, the proposed EVD dualdating algorithm can renew the EVD of a data matrix while adding and deleting samples simultaneously. With EVD dualdating, IPCA-EVDD, DPCA-EVDD, and IDPCA-EVDD are presented to handle the changeable mean, where the variation is equivalent to add and delete several additional vectors in the case of zero-mean PCA. Plenty of comparative experiments on both real-world and artificial databases demonstrate that our EVD dualdating algorithm has a significant better approximation accuracy and computational efficiency than other state-of-the-art incremental and decremental PCA methods.

Appendices A. Proof of Lemma <xref ref-type="statement" rid="lem3">4</xref>

By definition, (A.1) μ E = n μ A n - p - p μ D n - p , μ A - μ E = p n - p ( μ D - μ A ) , μ D - μ E = n n - p ( μ D - μ A ) .

And, the scatter matrix of E is (A.2) S E = i = 1 n ( x i - μ E ) ( x i - μ E ) T - i = n - p + 1 n ( x i - μ E ) ( x i - μ E ) T = S A + n ( μ A - μ E ) ( μ A - μ E ) T - S D - p ( μ D - μ E ) ( μ D - μ E ) T = S A - S D + n p 2 n - p ( μ D - μ A ) ( μ D - μ A ) T - n 2 p n - p ( μ D - μ A ) ( μ D - μ A ) T = S A - S D - n p n - p ( μ D - μ A ) ( μ D - μ A ) T .

B. Proof of Lemma <xref ref-type="statement" rid="lem4">5</xref>

By definition, (B.1) μ E = n μ A + m μ B - p μ D n + m - p .

And, the scatter matrix of C is (B.2) S C = i = 1 n ( x i - μ E ) ( x i - μ E ) T + i = n + 1 n + m ( x i - μ E ) ( x i - μ E ) T - i = n - p + 1 n ( x i - μ E ) ( x i - μ E ) T = S A + n ( μ A - μ E ) ( μ A - μ E ) T + S B + m ( μ B - μ E ) ( μ B - μ E ) T - S D - p ( μ D - μ E ) ( μ D - μ E ) T .

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grants nos. 61175028, 61375007), the Ph.D. Programs Foundation of Ministry of Education of China (Grants nos. 20090073110045), and the Shanghai Pujiang Program (Project no. 12PJ1402200).

Jolliffe I. T. Principal Component Analysis 2002 2nd New York, NY, USA Springer MR2036084 Fukunaga K. Introduction to Statistical Pattern Recognition 1990 Academic Press MR1075415 Turk M. Pentland A. Eigenfaces for recognition Journal of Cognitive Neuroscience 1991 3 1 71 86 10.1162/jocn.1991.3.1.71 2-s2.0-0026065565 Yeung K. Y. Ruzzo W. L. Principal component analysis for clustering gene expression data Bioinformatics 2001 17 9 763 774 10.1093/bioinformatics/17.9.763 2-s2.0-0034800371 Deerwester S. Dumais S. T. Furnas G. W. Landauer T. K. Harshman R. Indexing by latent semantic analysis Journal of American Society for Information Science 1990 41 6 391 407 Li X. Hu W. Shen C. Zhang Z. Dick A. van den Hengel A. A survey of appearance models in visual object tracking ACM Transactions on Intelligent Systems and Technology 2013 4 4, article 58 10.1145/2508037.2508039 Vapnik V. N. The Nature of Statistical Learning Theory 1995 New York, NY, USA Springer MR1367965 Cauwenberghs G. Poggio T. Incremental and decremental support vector machine learning Advances in Neural Information Processing Systems 2000 100 110 Karasuyama M. Takeuchi I. Multiple incremental decremental learning of support vector machines IEEE Transactions on Neural Networks 2010 21 7 1048 1059 10.1109/TNN.2010.2048039 2-s2.0-77954563782 Lorenzelli F. Yao K. Systolic Arrays for SVD Downdating, SVD and Signal Processing III: Algorithms, Architectures, and Applications 1995 Elsevier Science Hall P. Marshall D. Martin R. Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition Image and Vision Computing 2002 20 13-14 1009 1016 10.1016/S0262-8856(02)00114-2 2-s2.0-0036903144 Hall P. Marshall D. Martin R. Merging and splitting eigenspace models IEEE Transactions on Pattern Analysis and Machine Intelligence 2000 22 9 1042 1049 10.1109/34.877525 2-s2.0-0034266805 Skočaj D. Leonardis A. Weighted and robust incremental method for subspace learning 2 Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV '03) October 2003 Nice, France 1494 1501 2-s2.0-0344982975 10.1109/ICCV.2003.1238667 Levy A. Lindenbaum M. Sequential karhunen-loeve basis extraction and its applicat ion to images IEEE Transactions on Image Processing 2000 9 8 1371 1374 10.1109/83.855432 2-s2.0-0034247885 Huang D. Yi Z. Pu X. A new incremental PCA algorithm with application to visual learning and recognition Neural Processing Letters 2009 30 3 171 185 10.1007/s11063-009-9117-1 2-s2.0-70450228427 Ross D. A. Lim J. Lin R.-S. Yang M.-H. Incremental learning for robust visual tracking International Journal of Computer Vision 2008 77 1–3 125 141 10.1007/s11263-007-0075-7 2-s2.0-39749173057 Weng J. Zhang Y. Hwang W. Candid covariance-free incremental principal component analysis IEEE Transactions on Pattern Analysis and Machine Intelligence 2003 25 8 1034 1040 10.1109/TPAMI.2003.1217609 2-s2.0-0041471247 Zha H. Simon H. D. On updating problems in latent semantic indexing SIAM Journal on Scientific Computing 1999 21 2 782 791 10.1137/S1064827597329266 MR1718703 2-s2.0-0033296577 Levy A. Lindenbaum M. Sequential Karhunen-Loeve basis extraction and its application to images 2 Proceedings of the International Conference on Image Processing (ICIP '98) October 1998 Chicago, Ill, USA 456 460 10.1109/ICIP.1998.723422 Zhao H. Yuen P. C. Kwok J. T. A novel incremental principal component analysis and its application for face recognition IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics 2006 36 4 873 886 10.1109/TSMCB.2006.870645 2-s2.0-33746803683 Brand M. Fast low-rank modifications of the thin singular value decomposition Linear Algebra and its Applications 2006 415 1 20 30 10.1016/j.laa.2005.07.021 MR2214744 2-s2.0-33645149161 Melenchón J. Martínez E. Efficiently downdating, composing and splitting singular value decompositions preserving the mean information Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis 2007 436 443 Wang L. Chen S. Wu C. An accurate incremental principal component analysis method with capacity of update and downdate Proceedings of International Conference on Computer Science & Information Technology 2012 51 118 123 Artac M. Jogan M. Leonardis A. Incremental pca for on-line visual learning and recognition 3 Proceedings of the 16th International Conference on Pattern Recognition 2002 781 784 Jonathon Phillips P. Moon H. Rizvi S. A. Rauss P. J. The FERET evaluation methodology for face-recognition algorithms IEEE Transactions on Pattern Analysis and Machine Intelligence 2000 22 10 1090 1104 10.1109/34.879790 2-s2.0-0034290919 Martinez A. M. Benavente R. The AR face database CVC Technical Report 1998 24 Georghiades A. S. Belhumeur P. N. Kriegman D. J. From few to many: illumination cone models for face recognition under variable lighting and pose IEEE Transactions on Pattern Analysis and Machine Intelligence 2001 23 6 643 660 10.1109/34.927464 2-s2.0-0035363672 Nene S. A. Nayar S. K. Murase H. Columbia object image library (COIL-100) 1996 CUCS-006-96 Ozawa S. Pang S. Kasabov N. Incremental learning of chunk data for online pattern classification systems IEEE Transactions on Neural Networks 2008 19 6 1061 1074 10.1109/TNN.2007.2000059 2-s2.0-49149087192 Goel N. Bebis G. Nefian A. Face recognition experiments with random projection 5779 2nd Biometric Technology for Human Identification March 2005 Orlando, Fla, USA 426 437 Proceedings of SPIE 10.1117/12.605553 Dasgupta S. Gupta A. An elementary proof of a theorem of Johnson and Lindenstrauss Random Structures & Algorithms 2003 22 1 60 65 10.1002/rsa.10073 MR1943859 2-s2.0-0037236821