EVD Dualdating Based Online Subspace Learning

Conventional incremental PCA methods usually only discuss the situation of adding samples. In this paper, we consider two different cases: deleting samples and simultaneously adding and deleting samples. To avoid the NP-hard problem of downdating SVDwithout right singular vectors and specific position information, we choose to use EVD instead of SVD, which is used bymost IPCAmethods. First, we propose anEVDupdating anddowndating algorithm, called EVDdualdating,which permits simultaneous arbitrary adding and deleting operation, via transforming the EVD of the covariance matrix into a SVD updating problem plus an EVD of a small autocorrelation matrix. A comprehensive analysis is delivered to express the essence, expansibility, and computation complexity of EVD dualdating. A mathematical theorem proves that if the whole data matrix satisfies the low-rankplus-shift structure, EVD dualdating is an optimal rank-k estimator under the sequential environment. A selection method based on eigenvalues is presented to determine the optimal rank k of the subspace.Then, we propose three incremental/decremental PCA methods: EVDD-IPCA, EVDD-DPCA, and EVDD-IDPCA, which are adaptive to the varyingmean. Finally, plenty of comparative experiments demonstrate that EVDD-based methods outperform conventional incremental/decremental PCA methods in both efficiency and accuracy.


Introduction
Principal component analysis (PCA) [1], known as the subspace learning, or the Karhunen-Loeve transform [2], has been an active topic in machine learning and pattern recognition societies in the last several decades.As a well-known unsupervised linear dimension reduction and multivariate analysis method, PCA has been applied to biometric recognition [3], gene classification [4], latent semantic indexing [5], and visual tracking [6].
In order to obtain the optimal set of normal orthogonal basis, which endues PCA with the minimal reconstruction error, the batch-mode PCA can be achieved in two ways: the eigenvalue decomposition (EVD) of the data covariance matrix and the singular value decomposition (SVD) of the data matrix.Both approaches have a high computational cost and a mass demand of storage, in the case of a highdimensional and large-scale dataset.In practical applications, not all the observations are available before training.
Especially in online usage, samples arise sequentially along with time.In these situations, the batch-mode PCA does not satisfy the demand for real-time process due to its requirement to recompute the EVD or SVD of the whole data every time.
To solve this issue, incremental learning has been investigated for more than two decades in both applied mathematics and machine learning community, whose task is to update the learning results without reexecuting the whole process when adding new data points.Various effective incremental PCA (IPCA) methods have been proposed.
In a period of knowledge explosion, the fast growing information is usually adulterated with mock, invalid, or expired data.The presence of a few deviated samples might tremendously contaminate the solved model, such as principal directions in PCA.The overdue instances, which can be regarded as outliers compared to unexpired instances in some degree, could reduce the accuracy of data model.Therefore, for an intelligent learning system, the only function to admit new instances is not enough, but the capability to eliminate aberrant samples is also necessary.This is the aim of decremental learning.Comparing with IPCA, decremental PCA (DPCA) did not receive adequate attention in the literature.Only a few methods have been proposed in the last ten years.Besides, there is no incremental decremental algorithm of subspace learning to the best of our knowledge.Similar works are only about support vector machine (SVM) [7].Cauwenberghs and Poggio [8] propose an incremental decremental method of adding and/or deleting a single sample, and Karasuyama and Takeuchi [9] expand it to the case of multiple instances.
Because the essence of PCA is SVD or EVD in the mathematical form, the task of incremental PCA and decremental PCA is equivalent to updating and downdating SVD or EVD.In existing methods, most IPCA approaches adopt similar strategies via updating SVD.However, these tactics based on SVD may be impossible to be implemented for decremental PCA.Lorenzelli and Yao [10] point out that SVD downdating is NP-hard without knowing right singular vectors.Hall et al. [11] argue that right singular vectors of the remained matrix cannot be computed without visiting elements of the right singular vectors of the original matrix.Besides, in many practical applications, such as subspace learning [12,13], image reconstruction [14], face recognition [15], and visual tracking [16], only left singular vectors are needed as the projection matrix, so right singular vectors are usually not stored to save memory.If the data matrix and the right singular vectors are not preserved, the position information of deleted points in the queue may be unknowable in the decremental case, which causes right singular vectors to be incomputable.The problem of incomplete position information does not arise in increment PCA, because it is a common sense that new instances are appended to the tail of queue.
Based on the demand on incremental decremental learning and the difficulty of decremental learning in the analysis above, we introduce a novel online subspace method for simultaneous incremental decremental learning.The contributions in this paper are as follows.
(1) To avoid the problem of lacking right singular vectors in decremental learning, we utilize EVD instead of SVD and propose a dualdating algorithm for eigenspace, that is, EVD dualdating, which can accept and delete samples at the same time.Our algorithm transforms the EVD updating and downdating of the covariance matrix into a SVD updating problem plus an EVD of a small autocorrelation matrix.To the best of our knowledge, it is the first attempt of simultaneous incremental decremental subspace learning and has a simpler and unitized mathematical form, which theoretically guarantees a better performance than the conventional multiple-step implementation.
(2) Several theoretical and computational analyses are presented to further explore the property of EVD dualdating, including the essence and geometric explanation of EVD dualdating, expansive forms of EVD dualdating for data revising and weighted updating, the computation complex of EVD dualdating, a mathematical theorem which demonstrates the optimality of EVD dualdating in the sequential mode if the data matrix satisfies low-rank-plus-shift structure, and a selection method of the optimal rank  based on eigenvalues.
(3) It is proofed that the change of mean caused by adding or deleting samples in the varying-mean PCA can be transformed into adding and deleting several equivalent vectors in the zero-mean PCA.Thus, three online PCA algorithms are derived based on EVD dualdating to cope with changeable mean: incremental PCA (EVDD-IPCA), decremental PCA (EVDD-DPCA), and incremental decremental PCA (EVDD-IDPCA).
The remaining of this paper is organized as follows.Section 2 briefly reviews the updating and downdating methods of both SVD and EVD and incremental PCA.The proposed EVD downdating algorithm and its analyses are presented in Section 3. In Section 4, EVD dualdating is applied to incremental decremental PCA with mean updating.Section 5 presents the experiment results and comparisons with other approaches.Section 6 concludes this paper.In the end, proofs of lemmas and theorems are in the Appendix.

Related Work
Over the past few decades, many efficient incremental PCA methods have been proposed.Generally, existing IPCA algorithms can be divided into three categories.The first category updates eigenvectors without any matrix decomposition.The typical method is the candid covariance-free IPCA (CCIPCA) [17].The second category updates principal components via EVD updating.The subspace merging and splitting model, developed by Hall et al. [12], belong to this category.With the help of partition R-SVD [14] and SVD updating [18], the third category is the most studied which recomputes singular values and singular vectors via sequentially updating SVD.
Weng et al. [17] propose an incremental PCA method without computing the covariance matrix, the candid covariance-free IPCA(CCIPCA).The CCIPCA algorithm computes principal components sequentially and considers the complementary space of lower order PCs when calculating higher order PCs.Because the computation of the (+1)th PC depends on the th one, the error will be accumulated in the whole process.Besides, the sample-to-dimension ratio needs to be large enough to avoid some problems coming from the view of statistical estimation.This condition is not satisfied in many situations.
Hall et al. [12] develop a merging and splitting eigenspace model (MSES).This algorithm is an online subspace learning algorithm based on EVD, via solving a small eigenproblem on a new orthonormal basis.MSES is able to update or downdate EVD by adding or subtracting the eigenspace of added or deleted samples and adaptive to the change of data mean.
Except these two approaches above, other incremental PCA methods are based primarily on SVD.Levy and Lindenbaum [19] propose the sequential Karhunen-Loeve (SKL) method based on the partitioned R-SVD algorithm, which simplifies the SVD of a large data matrix into the SVD of some small ones via a sequential procedure.Then, this sequentialized partitioned R-SVD algorithm is utilized to extract PCs from a sequence of human face images.Besides, a forgetting factor is employed to weaken the affection of old data.However, in SKL, the mean of data is not taken into consideration, so the result is not accurate enough when confronting an image sequence with a variational mean, such as a human face under a changeable illumination.Sko č aj and Leonadris [13] develop the weighted and robust incremental subspace learning (WR-ISL) algorithm, which has the ability to deal with the change of mean and weighted data.However, WR-ISL does not consider the chunk updating mode; in other words, only one sample can be manipulated in each round.The mean updating for multiple samples is solved by Ross et al. [16], who demonstrate that the covariance matrix of the combined data is equal to the sum of covariance matrices of the old data, the new data, and an additional vector when taking the mean into account.According to this, Ross et al. obtain an extended SKL algorithm with mean updating, which is applied to visual tracking and successfully locates one human face and one toy with different poses under variational background and illumination, both indoor and outdoor.
Zha and Simon [18] propose a more generalized mathematical formula to update SVD, namely, SVD updating.This algorithm, which is applied to LSI, is an efficient incremental method to recalculate the rank- SVD for updating documents, updating terms, and term weight corrections.Moreover, Zha and Simon prove that if the united data matrix satisfies the low-rank-plus-shift structure, the result of the SVD updating algorithm with the new data and the optimal rank- approximate of old data is still an optimal rank- estimation.Zhao et al. [20] propose a chunk incremental PCA approach via the SVD updating algorithm, known as SVDU-IPCA.Comparing to other incremental PCA methods, SVDU-IPCA computes the eigendecomposition of the autocorrelation matrix instead of the covariance matrix.The motivation is that usually the sample number is much smaller than the data dimension in practical applications, so the dimension of the autocorrelation matrix is also smaller than the covariance matrix.Then, Zhao et al. find a strategy to update the eigendecomposition of a autocorrelation matrix by SVD updating.However, the change of mean is not considered in SVDU-IPCA, so it is not suitable for the situation with changeable mean.Besides, it suffers from the problem of growing demand for storage and computation, because the size of autocorrelation matrix is dilating along with the new data, and an additional process is needed to transfer the resulting right singular vectors and kept whole data to principal components.Huang et al. [15] propose an improved SVDU-IPCA method to handle changeable mean data and decrease the storage, where only a small package of concentrated data is saved to calculate left singular vectors.
Although a great deal of research has been accomplished about incremental subspace learning, the research on decremental learning is still inadequate in the literature.The merging and splitting eigenspace model developed by Hall et al. [12] can downdate EVD to recompute PCs when deleting some samples from the old data.Meanwhile, they claim that it is impossible to achieve SVD downdating in a closed form with their model.Brand [21] proposes a fast modification model of rank- singular value decomposition (MSVD).As an extension of the term weight corrections form of SVD updating, MSVD is able to recompute the rank- SVD of the modified data matrix after updating, downdating, revising, and recentering terms.However, this method does not take the mean into consideration, so its result is not accurate when the data mean was time-varying.Melenchón and Martínez [22] develop a method for downdating, composing, and splitting SVD (DCSSVD) with a changeable mean.DCSSVD accomplishes these by downdating, composing, and splitting the right singular vectors firstly, then computing the mean and SVD of the remained right singular vectors, and finally calculating the resulting SVD.However, this method suffers from a severe efficiency problem, since the core process is the SVD of a  ×  matrix, whose computation complexity is ( min(, )), still depending on the data dimension.AIPCA, proposed by Wang et al. [23], is a decremental version of SVDU-IPCA algorithm which recomputes the eigendecomposition of the autocorrelation matrix by MSVD.Although AIPCA achieves decremental subspace learning, it inherits disadvantages of SVDU-IPCA and MSVD, such as incapability of handling changeable mean, a large memory to preserve the data matrix, and an additional process to transfer its results to left singular vectors.
Beside the accuracy and efficiency, the severest problem faced by SVD-based decremental methods is that it is a NPhard problem without the position information of deleted samples in the data matrix, which might be not obtainable in many practical applications.

EVD Dualdating
In Section 3.1, we first briefly review the SVD updating [18] algorithm.Our EVD dualdating algorithm is proposed in Section 3.2.Then, the related analyses are reported in the following sections.

SVD Updating. Given a data matrix
where To solve this problem, Zha and Simon [18] propose an efficient mathematical tool, namely, SVD updating.Its detailed procedure is described in Algorithm 1.

EVD Dualdating.
In this section, a thorough discussion of the proposed dualdating algorithm for EVD is presented.Dualdating means updating and downdating together; in other words, we consider the situation of adding and deleting samples simultaneously.
Given a data matrix  = [ 1 ,  2 , . . .,   ] ∈ R × , its SVD is  = Σ.Let   =   ∈ R × the covariance matrix (in this section, we do not distinguish the covariance matrix and the scatter matrix, since their difference is only the coefficient 1/), then   is a symmetric positive semidefinite matrix, and its EVD is   = Λ  , where Λ = Σ 2 .The best rank- approximation of   is where   is the first  columns of  and Λ  is a diagonal matrix with the largest  eigenvalues in Λ.For any matrix  ∈ R × , we call (2) the rank- eigenvalue decomposition (rank- EVD) of  or   .Now some old samples  are to be deleted, where  can be composed of arbitrary  ( < ) columns in .Without losing any generality, let  be the last  columns:  = [ +1− , . . .,   ] ∈ R × .Meanwhile, new instances  are available:  = [ +1 , . . .,  + ] ∈ R × .We are interested in how to express the rank- EVD of the final data matrix  ⋆ = [ 1 , . . .,  − ,  +1 , . . .,  + ] ∈ R ×(−+) as modifications to   , Λ  via  and .

Basic Procedure.
The basic procedure of the proposed EVD dualdating algorithm is as follows.Let Thus, the covariance matrix of  ⋆ can be written as The basic idea of EVD dualdating is to transform the dualdating problem into a SVD updating problem plus an extra process with a small computation complexity.Firstly, consider the matrix , we assume that right singular vectors in the rank- SVD of  are   .Then, the rank- SVD of [  ] can be calculated by the SVD updating algorithm: where 5) into (4); we have Let Because  ⋆  ⋆ is a symmetric positive semidefinite matrix, Ψ is also symmetric positive semidefinite.Usually  ≪ , so Ψ is a small matrix compared to the covariance matrix of  ⋆ .The EVD of Ψ is where By ( 3) to ( 9), we have successfully converted the dualdating problem of EVD into a SVD updating problem of adding  +  samples plus an EVD of a small  ×  matrix.

Further Simplification.
Although the basic procedure of our EVD dualdating algorithm is given, one problem still remains unsolved: the assumed right singular vectors   is unobtainable.Here we address this problem by simplifying the computation of Ψ.
Consider the results of the SVD updating algorithm on the rank- SVD of [  ]: Input: The rank- EVD of old data  ∈ R × ,   , Λ  , the deleted data  ∈ R × , the added data  ∈ R × .Output: The rank- EVD of the remained data Algorithm 2: EVD dualdating. where Take ( 10) into ( 7); we have From (11), it can be seen that the right singular vectors   are actually not needed, and the computation of Ψ is simplified.

Algorithm.
The detailed procedure of EVD dualdating has been presented above.To sum up, the pseudocode of our EVD dualdating algorithm is described in Algorithm 2. To achieve pure updating or downdating of EVD, it is only needed to let  or  be an empty set.From the computation progress, EVD dualdating is degenerated into the standard SVD updating in the updating mode.

Analysis of EVD Dualdating.
In this section, we first analyze the mechanism of EVD dualdating for incremental and decremental learning.Second, some extended forms of EVD dualdating are given for particular uses.Third, the computation complexity on the proposed EVD dualdating algorithm is presented.Fourth, the optimality of EVD dualdating in the sequential usage is demonstrated.Finally, we discuss how to determine the optimal rank .

Mechanism of Incremental and Decremental Learning.
For convenience, when analysing the essence of incremental and decremental learning based on EVD dualdating, we only consider the pure updating or downdating situation and denote the changed matrix as  ∈ R × in both situations.
According to the procedure of EVD dualdating, two key decompositions are the SVD updating of the equivalent adding matrix [ ] and the EVD of the small matrix Ψ.And in the SVD updating algorithm, the core step is the rank- SVD of the following matrix: where Σ  is the diagonal variance matrix of the original data ,     is the coefficient matrix by projecting  onto the subspace spanned by   , and  is the upper triangular reconstruction error matrix of .So a conclusion can be obtained that, in , the left  columns represent the original data, and the remained  columns represent the added or deleted data.
Then, divide the columns of Ṽ into two partitions: where Ṽ1  are the first  rows of Ṽ and Ṽ2  are the last  rows of Ṽ .Thus, Ṽ1  , Ṽ2  stand for the old data and the changed data, respectively.
So the matrix Ψ can be written as Now, let us observe the situation from the view of geometry shown in Figure 1.On the left is the column space of data.The red arrows represent the orthogonal basis   of the old subspace.The green arrow is the added or deleted samples , whose projection on   and reconstruction error are the green dashed arrows.After QR decomposition, a new basis in the extended subspace is made up by the red and pink arrows.However, the projection of the data matrix on this basis is not completely diagonalized.So, the SVD of the coefficient matrix  on this basis is executed to obtain the diagonalizing matrix.Then, the new orthogonal basis after adding  is represented by the blue arrows.At this time,

Mathematical Problems in Engineering
Column space of data Row space of the row space of  is drawn on the right of the figure.The black and pink arrows compose a standard orthogonal basis, where the black ones are the elements corresponding to old samples, and the green one is the elements corresponding to changed samples.The blue arrow represents the orthogonal vectors Ṽ Σ  in the row space of  after adding  via SVD updating.Because EVD dualdating adds samples  at first no matter whether the case is incremental or decremental, so it needs to make an adjustment in the row space of .If it is deletion, the elements corresponding to changed samples are sign-changed.As shown in Figure 1, the component, marked by the cyan blue dashed arrow, is reversed.From (14), Ψ is in fact the sum or difference of autocorrelation matrices of the old data Ṽ1

Σ𝑇
and the changed data Ṽ2  Σ  .According to the relationship between the column and row space, the EVD of Ψ is utilized to acquire the new rotation matrix  in the data space.Finally, the resulting orthogonal basis  ⋆  is the orange arrows.
To sum up, the aim of EVD dualdating is to obtain the projection matrix caused by the change of sample set, and the essence of EVD dualdating is to transform the EVD of a varying covariance matrix in the data space to the EVD of a varying autocorrelation matrix in a dimension-reduced row space.

Extendibility of EVD Dualdating.
From the deduction of EVD dualdating, it can be seen that nearly no restriction is imposed on , , and Φ.In the downdating mode, the procedure is still feasible even if  is not columns of .
Meanwhile, Φ can be selected as any matrix which matches the dimension.The only condition needed to be satisfied is that Ψ must be a positive semidefinite matrix.From another view, our EVD dualdating algorithm has a favorable extendibility.
The standard dualdating mode for EVD dualdating is adding and deleting samples synchronously.As we mentioned before, when  or  is an empty set, EVD dualdating can work at the pure incremental or decremental mode.When  = , EVD dualdating can be seen as data revising.Another useful extension is forgetting updating, or the so-called weighted updating, which is very important for online applications.In the learning procedure, prior instances should be assigned a low weight since they become antiquated as time goes on.Without proper weighting mechanisms, the contribution of too many old similar samples can become too prominent that new instances seem meaningless.In [16], the forgetting factor is used to destrengthen the effect of old images of the tracked object.Via EVD dualdating, a concise but meticulous weighting formula can be acquired, in which the weight of an arbitrary sample can be modulated, similar as the way adopted in [13].The sole operation is modifying Φ as follows: The equivalent data matrix of weighted updating is The detailed expansions of EVD dualdating are listed in Table 1.

Computation Complexity of EVD Dualdating.
Before analyzing, we define some signs to simplify the representation: QR(, ) and SVD(, ) stand for the QR and SVD decomposition of a  ×  rectangle matrix and QR(), SVD(), and EVD() stand for the QR, SVD, and EVD decomposition of a  ×  square matrix.The computation of the proposed EVD dualdating algorithm is composed of four parts: QR of ( × −      ) ⋆ , SVD of , EVD of Ψ, and other multiplication operations including calculating reconstruction error, Ψ, and  ⋆  .Because the first  column of ( × −      ) ⋆ is already orthogonal, its QR decomposition only operates the last  +  columns actually.The computation complexity is presented in Table 2, where the computation is presented into two parts: matrix decomposition and transformation cost.
In the pure updating or downdating mode, there are two matrix decompositions in our EVD-Dualdating algorithm, one more than other pure updating and downdating methods.This may cause EVD dualdating slower than other methods.But taking the dimension and the transformation cost into account, the efficiency of EVD dualdating is close or even better, comparing to other methods.The main advantage of our algorithm can be reflected in the dualdating mode.As the only method achieving simultaneous updating and downdating, EVD dualdating can avoid many repeating processes and decrease the cumulative error.An experimental comparison of efficiency and accuracy on our EVD dualdating and other incremental and decremental methods is presented in Section 5.

Justification of the Sequential
Usage of   and Λ  .In many online applications, it is impossible to store the original data because of the limitation of the physical medium and the consideration about efficiency.Described in a mathematical form, this means that the original data matrix  is unobtainable and replaced by its best rank- approximation which can be calculated by   and Λ  .So, it is urgent to demonstrate the effectiveness of EVD dualdating in a sequential process.
Zha and Simon [18] proof that when the combined matrix satisfied the low-rank-plus-shift structure, SVD updating is optimal when  is replaced by its best rank- approximation.Here, a theoretical demonstration is given to illustrate that if the whole data matrix satisfies the low-rank-plus-shift structure, the result of EVD dualdating after any adding or deleting operations is also an optimal rank- estimation.First, we state Lemma 1 without proof.
The lemma above indicates that for the rank- EVD it is safe to cutoff the minor eigenspaces, without affecting the optimality.With this, we discuss under the low-rankplus-shift structure, when  is replaced by best  (), the information discarded will also be discarded after EVD dualdating.The conclusion is summarized in Theorem 2, whose proof can be found in the Supplementary Material (available online at http://dx.doi.org/10.1155/2014/429451).
Theorem 2. Given a matrix  ∈ R × , with its best- approximation Â, the deleted data  ∈ R × from , the added data  ∈ R × ,  >  > .Let  = [ ] be the remained data from ,  = [  ] the full data, and  = [ ] the final data, where the underline means deletion.Let Ĉ = [best  () ] be the remained matrix after deleting columns corresponding to  from 's best- approximation, and let Ê = [ Ĉ ] be the final data from Â. Assume that  satisfies the low-rank-plus-shift structure; that is, where   is symmetric and positive semidefinite with rank(  ) = ; then best  (  ) = best  ( Ŝ ) .
3.5.Criterion for the Optimal Rank  Selection.In the deduction above, the rank of subspace is assumed to be a fixed number ; however the optimal dimension of subspace depends on a priori information, which is possibly unknown in practical applications.Based on the fact that the bulk of variability of a given dataset can be captured by the top few eigenvectors, we introduce an eigenvalue-based method to determine the best rank  of subspace during the online learning procedure.Supposing the truncation operation is not yet executed in steps 4 and 6 of Algorithm 2, the ranks of obtained  ⋆  and Λ ⋆  are  +  + .First, we define the rate   : which indicates the proportion of the first  eigenvalues in all eigenvalues.Then, the best dimension  can be selected as the minimum number  exceeding some threshold: where the threshold   is a value in (0, 1).In the batch mode, the threshold only depends on the proportion of information to be preserved.For EVD dualdating, because the estimation of rank  and truncation are performed in every round, the threshold is relative to the ratio of saved and new information.
In practical implementation, it can be chosen according to the chunk size of added and deleted samples.

Incremental Decremental PCA Based on EVD Dualdating
In the deduction of EVD dualdating in Section 3.2, the mean of samples is not considered, but in practical applications, centralization is a necessary process to reduce the effect of  environment.In this section, we first provide a brief review of PCA.Then, we take mean into account and propose three online subspace learning algorithms: EVDD-IPCA, EVDD-DPCA, and EVDD-IDPCA.

Principal Component Analysis.
Principal component analysis (PCA) is one of the most popular multivariate analysis and dimension reduction methods.The goal of PCA is to find a set of normal orthogonal basis, so-called principal components, which has the best reconstruction performance in the sense of minimum mean squared error (MMSE).Given a data matrix  = { 1 ,  2 , . . .,   } ∈ R × , the covariance matrix of  is defined by   = (1/) ∑  =1 (  −   )(  −   )  .Principal components (PCs) are the first  eigenvectors { 1 , . . .,   } corresponding to the largest  eigenvalues { 1 , . . .,   }.Let  = { 1 , . . .,   }, Λ = diag( 1 , . . .,   ); then  and Λ can be achieved by the EVD of the covariance matrix,   = Λ  .Another way of solving PCA is to compute the SVD of the centralized data matrix  − 1  = Σ  , where 1  stands for a 1 ×  full-1 row vector, each column of left singular vectors  is a principal component, and Σ = √ Λ is the singular value matrix.

Incremental and Decremental PCA.
When confronting a huge dataset with a high dimension, both batch-mode methods, no matter EVD or SVD, cost tremendous time and storage.Besides, for an online learning system, it has to face an awkward circumstance that not all the instances are available before training, or some expired instances need to be deleted after training.Obviously, these problems exceed the ability of the batch-mode PCA.The incremental and decremental PCA is a natural solution.
In this section, we consider EVD dualdating with a timevarying mean, and deduce the incremental decremental PCA formula based on EVD dualdating.As mentioned before, EVD dualdating degenerates into SVD updating without right singular vectors in the updating mode, so EVDD-IPCA is actually the same as the extended sequential KL algorithm.Nonetheless we still present it in this paper for integrity.The interested reader can find more details in the reference paper [16].
The key idea of EVDD-based incremental and decremental PCA algorithm is that centralizing the original samples, the added samples, and the deleted samples separately and utilizing some mean-revising vectors to keep the covariance matrix equal to the original one.The methods of determining these mean-revising vectors are introduced in Lemmas 3, 4, and 5.For incremental or decremental PCA, there is only one mean-revising vector, noted as the equivalent added vector   or the equivalent deleted vector   , respectively, which is direct ratio to the difference of the original mean and the changed sample mean.For increment decremental PCA, the situation is a little more complex.Because of the existence of intersecting items, there are three mean-revising vectors, two equivalent added vectors  1 ,  2 , and one equivalent deleted vector   .Based on these lemmas, the proposed EVDD-IPCA, EVDD-DPCA, and EVDD-IDPCA algorithms are presented in Algorithms 3, 4, and 5.

Remark 6.
As an important approach of dimension reduction, PCA is utilized as the preprocessing method for many other machine learning methods, and the feature extraction method in other applications.Because these methods usually work in the subspace of PCA, there is a great demand to achieve simultaneous online incremental decremental subspace learning and data reconstruction.Artac et al. [24] propose a method to sequentially compute the coefficients of a sample in IPCA.Here, we introduce an incremental approach to update the projection coefficients of a data point after renewing the subspace via EVD-IDPCA, without storing the original data.For any sample   , assuming the eigenvectors is   and the mean is  when it is added into the dataset, the reconstruction of   is x =     + , where   =     is the projection coefficients of   on the basis   .Then, at each round of EVD-IDPCA, the projection coefficients of   can be updated by where Ũ(1:)

𝑘
is the first  rows of Ũ .It is worth noticing that in (24) ( Ũ(1:)  )  is a procedure variable in EVD dualdating, and  ⋆  ( −  ⋆ ) only needs to be computed once for all samples, so the computational amount of updating   is small, ( 2 ), but the memory to store a data point is reduced from () to ().

Experiment
In this section, experiments of the proposed algorithms based on EVD dualdating are presented, compared with other classic methods.Because incremental PCA has been discussed a lot in the earlier literature and the proposed EVDD-IPCA is actually equivalent to the extended sequential KL algorithm, we do not verify IPCA methods in this paper any more.The interested reader can find the performance analysis and comparison in relative papers [12,15,16,20].In the following content, decremental PCA, incremental decremental PCA experiments on real-world datasets are firstly reported; then, an adaptive rank selection experiment of EVD-Dualdating on an artificial dataset is represented.All experiments are performed with Matlab, on a computer with dual-core 2.0 GHz CPU and 4 G RAM.

DPCA Experiment: Performance Evaluation on Real-World Data.
In order to verify the performance and efficiency of the proposed EVDD-DPCA and EVDD-IDPCA, four datasets are used, including cases of both high dimension and huge size.The FERET [25] database is a standard dataset used for facial recognition system evaluation managed by DARPA and NIST.The AR [26] dataset is a popular face image dataset, where images are shot under different facial expressions, illumination conditions, and partial occlusions due to sun glasses and scarf.The Yale Face Database B (Yale B) [27] contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses ×64 illumination conditions).Subsets of AR, FERET, and Yalb B are employed in our simulation, which includes 952, 720, and 4050 cropped and centralized face images, respectively.The fourth database is the Columbia Object Image Library (COIL-100) [28], and it includes 7200 color images of 100 objects, which are taken at pose intervals of 5 degrees, corresponding to 72 poses per object.The detailed information of four datasets and our experiment settings are listed in Table 3.
To compare the performance of decremental learning, we implement the proposed EVDD-DPCA algorithm with the batch-mode PCA, MSES [12], MSVD [21], DCSSVD [22], and AIPCA [23].First, the whole data are learned via the batch-mode PCA; then, assuming some classes are expired data, their samples are deleted chunk by chunk.In our experiment, the number of expired classes is 40 for FERET, 39 for AR, 30 for Yale B, and 40 for COIL-100, and the chunk size is 10.Every experiment is repeated 20 times to reduce the disturbance from the process scheduling of operating system and randomized grouping.The performance is mainly evaluated by their efficiency, accuracy of eigenspace, and performance of face recognition: (i) execution time; (ii) weighted angle between PCs of the batch-mode PCA and DPCA methods: where   is the th eigenvalue of the batch mode and   is the angle between the th PCs; (iii) recognition rate.

Computational Efficiency by the Subspace Dimension
(10−200).Recalling the analysis of computation complexity in Section 3.3, the practical computational efficiency depends on the dimension of the small matrix for decomposition and the cost of transformation.From Table 2, DCSSVD has a larger computation complexity of matrix decomposition, QR( − , ) and SVD(, ); AIPCA has a larger computation complexity of transformation, (( + )).For MSES, MSVD and our EVDD, they have close computation complexities: MSES needs one additional SVD(, ) to extract the eigenspace model of deleted samples before subtracting; MSVD has two QR decomposition of the residual matrices in both the row and column space, QR(, ) and QR(, ), as well as a larger transformation cost (( + )( + ) 2 ); EVDD has one additional EVD() to transform updating to downdating.Therefore, when the data dimension  is high or the size of dataset  is huge, that is, ,  ≫ , , DCSSVD and AIPCA achieve lower efficiencies, and MSES, MSVD, our EVDD achieve close higher efficiencies.This conclusion is also demonstrated by Figures 2(a), 2(b), 2(c), and 2(d), which show the execution time by kept PCs (: 10-200) of MSES, DCSSVD, MSVD, AIPCA, and EVDD-DPCA, on FERET, AR, Yale B, COIL-100.From these figures, we observe that our proposed EVDD-DPCA achieves a better or comparable efficiency.

PC Estimation Equality to
Ground-Truth PCs.In order to evaluate the accuracy, the angles between the resulting PCs of DPCA methods and the batch-mode PCA can be adopted.But, we choose the weighted angles by their corresponding eigenvalue, which are more suitable for evaluation because they emphasize the importance of the leading PCs.From these figures, our proposed EVDD-DPCA algorithm performs the best accuracy of the eigenvector estimation.The accuracy of principal direction depends on the estimation of mean and the cut-off error.The error of mean will cause a bias of the origin for data centralizing, which may cause the direction of the resulting basis totally different in the worst situation.The cut-off error accumulates in the sequential process, so the more times the truncation happens, the lower accuracy the final result remains.The method to update the mean is the same in MSES and EVDD-DPCA, whose estimate is equal to the true mean.In DCSSVD, the new mean is updated via the right singular vectors .However,  is cut off to the reduced dimension, so its estimation of mean is not accurate.But the inaccuracy of mean will not affect its computation of singular vectors, because the mean correction item is stripped off from , and no data centralizing process is executed.So errors of the singular vectors in EVDD-DPCA, MSES and DCSSVD mainly come from the cut-off error.Before splitting, MSES calculates the EVD of the deleted data, whose result is cut off to the kept dimension.The step will bring in more cutoff error.DCSSVD directly deals with the right singular vectors  to achieve downdating, so it actually ignores the information of deleted samples reflected by high order PCs.In AIPCA and MSVD, the mean is not updated, so all the remained samples centralized with the old mean.Therefore, their results deviate far away from true PCs.In Figures 3(a

Results of Recognition with Minimum Distance Classifier.
In the recognition experiment, the resulting PCs are used as the projection matrix to project the testing image to the subspace, then minimum distance classifier (MDC) is utilized for recognition.The advantage of MDC in our online application is that only the mean of each class in the projection subspace needs to be saved.The distance between a sample  and a class Ω in MDC is defined by a Mahalanobis distance: where x is the projection vector in the subspace,  Ω is the mean of the class Ω in the subspace, Λ  is the eigenvalue matrix estimated by EVD dualdating.Figures 5(a), 5(b), 5(c), and 5(d) represent recognition rates of the full-data PCA, the batch-mode PCA, DPCA methods.The result shows that the full-data PCA has a lower recognition rate due to the existence of expired instances, and all DPCA methods have close recognition rates, nearly equal to the batch-mode PCA.The similar results are also obtained by Ozawa et al. [29].This phenomenon can be explained via random projection (RP) [30].According to Johnson-Lindenstrauss lemma [31], arbitrary set of  points in a highdimensional Euclidean space can be mapped onto a  ≥ (log / 2 ) (0 <  < 1) dimension subspace where the distances between all pair of points are approximately preserved.So as long as  is large enough, for arbitrary -dimensional random projection, the classification performance is mainly determined by MDC and the structure of data space itself.In our experiments, the smallest  is between the range [40, 60] on FERET, AR, Yale B, and about 100 on COIL-100.

IDPCA Experiment: Performance Evaluation on Real-
World Data.To compare the performance of incremental decremental subspace learning methods, we implement the proposed EVDD-IDPCA algorithm with the batch-mode PCA, MSES [12], MSVD [21], DCSSVD [22], and AIPCA [23].Because DCSSVD only accomplishes decremental PCA, we combine it with the extended SKL to achieve IDPCA.As a decremental version of SVDU-IPCA, AIPCA is connected with SVDU-IPCA to fulfill IDPCA in our experiment.
The datasets for IDPCA is the same as in the DPCA experiment and the configuration is shown in Table 4.In our experiment, samples of pretraining classes are learned by the batch-mode PCA, then at every round, a chunk of samples in expired classes are deleted, and meanwhile a chunk of samples in new classes are added.The chunk size is 10.The training/testing rate is the same as in the DPCA experiment.pre-processing, post-processing and some matrix decompositions.Therefore, as shown in Table 2, via the dualdating scheme, EVD dualdating has a more concise form with less matrix decompositions and a lower transformation cost.So in our experiment, the proposed EVDD-IDPCA performs much higher efficiency than other methods, especially, when the scale of dataset is large.when the chunk size is 10.As the only real incremental decremental PCA method with an accurate mean estimation and a dualdating scheme, EVDD-IDPCA can obtain principal eigenvectors with fairly better approximation than other methods via avoiding redundant cut-off error.These figures show that the estimation of leading PCs in EVDD-IDPCA is significantly superior to opponents.Besides, one important advantage of EVDD-DPCA, not reflected by these DPCA and IDPCA experiments, is that the specific position information of deleted and added samples is not needed, which are necessary for DCSSVD, AIPCA, and MSVD.

Automatic Rank 𝑘 Selection and Weighted EVD Dualdating.
In this experiment, the selection of the dimension  of subspace without any a priori is evaluated.An artificial dataset is used here, which includes data points generated from the following model: where  ∈ R × ,  ∈ R ×1 is a coefficient vector and  is a small noise, sampled from a normal distribution N(0,   ).In the simulation, the data dimension is  = 100, and the number of generated samples is 10000.Then, samples are sequentially learned at different chunk sizes (5,10,20) by our EVD dualdating and weighted EVD dualdating algorithms.The weights are  1 = 0.95, and  =  in weighted EVD dualdating.In every round, the number of kept PCs is determined by (20), and the thresholds of preserved proportion are   = 0.98, 0.97, 0.96 with respect to the chunk sizes 5, 10, and 20. Figure 10 shows the updating curves of kept rank  during the online learning process, where the solid lines stand for weighted EVD dualdating, and the dash lines stand for EVD dualdating.From this figure, kept ranks in all curves quickly rise from the chunk size to 50-60 at the beginning, which means new features have been added to the eigenspace.Then, ranks of the weighted EVD dualdating tend to a common stable value 53.It is worth noted that the red solid line with the smallest chunk size 5 has the fast convergence speed, and the blue solid one with the largest chunk size 20 converges slowest.For normal EVD dualdating, because the influence of leading PCs is not weakened, as the online learning progresses, it becomes unwelcome to new features, and later exclusive to minor PCs.Therefore, their kept ranks, reflected by dash lines, all have a quick decreasing trend.For example, the blue dash line (chunk = 20) ends with a rank less than 30, after all samples are learned.

Conclusion
Mathematical Problems in Engineering from previous works, the proposed EVD dualdating algorithm can renew the EVD of a data matrix while adding and deleting samples simultaneously.With EVD dualdating, IPCA-EVDD, DPCA-EVDD, and IDPCA-EVDD are presented to handle the changeable mean, where the variation is equivalent to add and delete several additional vectors in the case of zero-mean PCA.Plenty of comparative experiments on both real-world and artificial databases demonstrate that our EVD dualdating algorithm has a significant better approximation accuracy and computational efficiency than other state-of-the-art incremental and decremental PCA methods.

Figures 3 (
a), 3(b), 3(c), and 3(d) show the weighted angles of the first 50 PCs of DPCA methods on four datasets, when the number of kept PCs is 100, and the chunk size is 10.Figures 4(a), 4(b), 4(c), and 4(d) show the weighted angles error of the first 50 PCs of DPCA methods on different datasets by the number of kept PCs (: 10-200), when the chunk size is 10.
), 3(b), 3(c), and 3(d), it can be seen that the weighted angle of our proposed EVDD-DPCA is much smaller than other methods, because of the accurate estimate of mean and the smaller cut-off error.MSES and MSVD have close performances, and AIPCA and MSVD have larger errors.The same conclusion can be obtained in Figures4(a), 4(b), 4(c), and 4(d).The fluctuation at the beginning of these curves is because the dimension of observed PCs is increasing from 10 to 50.
Ground-Truth PCs.Figures 7(a), 7(b), 7(c), and 7(d) show the weighted angles between the first 50 PCs of different IDPCA methods, when the number of kept PCs is 10, and the chunk size is 10.Figures 8(a), 8(b), 8(c), and 8(d) show the error norm of weighted angles between the first 50 PCs of IDPCA methods on different datasets by the number of kept PCs(: 10-200),
is the diagonal matrix with the largest  singular values and   and   are the first  columns of  and , respectively.We call (1) the rank- singular value decomposition (rank- SVD) of .When new samples  = [ +1 ,  +2 , . . .,  + ] ∈ R × come, how to compute the rank- SVD of the new data matrix  • = [ ] by only using   , Σ  ,   , and ?The rank- SVD of old data ,   , Σ  ,   , new data .Output: The rank- SVD of total data  • , Ǔ  , Σ  , V Input:

Table 3 :
Dataset and configuration for DPCA.

Table 4 :
Dataset and configuration for IDPCA.
Execution time, weighted angle, and recognition rate are used to evaluate the performance of IDPCA methods.5.2.1.Computational Efficiency by the Subspace Dimension(10−200). Figures 6(a), 6(b), 6(c), and 6(d) present the runtime by the number of kept PCs () of IDPCA methods.Different from other IDPCA methods, which process incremental learning and decremental learning separately, our EVDD-IDPCA deals with deleted and added samples simultaneously, and avoids the repeating execution of