^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{5}

^{8}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{8}

As a powerful nonlinear feature extractor, kernel principal component analysis (KPCA) has been widely adopted in many machine learning applications. However, KPCA is usually performed in a batch mode, leading to some potential problems when handling massive or online datasets. To overcome this drawback of KPCA, in this paper, we propose a two-phase incremental KPCA (TP-IKPCA) algorithm which can incorporate data into KPCA in an incremental fashion. In the first phase, an incremental algorithm is developed to explicitly express the data in the kernel space. In the second phase, we extend an incremental principal component analysis (IPCA) to estimate the kernel principal components. Extensive experimental results on both synthesized and real datasets showed that the proposed TP-IKPCA produces similar principal components as conventional batch-based KPCA but is computationally faster than KPCA and its several incremental variants. Therefore, our algorithm can be applied to massive or online datasets where the batch method is not available.

As a conventional linear subspace analysis method, principal component analysis (PCA) can only produce linear subspace feature extractors [

Standard KPCA has some drawbacks which limit its practical applications when handling big or online datasets.

To overcome these limitations, many promising methods have been proposed in the past few years. These methods can be grouped into two classes. The first class is the batch-based modeling method, which requires that all training data is available for estimating KPCs. Rosipal and Girolami proposed an EM algorithm for reducing the computational cost of KPCA [

The second class is incremental methods, which can compute KPCs incrementally to handle online data processing. Chin and Suter proposed an incremental version of KPCA [

Before continuing, a note on mathematical notations is given as follows. We use lower case and upper case letters (e.g.,

To address these limitations, we propose a two-phase incremental KPCA (TP-IKPCA), where the mapped data is represented in an

Overview of TP-IKPCA.

Here, we should clarify the relationship among some important quantities, including

The main contributions of our work are fourfold:

The rest of the paper is organized as follows. Section

In this section, we briefly outline the standard procedure of KPCA. As mentioned above, in KPCA, the input sample set

To obtain the eigenvectors in the kernel space, the covariance matrix is defined as

Combining (

Let

Considering

For a test sample

For the sake of simplicity, we assume that the mapped data

Of note, for KPCA, the kernel matrix

At present, there have been many incremental algorithms for PCA [

Let

If linear PCA is performed based on

Based on Theorem

Let

Based on Lemma

However, the orthogonalization process using Lemma

Let

(1) If

(2) If

Obviously, based on Lemma

Combining both Lemmas

An online algorithm for incrementally finding the orthonormal basis and the projection vectors.

Obviously, if we map all the samples of

In this section, we will outline the incremental learning method of KPCA based on the incremental version of PCA (IPCA) proposed by Hall et al. [

Given a sample set

Subsequently, we normalize

We compose

Broadly, the procedure of our incremental method is similar to IPCA presented by Hall et al. [

Once we determined the principal direction set

Based on the analysis in Section

Flowchart of TP-IKPCA.

The dimensions

Suppose that given the current sample set

We evaluated and compared the performance of TP-IKPCA on synthetic and real datasets with several typical KPCA-based approaches in terms of accuracy and time complexity. The comparison methods include

In this experiment, we use two-dimensional nonlinear synthetized data to evaluate the accuracy and memory space efficiency of KPCA, IKPCA-RS, INKPCA, and our proposed TP-IKPCA. The data is generated by:

The contour lines of the first three PCs obtained by each method for

Synthesized data including 500 samples and the contours of the first three principal components drawn using a polynomial kernel. The first row is from KPCA, the second row is from TP-IKPCA, the third row is from IKPCA-RS, and the forth row is from INKPCA. Data points are represented by red dots “

Figure

Evolution of the correlation coefficients between the first three PCs of three incremental algorithms and KPCA when increasing the number of training samples.

TP-IKPCA

IKPCA-RS

INKPCA

Table

Learning time (sec) for proposed and comparison methods.

Training stage | Testing stage | |
---|---|---|

KPCA | 1.067 ± 0.03 | 0.203 ± 0.006 |

IKPCA-RS | 0.201 | 0.007 |

INKPCA | 0.145 | 0.201 ± 0.004 |

TP-IKPCA | | |

Notice | The training number |

From Table

In what follows, we design two experiments to investigate the behavior of our algorithm when the number of training samples increases. Firstly, in Figure

Variation of the number of basis

Then, we compute the acceleration ratio which represents the ratio of the time consumed by KPCA to extract features from 100 test samples to the time consumed by TP-IKPCA. The resulting variation of acceleration ratio for testing speed with respect to the number of training samples is shown in Figure

Changes of the ratio of the test speed with respect to the training sample numbers

In this section, we consider an image processing application where we process the MNIST database of handwritten digits (

Figure

Restoration results by TP-IKPCA, IKPCA-RS, INKPCA, and KPCA.

Figure

Evolution curves of the correlation coefficients between the first three PCs of (a) TP-IKPCA, (b) IKPCA-RS, (c) INKPCA, and KPCA as the number of training samples increases (x-axis).

TP-IKPCA

IKPCA-RS

INKPCA

We also display in Table

Experimental results of learning time on different digit (sec).

Training stage | Testing stage | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

“0” | “2” | “4” | “6” | “8” | “0” | “2” | “4” | “6” | “8” | |

| ||||||||||

KPCA | 1.75 ± 0.055 | 1.74 ± 0.018 | 1.74 ± 0.021 | 1.76 ± 0.036 | 1.74 ± 0.030 | 0.358 ± 0.022 | 0.351 ± 0.013 | 0.357 ± 0.012 | 0.352 ± 0.011 | 0.354 ± 0.011 |

| ||||||||||

IKPCA-RS | 43.68 ± 0.049 | 41.32 ± 0.038 | 43.21 ± 0.068 | 44.75 ± 0.074 | 42.87 ± 0.059 | 0.221 ± 0.009 | | 0.208 ± 0.014 | 0.169 ± 0.017 | 0.214 ± 0.008 |

| ||||||||||

INKPCA | 23.67 ± 0.032 | 22.13 ± 0.038 | 23.91 ± 0.041 | 23.10 ± 0.047 | 22.13 ± 0.035 | 0.360 ± 0.023 | 0.358 ± 0.014 | 0.353 ± 0.012 | 0.355 ± 0.012 | 0.352 ± 0.011 |

| ||||||||||

TP-IKPCA | | | | | | | 0.251 ± 0.008 | | | |

| ||||||||||

| 242 | 351 | 219 | 190 | 282 | 242 | 351 | 219 | 190 | 282 |

Firstly, we analyze the training results shown in Table

From the testing results shown in Table

In what follows, we gradually increase the number of training samples and summarize the training and testing time required by TP-IKPCA and standard KPCA. We find from extensive experiments that the computational superiority of TP-IKPCA over KPCA increases with the number of training samples. Taking the experiments on digit “0” as an example, we repeated this evaluation 20 times and recorded the resulting training and testing time (in seconds) required by TP-IKPCA and KPCA under different training sample size

Training and testing time (sec) on digit “0” when increasing the number of training samples from 500 to 5000.

| 500 | 1000 | 1500 | 2000 | 3000 | 4000 | 5000 | |
---|---|---|---|---|---|---|---|---|

| 242 | 366 | 453 | 514 | 605 | 670 | 735 | |

| ||||||||

| | | | | | | | |

| ||||||||

training stage | KPCA | 1.75 ± 0.055 | 6.98 ± 0.066 | 18.72 ± 0.140 | 35.46 ± 0.185 | 79.97 ± 0.570 | 154.65 ± 0.687 | 255.01 ± 2.083 |

TP-IKPCA | 1.89 ± 0.019 | 5.34 ± 0.064 | 9.86 ± 0.085 | 14.92 ± 0.078 | 25.40 ± 0.015 | 37.46 ± 0.031 | 50.43 ± 0.411 | |

ratio | | | | | | | | |

| ||||||||

testing stage | KPCA | 0.358 ± 0.022 | 0.650 ± 0.013 | 1.131 ± 0.018 | 1.600 ± 0.026 | 2.406 ± 0.070 | 3.657 ± 0.170 | 5.234 ± 0.293 |

TP-IKPCA | 0.181 ± 0.012 | 0.266 ± 0.014 | 0.323 ± 0.017 | 0.371 ± 0.016 | 0.429 ± 0.429 | 0.465 ± 0.011 | 0.511 ± 0.010 | |

ratio | | | | | | | |

Based on Table

In this paper, we proposed a novel incremental feature extraction method termed as TP-IKPCA which endowed KPCA with the capability of handling dynamic or large-scale datasets. The proposed TP-IKPCA differs from the existing incremental approaches in providing an explicit form of the mapped data and the updating process of KPCs is also performed in an explicit space. Specifically, TP-IKPCA is implemented in two phases. First, an incremental algorithm is given to explicitly project the mapped samples in the kernel space. Second, we employed the existing incremental method of PCA to capture KPCs based on the explicit data in the projection space. The computational complexity of TP-IKPCA has a close relationship with the size of basis

TP-IKPCA can be utilized in any application where KPCA needs to be conducted, especially when training data is of large scale, or can only be collected one by one, where the conventional batch-based KPCA cannot be applied. The idea of this study can be extended to other kernel-based methods, such as Kernel Fisher discriminant analysis (KFDA), Kernel independent component analysis (KICA), and so on.

Let

Combining (

Let

For the relationship of the eigenvalues and the corresponding eigenvector between

Let

Based on the above definitions, we have

Let

Based on Lemmas

Let

If

So, conclusion (1) in Lemma

Combined with (

Then, we can obtain (

In our manuscript, we used two datasets to support the findings of our study. One dataset is the synthetic toy data, which can be generated by the following way:

The authors declare that they have no conflicts of interest.

This work was supported in part by National Natural Science Foundation of China (Grants nos. 61773244, 61373079, and 61572344), National Institutes of Health in USA (AG041721, MH107815, EB006733, EB008374, and EB009634), and Provincial Natural Science Foundation of Shanxi in China (2018JM4018).