Incremental Matrix-Based Subspace Method for Matrix-Based Feature Extraction

(e matrix-based features can provide valid and interpretable information for matrix-based data such as image. Matrix-based kernel principal component analysis (MKPCA) is a way for extractingmatrix-based features.(e extractedmatrix-based feature is useful to both dimension reduction and spatial statistics analysis for an image. In contrast, the efficiency of MKPCA is highly restricted by the dimension of the givenmatrix data and the size of the training set. In this paper, an incremental method to extract features of a matrix-based dataset is proposed. (e method is methodologically consistent with MKPCA and can improve efficiency through incrementally selecting the proper projection matrix of the MKPCA by rotating the current subspace. (e performance of the proposed method is evaluated by performing several experiments on both point and image datasets.


Introduction
Subspace analysis is helpful for a network in computer vision [1,2] and data modeling problems [3] and social network [4]. AlexNet [5] is a pioneer in using principal component analysis (PCA), a basic subspace analysis method, to help complex networks improve their performance. It performs PCA on the set of RGB pixel values throughout the ImageNet training set to reduce overfitting on image data by data augmentation. PCANet [6] employs PCA to learn multistage filter banks in its architecture. e PCA makes the architecture be extremely easily and efficiently designed and learned. e PCA based analysis method can be flexibly applied to designing the architecture of neural networks [7][8][9]. In complex networks, the training image, convolution induced feature maps, and some channel splicing features can be considered as matrix-based data. Improving the matrix-based subspace method and studying the relationship between matrix-based PCA and the basic vector-based one may provide a reference for networks. e regular PCA is a linear method in the sense that it only utilizes the first-order and second-order statistics. erefore, its modeling capability is limited when confronted with highly nonlinear data structures. To enhance its modeling capability, the vector-based kernel PCA (KPCA) [10,11] is proposed. It improves the performance of PCA in modeling by nonlinearly mapping the data from the original space to a very high-dimensional feature space, the so-called reproducing kernel Hilbert space (RKHS). e nonlinear mapping enables an implicit characterization of high-order statistics and can be flexibly combined with other learning methods [12][13][14]. e key idea of kernel methods is to avoid the explicit knowledge of the mapping function by evaluating the dot product in the feature space using a kernel function. e vector-based KPCA takes vectors as input. e matrix data, such as a two-dimension image, is vectorized into a vector before being fed into kernel methods. Such vectorization ruins the spatial structure of the pixels that define the two-dimensional image. To bring back the matrix structure, matrix-based KPCA [15] is proposed by combining two-dimensional PCA (2DPCA) [16] with the kernel approach. e matrix-based KPCA can generalize the vector-based KPCA and can provide richer representations than vector-based KPCA. e advantage of the matrix-based KPCA methods is that they enable us to study the spatial statistics in the matrix.
However, the size of Gram matrix of matrix-based KPCA is much bigger than that of vector-based KPCA, which means that the batch problem of the matrix-based KPCA is much more serious than that of vector-based KPCA. For matrixbased KPCA, the Gram matrix has to be provided before the eigendecomposition process can be conducted. For extra data added, additional new rows and columns are required for a new Gram matrix, and the eigendecomposition has to be performed for the size grown matrix. In addition, the principal component vectors must be supported by all the matrices in input dataset. is induces a high cost for storage resources and computing workloads during conducting applications with large datasets and input matrix with big size.
In the last decade, several strategies and approaches have been proposed to improve the batch nature for the vectorbased KPCA, such as the methods that kernelize the generalized Hebbian algorithm [17,18], the methods that compute the KPC incrementally [19][20][21], the greedy KPCA method [22,23], and the adaptive KPCA method [24]. In recent years, the KPCA based method still has great vitality [25][26][27], while the methods that counteract the huge batch nature of matrix-based KPCA are rare. e matrix-based KPCA can be speeded up using the idea proposed in improved KPCA [28]. However, this does not imply that the matrix-based KPCA computation is solved, since the size of eigendecomposition still depends on the size of the datasets. Consequently, an approach that adapts to the batch nature of matrix data with efficient computations is required.
is paper proposes an incremental matrix-based KPCA (IMKPCA) method to approximating the traditional one with less computation time and memory usage for extracting kernel principal vectors under a certain accuracy. e contributions of this paper are in three points: (1) e proposed method is implemented through the total scatter matrix to avoid directly operating on the Gram matrix of matrix-based KPCA. at is what inspires us the idea of the proposed IMKPCA. (2) e basis of our solution is incrementally adjusting the current eigenvector matrix to keep the total scatter of the matrix dataset. is work can be completed by decomposing the added matrix data into a component parallel to the previous eigenvector matrix and a component orthogonal to that. Meanwhile, the standard method computes the scatter matrix directly.
(3) e proposed matrix-based feature can be used to study the spatial statistics in the matrix under an acceptable computational complexity and memory usage. However, the vector-based one cannot. e rest of this paper is organized as follows. e preliminaries of matrix-based KPCA are briefly introduced in Section 2. en, an incremental matrix-based kernel subspace method is presented in Section 3, followed by the experiment results shown in Section 4. Finally, the conclusion is presented in Section 5.

Preliminaries
⟹ and ⇓ mean horizontal and vertical concatenations, respectively. e expressions of matrix and matrix dataset can be simplified by using ⟹ and ⇓ as in [15].
For a matrix X ≜ [⟹ q i�1 x i ] ∈ R p×q , where p and q are the numbers of rows and columns, respectively, we define its , where x i is the i-th column of X and ϕ is a nonlinear mapping from R p to R f . e dot product matrix K ∈ R q×q of two kernelized matrices X and Y is defined as We begin by obtaining a matrix dataset: block matrix X n ≜ [⟹ n i�1 xi] ∈ R p×nq ; the idea of matrix-based KPCA is performing two-dimensional PCA [16] in feature space, where x i ∈ R p×q is the i-th matrix data in the dataset.
Assuming that the dataset X n is kernelized to ∈ R f×nq and that it is centered in feature space (we shall return to this point later), its total scatter matrix is estimated by Given that the mapping function ϕ is implicit, the KPC cannot be computed by performing eigendecomposition on Σ n directly. e reason is that the nonlinear feature space's cardinality is extremely large, and it usually makes the matrix Σ n rank deficient. e matrix-based KPCA circumvents the KPC by following the standard method in [15] and an eigendecomposition problem for the Gram matrix is done in practice to calculate the leading eigenvectors for Σ n . Suppose that (λ w , v w ) is an eigenpair of the matrix K n ; that is, v w ∈ R nq×1 is a unit eigenvector with the corresponding eigenvalue λ w as the w-th largest eigenvalue or K n v w � λ w v w . en the r most significant matrix-based KPC in feature space takes the matrix form of where U w ∈ R f are the w-th columns of U n , v n ∈ R nq×r is the matrix with columns of v w , and Λ n ∈ R r×r is a diagonal matrix whose diagonal elements are λ w : w � 1, 2, . . . , r . en the total scatter matrix has the form Σ n ≜ 1nU n Λ n U ⊤ n . For a given matrix Y ≜ [⟹ q i�1 yi] ∈ R p×q as a test data, with a kernelized form ϕ(Y) ∈ R f×q , its w-th principal component vector corresponding to ϕ is computed as 2 Complexity Equation (5) is a critical factor for the proposed IMKPCA because an explicit form of the function ϕ is not required. at means the principal component vectors of a given matrix data onto the matrix-based KPC can be solved entirely by kernel functions. Based on equation (5), for one projection direction, the IMKPCA outputs a principal component vector, and the dimensionality of the vector is equal to the number of the columns of the input matrix. In comparison, the vector-based KPCA outputs a principal component value for one projection direction [11]. In the next section, we show how to update matrixbased KPC in the feature space based on the new data's projections.

Incremental Matrix-Based Kernel Principal Component Analysis
In this section, a recursive matrix-based KPC formulation is selected to improve the batch nature of matrix-based KPCA. We describe how the original matrix-based KPC can be updated in an incremental way. e detailed procedures and their superiority compared with the standard counterparts are shown in the following sections.

Recursive Formulation for Matrix-Based KPC.
Assume that the current data has been given as a block matrix X n ≜ [⟹ n i�1 xi] ∈ R p×nq and the new data x n+1 ∈ R p×q is given for update. We commence with the recursion of the total scatter matrix: Based on equation (4), ϕ(x n+1 ) can be decomposed into a component parallel and a component orthogonal to U n ; i.e., where P U n (ϕ(x n+1 )) is the matrix of ϕ(x n+1 ) projecting onto the matrix-based KPC U n and P U ⊤ n (ϕ(x n+1 )) is the matrix of ϕ(x n+1 ) projecting onto the orthogonal complementary space of U n .
Based on the idea of KPCA [6,18,29], the principal component can be expanded approximately in terms of some training samples in the feature space. is means that ϕ(x n+1 ) can be reconstructed by the subspace spanned by U n , the previous matrix-based KPC, and U ⊤ n PU ⊤ n (ϕ(x n+1 )) can be modeled as noises. However, we have where p w x n+1 can be calculated by replacing Y with x n+1 in equation (5). en, the current matrix-based KPC can be obtained by rotating the previous matrix-based KPC to preserve the total scatter of X n+1 ≜ [⟹ n+1 i�1 xi] ∈ R f×(n+1)q most faithfully.

Rotation of Matrix-Based KPC.
e key observation is that the effect of ϕ(x n+1 ) on the previous matrix-based KPC can be presented by a rotation. en, based on equations (7) and (8), we have Substituting the previous matrix-based KPCA result Σ n � (1/n)U n Λ n U ⊤ n and equation (9) into equation (6), we get Denote the eigendecomposition of the matrix where W 1 ∈ R r×r is an orthonormal matrix and D 1 ∈ R r×r is a diagonal matrix; equation (10) becomes Consequently, based on Σ n � (1/n)U n Λ n U ⊤ n and equation (11), the matrix-based KPCA system of the total scatter matrix equation (6) is recursively given by In equation (12), W 1 represents the directional variation of matrix-based KPC caused by ϕ(x n+1 ) and D 1 represents the component ratio change of the updated matrix-based KPC. From equation (12), we rotate matrix-based KPC to most faithfully preserve the total scatter of the given data when ϕ(x n+1 ) is linearly independent on the previous matrix-based KPC.

Recursive Formulation with Mean
Updating. For the sake of simplicity, the centralized assumption in feature space is provided in former analysis. However, the assumption is often invalid. e reason is that the mean matrix of the mapped data in the feature space always changes when new Complexity 3 data is added for updating. For discarding this assumption in feature space, we get where M n ∈ R f×q and M n+1 ∈ R f×q denote the mean matrices of the previous mapped data and the current mapped data, respectively. In the total scatter matrix Σ n+1 , the mapped matrices data are centered with the current mean matrix M n+1 rather than the previous mean matrix M n that is used in the total scatter matrix Σ n . Hence, the recursion formulation equation (6) is invalid for application. Fortunately, we have Following the idea of the scatter matrix update in incremental learning [30], it can be easily achieved that the recursive formulation with the mean update for the total scatter matrix of matrix-based KPC has the following form: e recursive matrix-based KPC formulation of equation (15) can be computed following the approach in the previous subsection, and the update formulations can be obtained completely by imitating equation (7) ∼ equation (12). e mapped matrix should be projected onto equations (7) and (8), i.e., the eigendecomposition of matrix (10) should be replaced by the eigendecomposition of matrix is the w-th principal component vector of M n and can be calculated by equation (5) as follows: e recursive formulation equation (12) comes to one with the mean update for the proposed matrix-based KPC.
In the subsequent experiment section, for a given matrix-based dataset, a portion of matrix data are chosen as original data X n ≜ [⟹ n i�1 x i ] ∈ R p×nq and the remaining matrix data are used as new data x n+1 ∈ R p×q successively to update the current principal components.

Experiments
Several experiments were conducted to examine four properties of the proposed method: (1) the effectiveness of approximating KPC under stationary data; (2) the influence of parameter on the proposed method under MNIST and Fashion MNIST databases; (3) the superiority compared with several reference methods under some well-known databases, the Fashion MNIST [31], ORL [32], YaleA [33], Extended [34], PF01 [35], and COIL 100 [36]; and (4) the efficiency compared with reference methods for fashion products from the Fashion MNIST database.
In order to measure the accuracy of the proposed method in approximating MKPCA and to assess the quality of the solution objectively, a distance measure based on the angles between principal kernel components is employed in this section as follows: where v * w is the ground truth of the w-th principal kernel component computed by a standard MKPCA method and v w is the w-th extracted kernel principal component computed by the proposed method.
For visualizing the capacity of IMKPCA in describing the construction of given data, two-dimensional stationary set and matrix-based dataset both are necessary because a twodimensional datum is a special matrix datum. In Sections 4.1 and 4.2, the experiments are mainly based on two generated two-dimensional datasets. In Sections 4.3 and 4.4, the experiments are based on matrix-based datasets. e Gaussian kernel function k(x, y) � exp − σ − 1 ‖x − y‖ 2 is used in this section. For two types of matrix data X ∈ R p×q and Y ∈ R p×q , the dot product matrix K ∈ R q×q of them is j ]] as that mentioned in Section 2.

Stationary Data Approximation Accuracy.
e experiment is carried out on the 90 toy data [10] to serve in testing the effectiveness of the proposed method in approximation accuracy. e data are generated in stationary environment in the following way: x-values have uniform distribution in [− 1, 1], x i ∼ U[− 1, 1], and y-values are generated from y i � x 2 i + ξ, i � 1, 2, . . . , 90, where ξ is normal noise with standard deviation 0.2. We calculate the principal kernel components by standard MKPCA [15], vector-based KPCA [6], and the proposed method, respectively. e values of the test data projection onto the extracted kernel principal components are given in Figures 1 and 2. Figure 1 contains lines of constant principal component value (contour lines).
ose contour lines indicate the structure of the data in the feature space. e red data are the total sample, and the constant projection values onto the first five KPC computing by the three methods are shown. Figure 2 illustrates the results of the proposed method in the computation process. e green dots are the original data, red dots are the appended data, and blue contours depict the constant projection values onto the first three KPC. It can be visually seen that the KPC exacted by IMKPCA converges to the ground truth KPC, even beginning with an unsatisfactory initialization. Figures 1  and 2 show that the proposed method is with equal authentic accuracy but a wider range of original data choice compared with MKPCA and vector-based KPCA for computing the principal kernel components of the generated stationary data.

Influence of Parameters.
In this experiment, we investigate how the number of original data affects the effectiveness of IMKPCA. e measure equation (17) is used as a distance between subspace computed by IMKPCA and the ground truth computed by MKPCA. Two matrix-based stationary data are used. One consists of the first 200 training images of "T-shirt/top" in Fashion MNIST, and the other contains the first 120 sets of training data of digit "0" from the MNIST dataset. For each stationary data, two results are shown with n � 30 and n � 50 as original data capacity, respectively. Equation (12) is used to update the subspace in feature space, and the distance between the updated subspace and the standard one is recorded with radian. Figure 3 shows the result of each iteration under different n, in which the distances between the first three kernel principal components are computed as a distance between two subspaces (represented by "MKPC-1", "MKPC-2," and "MKPC-3," respectively).

Figures 3(a) and 3(b) show the results for image data of "T-shirt/top." Figures 3(c) and 3(d)
show the results for matrix data of digit "0." In each subfigure, the result shows that the KPC extracted by IMKPCA converges to the ground truth when iteration process equation (12) continues. Figure 3 shows that setting n to a large value helps IMKPCA improve the accuracy of approximating the ground truth.

Study of the Spatial Statistics in the Matrix Data.
e experiment concerns the advantage of IMKPCA in studying the spatial statistics in the matrix data. We conduct the proposed method on digits from the MNIST dataset and extract the matrix-based principal component on both column-lifted and row-lifted stages using the Gaussian e row-lifted stage means processing the feature matrix extracted by the proposed method in the column-lifted stage as input. e result of the row-lifted stage represents the final extracted spatial statistics of the given dataset. Each digit is a 28 × 28 matrix, while we kept only the first sixteen principal components that are displayed as a 16 × 16 matrix. For comparing the proposed method with the vector-based KPCA, the vector should be an input. However, an image of the digit is represented as a matrix. Hence the images are vectorized into vectors before being fed into vector-based KPCA, and the feature of each digit, a sixteen-dimension vector, consists of the first sixteen principal component values. Figure 4 shows ten example images of digit "3" and its matrix-based principal components computed by the proposed IMKPCA. Figure 5 shows the result of vector-based KPCA for vectorized digit "3," in which the vertical axis shows the projection values of the vectorized digit onto the principal component, and the horizontal axis is the index of the ten samples in Figure 4. e projection value of digit "3" onto the i-th principal component is represented by "VPCi." Figure 6 shows ten example images of digits "1" to "5"; two samples for each digit and their matrix-based principal components are computed by the proposed methods. Figures 4-6 show that the matrix-based features can provide more significant information than that of vector-based ones. e vector-based KPCA outputs a vector as a feature, while the IMKPCA outputs a matrix. Typically, the matrix-based features of digit "3" present as a distinctive white line in matrix in Figure 4; the digits "1" to "5" are present as lines in Figure 6, while the vector-based ones are numerical values in Figure 5. In particular, digit "4" is present with two lines as it has two distinct line structures; digits "2" and "5" are similar in column-lifted stage (the second row in Figure 6) and  Figure 6). e reason is that they are two different digits, but the vertical flip of digit "5" is similar to that of digit "2." In order to show the advantage of the spatial statistics for the matrix data as quantitative analysis, two image tests, a digit recognition test and an image database classification test, are conducted. For digits test, the first 100 training images and the first 500 test images of each digit in MNIST serve to test the proposed method's capability in digits recognition. e experiment is repeated for the dataset with different training data ratios as original data for computing matrix-based KPC in the feature space. e extracted features in the column-lifted and row-lifted stages are used to digitize the digit recognition under the nearest distance classifier. For Gauss function, σ � 100 in column-lifted stage and σ � 1 in row-lifted stage for the proposed method, and σ � 100 for vector-based KPCA are chosen to retain the total scatter of the data in the feature space. Table 1 presents the average digit recognition rates and the variances (the value follow ±) for the chosen dataset. e ratios r � 0.1, r � 0.2, r � 0.3, and r � 0.4 correspond to n � 10, n � 20, n � 30, and n � 40 as the capacity of original data, respectively. In Table 1, the result of the vector-based method is unique because it does not need to choose original data for updating. Table 1 shows that IMKPCA has a prominent advantage compared with vector-based KPCA. e reason is that the vectorization ruins the pixels' spatial structure that defines the image of digit from the MNIST dataset. Meanwhile, IMKPCA brings back the structure. In particular, the results obtained by features in the row-lifted stage are better than those of column-lifted one due to the extracted spatial statistics for matrix data. Additionally, using more samples as original data in the training period increases the recognition performance. For image database tests, the Fashion MNIST, ORL, YaleA, Extended YaleB, and COIL 100 databases are chosen to evaluate the classification. Figure 7 shows some image samples from each database. e Fashion MNIST database consists of 60,000 training images and 10,000 test images of 10 fashion products, such as "T-shirt/top," "Trouser," and "Dress". In  In the experiment, ten images of each object are chosen for training, and the remaining images are used for testing. Each image from COIL 100 is resized to a 28 × 28 matrix in the experiment for computer resource restriction. Table 2 shows the average database recognition rates and variances under ten different training sample sequences for the given six databases, in which "NC" indicates "Noncomputable." e proposed IMKPCA performs better than the vector-based PCA, vector-based KPCA, and 2D PCA to recognition rate for the five databases, as shown in Table 2. e proposed IMKPCA and the MKPCA show similar performances for the test databases. e reason is that the proposed method is methodologically consistent with MKPCA, and it is incrementally approximating the ground truth of MKPCA. In comparison, the proposed method is more efficient and more computable than MKPCA. e major reason is that the size of the eigendecomposition of MKPCA still depends on the datasets' size, such as the beyond computation for the COIL 100 database and Fashion MNIST database in Table 2.

Computational Efficiency Comparison.
is section compares the computation time and resource consumption between the MKPCA, vector-based KPCA, and IMKPCA.
e Fashion MNIST database is employed to compare the proposed method's feature extraction efficiency with that of reference ones. For exact and reasonable comparison, the computer operating environment is the same, and the three methods are executed on a computer with 4G RAM and 3.2 G CPU. e input dataset is extracted from the training samples of "T-shirt/top." MKPCA, vector-based KPCA, and IMKPCA are executed for datasets with different capacities. Since the resource consumption of MKPCA is large, we consider the case that computing time is longer than 400 seconds and memory overflow with 4G RAM as "Noncomputable." For the proposed method, the computing time in this example mainly depends on equation (12), in which M 1 and D 1 are computed with computational complexity O(r 3 ). e computing times of the standard MKPCA and the vector-based KPCA depend on eigendecomposition with computational complexity O(n 2 ) and O(n 2 pq), respectively, where n is the number of pieces of training data and p and q are the row size and column size of the given matrix data, respectively. Figure 8 shows the maximum value, minimum value, and mean value of angles between the extracted kernel principal component and the ground truth of MKPCA for images of 10 fashion products. It indicates that the proposed method's result can approximate the real projection matrix well for Fashion MNIST. In particular, the minimum values and mean values for ten fashion products are mostly smaller than 0.05. Figure 9(a) shows the computational time for Fashion MNIST database under different capacities. It indicates that the vector-based method and the proposed  method need much less computational time and resource consumption than MKPCA under the same computer system with software Matlab2014. Figure 9(b) shows the computational times for the proposed method under different ratios of original data to the computed data. For each curve, the percentage of 1000 fashion products image is used to perform the proposed IMKPCA method. Figures 8  and 9 show that the proposed method needs more time than     the vector-based KPCA method but the proposed method occupies some advantages as a whole; for example, it can retain the spatial structure of the data in the matrix and it can provide more information than the vectored-based method with a better property. e proposed IMKPCA method nearly linearly increases, while the standard MKPCA increased sharply according to the training samples.

Conclusion
In this paper, an incremental matrix-based kernel subspace method is proposed. e main innovation made by the proposed method is the idea of rotating matrix-based kernel principal components to most faithfully preserve the total scatter of the given data when new data are appended. e proposed method is still subject to the KPCA methodology, which makes it more reliable than most of the existing algorithms for improvement. For the proposed method, the feature extractor should also be associated with the eigendecomposition problem, while the eigenvalue has a different dimension compared with that of the vector-based KPCA method. e experiment results show that feature extraction based on the proposed method is more efficient than the standard MKPCA and the vector-based KPCA. On the other hand, the proposed method obtains efficient feature extraction relying on the linear relationship between the current principal component and new appended data. It means that the proposed method is efficient in processing stationary dataset. Future work will concentrate on how to adapt the principal components to the nonstationary matrix dataset.
Data Availability e data that support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.