Incremental Discriminant Analysis in Tensor Space

To study incremental machine learning in tensor space, this paper proposes incremental tensor discriminant analysis. The algorithm employs tensor representation to carry on discriminant analysis and combine incremental learning to alleviate the computational cost. This paper proves that the algorithm can be unified into the graph framework theoretically and analyzes the time and space complexity in detail. The experiments on facial image detection have shown that the algorithm not only achieves sound performance compared with other algorithms, but also reduces the computational issues apparently.


Introduction
Nowadays, increasing amounts of data in the field of industrial, economic, medical, and other application areas, such as signals, measurements, images, and videos, are becoming available due to the development of computer technology. In order to excavate the hidden information in the data implicitly describing underlying processes or structures, advanced intelligent tools are proposed. However, since the stochastic nature of the processes and their measurement, structure in this data is mostly collected with noise. Consequently, it is reasonable to seek robust and adaptive tools that can cope with this nature.
Computational intelligence techniques have been investigated to answer this need. These techniques have been concerned with reproducing the abilities of human brains. Machine learning techniques exactly imitate the learning procedure of human, which construct learning model based on example data and use that to make predictions and decisions. However, due to the noise in data, it is important to construct efficient learning model to help sift useful information from the noise.
In regards to this, machine learning algorithms project high-dimensional data into low-dimensional feature space to make their low-features as separable as possible. Generally, they are classified into two categories: supervised learning and unsupervised learning. The essential difference between supervised learning and unsupervised learning is that whether the class information is considered. Generally speaking, the recognition performance of supervised learning is superior to that of unsupervised learning. As a classical machine learning algorithm, linear discriminate analysis (LDA) [1,2] seeks optimal discriminative vectors to maximize the interclass scatter matrix and to minimize the intraclass scatter matrix. A large number of research works have shown the predominant advantage of LDA in various applications.
It is worth noting that traditional LDA is based on vector model. It requires all data being vectorized before learning. Actually, high-dimensional image data is structured data; the vectorization operation will break the correlation relationship of different pixels. Furthermore, the vectorization operation also is easy to result in the curse of dimensionality problem. As a result, machine learning algorithms [3][4][5][6][7][8][9][10][11] based on tensor algebra are investigated. These algorithms consider high-dimensional image as a high order tensor and introduce tensor algebra to analyze tensor data. Tensor representation not only is helpful to preserve the structure of highdimensional image, but also serves as an effective way to avoid the curse of dimensionality problem. To unify all machine learning algorithms, [12] proposes the graph embedding framework. Under this graph embedding framework, two kinds of projective forms are summarized, called vector-tovector and tensor-to-tensor forms, respectively.

Computational Intelligence and Neuroscience
However, for all machine learning algorithms, they have to train all samples again when new samples are added, which results in heavy computational cost. Consequently, incremental machine learning algorithms are proposed [13][14][15][16][17]. But most incremental learning algorithms focus on vector machine learning. Only a limited number of works study incremental learning in tensor space [18][19][20]. To investigate the incremental tensor learning, this paper develops incremental tensor discriminant analysis (ITDA), which employs supervised learning in tensor space and introduces incremental learning to process online learning. Furthermore, as a kind of machine learning algorithm, this paper also exploits the relationship between the proposed methods and the graph embedding framework and proves that the algorithm is a special case of tensor-to-tensor form under the graph embedding framework theoretically. This paper also analyzes the time and space complexity in detail. At last, this paper conducts facial image detection experiments to evaluate the proposed method. The experimental results have demonstrated the advantage of the method.

Tensor Discriminant Analysis
For multidimensional image data = { 1 , . . . , }, where ∈ R 1 ×⋅⋅⋅× , the corresponding class label is ( ) ∈ [1, ], where is the number of the class. Let the number of the th class be ; then the following definitions are introduced. Definition 1. Within-class scatter tensor is defined: where represent the mean tensor of the th class.
Definition 2. Between-class scatter tensor is defined: where represent the total mean tensor.

Definition 3.
Total scatter tensor is defined: It is easy to derive that (4) Definition 4. Mode-within-class scatter matrix is defined: where ( ) is the mode-matrix of the th sample and ( ) is the mode-mean matrix of the th class.
Definition 5. Mode-between-class scatter matrix is defined: where ( ) is the mode-total mean matrix.
Definition 6. Mode-total scatter matrix is defined: The basic idea of TDA is to seek projective matrices to make within-class scatter tensor smaller and between-class scatter tensor larger. The objective function is In order to solve the above function, the iterative technique is adopted. It is assumed that the projective matrices Computational Intelligence and Neuroscience 3 { (1) , . . . , ( −1) , ( +1) , . . . , ( ) } are known; then ( ) is solved as follows: (1) . Since (− ) (− ) = , so the above equation can be rewritten: Based on the basic concept of TDA and related matrix knowledge, we can get the following theorems.

Theorem 7.
In tensor discriminant analysis, the modeintraclass scatter matrix is generally nonsingularity.
Proof. Defining the following matrix where is the number of samples, ( ) expresses the class label of the th sample. Then the mode-intraclass scatter matrix is represented: Generally speaking, ≪ ; then rank ( ( ) ) = min ( , ) = .

Theorem 8. Equation
Proof. Based on the basic concept of tensor algebra, the numerator of (8) can be rewritten: Letting = vec( ) vec( ) = (1/ ) ∑ =1 , then the above equation is written: where 4 Computational Intelligence and Neuroscience Within the low-dimensional feature space, it is desired to preserve the property as demonstrated in (4), so the denominator of (8) is formulated as follows: where Combining (19) with (16), (18) can be written: where Consequently, (8) is expressed: The form of (22) is consistent with the tensor-to-tensor form of the graph embedding framework. Therefore, (8) can be unified into the graph embedding framework.

Incremental Learning Based on a Single Sample.
In order to distinguish these variables that need to be updated during incremental learning procedure, the paper employs the subscript old to mark the variables before incremental learning. For example, old expresses the total mean tensor before new samples are added.
When a single sample new is added, its class label is new ; then the mode-total mean matrix becomes If new ∉ [1, ], that is, the new sample belongs to a new class. In this case, the total class number is = old +1 and modeinterclass scatter matrix is updated: Computational Intelligence and Neuroscience 5 where is the updated sample number of the th class. Mode-intraclass scatter matrix is where ( ) new is the mode-mean matrix of the new sample. Because a single sample is added and it belongs to a new class, we can get Then (25) becomes It is demonstrated in (27) that mode-intraclass scatter matrix will not change when a new sample with new class is added.
When the class label of the new sample new = ∈ [1, old ], that is, the class label is not a new class. In this case, the total class number = old ; then mode-interclass scatter matrix is Mode-intraclass scatter matrix is Because the new sample belongs to the th class, then the class mean of the th class becomes Based on this, we can get So (29) is simplified: where new is the mean tensor of the new samples belonging to the th class. The corresponding mode-mean matrix of the th class is Then the number of samples in the th is The total mean tensor is updated: where new is the mean tensor of all new samples. The interclass scatter mean tensor is updated: The corresponding mode-interclass scatter matrix is The mode-intraclass scatter matrix is Substituting (34) into the following equation, we can get old ( Similarly, we can get Without loss of generality, it is supposed that, for new samples, there are +1 samples belonging to the new class label + 1; then updated mode-interclass scatter matrix is and mode-intraclass scatter matrix is It is not difficult to find that incremental learning based on singular sample only is a special case of incremental learning based on multisample.

The Complexity Analysis.
For tensor discriminant analysis, the main computational time is spent on the computation of interclass mean, total mean, inter-and intraclass scatter tensor, and Eigen decomposition. The computation cost of inter-and intra-class scatter tensors depends on the number of training samples. If there are a large number of training samples, it cannot avoid to increment computational time.
For incremental discriminant analysis, the main computational time is spent on the computation of updated interand intraclass scatter matrix and the class number.
For Eigen decomposition, both the time complexity of TDA and ITDA are ( 3 ). The main difference of the time complexity is the computation of inter-and intraclass scatter matrix. For TDA, the time complexity is ( +1 ), so the time complexity will increase with the number of training samples. For ITDA, the time complexity is ( +1 + +1 ), which is related to the class number and the number of new samples. It has no relationship with the number of initial training samples. Consequently, ITDA is helpful to reduce the time complexity.
Considering the space complexity, ITDA is also superior to TDA. When new samples are added, TDA needs ∏ =1 bytes to save all training samples, but ITDA only needs ∏ =1 bytes to save new added samples, ∏ =1 bytes to save the total mean, ∏ =1 bytes to save the class mean, and ∑ =1 2 bytes to save mode-scatter matrix. Hence ITDA has the capability to save space.
Compared to incremental learning based on single sample with incremental learning based on multisamples, incremental learning based on single samples has an advantage to reduce the space complexity because it only deals with one sample for each time.

Experiments
In this section, a series of experiments are carried out to validate the performance of incremental tensor discriminant analysis (ITDA). The CBCL image data set is used to conduct facial image detection experiments. The dataset contains two classes of images, including facial images and nonfacial images as shown in Figure 1. The total number of the datasets is 2988 images, in which there are 2429 facial images and 559 nonfacial images. For each image, the size is 19 × 19. This paper divides whole dataset into training dataset with 1215 facial images and 280 nonfacial images and testing dataset with 1214 facial images and 279 nonfacial images. Furthermore, training dataset is divided into initial training ITLDA integrates the tensor representation and incremental learning; it is reasonable to believe that it has the advantage to improve the detection performance and reduce the time and space complexity. In this respect, ITLDA is compared with LDA [21], ILDA [14], TPCA [22], ITPCA [23], and TDA [9]. LDA is the classical linear discriminant analysis. ILDA is the incremental version of LDA. TPCA is also called MPCA (multilinear principal component analysis), which carries on principal component analysis with tensor data. ITPCA is proposed to suit for incremental principal component analysis for tensor data. TDA also represents data as tensor structure and conducts multilinear discriminant analysis. For each time of incremental learning, the paper adds one incremental dataset and then extracts low-dimensional features on testing dataset. The nearest neighbor classifier is employed to classify these low-dimensional features.
The comparisons of detection performance for different algorithms with incremental learning are shown in Figures  2, 3, 4, and 5, respectively. It is worth noting that LDA is the worst and ILDA is better than LDA. However the detection results of ILDA drop with the increment of the dimension of low-dimensional features. TPCA and ITPCA have similar detection results and both of them exceed LDA and ILDA. The probable reason is that TPCA and ITPCA represent data as tensor structure, which make full use of the interior structure information to enhance the detection performance. TDA is superior to the above four algorithms. When the dimension of low-dimensional features is low, TDA and ITDA have comparative detection percent and ITDA begins to surmount TDA when the dimension of lowdimensional features increases. Figure 6 and Table 1 have shown the best detection results of different algorithms. It can be seen that the detection performances of different algorithms are improved with the increment of incremental learning numbers and ITLDA always has the best performance. Consequently, it can be derived that the increment of incremental learning number is helpful to improve the detection result. More than that, as shown in Figures 7 and  8, incremental learning algorithms ILDA, ITPCA, and ITDA have the capability to alleviate time and space complexity   apparently compared with nonincremental learning. Furthermore, since ITPCA and ITDA adopt tensor representation, they have lower time and space requirements than LDA.

Conclusions
In this paper, incremental tensor discriminant analysis (ITDA) is investigated. It adopts tensor representation to keep the structure information for high-dimensional images and introduces incremental learning to complete online learning. This paper also proves the relationship between ITDA and the graph framework theoretically. The facial detection experiments have shown that ITDA has better performance than TDA and is able to reduce the time and space complexity apparently.