Bio-Inspired Structure Representation Based Cross-View Discriminative Subspace Learning via Simultaneous Local and Global Alignment

Recently, cross-view feature learning has been a hot topic in machine learning due to the wide applications of multiview data. Nevertheless, the distribution discrepancy between cross-views leads to the fact that instances of the different views from same class are farther than those within the same view but from different classes. To address this problem, in this paper, we develop a novel cross-view discriminative feature subspace learning method inspired by layered visual perception from human. Firstly, the proposed method utilizes a separable low-rank self-representation model to disentangle the class and view structure layers, respectively. Secondly, a local alignment is constructed with two designed graphs to guide the subspace decomposition in a pairwise way. Finally, the global discriminative constraint on distribution center in each view is designed for further alignment improvement. Extensive cross-view classification experiments on several public datasets prove that our proposed method is more effective than other existing feature learning methods.


Introduction
Under the modern technique background, there are many artificial intelligence methods inspired by nature, such as machine learning [1][2][3][4], reinforcement learning [5], and artificial immune recognition [6]. Among them, machine learning can effectively deal with image recognition problems. However, some researches have indicated that the adaptive ability of traditional machine learning drops sharply, when the learned images have large distribution discrepancy, such as cross-view data [2]. is discrepancy means that data variance in the view space is larger than data variance in the class space. It generates that the different views become the major factor affecting recognition. erefore, we mainly focus on cross-view subspace learning to deal with the distribution discrepancy problems in this paper.
In recent years, subspace learning (SL) has made great contributions in the field of machine learning and has a wide application in computer vision, data mining, and so on [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]. One of the most typical methods is principal component analysis (PCA) [22], which uses an orthogonal transformation to reduce the dimensionality while preserving unique information (principal component) of the data. However, PCA is an unsupervised dimensionality reduction method, which disregards the discriminative information attached to the semantic component. So, a supervised dimensionality reduction method, linear discriminant analysis (LDA), using semantic component was proposed in [23]. LDA learns a supervised linear combination to adjust the spatial dispersion. However, LDA generates the overfitting phenomenon when processing noisy data. To overcome corruption, rank minimization technique has been spotlighted in recent years. Candès et al. enforced low-rank and sparse constraints to eliminate corrupt information in the data [24]. After that, low-rank representation (LRR) was proposed to restore clean data through dictionary representation in multiple spaces [7]. In the last decade, LRR model has achieved satisfactory results in various fields [8][9][10][11][12][13][14][15][16][25][26][27][28][29][30][31][32][33][34][35], such as domain adaptation [8], clustering [9], transfer learning [25], and low-rank texture structure [26]. However, Liu et al. pointed out that the dictionary of LRR may fail when the data is insufficient. To solve this problem, the latent LRR (LatLRR) was proposed to enhance subspace learning by latent information [10]. However, LatLRR is still unsupervised. Inspired by LRR models and LDA, Li et al. unified linear discriminant constraint and low-rank representation into subspace learning to enhance the learned low-dimensional feature [11]. Afterwards, the low-rank embedding (LRE) proposed in [12] provides a robust embedding subspace learning framework that eliminates reconstruction errors by adopting l 2,1 -norm constraint on the projection residual. e latent low-rank and sparse embedding (LLRSE) was developed in [13] based on LRE. LLRSE additionally introduces a reconstructed orthogonal matrix, which makes the projection space contain more unique feature.
Recently, a lot of algorithms dedicated to cross-view feature learning based on above methods have been developed [27][28][29][30][31]. e low-rank common subspace (LRCS) was proposed in [27], which finds a view common subspace by LRR. Nevertheless, LRCS only considers the view label and ignores the discrimination of the class label. From the perspective of multiview data structure, a supervised subspace learning method, namely, multiview discriminant analysis (MVDA), was proposed in [28] by using the discriminative information in the different views. After that, a multiview manifold learning with locality alignment (MVML-LA) framework proposed in [29] provides us with a discriminative low-dimensional latent space. Most recently, the robust cross-view learning (RCVL) was designed to learn a common view-invariance discriminative subspace by adopting a novel rank minimization technique [30]. However, RCVL ignores the global discriminative information.
Human visual perception works through visual circuit, whose function is to understand the visual signal in a layerwise way. In detail, our brain uses a small feature extractor (each layer in the visual circuit) to obtain some simple features from the real complex signal. Drawing inspiration from layered processing in human visual system, we represent the cross-view data in two different structure layers, class structure layer and view structure layer, respectively, in order to construct the view-consistency feature learning model. Hence, we design two novel discriminative alignment constraints from simultaneous local and global viewpoints, which can not only disentangle the class and view layers but also bridge the gap existent in cross-view data. Our contributions are as follows: (1) e dual low-rank representation model is set up to discover the two latent structures existent in crossview data, which are view and class structures, respectively. ese two distinguished structures are conducive to discovering the potential feature for cross-view classification task.
(2) A local alignment constraint based on two designed local graphs is utilized to transform the neighbouring relationships between each pair of samples in the learned subspace. is constraint can make the view and class structures be separated effectively. (3) A global alignment constraint designed in our framework ulteriorly cut down view discrepancy. e projected samples from view and class subspaces are used to compose the discriminative constraint in global alignment by enforcing the mean distance between classes in different views. Figure 1 illustrates how to learn an aligned subspace in which samples have a large distance between classes and a close distance within class from both of class perspective and view perspective. e structure of this paper is organized as follows. Related works simply review the baseline of our work. Part three presents the proposed model and its solution process. Experiments show the results of the comparison experiments and parameter experiments. At last, conclusions summarize this paper.

Related Works
Our method has an intimate connection with the two following methods: (1) low-rank representation and (2) linear discriminant analysis.

Low-Rank
Representation. LRR is unaffected by errors and can explore the underlying structure of data. Suppose that X � [X 1 , X 2 , . . . , X k ] is a matrix of natural data from k classes. e model of LRR can be expressed by where Z is the low-rank linear combination coefficient matrix of data X. Matrix E with l 1 -norm can fit the corrupted information in real life. λ > 0 is used to balance the level of corruption. erefore, LRR that can simulate the corrupted data with representation framework possesses a feasible skill, which can handle the cross-view data well.

Linear Discriminant Analysis.
LDA is a familiar supervised method proposed for dimensional reduction, which is constrained by discriminative semantic information. e main principle of LDA is to find a discriminative subspace with the largest interclass variance and the smallest intraclass variance. Assume that n samples from m classes are X, y � (x 1 , y 1 ), . . . , (x n , y n ) , where X represents samples and y represents labels of different samples. In addition, x and x i denote the center of all samples and samples belonging to the ith class, respectively. Hence, between-class scatter and within-class scatter are as follows: 2 Complexity where n i is the number of samples from the ith class. erefore, the generalized Rayleigh quotient can be described as follows by Fisher discriminant criterion: where w is a projection matrix and Tr(·) denotes the trace operator. Furthermore, the solution of equation (3) is relatively complicated, so we transform it into trace-difference problem as follows [36]: LDA not only can retain as much information for recognition as possible while reducing the dimensionality but also can remove superfluous and dependent variable feature to the disadvantage of classification task. Nevertheless, due to the distribution of cross-view data, the performance of LDA only considered class semantic information is not outstanding.

Our Proposed Method
is section contains four parts. e first part of this chapter is to specify the symbols in our algorithm. Part two is a detailed introduction and direction about our framework. e third one develops a numerical scheme to obtain the approximate solutions iteratively. e last part discusses the computational complexity of our proposed algorithm.

Notations.
Assume that X 1 and X 2 (X � [X 1 , X 2 ] ∈ R d×n ) are two matrices of different views from same c classes, where n and d denote the number and dimensionality of all training samples, respectively. Class structure Z c ∈ R n×n and view structure Z v ∈ R n×n are two linear combination matrices, which are included in local graph framework to discover view-invariant structure of cross-view data. P ∈ R d×p is a basis transformation projection matrix, where p is dimensionality of projected data. E ∈ R d×n is a matrix of error data designed to obtain a robust subspace from noise. In addition, V 1 , V 2 , V 1 , V 2 ∈ R n×c are four constant coefficient matrices utilized for aligning the global information of cross-view data.

Objective Function.
To address the cross-view discriminative analysis, we formulate our subspace learning model with simultaneous local and global alignments as follows: where D(Z c , Z v , E) is a low-rank framework of two potential manifolds in cross-view data. L(P, Z c , Z v ) enforces the view-specific discriminative local neighbor relationship among instances. G(P, Z c , Z v ) performs the discriminative clustering and separation of the global structure in the class manifold and the view structure, respectively. In the following, the above terms are illustrated in detail.

Dual Low-Rank Representations.
Only one linear combination matrix, which is constrained by the rank minimum, is used by methods based on low-rank model. Nevertheless, unitary low-rank structure gives rise to the failure of linear representation due to differences between distributions of views, which endow cross-view data from a same class with a large divergence. erefore, the two structure matrices Z c and Z v are adopted to solve this specific problem, where the between-view samples from the same class are far away and the within-view samples from the different classes are closer. e first term is defined with dual low-rank representations to strip down the class and view structures as follows: where ‖ · ‖ * is a sign of the nuclear norm, which is close to a representation of the rank minimum problem, and its solution is relatively convenient. Assuming that part of the data from real world contains corruption, we adopt the l 2,1 -norm to make matrix E have the structured sparsity as the noisy data. l 2,1 -norm can effectively remove the corruption of specific sample while holding the other clean samples. λ 1 > 0 is used to balance the corruption.

Graph-Based Discriminative Local Alignment.
To introduce the local discriminative constraint, two graph-based constraints are constructed on each pair of synthetic samples with Z c and Z v from class and view subspaces, respectively, as follows, which can better cluster intraclass samples and decentralize interclass ones.
where Y c,i and Y v,i denote the ith projected samples of crossview data from the class space Y c � P T XZ c and view space where l i and l j are the labels of samples x i and x j , respectively. x i ∈ N c k 1 (x j ) denotes that x i belongs to the k 1 adjacent datasets of the same sample x j . x i ∈ N v k 2 (x j ) means that x i belongs to the k 2 adjacent datasets of the same view sample x j . Hence, L c can calculate the distance between two similar samples from same class. Similarly, L v denotes the distance between two similar samples from different classes. With the help of Fisher criterion, the pairwise local discriminative constraint L(Z c , Z v , P) can be rewritten as follows: where L c and L v denote the Laplacian operators of W c and W v . α is a balance parameter. Minimizing the subtraction of trace in equation (9) can weaken the impact of view information and separate the class structure and the view structure.

Discriminative Global Alignment.
e discriminative projection of all pairwise samples through L(Z c , Z v , P) can reduce the impact between views, but the differences between learned features from different classes are not significant enough. So, to further enhance the separation of two manifolds, we design a global discriminative constraint for cross-view analysis as the third term G(P, Z c , Z v ): where S W (P T XZ c ) is within-class and between-view scatter matrix in class manifold and S Bi (P T XZ v )(i � 1, 2) is withinview and between-class scatter matrix in view manifold. ese scatter matrices are formulated as where μ i (i � 1, 2) denotes the overall mean feature from the ith view and μ i j denotes the mean feature of the jth class from the ith view. In this way, the within-class view margin in class structure can be reduced, and the margin of betweenclass data from same view in view structure can be magnified. e third term can be framed as where V i is the coefficient matrix of the within-class mean feature of the ith view and V i (i � 1, 2) is the coefficient matrix of the overall mean feature of the ith view. In detail, In addition, we add an orthogonal constraint P T P � I to neglect trivial solutions. In the end, we rewrite equation (5) with all the terms as

Optimization Scheme.
To obtain the feasible solution of Z c and Z v , we adopt two auxiliary variables J c and J v . en, equation (13) can be transformed into the following term: For optimization problems with equality constrains, the Augmented Lagrangian method is an effective solution.
e Augmented Lagrangian form of equation (14) is as follows: where Y 1 , Y 2 , and Y 3 are the Lagrange multipliers and η > 0 is the penalty parameter. We use an alternating solution to optimize iteratively all variables. We define the left bottom of the variable plus t as the t-th solution.
First, by ignoring the other variables except P, equation (15) becomes We obtain the projection matrix P t one by one, because P t is an orthogonal matrix. For the ith column of P t , the objective function is rewritten as

Complexity
We enforce the derivative of function (17) to be zero.
erefore, P i,t is the ith eigenvector of the matrix in equation (18) and P can be simply solved.
Update J c : e singular value thresholding is an approximate method to solve the above kernel norm minimization equations [37].
Update J v : Equation (20) can be addressed in the same way as equation (19).
We enforce the derivative of equation (15) with respect to Z c to be zero.
It is obvious that equation (21) is a Sylvester equation. We can easily solve it by [38]. Similarly, we enforce the derivative of equation (15) with respect to Z v to be zero.
Equation (22) can be addressed in the same way as equation (21).
Update E: e above equation is a l 2,1 -norm minimization problem whose solution is shown in [39]. e entire numerical iterative scheme for equation (14) is shown in Algorithm 1, where the parameters ρ, θ, t max , η, and η max are set empirically. Moreover, the matrices Z c , Z v , E, Y 1 , Y 2 , and Y 3 are initialized as 0 and the parameters α, λ 1 , and λ 2 are tuned by the experiments.

Complexity Analysis.
According to the above computational process and Algorithm 1, we discuss the computational complexity of the proposed algorithm in detail. In Complexity Algorithm 1, the main factor of algorithm complexity depends on Steps 1-5. Equation (18)

Experiments
In this section, we evaluate the performance of our proposed method with classification task. Firstly, we introduce four cross-view datasets: face database, object databases, imagetext database, and experimental setup. Secondly, we adopt several excellent subspace learning algorithms for comparison with ours. e initializations of all unknown parameters are tuned to get the best experimental results. e analysis of parameters is shown in Figure 2. In addition, each experiment is repeated 10 times and the average classification results are shown. erefore, due to inconsistency of dimensionality of the two features, we use PCA to adjust the image dimension.

Experimental Datasets. CMU-PIE
COIL-20 object dataset is composed of 20 objects from a level 360-degree view. ere is 5°between every two adjacent images, so each category has 72 samples. We divide the 72 images into two groups, G1 and G2. In addition, G1 is COIL-100 object dataset is an extension of the COIL-20. e only difference is that the COIL-100 is composed of 100 objects from a level 360-degree view. erefore, the set of the COIL-100 dataset is similar to that of the COIL-20 dataset.

Experimental Results and Analysis.
In experiments, we need not use any information about the test set, including class and view information. We select several subspace learning methods as comparison methods, that is, PCA, LDA, locality preserving projections (LPP) [40], LatLRR, SRRS, and RCVL. After extracting feature with the comparison methods, we uniformly choose KNN as classifier to evaluate their performance. In addition, we also add 10 Input: data matrix X, parameters α, λ 1 and λ 2 Initialize: ρ � 1.3, θ � 10 − 9 , t � 0, t max � 200, η 0 � 0.1 and η max � 10 10 ; while not converged or t ≤ t max do (1)Solving P t+1 by equation (18); (2)Solving J c,t+1 by equation (19); (3)Solving J v,t+1 by equation (20); (4)Solving Z c,t+1 by equation (21); (5)Solving Z v,t+1 by equation (22); (6)Solving E t by equation (23);         Complexity percent and 20 percent of random noise to part of the datasets to demonstrate the adaptability of our subspace learning algorithm to different levels of corrupted data and some instances are shown in Figure 3.
For CMU-PIE, we randomly perform cross-view subspace learning on two poses, with a total of 6 experimental groups, which are C1{P05,P09}, C2{P05,P27}, C3{P05,P29}, C4{P09,P27}, C5{P09,P29}, and C6{P27,P29}. Tables 1-3 respectively show the classification results of all experimental algorithms on the original data, 10% noisy data, and 20% noisy data. For COIL-20 and COIL-100 object datasets, we select two sets of samples from G1 and G2 as a cross-view training set, respectively, and the others as a test set. So we get 4 experimental groups from COIL-20 and COIL-100 datasets, including C1{V1,V3}, C2{V1,V4}, C3{V2,V4}, and C4{V2,V3}. Tables 4-6 display the classification results of all experimental algorithms on the original data, 10% noisy data, and 20% noisy data from COIL-20. Figures 2, 4, and 5 show the experimental results of four groups from original COIL-100 dataset, 10% corrupted COIL-100 dataset, and 20% corrupted COIL-100 dataset. For Wikipedia, we use the reduced dimensionality image feature and text feature as two views and Figure 6 displays the results of comparison experiments on the original data, 10% noisy data, and 20% noisy data. e results of experiments prove that our method achieves the persistent higher classification results compared to other methods. For noisy data, the classification results of most methods based on LRR are more robust than those of other methods. It is due to the fact that low-rank representation framework can restore raw information from corrupted data by learning latent structure. Besides, another result also can be found that the classification results of methods used for cross-view data are better than those of other comparisons. Our proposed method projects data into the discriminative view-invariant subspace via dual lowrank representations framework, so that the method can better learn from cross-view data.

Performance Evaluations.
In this part, we test what parameters should we assign to ensure that the performance of our method can get a best grade. en, we show the convergence of our algorithm.
ere are three tunable parameters α, λ 1 , and λ 2 in our framework. We evaluate the effect of parameters on COIL-20 C1. α, λ 2 are two parameters to adjust discriminative local alignment and discriminative global alignment. From  Figures 7(a) and 7(b), it can be seen that our method gets the best result, when α � 100 and λ 2 � 0.01. Furthermore, λ 1 is a parameter to constrain the corrupted data and the classification result is optimal around 1.
In the end, we show the convergence analysis of our method on different datasets: the original, the 10% corrupted COIL-20 C1, and the CMU-PIE C1. e maximum value of ‖X − X(Z c,t+1 + Z v,t+1 ) − E t+1 ‖ ∞ , ‖J c,t+1 − Z c,t+1 ‖ ∞ , and ‖J v,t+1 − Z v,t+1 ‖ ∞ is used as convergence criterion in each iteration. e variation of the maximum value with the increase of the number of iterations is shown in Figure 8. e curves point out that the proposed algorithm converges steadily and efficiently after 20 iterations.

Conclusions
We proposed a subspace learning algorithm with discriminative constraint via low-rank representation to solve the cross-view recognition task. Our method can learn a distribution-invariant subspace from cross-view data by designing two substantial structures with dual low-rank constraints. We also integrate the local alignment and the global alignment into our framework to eliminate the interference caused by the view manifold in the subspace. Meanwhile, we also design a feasible iterative scheme to ensure that the model converges and obtains the optimal solution. Extensive experiments on several public datasets prove that our proposed method has strong robustness and stability for cross-view classification tasks.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper. 12 Complexity