A Novel Graph Constructor for Semisupervised Discriminant Analysis: Combined Low-Rank and k-Nearest Neighbor Graph

Semisupervised Discriminant Analysis (SDA) is a semisupervised dimensionality reduction algorithm, which can easily resolve the out-of-sample problem. Relative works usually focus on the geometric relationships of data points, which are not obvious, to enhance the performance of SDA. Different from these relative works, the regularized graph construction is researched here, which is important in the graph-based semisupervised learning methods. In this paper, we propose a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph. In our LRKNN graph, we map the data to the LR feature space and then the kNN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the k-nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, the LRKNN graph can significantly improve the performance of SDA. Extensive experiments on several real-world databases show that the proposed LRKNN graph is an efficient graph constructor, which can largely outperform other commonly used baselines.


Introduction
For the real-world data mining and pattern recognition applications, the labeled data are very expensive or difficult to obtain, while the unlabeled data are often copious and available. So how to improve the learning performance using the copious unlabeled data has attracted considerable attention [1,2]. Semisupervised dimensionality reduction can be directly used in the whole dataset which does not need training set and testing set [3].
Illuminated by semisupervised learning [4][5][6], Semisupervised Discriminant Analysis (SDA) is first proposed by Cai et al. [2]. It can easily resolve the out-of-sample problem [7]. In SDA algorithm, the labeled samples are used to maximize the different classes reparability and the unlabeled ones to estimate the data's intrinsic geometric information. From then on, many kinds of semisupervised LDA were proposed. Zhang and Yeung proposed SSDA [3] using path-based similarity measure. In a similar way, SMDA [8] and UDA [9] execute LDA under semisupervised setting manifold regularization. And [6] utilizes unlabeled data to maximize an optimality criterion of LDA and uses the constrained concave-convex procedure to solve the optimization problem and so forth.
Although these methods perform semisupervised LDA in different ways, they all need the geometric relationships between the whole data by constructing a regularized graph. The graph remarkably impacts the performance of these methods. However, little attention has been paid to graph constructor methods. So in this paper we study the regularized graph construct problem of SDA [2]. Below we summarize our main contributions in this paper.
(i) Inspired by low-rank representation (LRR) [10] and the -nearest neighbor algorithm, we construct a novel graph called combined low-rank and -nearest neighbor graph. LRR jointly obtains the representation of all the samples under a global low-rank constraint. Thus it is better at capturing the global data structures.
2 Computational Intelligence and Neuroscience (ii) Since NN is used to satisfy the algorithmic requirements of SDA, the affinity of local geometrical structure can be maximally preserved after using the LRKNN graph.
(iii) Extensive experiments on real-world datasets show that our proposed LRKNN regularized graph can significantly boost the performance of Semisupervised Discriminant Analysis.
The rest of the paper is organized as follows. We briefly review the related work in Section 2. We give the preliminary in Section 3. We then introduce the combined low-rank and -nearest neighbor graph construct framework in Section 4. Then Section 5 reports the experiment results on real-world database tasks. In Section 6, we conclude the paper.

Related Work
This paper proposes a combined low-rank and -nearest neighbor graph to boost the performance of Semisupervised Discriminant Analysis. Our work is related to both Semisupervised Discriminant Analysis improvement techniques and graph conductor design. We briefly discuss both of them. Cai et al. [2] proposed a semisupervised dimensionality reduction algorithm SDA, which captures the local structure for data dimensionality reduction. Zhang and Yeung proposed SSDA [3] using path-based similarity measure to capture global manifold structure of the data. The works in SMDA [8] and UDA [9] also perform semisupervised LDA with manifold regularization. Nie et al. [11] proposed an orthogonal constraint semisupervised orthogonal discriminant analysis method. Zhang et al. [1] utilized must-link constraints or cannot-link constraints to capture the underlying structure of dataset. Song et al. [5] utilized labeled data to discover class structure and utilized unlabeled data to capture the intrinsic local geometry. Probabilistic Semisupervised Discriminant Analysis (PSDA) algorithm is presented by Li et al. [12], which utilizes unlabeled samples to approximate class structure instead of local geometry. In the work [13], Dhamecha et al. presented an incremental Semisupervised Discriminant Analysis algorithm, which utilizes the unlabeled data for enabling incremental learning. The work [14] developed a graph-based semisupervised learning method based on PSDA for dimensionality reduction.
Our work is also related to another line of research, the graph conductor design. There are many methods proposed for graph construction, including -nearest neighbors based method and -ball based method [15] which are two most popular methods for graph adjacency construction. Based on these two methods, various approaches such as heat kernel [15] and inverse Euclidean distance [16] are used to set the graph edge weights. However, all these methods are to find pairwise Euclidean distances, which are very sensitive to data noise. Moreover, since only the local pairwise relationship between data points is taken into account, the constructed graph cannot reveal sufficiently the clustering relationship among the samples. Yan et al. proposed an 1graph via sparse representation [10,17]. An 1 -graph over a dataset is derived by encoding each datum as a sparse representation of the remaining samples. In the work [18], Zhuang et al. proposed a novel method to construct an informative low-rank graph (LR-graph) for semisupervised learning. And Gao et al. proposed a novel graph construction method via group sparsity [19]. Li and Fu [20] developed an approach to construct graph-based on low-rank coding and -matching constraint and proposed a novel supervised regularization based robust subspace (SRRS) approach via low-rank learning [21]. Zhao et al. proposed a novel approach to construct a sparse graph with blockwise constraint for face representation, named SGB [22]. A sparse and low-rank graph-based discriminant analysis (SLGDA) is proposed, which combines both sparsity and low rankness to maintain global and local structures simultaneously [23]. In the work [24], Li and Fu incorporated KNN constraint and -matching constraint into the low-rank representation model as the balanced (or unbalanced) graph. We focus on constructing a novel graph for SDA, capturing the data using LRR and then utilizing the KNN algorithm to satisfy the algorithmic requirements of SDA.
The work that is most closely related to ours is the lowrank kernel-based Semisupervised Discriminant Analysis [25], which is my previous research. The LRR is used as the kernel in the KSDA [2]. In our current work, we proposed a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and -nearest neighbor (LRKNN) graph. In our LRKNN graph, the NN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the -nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, therefore the LRKNN graph can capture not only the global structure but also the local information of the data, which can largely improve the performance of the SDA. They all belong to classes. The SDA method [2] hopes to find a rejection matrix a, which motivates presenting the prior assumption of consistency by a regularized term. The objective function is as follows:

Preliminary
where S and S are the between-class scatter and total class scatter matrix. And S is defined as the within-class scatter matrix.
The parameter in (1) balances the model complexity and the empirical loss. The regularized term supplies us with the flexibility to incorporate the prior knowledge in the applications. We aim at constructing (a) graph combining the manifold structure through the available unlabeled samples.
Given a set of samples {x } =1 , we can construct the graph G to represent the relationship between nearby samples by NN. Then put an edge between -nearest neighbors of Computational Intelligence and Neuroscience 3 each other. The corresponding weight matrix S is defined as follows: where (x ) denotes the set of -nearest neighbors of x . Then (a) term can be defined as follows: where D is a diagonal matrix whose entries are column (or row since S is symmetric) sum of S; that is, D = ∑ S . The Laplacian matrix [10] is L = D − S. We can get the objective function of the SDA with the regularizer term (a): By maximizing the generalized eigenvalue problem, we can obtain the projective vector a.

Low-Rank Representation.
Yan and Wang proposed the low-rank representation and used it to construct the affinities of an undirected graph (here called LR-graph) [10]. It jointly obtains the representation of all the samples under a global low-rank constraint, and thus it is better at capturing the global data structures [16]. Let X = [x 1 , x 2 , . . . , x ] be a set of samples; each column is a sample which can be represented by a linear combination in the dictionary A [26]. Here, we select the samples themselves X as the dictionary A: where Z = [z 1 , z 2 , . . . , z ] is the coefficient matrix with each z being the representation coefficient of x . LRR seeks the lowest rank solution by solving the following optimization problem [26]: The above optimization problem can be relaxed to the following convex optimization [27]: Here, ‖ ⋅ ‖ * denotes the nuclear norm (or trace norm) [28] of a matrix, that is, the sum of the matrixes singular values. By considering the noise or corruption in our real-world applications, a more reasonable objective function is where ‖ ⋅ ‖ can be the 2,1 -norm or 1 -norm. In this paper we choose 2,1 -norm as the error term measurement which is defined as The parameter is used to balance the effect of low rank and the error term. The optimal solution Z * can be obtained via the inexact augmented Lagrange multipliers (ALM) method [29,30].

-Nearest Neighbor Algorithm.
The samples x and x are considered as neighbors if x is among the -nearest neighbors of x or x is among the -nearest neighbors of x . There are different methods to assign weights for W. The following are three of them.

Combined Low-Rank and -Nearest Neighbor (LRKNN)
Graph Constructor Algorithm. How to find an appropriate subspace for classification is an important task, which we called dimensionality reduction. The dimensionality reduction is aimed at finding labeling of the graph, which is consistent with both the initial labeling and the data's geometry structure (edges and weights W).
Step 1. Map the labeled and unlabeled data X to feature space by the LRR algorithm. min Step 2. Obtain the symmetric graph W by -nearest neighbor algorithm.
Step 3. Implement the SDA algorithm for dimensionality reduction.
Step 4. Execute the nearest neighbor approach for the final classification.
Algorithm 1: Procedure of SDA using combined low-rank and -nearest neighbor graph.
These proposed SDA methods always analyze the relationship of the data using the mode one-to-others. For example, the most common -nearest neighbor graph only shows the edges and the weight graph should be 1, or the -graph and the 1 -graph (SR-graph) determine the graph structure weights by the limitation of 2 -norm or the 1 -norm. And the 1 -graphs lack global constraints, which greatly reduce the performance when the data is grossly corrupted. To solve this drawback, Liu et al. proposed the low-rank representation and used it to construct the affinities of an undirected LRgraph [26]. LR-graph jointly obtains the representation of all the samples under a global low-rank constraint, and thus it is better at capturing the global data structures [31].
Since the LR-graph, 1 -graph, and -graph are asymmetric matrix, in order to satisfy the algorithmic requirements of SDA, similar graph symmetrization process was often used in the previous works; that is, W = W + W . Since the LRR is good at capturing the global data structures and the local geometrical structure can be maximally preserved by the -nearest neighbor algorithm, here, we proposed a novel solution which uses -nearest neighbor algorithm to satisfy the algorithmic requirements. So the combined LRKNN method can improve the performance to a very large extent. Heat kernel weighting [15] is used here.

SDA Using Combined Low-Rank and -Nearest Neighbor Graph.
Graph structure remarkably impacts the performance of these SDA-likely methods. However, little attention has been paid to graph constructor methods. So in this paper we present a novel combined low-rank and -nearest neighbor graph algorithm, which largely improves the performance of SDA.
Firstly, map the labeled and unlabeled data to the LRgraph feature space. Secondly, obtain the symmetric graph by -nearest neighbor algorithm where heat kernel weighting is used. By choosing appropriate kernel parameter, it can increase the similarities among the intraclass samples and the differences among the interclass samples. Then implement the SDA algorithm for dimensionality reduction. Finally execute the nearest neighbor method for the final classification in the derived low dimensional feature subspace. The procedure is described as follows in Algorithm 1.

Experiments and Analysis
To examine the performance of the LRKNN graph in SDA algorithm, we conducted extensive experiments on several real-world datasets. In this section, we introduce the datasets we used and the experiments we performed, respectively; then we present the experimental results as well as the analysis. The experiments are conducted on machines with Intel Core CPUs of 2.60 GHz and 8 GB RAM.

Datasets.
We evaluate the proposed method on 4 realworld datasets including three face databases and the USPS database. In these experiments, we normalize the sample to a unit norm.
(i) ORL Database [10]. The ORL dataset contains 10 different images of each for 40 distinct subjects. The images are taken at different times, varying the lighting, facial expressions, and facial details. Each face image is manually cropped and resized to 32 × 32 pixels, with 256 grey levels per pixel.  (iv) USPS Database [33]. The USPS handwritten digit database is a popular subset containing 9298, 16 × 16 handwritten digit images in total. Here, we randomly select 300 examples for the experiments.

Comparative Algorithms.
In order to demonstrate how the SDA dimensionality reduction performance can be (i) SR-Graph [29]. SR-graph considers the reconstruction coefficients in the sparse representation by solving the following problem:â = arg min ‖y − Xa‖ 1 . The graph weight is defined as W = | |.
(ii) LLE-Graph [34]. LLE-graph considers the situation of reconstructing a sample from its neighbor points and then minimizes the 2 reconstruction error.
s.t. ∑W = 1. (iii) KNNK Graph [29]. We adopt Euclidean distance as our similarity measure and use a Gaussian kernel to reweight the edges. The number of the nearest neighbors is set to 4. Similarly, the original SDA using KNNB graph is also set to 4.

Experiment 1: Performances of SDA Using Different Regularized Graphs.
To examine the effectiveness of the proposed combined LRKNN graph for SDA, we conduct experiments on the four databases. In our experiments, we randomly select 30% samples from each class as the labeled samples to evaluate the performance with different numbers of selected features. The evaluations are conducted with 20 independent runs for each algorithm. We average them as the final results. First we utilize different graph construction methods to get the (a) term, and then we implement the SDA algorithm for dimensionality reduction. Finally, the nearest neighbor approach is employed for the final classification in the derived low dimensional feature subspace. For each database, the classification accuracy for different graphs is shown in Figure 1. Table 1 shows the performance comparison of different graph algorithms. Note that the results are the best results of all these different selected features mentioned above. The bold numbers represent the best results of different graph algorithms. From these results, we can observe the following: (i) In most cases, our proposed LRKNN graph consistently achieves the highest classification accuracy compared to the other graphs. The results indicate that the classification accuracy is much higher than the other graph algorithms. So it improves the classification performance to a large extent, which suggests that LRKNN graph is more informative and suitable for SDA.
(ii) In most conditions, the performance of the combined NN algorithm is always superior to the separate algorithm (without NN), which means that our proposed graph construct methods combined NN algorithm is extremely effective, especially for the LRR algorithm.
(iii) Since the SR-graph ( 1 -graph) lacks global constraints, the performance improvement is not obvious even if it is combined with the NN algorithm.
(iv) In some cases (maybe some certain enough high dimensionality), the traditional construct graph methods such as NN-graph and LLE-graph may achieve good performances in some databases, but they are not as stable as our proposed algorithm. Table 2 shows the execution time of the eight methods mentioned. We compute the total time with 20 independent runs for 10 features. And Table 2 gives the average runtime of the 20 runs for 10 features. We can see that although our algorithm is slower than the traditional NN algorithms, the performance is much better than these baseline algorithms at an acceptable runtime.

Experiment 2: Parameters Settings.
We examine the effect of the heat kernel parameters in LRKNN, SR-NN, LLE-NN, and KNNK graph. We vary the graph parameters and examine the classification accuracy on the four databases. We also select 30% samples from each class to evaluate the classification performance. The evaluations are conducted with 20 independent runs and the averaged results are adopted. We adopt the average results of the 10 different numbers of selected features mentioned in Section 5.2 as the final result, which are shown in Figure 2. We can see that the classification accuracy is influenced by the kernel parameters.
We also evaluate the performance of different nearest neighbor numbers for the LRKNN graph, namely, the value for the NN algorithm. Here we conduct the experiments on the ORL database and Extended Yale Face Database B. The procedure is the same as the experiments above. We adopt the average results of the 20 different runs as the final result, which are shown in Figure 3. We can see that the classification   bold numbers represent the best results. And the percentage number after the database is the label rate. For each database, we vary the percentage of labeled samples from 20% to 50% and the recognition accuracy is shown in Table 3, from which we observe the following. In most cases, our proposed LRKNN graph consistently achieves the best results, which is robust to the label percentage variations. And it is worth noting that even in very low label rate our proposed method can achieve high classification accuracy, while some other compared algorithms are not as robust as our LRKNN algorithm especially when the label rate is low. Thus, our proposed method has much superiority compared with the traditional construct graph methods. Sometimes these traditional methods may achieve good performances in some databases with high enough label rate. But they are not as stable as our proposed algorithm.
Since the labeled data is very expensive and difficult, our proposed graph for SDA algorithm is more robust and suitable for the real-world data.

Experiment 4: Performance of LRKNN Graph with Different Weight Methods.
We evaluate the performance of the different weight methods mentioned in Section 5.2 for our LRKNN graph. We conduct 20 independent runs for each algorithm. We average them as the final results. The procedure is the same as the experiments in Section 5.2.
For each database, we show the performance for the three weight methods (KNNE, KNNB, and KNNK) of NN for our LRKNN graph in Figure 4, from which we observe the following.
Overall, the KNNK based LRKNN graph achieves the best results compared with the other two NN methods. And   we can see, the results of our method are stable for Gaussian noise, "salt and pepper" noise, and multiplicative noise. And because of the robustness of the low-rank representation to noise, our method LRKNN is much more robust than other graphs. With the different kinds of gradually increasing noise, some kinds of methods' performance fall a lot, while our method's performance is robust and decrease little with the increasing noises.

Conclusions
In this paper, we propose a novel combined low-rank and -nearest neighbor graph algorithm, which largely improves the performance of SDA. The LRR can naturally capture the global structure of the data. And the -nearest neighbor algorithm can maximally preserve the local geometrical structure of the data. Therefore, it can largely improve the performance using the NN algorithm to satisfy the SDA's algorithmic requirements. Empirical studies on four realworld datasets show that our proposed LRKNN graph for Semisupervised Discriminant Analysis is more robust and suitable for the real-world applications.