Semisupervised Discriminant Analysis (SDA) aims at dimensionality reduction with both limited labeled data and copious unlabeled data, but it may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear. The kernel trick is widely used to map the original nonlinearly separable problem to an intrinsically larger dimensionality space where the classes are linearly separable. Inspired by low-rank representation (LLR), we proposed a novel kernel SDA method called low-rank kernel-based SDA (LRKSDA) algorithm where the LRR is used as the kernel representation. Since LRR can capture the global data structures and get the lowest rank representation in a parameter-free way, the low-rank kernel method is extremely effective and robust for kinds of data. Extensive experiments on public databases show that the proposed LRKSDA dimensionality reduction algorithm can achieve better performance than other related kernel SDA methods.
For many real world data mining and pattern recognition applications, the labeled data are very expensive or difficult to obtain, while the unlabeled data are often copious and available. So how to use both labeled and unlabeled data to improve the performance becomes a significant problem [
Semisupervised Discriminant Analysis may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear [
Low-rank matrix decomposition and completion are recently becoming very popular since Yang et al. and Chen et al. proved that a robust estimation of an underlying subspace which can be obtained by decomposing the observations into a low-rank matrix and a sparse error matrix [
The major problem of kernel methods is to find the proper kernel parameters. But all these kernel methods usually use fixed global parameters to determinate the kernel matrix, which are very sensitive to the parameters setting. In fact, the most suitable kernel parameters may vary greatly at different random distribution of the same data. Moreover, the kernel mapping of KSDA always analyze the relationship of the data using the mode one-to-others, which emphasizes local information and lacks global constraints on their solutions. These shortcomings limit the performance and efficiency of KSDA methods. To overcome the disadvantages of the traditional kernel methods, inspired by LRR, we proposed a novel kernel-based Semisupervised Discriminant Analysis called low-rank kernel-based SDA (LRKSDA) where the low-rank representation is used as the kernel method. Compared with other kernels, the low-rank kernel jointly obtains the representation of all the samples under a global low-rank constraint [
The rest of the paper is organized as follows. We start by a brief review on an overview of SDA in Section
Given a set of samples
The parameter
Given a set of samples
We can get the objective function of the SDA with regularizer term
Yan and Wang [
Let
Semisupervised Discriminant Analysis may fail to discover the intrinsic geometry structure when the data manifold is highly nonlinear. The kernel trick is a popular technique in machine learning which uses a kernel function to map samples to a high dimensional space [
Let
Let
Kernel SDA (KSDA) [
The major problem of all these kernel methods is to find the proper kernel parameters. And they usually use fixed global parameters to determinate the kernel matrix, which is very sensitive to the parameters setting. In fact, the most proper kernel parameters may vary greatly at different random distribution even if they are for the same data. Moreover, the traditional kernel mapping always analyzes the relationship of the data using the mode one-to-others, which emphasizes local information and lacks global constraints on their solutions. These shortcomings limit the performance and efficiency of KSDA methods. To overcome these shortcomings mentioned above, inspired by low-rank representation, we propose a novel kernel-based Semisupervised Discriminant Analysis (LRKSDA) where LRR is used as the kernel representation.
Let
Let
Since the low-rank representation jointly obtains the representation of all the samples under a global low-rank constraint to capture the global data structures, we can get the lowest rank representation in a parameter-free way, which is very convenient and robust for kinds of data. So low-rank kernel-based SDA algorithm can improve the performance to a very large extent. The step of the LRKSDA is as follows.
Firstly, map the labeled and unlabeled data to the LR-graph kernel space. Secondly, execute the SDA algorithm for dimensionality reduction. Finally execute the nearest neighbor method for the final classification in the derived low-dimensional feature subspace. The procedure of low-rank kernel-based SDA is described as follows.
In this section, we conduct extensive experiments to examine the efficiency of low-rank kernel-based SDA algorithm. The simulation experiment is conducted in MATLAB7.11.0 (R2010b) environment on a computer with AMD Phenom(tn)II P960 1.79 GHz CPU and 2 GB RAM.
The proposed LRKSDA is tested on six real world databases, including three face databases and three University of California Irvine (UCI) databases. In these experiments, we normalize the sample to a unit norm.
In order to demonstrate how the semisupervised dimensionality reduction performance can be improved by low-rank kernel-based SDA, we list out SDA, KSDA1, and KSDA2 algorithm for comparison. In all experiments, the number of the nearest neighbors in the
The classification accuracy is influenced by the kernel parameters. So after comparing, we choose a proper kernel parameters
To examine the effectiveness of the proposed LRKSDA algorithm, we conduct experiments on the six public databases. In our experiments, we randomly select 30% samples from each class as the labeled samples to evaluate the performance with different numbers of selected features. The evaluations are conducted with 20 independent runs for each algorithm. We average them as the final results. First we utilize different kernel methods to get the kernel mapping, and then we implement the SDA algorithm for dimensionality reduction. Finally, the nearest neighbor approach is employed for the final classification in the derived low-dimensional feature subspace. For each database, the classification accuracy for different algorithms is shown in Figure
Classification accuracy of different SDA algorithms on six databases.
Yale B | ORL | PIE | Musk | Seeds | SPECT Heart | |
---|---|---|---|---|---|---|
LRKSDA | 0.825769 | 0.815 | 0.578243 | 0.836667 | 0.90625 | 0.778378 |
KSDA1 | 0.691392 | 0.693025 | 0.541478 | 0.756849 | 0.825 | 0.683333 |
KSDA2 | 0.723549 | 0.681576 | 0.534542 | 0.755128 | 0.709814 | 0.694154 |
SDA | 0.668397 | 0.687692 | 0.527715 | 0.757407 | 0.819122 | 0.69857 |
Classification accuracy of different SDA algorithms on the six databases of (a) Extended Yale Face Database B, (b) ORL database, (c) CMU PIE face database, (d) Musk (Version 2) Data Set 2, (e) Seeds Data Set, and (f) SPECT Heart Data Set.
In most cases, our proposed low-rank kernel-based SDA algorithm consistently achieves the highest classification accuracy compared to the other algorithms. LRKSDA achieves the best performance when the dimensionality is larger than a certain low dimension. And the classification accuracy is much higher than the other kernel SDA algorithms. So it improves the classification performance to a large extent, which suggests that low-rank kernel is more informative and suitable for SDA algorithm.
Since the proper kernel parameters are the most important thing of these traditional algorithms and since the kernel parameters of KSDA1 and KSDA2 algorithm are fixed global parameters, the two algorithms are very sensitive to different data or different random distribution of the same data. The performance improvement of these KSDA methods is not obvious. More seriously, as a result of randomly select labeled samples, the random distribution in each run may not adapt the so-called proper kernel parameters of KSDA1 and KSDA2 algorithm. Moreover, the traditional kernel mapping always analyzes the relationship of the data using the mode one-to-others, which emphasizes local information and lacks global constraints on their solutions. This situation may result in not good performance in some case, while the low-rank representation is better at capturing the global data structures. And we can get the lowest rank representation in a parameter-free way, which is very convenient and robust for kinds of data. So low-rank kernel-based SDA separates the different classes very well compared to other kernel SDA. And it can improve the performance to a very large extent, which means that our proposed low-rank kernel method is extremely effective.
We evaluate the influence of the label number in this part. The experiments are conducted with 20 independent runs for each algorithm. We average them as the final results. The procedure is the same with experiment 1. For each database, we vary the percentage of labeled samples from 10% to 50% and the recognition accuracy is shown in Tables
Classification accuracy of different graphs on ORL, Yale, and USPS databases.
Database |
Algorithm | The percentage of labeled samples | ||||
---|---|---|---|---|---|---|
10% | 20% | 30% | 40% | 50% | ||
Yale B | LRKSDA | 0.711471 | 0.792667 | 0.825769 | 0.848636 | 0.877778 |
Yale B | KSDA1 | 0.316113 | 0.536868 | 0.691392 | 0.807183 | 0.856101 |
Yale B | KSDA2 | 0.33994 | 0.569484 | 0.723549 | 0.819317 | 0.877637 |
Yale B | SDA | 0.325919 | 0.560259 | 0.668397 | 0.815348 | 0.855167 |
ORL | LRKSDA | 0.615556 | 0.728125 | 0.815 | 0.873333 | 0.9 |
ORL | KSDA1 | 0.172412 | 0.448653 | 0.693167 | 0.868578 | 0.937414 |
ORL | KSDA2 | 0.172454 | 0.454899 | 0.681576 | 0.851755 | 0.930487 |
ORL | SDA | 0.173478 | 0.442731 | 0.687692 | 0.877294 | 0.948035 |
PIE | LRKSDA | 0.29 | 0.46 | 0.578243 | 0.701044 | 0.82734 |
PIE | KSDA1 | 0.195252 | 0.371027 | 0.541478 | 0.711166 | 0.827522 |
PIE | KSDA2 | 0.195345 | 0.37646 | 0.543015 | 0.712033 | 0.825519 |
PIE | SDA | 0.195707 | 0.387806 | 0.527715 | 0.725264 | 0.82658 |
Classification accuracy of different graphs on Musk, Seeds, and SPECT Heart databases.
Database |
Algorithm | The percentage of labeled samples | ||||
---|---|---|---|---|---|---|
10% | 20% | 30% | 40% | 50% | ||
Musk | LRKSDA | 0.767778 | 0.827083 | 0.838095 | 0.838889 | 0.895125 |
Musk | KSDA1 | 0.356299 | 0.592578 | 0.756849 | 0.83253 | 0.883174 |
Musk | KSDA2 | 0.418741 | 0.607271 | 0.755128 | 0.817146 | 0.888439 |
Musk | SDA | 0.352444 | 0.611676 | 0.757407 | 0.840006 | 0.894315 |
Seeds | LRKSDA | 0.890323 | 0.893333 | 0.90625 | 0.90813 | 0.929608 |
Seeds | KSDA1 | 0.46676 | 0.654862 | 0.825 | 0.874946 | 0.914757 |
Seeds | KSDA2 | 0.410025 | 0.609322 | 0.709814 | 0.845034 | 0.890559 |
Seeds | SDA | 0.503879 | 0.725435 | 0.819122 | 0.872932 | 0.929595 |
SPECT Heart | LRKSDA | 0.778992 | 0.778378 | 0.776216 | 0.78038 | 0.826076 |
SPECT Heart | KSDA1 | 0.404513 | 0.622916 | 0.683333 | 0.786381 | 0.85786 |
SPECT Heart | KSDA2 | 0.398401 | 0.608538 | 0.702989 | 0.77983 | 0.869786 |
SPECT Heart | SDA | 0.364647 | 0.556669 | 0.69857 | 0.759696 | 0.818995 |
In most cases, our proposed low-rank kernel-based SDA algorithm consistently achieves the best results, which is robust to the label percentage variations. While some other compared algorithms are not as robust as our LRKSDA algorithm, we can see that the classification accuracy is very awful when the label rate is low. Thus, our proposed method has much superiority than the traditional KSDA and SDA algorithms. Sometimes these traditional methods may achieve good performances in some databases with high enough label rate. But they are not as stable as our proposed algorithm. Since the labeled data is very expensive and difficult, our proposed algorithm is much robust and suitable to the real word data.
As we mentioned in the previous part, since the low-rank kernel method gets the kernel matrix in a parameter-free way, it is robust for different kinds of data, while for the traditional kernel like Gaussian radial basis function kernel and polynomial kernel, if the data’s structure does not fit the stable kernel parameters they used, they cannot obtain the good representation of the original data set. Therefore, the low-rank kernel method is much more stable for all the data sets we use. And the low-rank representation jointly obtains the representation of all the samples under a global low-rank constraint, which can capture the global data structures. So it is robust to the label percentage variations even though the label rate is low.
In this test we compare the performance of different algorithms in the noisy environment. Extended Yale Face Database B and Musk database are randomly selected in this experiment. The Gaussian white noise, “salt and pepper” noise, and multiplicative noise are added to the data, respectively. The Gaussian white noise is with mean 0 and different variances from 0 to 0.1. The “salt and pepper” noise is added to the image with different noise densities from 0 to 0.1. And multiplicative noise is added to the data
Classification accuracy of different graphs with varying noise on Yale B database.
Noise types |
Algorithm | Variance or density of the three noises | |||||
---|---|---|---|---|---|---|---|
0 | 0.02 | 0.04 | 0.06 | 0.08 | 0.1 | ||
Gaussian | LRKSDA | 0.825769 | 0.816429 | 0.814286 | 0.807857 | 0.808214 | 0.807143 |
Gaussian | KSDA1 | 0.691392 | 0.555408 | 0.565422 | 0.562556 | 0.574249 | 0.579816 |
Gaussian | KSDA2 | 0.723549 | 0.585366 | 0.597015 | 0.602456 | 0.590576 | 0.59866 |
Gaussian | SDA | 0.668397 | 0.543879 | 0.540266 | 0.542199 | 0.541947 | 0.543264 |
“Salt and pepper” | LRKSDA | 0.825769 | 0.794643 | 0.7675 | 0.711786 | 0.643929 | 0.599286 |
“Salt and pepper” | KSDA1 | 0.691392 | 0.56888 | 0.509246 | 0.474557 | 0.450803 | 0.436003 |
“Salt and pepper” | KSDA2 | 0.723549 | 0.59498 | 0.522505 | 0.478096 | 0.446308 | 0.43581 |
“Salt and pepper” | SDA | 0.668397 | 0.553305 | 0.498777 | 0.468533 | 0.452681 | 0.429647 |
Multiplicative | LRKSDA | 0.825769 | 0.825357 | 0.821429 | 0.82 | 0.814286 | 0.793929 |
Multiplicative | KSDA1 | 0.691392 | 0.631297 | 0.619849 | 0.594597 | 0.584588 | 0.576168 |
Multiplicative | KSDA2 | 0.723549 | 0.641188 | 0.622062 | 0.616446 | 0.594529 | 0.594516 |
Multiplicative | SDA | 0.668397 | 0.594035 | 0.588897 | 0.58513 | 0.582225 | 0.556328 |
Classification accuracy of different graphs with varying noise on Musk database.
Noise types |
Algorithm | Variance or density of the three noises | |||||
---|---|---|---|---|---|---|---|
0 | 0.02 | 0.04 | 0.06 | 0.08 | 0.1 | ||
Gaussian | LRKSDA | 0.838095 | 0.783333 | 0.810476 | 0.795238 | 0.789524 | 0.777619 |
Gaussian | KSDA1 | 0.756849 | 0.689112 | 0.705138 | 0.702206 | 0.699312 | 0.710083 |
Gaussian | KSDA2 | 0.755128 | 0.705054 | 0.695936 | 0.697523 | 0.695125 | 0.70306 |
Gaussian | SDA | 0.757407 | 0.713289 | 0.699202 | 0.714286 | 0.676558 | 0.681785 |
“Salt and pepper” | LRKSDA | 0.838095 | 0.785238 | 0.771429 | 0.772143 | 0.766429 | 0.761905 |
“Salt and pepper” | KSDA1 | 0.756849 | 0.683009 | 0.667079 | 0.656237 | 0.66388 | 0.653854 |
“Salt and pepper” | KSDA2 | 0.755128 | 0.705003 | 0.664427 | 0.658723 | 0.656934 | 0.652174 |
“Salt and pepper” | SDA | 0.757407 | 0.70503 | 0.697131 | 0.681818 | 0.678207 | 0.666734 |
Multiplicative | LRKSDA | 0.838095 | 0.832381 | 0.827143 | 0.809524 | 0.793333 | 0.784286 |
Multiplicative | KSDA1 | 0.756849 | 0.733777 | 0.723228 | 0.71144 | 0.716774 | 0.71115 |
Multiplicative | KSDA2 | 0.755128 | 0.737889 | 0.716812 | 0.710216 | 0.701506 | 0.68323 |
Multiplicative | SDA | 0.757407 | 0.749432 | 0.738486 | 0.726044 | 0.703764 | 0.68799 |
As we can see, our proposed low-rank kernel-based SDA algorithm always achieves the best results, which means that our method is stable for Gaussian noise, “salt and pepper” noise, and multiplicative noise. And because of the robustness of the low-rank representation to noise, our method LRKSDA is much more robust than other algorithms. With the different kinds of gradually increasing noise, the traditional KSDA and SDA algorithms’ performance falls a lot, while our method’s performance is robust to these three noises and drops a few.
Notice that the noise is from a different model other than the original data’s subspaces. LRR can well solve the low-rank representation problem. When the data corrupted by arbitrary errors, LRR can also approximately recover the original data with theoretical guarantees. In other words, LRR is robust in an efficient way. Therefore, our method is much more robust than other algorithms with the three noises mentioned above.
In this paper, we propose a novel low-rank kernel-based SDA (LRKSDA) algorithm, which largely improves the performance of KSDA and SDA. Since low-rank representation is better at capturing the global data structures, LRKSDA algorithm separates the different classes very well compared to other kernel SDA. Therefore, our proposed low-rank kernel method is extremely effective. Empirical studies on six real world databases show that our proposed low-rank kernel-based SDA is much robust and suitable to the real word applications.
Current affiliation for Baokai Zu is Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA.
The authors declare that they have no competing interests.
This work was supported by the National Natural Science Foundation of China (no. 51208168), Tianjin Natural Science Foundation (no. 13JCYBJC37700), Hebei Province Natural Science Foundation (no. E2016202341), Hebei Province Natural Science Foundation (no. F2013202254 and no. F2013202102), and Hebei Province Foundation for Returned Scholars (no. C2012003038).