Robust Semi-Supervised Manifold Learning Algorithm for Classification

In the recent years, manifold learning methods have been widely used in data classification to tackle the curse of dimensionality problem, since they can discover the potential intrinsic low-dimensional structures of the high-dimensional data. Given partially labeled data, the semi-supervised manifold learning algorithms are proposed to predict the labels of the unlabeled points, taking into account label information. However, these semi-supervised manifold learning algorithms are not robust against noisy points, especially when the labeled data contain noise. In this paper, we propose a framework for robust semi-supervisedmanifold learning (RSSML) to address this problem. The noisy levels of the labeled points are firstly predicted, and then a regularization term is constructed to reduce the impact of labeled points containing noise. A new robust semi-supervised optimizationmodel is proposed by adding the regularization term to the traditional semi-supervised optimizationmodel. Numerical experiments are given to show the improvement and efficiency of RSSML on noisy data sets.


Introduction
The problem of dimensionality reduction, that is, the transformation of high-dimensional data into meaningful lowdimensional features, has arisen much interest of researchers.Recently, there have been much research efforts on developing effective and efficient manifold learning algorithms which can discover the potential intrinsic low-dimensional structures of the high-dimensional data.These algorithms included Isometric Mapping (ISOMAP) [1], Locally Linear Embedding (LLE) [2], Laplacian Eigenmaps (LE) [3], and Local Tangent Space Alignment (LTSA) [4].
The above classical manifold learning methods are all unsupervised learning algorithms; that is, they do not consider the prior information.In many applications, we can get some prior information of the input data.For example, in a classification problem, the class labels of partial data can be obtained.Considering prior information in the form of low-dimensional coordinates of certain sample points, the classical manifold learning methods can be extended to semi-supervised manifold learning methods [5].And the semi-supervised manifold learning algorithms can yield the low-dimensional coordinates that bear the same meaning as the prior information.
However, these unsupervised and semi-supervised methods may have a limited efficiency on real-world data, due to large noise or distortion of data.Practically, each method for dimensionality reduction requires certain assumptions on the data manifold to guarantee its expected efficiency.For example, ISOMAP needs a convex embedding domain of manifolds or a relatively uniform data distribution for estimating geodesic distance.LTSA should have neighbor sets that can approximately recover the local tangent spaces.In LLE, the local geometric structure of the manifold should be well determined via local combination of data neighborhoods.
There are some efforts on improving the original algorithms.One line is to preprocess the data set before applying the methods, without any modifications on algorithms.Smoothing the data set by weighted SVD, or equivalently, weighted PCA to reduce data noise before performing LTSA is suggested in [6].In [7], the outliers are first detected by the histogram analysis of the neighborhood distances of each point, and the locally smoothed values of data are then computed using the linear error-in-variables (EIV) model.
A fast outlier detection method for high-dimensional data sets is proposed in [8].It also employs a local smoothing method and introduces a weighted global function to further reduce the undesirable effect of outliers and noise on the embedding results.These algorithms can be also improved by adaptively selected neighborhoods [9].The other line is to adjust some details of algorithms.For example, multiple local weight vectors are used to solidify the structures determined by neighborhoods in [10].In [11], the influence of noisy points on the reconstruction is greatly reduced by solving a new local optimization model.In [12], a robust DLPP version based on L1-norm maximization is proposed.In [13], the short-circuit errors can be reduced by solving the problem of selecting the right number and position of landmarks automatically.A robust version of LTSA is proposed in [14] to further reduce the influence of noise on embedding results by endowing clean data points and noise data points with different weights into local alignment errors.In [15], an out-of-sample extension framework for a global manifold learning algorithm (ISOMAP) that uses temporal information in out-of-sample points in order to make the embedding more robust to noise and artifacts is proposed.
Although the improved manifold learning algorithms are more robust against noise than the original algorithms, few works are done on the semi-supervised algorithms [16].In fact, the undesirable effect caused by noise is more complicated in the semi-supervised problem.Firstly, it is difficult to accurately explore the local geometric structures when the local neighborhoods contain noisy points.Secondly, the provided prior information may be inexact for noisy points.And the constructed low-dimensional coordinates using the inexact prior information may be far from the real onmanifold coordinates of the sample points.The first issue can be solved by constructing noise-free neighbor sets [7][8][9] or constructing robust local geometric structures of the noisy neighbor sets [10,11,17].And we do not extend the topic regarding the first issue in the paper.
We focus on the second issue in the paper.We estimate the noise levels of the sample points which reflect the confidence levels in the prior information.Then we construct a new semi-supervised optimization model to reduce the undesirable effect of the inexact prior information with low confidence levels.A framework for robust semi-supervised manifold learning (RSSML) is proposed by solving the new semi-supervised optimization model.
The rest of this paper is organized as follows.In Section 2, we give a brief review of semi-supervised manifold learning.In Section 3, we show how to extend the semi-supervised manifold learning algorithms so that they can handle inexact information for noisy points.The framework for robust semi-supervised manifold learning (RSSML) is presented in the section.After that, we give numerical experiments in Section 4 to show the effectiveness of RSSML.

A Brief Review of Semi-Supervised Manifold Learning
Our work is an extension of semi-supervised manifold learning (SSML).In this section, we give a brief review of (SSML) [5].Assume that we are given a data set  = { 1 , . . .,   } (possibly with noise) from a -dimensional manifold.Without loss of generality, suppose that the prior information of the first  points is known.And denote by T1 = [ * 1 , . . .,  *  ] ∈  × the constructed low-dimensional coordinates using the prior information.The goal of SSML is to calculate the unknown low-dimensional coordinates  2 = [ +1 , . . .,   ] ∈  ×(−) .SSML proceeds in the following steps.
Step 2 (extracting local geometry).The local geometry of the determined neighbor set can be extracted by solving the classical local optimization methods [1][2][3][4].Take LLE as an example; the local geometry is characterized by the linear combination coefficients which can be computed by minimizing the least square optimization model: ( Step 3 (constructing semi-supervised optimization model).
In the unsupervised manifold learning algorithms, the global low-dimensional coordinates are calculated by solving the embedding cost functions which can preserve the extracted local geometries.For example, in LLE, the low-dimensional coordinates   ( = 1, . . ., ) can be computed by minimizing the embedding cost function: Different from the embedding cost functions in the unsupervised manifold learning algorithms, a regularization term concerning the prior information is added such that the lowdimensional coordinates can obey the prior information.As in semi-supervised LLE, the low-dimensional coordinates are obtained by minimizing the semi-supervised optimization model: where  is the regularization parameter that reflects the confidence level in prior information.
Semi-supervised manifold learning has been widely used in many real-life applications such as face recognition [18], remote sensing image classification [19], object tracking [5], and data visualization [18].Figure 1 illustrates some of the real-life applications.The available prior information and the computed embedding coordinates are different in different applications.For example, in the applications of face recognition and remote sensing image classification, the known low-dimensional coordinates are constructed according to the labels of the training data points, and SSML aims to predict the labels of the remaining data points using the known low-dimensional coordinates (see Remark 2 in the next section for the way of constructing low-dimensional coordinates and predicting the labels).In object tracking, SSML aims to recover the real locations of the object, using the given locations of the object in certain frames.In data visualization, SSML projects the data points to 2-dimensional or 3-dimensional embedding space to discover the hidden relations of the data points, using the given 2-dimensional or 3-dimensional coordinates of the training points.
Generally, semi-supervised manifold learning algorithms work well if the data sets are well sampled from a manifold.

Mathematical Problems in Engineering
In the situation,  → inf, and the minimization of ( 3 When the data sets contain noise, the effectiveness of SSML will be significantly decreased.This is because the noise level of each sample point may be different.For the sample points containing large noise, their prior information may not be trustworthy, and the regularization parameter  tends to be small.For those points only containing small noise, we are confident in the provided prior information, and  tends to be large.A fixed  cannot reflect the different confidence levels in the prior information of each point.It is desirable to construct a robust semi-supervised optimization model against noise.

Robust Semi-Supervised Manifold Learning Algorithm
In this section, a framework for robust semi-supervised manifold learning algorithm (RSSML) will be proposed.
Since the prior information may be inexact for the noisy points, the major problem is how to deal with inexact prior information according to the different noise levels of the sample points.Note that the noise levels of the sample points are unknown generally.It is desirable to measure the noise levels of the sample points before proposing the robust semisupervised optimization model.

3.1.
Measure the Noise Level.Recently, some work has been done to measure the noise levels of the points [20][21][22][23].In this paper, we measure the noise levels by the outlier detection algorithm based on reconstruction weights (ODBRW), due to little computation cost, low parameter requirement, and high effectiveness [20].The ODBRW method is applied only on the training points  1 , . . .,   , which consists of the following steps.
Step 1 (constructing the edge point sets).Search -nearest neighbors (KNN) of each   ,  = 1, . . ., , and determine the neighbor set N  = {  1 , . . .,    } firstly.Then select the edge points from N  by which requires that angles between any adjacent edges ⟨  ,    ⟩ and ⟨,    ⟩ should be acute or right.If the angle between the adjacent edges ⟨  ,    ⟩ and ⟨,    ⟩ is obtuse, it means that    separates   and .A point  is said to be an  edge point of   if there is no other neighbor    that separates   and .The determined edge point sets are very robust on the neighborhood size .See an illustrated example of the edge point set in Figure 2.More explanations about the edge point set can be found in [24].
Step 2 (calculating reconstruction weights).The reconstruction weights of the edge point set { where ℎ  , 1   ∈    and 1   is a vector with all ones.
Step 3 (measuring the noise levels).Form an  ×  matrix  by ℎ  ,  = 1, . . ., ; that is, where   is the index set of EP(  ).The noise level of   can be measured by 3.2.Robust Semi-Supervised Optimization Model.It is shown in theory that the smaller   is, the more likely that   tends to be an outlier [20].For a small   , it means that the prior information on   is not trustworthy.Hence, we hope that the effect of the prior information on computing the embedding coordinates can be reduced.For a large   , the sample point   tends to be a clean point, and we are more confident on its prior information.
Output.The low-dimensional coordinates .
Step 2 (extracting local geometry).Extract the local geometry  by some local optimization models of manifold learning.Matrix  is given by   =   −   −   + ∑      with   = 1 if  =  or 0 otherwise.
Step 4 (embedding global coordinates).Compute the lowdimensional embedding coordinates  by solving the linear system of equations (15).
Remark 1.Notice that many unsupervised learning methods can be extended to their semi-supervised versions by the proposed robust semi-supervised manifold learning (RSSML) algorithm.In this paper, we explore the local geometry by solving the least squares (LS) problem of LLE.And the local geometry can be explored by other local optimization methods such as RLLPE and LTSA.We call them RSSML-LLE, RSSML-RLLPE, and RSSML-LTSA.

Experiment Results
To verify the effectiveness of the proposed algorithm RSSML on real-world data, we perform experiments on CMU PIE data set [25], Handwritten-Alpha data set [26], and HAND SHAPE [27].For comparison, we apply the unsupervised methods LLE, RLLPE, and LTSA (see [2,4,11]), their semi-supervised versions, and the proposed robust semisupervised versions on the above data sets.In the three realworld examples, some noisy points are also added to test the robustness of RSSML to noisy data sets.
CMU PIE [25].The original data set contains 11560 samples of 68 individuals in 32 × 32 gray-scale image.In the experiment, we randomly selected 160 samples from 10 individuals (for a total of 1,600 samples).Some samples are plotted in Figure 3.
Handwritten-Alpha [26].The data set (HW-Alpha) is extracted from "binaryalphadigs" data set.It consists of 936 images of 26 handwritten alphabets.Each class has 36 images which are of size of 20 × 16.
HAND SHAPE [27].The original data set (Cambridge Hand Gesture Data) contains 9 gesture classes in 320 × 240 grayscale images, which are defined by 3 primitive hand shapes and 3 primitive motions.In this experiment, the target task is to classify different hand shapes.Therefore, the final data set is divided into five groups (see Figure 4).We randomly selected 350 points of each group to form the experiment set.
In the experiments, the parameters of these methods are set as follows.All of the manifold learning algorithms are involved with the neighborhood size parameter.The neighborhood size  is selected from 8 to 36.For the unsupervised methods LLE, RLLPE, and LTSA, the intrinsic dimension  ∈ {10, 14, 18, 22, 26, 30} is tried.The different regularization parameter  ∈ {5, 50, 100, . . ., 500} is tried in our robust semi-supervised version of LLE, LTSA, or RLLPE.We only report the best results.
To perform classification, the data sets are firstly projected onto low-dimensional space by these unsupervised methods, and then the Nearest Feature Line (NFL) classifier (see [28,29]) is used on the low-dimensional embedding results for the recognition.For the semi-supervised methods, the label information of the unlabeled points can be predicted directly.
In the first experiment, some noisy points are added to the original three data sets.We randomly select 10% sample points from the data sets to generate the noisy points.For each selected image, we randomly chose 1/6 pixels from the original image and changed the pixel values from V to 255 − V. Some of the noisy images are shown in Figure 3. Half of the sample points are randomly selected as training samples and the remaining are used for testing.Table 1 lists the classification rates of the manifold learning methods on the three data sets.It is clear that the unsupervised methods are sensitive to noise, especially for LTSA.When the data sets contain noisy points, LTSA may fail to find a reasonable local tangent space of each data point.For SS-LLE, SS-RLLPE, and SS-LTSA, the classification accuracies can be improved obviously.RSSML-LLE, RSSML-RLLPE, and RSSML-LTSA outperform the semi-supervised manifold learning approaches in the experiments.It is shown that the robustness to noisy points can be improved by the proposed robust semi-supervised model.
To better compare the effectiveness of the above manifold algorithms on noisy data sets, we generate the noisy data with different densities of Gaussian and reverse noise.We randomly select 10% samples from CMU PIE data set to generate the noisy images.The noisy images are generated in two ways.One way is to randomly choose 1/6, 1/8, or 1/12 pixels of each selected image and invert the value from V to 255 − V. Another way is to add the Gaussian noise to the selected image with different variances: 0.02, 0.05, or 0.1.The experimental results of the unsupervised, semi-supervised, and robust semi-supervised versions of LLE, RLLPE, and LTSA methods on the six experimental sets are shown in Table 2.
As can be seen in Table 2, SS-LLE, SS-RLLPE, and SS-LTSA are more sensitive to noisy points than RSSML (-LLE, -RLLPE, and -LTSA).Under different noise levels, RSSML methods can achieve higher classification accuracies than the other algorithms.It is further shown that RSSML can better handle the noisy points in the experiments.We notice that the classification rates of RSSML are also higher than those of the other methods on the original data set.This is because the original data set may be contaminated by noise in the sampling process.The proposed RSSML methods can also

Figure 1 :
Figure 1: Some real-life applications of semi-supervised manifold learning.(a) Face recognition, (b) remote sensing image classification, (c) object tracking, and (d) data visualization.

Figure 2 :
Figure 2: An illustrated example of the edge point set.The neighbor set of  determined by KNN is { 1 ,  2 ,  3 ,  4 }.And the edge point set is { 1 ,  2 ,  4 }. 3 is not an edge point of  since  2 separates  and  3 ; that is, the angle between the adjacent edges ⟨,  2 ⟩ and ⟨ 3 ,  2 ⟩ is obtuse.The edge point set is always { 1 ,  2 ,  4 } for different neighborhood size  ( ≥ 4).

Figure 3 :
Figure 3: Some CMU PIE images (original CMU PIE images and noisy CMU PIE images).