Robust L-Isomap with a Novel Landmark Selection Method

Isomap is a widely used nonlinear method for dimensionality reduction. Landmark-Isomap (L-Isomap) has been proposed to improve the scalability of Isomap. In this paper, we focus on two important issues that were not taken into account in L-Isomap, landmark point selection and topological stability. At first, we present a novel landmark point selectionmethod. It first uses a greedy strategy to select somepoints as landmark candidates and then removes the candidate points that are neighbours of other candidates. The remaining candidate points are the landmark points. The selection method can promote the computation efficiency without sacrificing accuracy. For the topological stability, we define edge density for each edge in the neighbourhood graph. According to the geometrical characteristic of the short-circuit edges, we provide a method to eliminate the short-circuit edge without breaking the data integrity.The approach that integrates L-Isomap with these two improvements is referred to as Robust L-Isomap (RL-Isomap). The effective performance of RL-Isomap is confirmed through several numerical experiments.


Introduction
Real-world data such as voice signals, gene microarray, or hyperspectral imagery data usually has a high dimensionality, which makes them difficult to analyze.This is known as the "curse of dimensionality" [1].In order to analyze and process the high-dimensional real-world data adequately, data dimensionality reduction has been attracting significant interest.Generally, real-world data is found to lie on a lowdimensional manifold embedded in the high-dimensional observation space.Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality.It can remove redundant information from the original data and alleviate the "curse of dimensionality" problem.
In the last few decades, many dimensionality reduction algorithms have been proposed.Principal component analysis (PCA) [2] and multidimensional scaling (MDS) [3] are traditionally linear methods.However, these linear methods fail to deal with complex nonlinear data.Currently, a number of nonlinear dimensionality reduction methods are available, for example, isometric feature mapping (Isomap) [4], local linear embedding (LLE) [5], and local tangent space alignment (LTSA) [6].Among these methods, Isomap is representative of global manifold learning methods, which attempts to preserve global geometrical features of data set in the embedding space.It substitutes Euclidean distance in MDS for geodesic distance.Hindered by MDS and geodesic distance computation, Isomap would become very timeconsuming as the amount of the input data increases.To improve the scalability of Isomap, L-Isomap was proposed by Silva and Tenebaum [7] in which the time-consuming computation was performed on a subset of points referred to as landmark points.However, two main problems still exist with L-Isomap.
The first problem is how to select landmarks for L-Isomap.So far, several landmark selection methods have been proposed.In [7], landmarks are selected randomly from the given data.An interesting approach called Fast-Isomap based on integer optimization was proposed in [8].In [9], landmarks are chosen from an approximate central part of the given data.In this paper, we propose a novel method for landmark point selection.First, the method selects some points as candidates using a greedy strategy.After that, it obsoletes the candidate points that are neighbours of other candidate points.The remaining candidate points are determined as the landmark points.Our method is proven to be more efficient without sacrificing accuracy by experiments.
Another critical problem of L-Isomap algorithm is its topological instability [10], which is caused by short-circuit edges [11].In order to compute the geodesic distance, a neighbourhood graph is constructed by connecting every data point with its -nearest neighbours.The short-circuiting problem occurs when an oversized neighbourhood is chosen so that the neighbourhood distance is larger than the distance between the folds in the manifolds.Short-circuit edges severely damage the manifold structure.Thus even a single short-circuit edge may produce many errors when computing geodesic distance which in turn severely impairs the performance of L-Isomap.A simple way to avoid shortcircuit edges is to decrease the neighbourhood size.But a very small neighbourhood size will fragment the manifold into a large number of disconnected regions [10].Thus, choosing an appropriate neighbourhood size requires a priori information about the global geometry of the high-dimensional manifold.However, it is difficult to know the information beforehand [4,10].H. Choi and S. Choi [12] solve the short-circuiting problem by removing data points with extremely large total flows when computing the shortest path.In this paper, we define edge density for every edge in the neighbourhood graph using multivariate kernel density estimation (KDE) and propose a method to make L-Isomap robust to shortcircuit edge.Different from the former method which tries to delete abnormal points, our method aims at short-circuit edge itself.According to the geometric feature of the shortcircuit edge that the areas which short-circuit edges go through have very few data points, the edge which has extraordinarily low edge density is claimed as the shortcircuit edge and our method removes such edges from the neighbourhood graph.Numerical experiments on several data sets demonstrate the effectiveness of our method.
This paper is organized as follows.Section 2 briefly reviews the Isomap algorithm and L-Isomap algorithm.A novel landmark selection method is given in Section 3. Section 4 presents a method of eliminating short-circuit edges.Experiment results on several data sets are given in Section 5, in order to validate the useful performance of RL-Isomap.Finally, conclusions and future extensions are discussed in Section 6.

Isomap and L-Isomap
For the given original input data  ∈  × with  samples and  dimensions, Isomap attempts to embed those samples into a lower dimensional space  ∈  × , while preserving the geodesic distance between all the input points as faithfully as possible.Isomap first constructs a weighted undirected neighbourhood graph  = (, ), where each node V  ∈ , corresponding to the point   ∈ , is connected with its -nearest neighbours and each edge   ∈  is assigned weight   that represents the Euclidean distance between points   and   .Second, Isomap computes the shortest path between every two points in the graph to approximate the geodesic distance    using Dijkstra's or Floyd's shortest-path algorithm [13,14].The geodesic distance between all the data points forms the geodesic distance matrix   .Finally, Isomap applies MDS on matrix   to find the low-dimensional embedding.
So far, Isomap has been successfully applied in many different fields.Unfortunately, when the amount of input data, , is too large, Isomap may become too time-consuming in terms of the shortest-path construction (( 2 log )) and the MDS eigenvalue decomposition (( 3 )).In order to speed up these two computations, L-Isomap is proposed.L-Isomap randomly selects  points from , denoted as landmark points.Instead of computing the shortest path between all data points, L-Isomap only computes the shortest path from each data point to the landmark points.Then classical MDS is applied to the resulting  ×  geodesic distance matrix to find the low-dimensional embedding of the landmark points.The embeddings of the remaining points are obtained by a fixed linear transformation of their geodesic distance to the landmark points.This way the time complexities of the shortest path and the MDS computation are, respectively, reduced to ( log ) and ( 2 ).

Landmark Candidate.
A very important procedure for L-Isomap is to build the -nearest neighbourhood graph.However, many neighbourhoods are similar because they share common points.Thus, some neighbourhoods can be deleted to get a simpler neighbourhood graph.Based on this idea, we select the candidate landmark points by simplifying neighbourhood graph.Formally, let Ω = { 1 ,  2 , . . .,   } be the neighbourhood set, one for each point   ∈ , where ⋃  =1   = , and each set   ∈ Ω is assigned a nonnegative cost (  ).The goal is to find a set cover Once the cover Ω * is obtained, the corresponding landmark candidates are determined.The problem can be resolved by a greedy strategy.Let    be the number of uncovered points in   and the ratio of   is    =    /(  ) which counts the number of points covered by   per unit cost.The probability of including   in Ω * increases with the ratio    .The landmark candidate selection problem is an unweighted case, where (  ) = 1.A sketch of the landmark candidate selection method can be summarized as Algorithm 1.
Algorithm 1 can run in time ( log ) and it achieves an approximation ratio of (), where and  is the size of the largest set in Ω [15,16].We apply Algorithm 1 to get the neighbourhood subset   landmark candidate points to obtain the landmark points that are nonneighbouring to each other.Formally, let  = {  1 ,   2 , . . .,   V }, where    represents the set of points whose neighbourhood include    and  ∈ {1, 2, . . ., V}.The landmark point selection method can be summarized as Algorithm 2.
In order to test the effectiveness of our landmark selection method, we, respectively, run RL-Isomap and other three methods on Swiss roll data, where the amount of the input data varies from 500 to 7500.Tenebaum et al. [4] used residual variance to characterize how well the embedding result preserves the geometry features of the high-dimensional manifold.The smaller the residual variance value, the better the embedding result.As shown in Figure 1, EL-Isomap has the worst performance of the four methods in the experiment because it always tends to select landmarks which are situated around the circumcenter [9].The random scheme performs well but it is not stable enough because of its random strategy and unpredictability.Fast-Isomap has similar performance to RL-Isomap; however, the number of landmark points selected by RL-Isomap is much smaller than that of Fast-Isomap as shown in Figure 2(a).In this experiment, the landmark points selected by Fast-Isomap is averagely 48.55% more than RL-Isomap.As a result, RL-Isomap will be faster than Fast-Isomap when computing the low-dimensional embedding.As shown in Figure 2(b), RL-Isomap is averagely 33.56% faster than Fast-Isomap in the experiment.So RL-Isomap performs best in both speed and accuracy in this experiment.

Robust L-Isomap
4.1.Short-Circuit Edge Elimination.As pointed out in Section 1, L-Isomap faces topological instability problem because of the short-circuit edge.An oversized neighbourhood might result in short-circuit edges that destruct the manifold structure.As shown in Figure 3, the short-circuit edge, denoted by black solid line, directly links up two points which are supposed to be very far on the manifold.A simple way to avoid these short-circuit edges is to decrease the neighbourhood size; however too small neighbourhood will break the connectivity of the neighbourhood graph, as shown in Figure 4. Thus it is not an easy job to choose an appropriate neighbourhood size.The previous method tries to delete some outliers to mitigate the short-circuiting problem.But the method breaks the integrity of the data set.In this section, we present a novel method which directly deletes the shortcircuit edges in the neighbourhood graph.
As shown in Figure 3, the areas which short-circuit edges go through usually have very sparse data points.Based on this fact, we claim that the edge that goes through the area which has extremely low data density is a short-circuit edge.In order to quantify the data density of the area which the edges go through, we introduce edge density for each edge in the neighbourhood graph using KDE method.
KDE is a nonparametric tool for estimating the distribution of data [18].The multivariate KDE is a direct extension of the univariate estimator.For the -variate random sample  1 ,  2 , . . .,   having density , the -dimensional KDE is defined as follows: where  ∈ R  ,  is a symmetric positive  ×  matrix called the bandwidth matrix, and where || is the determinant of  and  is -dimensional kernel function that satisfies and ∫ is shorthand for ∫ ⋅ ⋅ ⋅ ∫ R  .However, KDE is linear method, such that it cannot be applied directly on complex nonlinear data.In mathematics, the manifold is a topological space that locally resembles Euclidean space near each point.More precisely, each point of a -dimensional manifold has a neighbourhood that is homeomorphic to the -dimensional Euclidean space [5].Therefore, the points within the same neighbourhood can be approximately seen to have linear structure.For the input data  ∈  × , L-Isomap uses -NN rule to build the neighbourhood graph  with edge set  = {  |  > ,   ∈   or   ∈   }, where   is the neighbourhood of point   and   denotes the edge connecting points   and   .Let F  = {  } ∪   and, according to the local linearity properties of the manifold, data points in F  have nearly linear structure.This way, we define the point density of the manifold at point   as where |F  | denotes the number of points included in F  .Given an edge   , the quartiles of   can be calculated as follows: After that, we have Then let We define edge density of each edge   in  as follows: = q max ( ĝ (  ; ) , ĝ (  ; )) .
(1) Define a neighbourhood graph by using -NN rule.
(2) Eliminate the edges whose   is below .
(5) Apply LMDS to find a low-dimensional embedding.
In this paper, we take the standard -variate normal density, () = (2) −/2 exp(−(1/2)  ) as the kernel function.The choice of bandwidth matrix  is intractable in KDE [19].In the univariate case, too big values of ℎ will make the estimate too smooth and may not uncover the structural features, whereas small values of bandwidth ℎ yield "wiggly" estimate and show spurious features [20].In the multivariate case, the choice of the bandwidth matrix  faces the same dilemma.Fortunately, our experiments show that the choice of  has little effect on the results because the edge density values of short-circuit edges are much lower than that of the normal edges.For simplicity, we take the unit matrix  as bandwidth matrix.
We compute the edge density for every edge in  by ( 9) and have  = {  |  > ,   ∈   or   ∈   }.As mentioned above, short-circuit edges have extremely low edge density.In such case, if   is less than a certain threshold, we believe that the corresponding edge   is a short-circuit edge and remove it from the neighbourhood graph.For example, edge density values for Swiss roll data are illustrated for two different sizes of neighbourhoods, where  = 10 and  = 15 in Figure 5.In Figure 5(c), there are a few short-circuit edges in the neighbourhood graph where  = 15.In this case, a few edges have extremely low edge density values shown in Figure 5(d).While, for the case of  = 10, the neighbourhood graph in Figure 5(a) is very healthy and the corresponding edge density values are well-distributed as shown in Figure 5(b).From Figure 5, it is clear that the edge density values of shortcircuit edges are much lower than that of normal edges.In order to determine the threshold, we sort all the elements in  in ascending order and use   which has the maximum increment in the first half of the sequence as our threshold.The threshold  can be obtained adaptively as follows.
(1) Arrange all elements in  in ascending order and the first half of the sequence is The algorithm that integrates L-Isomap with the landmark selection method and the short-circuit edge elimination method is called RL-Isomap.The main procedure of RL-Isomap is summarized as Algorithm 3.
The previous method proposed by H. Choi and S. Choi [12] must recompute the neighbourhood graph because a few points which have extremely high total flows have been eliminated from the original input data set.On the contrary, RL-Isomap directly eliminates the short-circuit edges from the neighbourhood graph and will not destroy the data's integrity.

Complexity of RL-Isomap. RL-Isomap algorithm runs in
where ( 2 ) is the complexity of constructing the neighbourhood graph of all input data.Then, we compute the edge density for the neighbourhood graph to eliminate the short-circuit edges and its time complexity is ( 2 ).Next term ( log ) is the complexity using Algorithm 1 to get the landmark candidate points.Algorithm 2, determining the landmark points, runs in (V 2 ) where V is the number of the candidate points.The fifth term ( log ) is the complexity of computing the geodesic distance using Dijkstra's algorithm.The last term ( 2 ) is the complexity of computing the embedding of the input data.

Numerical Experiments
In this section, we conduct several experiments on different data sets: (1) Swiss roll data; (2) Toroidal Helix data; (3) face image data, to test RL-Isomap.After that, we use RL-Isomap to analyze the nonlinear structure of the high-dimensional Internet traffic matrix.

Experiment on Swiss Roll Data.
In Figure 5(c), 1000 data points were used to build the neighbourhood graph where the neighbourhood size  = 15.In such case, there are several short-circuit edges appearing in the neighbourhood graph shown in Figure 5(c).First, we apply Robust Kernel Isomap [12] to the neighbourhood graph.After deleting two points which are considered as outliers by Robust Kernel Isomap, a new neighbourhood graph is shown in Figure 6(a) and there are still some short-circuit edges left.In such case, the embedding result by Robust Kernel Isomap is not correct shown in Figure 6(e).The distribution of edge density values for the neighbourhood in Figure 5(c) is shown in Figure 5(d) and it is clear that some values are extremely low.We then apply RL-Isomap to compute the threshold, marked by an orange line in Figure 5    The edge density values of the new neighbourhood distribute more evenly than that of the original neighbourhood graph (see Figure 8(b)).After that, we run Algorithm 1 on the new neighbourhood graph and get the landmark candidate points denoted by black solid point shown in Figure 7(e) and then Algorithm 2 is applied to the candidate points and the resulting landmark points, marked by red circles, are illustrated in Figure 7(f).Finally, RL-Isomap finds the correct two-dimensional embedding, shown in Figure 7(h), that faithfully preserves the geometric features of the original manifold.This experiment further validates the effectiveness of RL-Isomap.

Figure 1 :
Figure 1: Residual variance comparison of four methods.

Figure 5 :
Figure 5: Edge density of each edge in the neighbourhood graph for the case of Swiss roll data.The number of the data points is 1000.(a) The -nearest neighbourhood graph with  = 10.(b) The edge density values are well-distributed for the case of  = 10.(c) The -nearest neighbourhood graph with  = 15.There are some short-circuit edges in this case.(d) The orange line denotes the threshold  = 0.9858.Some edges have extremely low edge density values that are considered as short-circuit edges.
(d), where the threshold  = 0.9858 and the edges whose edge density values are below the threshold are deleted to get a new neighbourhood −0.8 −0.6 −0.4 −0

Figure 6 :
Figure 6: Comparison of RL-Isomap with Robust Kernel Isomap for the case of Swiss roll data.(a) Neighbourhood graph by Robust Kernel Isomap.(b) Neighbourhood graph after removing the edges whose edge density values are below the threshold by RL-Isomap.(c) The distribution of the edge density values of edges in the new neighbourhood graph.(d) The landmark candidate points and the landmark points.(e) Embedding result by Robust Kernel Isomap.(f) Embedding result by RL-Isomap.

Figure 7 :
Figure 7: Comparison of RL-Isomap with Robust Kernel Isomap for the case of Toroidal Helix data.(a) The Toroidal Helix data.(b) The -nearest neighbourhood graph with  = 25.(c) Neighbourhood graph constructed by Robust Kernel Isomap.(d) The new neighbourhood graph after removing the edges whose edge density values are below the threshold.(e) The landmark candidate points.(f) The landmark points.(g) Embedding result by Robust Kernel Isomap.(h) Embedding result by RL-Isomap.

5. 2 .
Experiment on Toroidal Helix Data.Toroidal Helix data is shown in Figure 7(a), with data number  = 1200.The neighbourhood graph is constructed with  = 25 and Figure 7(b) gives an overhead view of the neighbourhood graph and in such case there are a few short-circuit edges.First, we use Robust Kernel Isomap and 8 points are removed.After that, Robust Kernel Isomap reconstruct the neighbourhood graph, but there are still short-circuit edges existing, shown in Figure 7(c) and the corresponding embedding result in Figure 7(g) is not correct.Then we use (9) to compute edge density for each edge in the neighbourhood graph in Figure 7(b) and the values are illustrated in Figure 8(a) where the threshold  = 0.9313.The edges whose edge density is below the threshold are removed from the neighbourhood graph and then get a new neighbourhood graph without shortcircuit edge shown in Figure 7(d).

Figure 8 :Figure 9 :
Figure 8: The comparison of distribution of edge density value between the two neighbourhood graphs.

5. 3 .
Experiments on Face Data.In addition to the previous experiments on synthetic data set, we apply RL-Isomap on real-world data, face images.The input consists of 64-pixelby-64-pixel images of a face with different lighting directions and poses, represented by a sequence of 4096-dimensional vectors.Thus the data set actually is a 3-dimensional manifold embedded in the 4096-dimensional observational space.We apply L-Isomap and RL-Isomap, respectively, on the data to detect its intrinsic geometric structure, where  = 10.As shown in Figure9, representative faces are shown next to circled points in different parts of the space where each coordinate axis of the embedding result corresponds with one degree of freedom underlying the initial data: up-down pose and left-right pose.The grayscale of a 5 × 64 square bar below each representative face represents lighting direction.In Figure9(a), the embedding result of L-Isomap is not correct because some points representing left-pose are projected close to the points representing the right pose which is clearly caused by short-circuiting problem.In Figure9(b), RL-Isomap correctly finds the 3-dimensional embedding of the input data.5.4.Nonlinear Structure Analysis of Traffic Matrix by RL-Isomap.Internet traffic matrix has been a useful traffic data model to understand the Internet from the whole-network perspective, but Internet traffic matrix usually possesses highdimensional attributes[21].The experiments above have proved the effectiveness of RL-Isomap, so, in this part, we apply RL-Isomap to analyze the intrinsic nonlinear structures of the Internet traffic matrix.5.4.1.Traffic MatrixModelling.An Origin-Destination (OD) flow comprises all traffic originating from a given source and delivering to a given destination.Let  denote the number of all sources and destinations in a network.A traffic matrix is naturally represented by a three-dimensional, (2)If    = 0 for all  then stop: Ω * is the cover, where  = 1, 2, . . ., .Otherwise find a subscript   ∈ {1, 2, . . ., }, maximizing the ratio     /(   ) and proceed to Step (3).(3) Add    to Ω * , replace each   by   −    and return to Step (1).