The problems of improving computational efficiency and extending representational capability are the two hottest topics in approaches of global manifold learning. In this paper, a new method called extensive landmark Isomap (EL-Isomap) is presented, addressing both topics simultaneously. On one hand, originated from landmark Isomap (L-Isomap), which is known for its high computational efficiency property, EL-Isomap also possesses high computational efficiency through utilizing a small set of landmarks to embed all data points. On the other hand, EL-Isomap significantly extends the representational capability of L-Isomap and other global manifold learning approaches by utilizing only an available subset from the whole landmark set instead of all to embed each point. Particularly, compared with other manifold learning approaches, the data manifolds with intrinsic low-dimensional concave topologies and essential loops can be unwrapped by the new method more successfully, which are shown by simulation results on a series of synthetic and real-world data sets. Moreover, the accuracy, robustness, and computational complexity of EL-Isomap are analyzed in this paper, and the relation between EL-Isomap and L-Isomap is also discussed theoretically.

Nonlinear dimensionality reduction (NLDR) is an attractive topic in many scientific fields [

Compared with the local approaches, the global approaches are better in terms of giving more faithful global geometric representations and being more understandable on metric-preserving construction principles. Yet they mainly lose on two points [

Corresponding to the above two points, the two topics have attracted more and more attention recently, which are improving the computational speed and extending the available range of the global approaches, such that both performances could be comparable or in excess of those of the local approaches. Recently, both topics have been developed to a certain extent independently. The most typical work addressed to the first topic is the landmark Isomap (L-Isomap [

The main purpose of this paper is to present a new method, addressed on both topics simultaneously. Similar to the L-Isomap, the new method also utilizes specific landmark set to embed the new input data, due to which it can be seen as an extension of L-Isomap, therefore called extensive L-Isomap or EL-Isomap. It is common knowledge that the EL-Isomap can have the similar high efficiency of L-Isomap by using landmark subset as the reference to embed the whole data set. However, the distinctions between L-Isomap and EL-Isomap in motivations, algorithms, and theoretical foundations lead to significantly different performance of the applications. The simulation results show that EL-Isomap considerably extends the range of manifolds, on which the original global approaches (including L-Isomap) take effect. The typical two examples are the data manifolds with loops and the ones with intrinsic topology of concave regions in the low-dimensional space. The synchronous improvement on both topics makes EL-Isomap distinguished from other global approaches, which is evidently verified by the simulations implemented on a series of synthetic and real-world data sets.

In summary, the proposed method has mainly the following threefold contributions in manifold learning. First, it essentially extends the available range of current manifold learning techniques and can be effectively utilized in data lying on loopy manifold, concave structured manifold, and others with complex manifold configurations. The new method thus possesses the advantage owned by many local manifold learning approaches. Second, by calculating and utilizing the geodesic distance across the entire manifold under mathematical deductions, the new method is capable of keeping global low-dimensional structure under the whole manifold. It thus inherits the advantage of global manifold learning approach, especially those geodesic-distance-based ones. Furthermore, the proposed method guarantees a low computational complexity in implementation, which is comparable to current most efficient manifold learning techniques. All these contributions have been theoretically evaluated or empirically substantiated through experiments.

This paper is organized as follows: Section

Since being presented in [

As mentioned in the first section, the initial motivation of L-Isomap is the first topic, that is, improving the computational efficiency of the global approaches. By approximating a large global computation through calculations of a much smaller set, L-Isomap significantly decreases the computational complexity of Isomap to almost linearly increasing with the number of input data set, which makes L-Isomap comparable to the local approaches in this point.

The main motivation of EL-Isomap is changed to the second topic, that is, enlarging the range of data manifolds, on which the global approaches can implement effective manifold learning. Furthermore, the algorithm also inherits the high efficiency property of L-Isomap.

Particularly, the construction of EL-Isomap is motivated heuristically by the following facts. The related information utilized by Isomap is the estimated geodesic distances between all data pairs, while, for the L-Isomap, those become the estimated geodesic distances between all data and the landmarks, a small subset of the original data set. Except the reduction of computational complexity, this also brings another extra advantage to L-Isomap: even when some of the geodesic distances (between nonlandmarks) are impossible, or not easy, or not faithful to be estimated (as the geodesic distance between A and B in Figure

(a) is a

The L-Isomap mainly contains two processes: calculating embeddings of the specialized landmarks in the latent space and adopting them as the reference to embed any new input. Comparatively, EL-Isomap also needs to implement two processes. The first processes of both methods are very similar, but the second are essentially different, in which L-Isomap realizes embedding for a new input via utilizing all landmarks as reference points while EL-Isomap does so through adopting only part of the landmarks as the available reference point set. We first list the main steps of L-Isomap, and then we only present the different part of EL-Isomap from L-Isomap as a comparison.

The main difference between L-Isomap and EL-Isomap is Step 3, which commonly aims at embedding a new input from the original space to the latent space. Step 3 of EL-Isomap is listed as follows in contrast to that of L-Isomap.

Note that Step 4 of EL-Isomap actually follows the same setting as the L-Isomap method [

Actually, another difference between the two algorithms is the related landmarks selecting strategies (Step 1) derived from the distinct aims of L-Isomap and EL-Isomap. Since this section intends to give only macroscopical comparisons between both methods, the detailed analysis for difference between Step 1 of both methods will be given in Section

Notice that the first

The reasonability of a global approach for manifold learning can be learned via validating its two capabilities: accuracy in Euclidean case and stability in non-Euclidean case. Specifically, the accuracy of a global approach means when the geodesic distance matrix of manifold data is exactly Euclidean, the approach is guaranteed to find the accurate global figure of the corresponding low-dimensional data; the stability of a global approach means when the estimated geodesic distances between high-dimensional data are with certain perturbations to render them deviated from Euclidean condition, the approach can still stably find the approximately correct embeddings.

To analyze the reasonability of L-Isomap and the reasonability of EL-Isomap, the reasonability of the calculated embeddings of the landmark set by Step 2 has to be involved due to its decisive influence on the final results of both methods. To this aim, the following two theorems are presented.

Suppose the geodesic distance matrix

Let

The above theorems can conveniently deduce the reasonability of L-Isomap. The related results are listed in the following as comparisons with those of EL-Isomap.

Under the situation of Theorem

Under the situation of Theorem

Like L-Isomap, EL-Isomap also aims at consistently finding the embedding of any new input, that is, finding the components of its low-dimensional corresponding when projected onto the uniform coordinate system

Under the situation of Theorem

(i) Center

where

Denote the first

According to (

(ii) Calculate the projection of

where

(iii) According to the supplemental condition of the theorem, each

which is formula (

The proof is completed.

The stability of EL-Isomap can then be given by virtue of the above results.

Consider perturbations

(i) Since

(ii) According to (

(iii) Since

Based on the above theorems we can give theoretical analysis for both methods comparatively. Theorems

As mentioned above, EL-Isomap is closely related to L-Isomap and Isomap, because L-Isomap is derived from Isomap, whose main aim is to improve the computational complexity of Isomap; on the other hand EL-Isomap is originated from L-Isomap, whose main purpose is to extend the representational capacity of L-Isomap and further the global manifold learning approaches. Another consideration of EL-Isomap is to inherit high efficiency property of L-Isomap, so as to make EL-Isomap have improvement in both hot topics, which are addressed on global approaches for manifold learning. To this aim, the computational complexities (time and space complexities) for three methods are analyzed as follows, aiming at showing efficiency property of EL-Isomap through comparisons.

The Isomap algorithm costs time mainly on two steps: geodesic distance estimation, which estimates an approximate geodesic distance matrix

L-Isomap reduces both the time and space complexities of Isomap significantly. For geodesic distance estimation, instead of calculating

Like Isomap and L-Isomap, the time complexity of EL-Isomap is also mainly determined by geodesic distance estimation and MDS eigenvalue calculation. Therein, geodesic distances between landmarks and between nonlandmarks and the corresponding available landmarks (with size

Notice that the geodesic distance the matrix EL-Isomap requires storing is also an

To sum up, both time and space complexities of EL-Isomap are close to or even in excess of those of L-Isomap, and both have a considerable improvement compared with Isomap. That is to say, inherited from L-Isomap, EL-Isomap also owns low computational complexity, which can make the method also be comparable with the local approaches for manifold learning in this point.

Notice that as the size of the selected available landmark subset by EL-Isomap increases, the utilized information by EL-Isomap intends to be similar to those by L-Isomap. Then a problem arises: when the landmark subset utilized by EL-Isomap is selected degradedly to be the whole landmark set, how about the relationship between L-Isomap and EL-Isomap? The following theorem proves the equivalence of both methods in this degradation case.

In the case that the available landmark subset is selected as the whole landmark set, EL-Isomap is equivalent to L-Isomap.

In this degradation case, it is easy to deduce that

which is similar to the result calculated by L-Isomap according to (

The proof is completed.

So far, we have presented EL-Isomap through comparing with L-Isomap (and Isomap) in various viewpoints. Yet as mentioned above, the algorithm of the new method is incomplete since the landmark selection strategies (Steps 2 and 3.1) are still not introduced. To complete the algorithm, the next section is specialized to construct the reasonable strategies for landmark selection of EL-Isomap.

Theorems

Several strategies for landmark selection of Step 1 for L-Isomap have been presented. Two of the most typical ones are random choice method [

As mentioned in the beginning of this section, another consideration for EL-Isomap to designate landmark selection strategy is that the geodesic distances between these landmarks and other points should be estimated possibly faithfully. When the data manifold is globally isometric to a convex region of the low-dimensional Euclidean space, the geodesic distance between any data pair can approximately meet this requirement [

Then a confliction occurs: typicality principle tends to select landmarks possibly scattered all over the manifold, while faithfulness principle inclines to construct landmark set assembled to the central part of the manifold. To compromise both principles, the following algorithm for landmark selection provides a tradeoff: first to construct a subset in the approximate central part of the data manifold and then to choose the landmarks from the subset using random choice method or LASSO regression method. The details are given as follows.

In the above algorithm, the function of the convexity coefficient

Notice that in Step 2 of the algorithm only

The main aim of Step 3.1 of EL-Isomap algorithm is to specify the available landmarks, between which and the new input

(a) is a

In the algorithm, the reason why only

This section mainly aims at demonstrating performance of EL-Isomap on representational capacity extending, by comparisons with Isomap, L-Isomap, LLE, and Laplacian eigenmap. In particular, two series of simulations are designated: one is on data manifolds with intrinsic topologies of concave regions (concave case) and the other is on data manifolds with intrinsic loops (loop case).

The first applied data set in this series of simulations is composed of

2D embeddings of the crossed S-curve data calculated by LLE, Laplacian eigenmap, Isomap, L-Isomap, and EL-Isomap, respectively. The fuscous points shown in the top left figure are the adopted landmarks by L-Isomap and EL-Isomap.

From Figure

The second utilized data set contains 3000 points generated from a manifold with intrinsic 2D concave H-like region (as shown in Figure

2D embeddings of the H-form manifold data calculated by LLE, Laplacian eigenmap, Isomap, L-Isomap, and EL-Isomap, respectively. The fuscous points shown in the top left figure are the adopted landmarks by L-Isomap and EL-Isomap.

It is shown clearly in the figure that LLE and Laplacian eigenmap commonly preserve the local topologies of neighborhood regions of the original data. However, the global figures of the whole data set are not well revealed by both methods. LLE distorts the configuration of the left part of the manifold and Laplacian eigenmap pinches the global frame to the center of the manifold. The performances of Isomap and L-Isomap are very similar, both of which approximately keep the global structure of the whole manifold, except that the embeddings in four corner parts outspread abnormally. EL-Isomap alleviates this problem to a certain extent. In particular, the perpendicular relations between each of the three corner parts (except top right corner) and the central part can be observed from the embeddings calculated by EL-Isomap. This can be explained by the fact that less unfaithful estimated geodesic distances are applied by EL-Isomap than those by Isomap and L-Isomap, which guarantees that EL-Isomap more robustly finds reasonable embeddings of the data set.

The value preset for the desired number of the available landmarks corresponding to each new input has significant influence on the final performance of EL-Isomap. To intuitively exhibit this influence, the performance of EL-Isomap under different values of this parameter (from

2D embeddings of the H-form manifold data calculated by EL-Isomap as the numbers of available landmarks are set as

From Figure

From the above simulations, it can be observed that both L-Isomap and EL-Isomap expend the representational capability of the original Isomap to the manifolds with concave topologies, and EL-Isomap can be implemented effectively in a more extensive range of data manifolds. Other two cases on which L-Isomap and EL-Isomap are effective are the data manifolds as shown in Figures

(a), (b), (c), and (d) demonstrate

This section mainly demonstrates the representational capability of the new method on manifolds with intrinsic loops. The first two applied manifolds are cylinder and sphere manifolds (as shown in Figure

Demonstrations of cylinder and sphere manifolds.

2D embeddings of the cylinder manifold data calculated by LLE, Laplacian eigenmap, Isomap, L-Isomap, and EL-Isomap, respectively. The squared points shown in the top left figure are the adopted landmarks by L-Isomap and EL-Isomap.

2D embeddings of the sphere manifold data calculated by LLE, Laplacian eigenmap, Isomap, L-Isomap, and EL-Isomap, respectively. The squared points shown in the top left figure are the adopted landmarks by L-Isomap and EL-Isomap.

From Figure

The performances of five methods on sphere data are similar to those on cylinder data (as shown in Figure

A real-world data set intrinsically located on the manifold with loops is also adopted, which contains

2D embeddings of the pig image data calculated by LLE, Laplacian eigenmap, Isomap, L-Isomap, and EL-Isomap, respectively. The images shown on top of each embeddings are images corresponding to the circled points with increasing order of their first coordinates. The fuscous points shown in the embeddings got by L-Isomap and EL-Isomap are the adopted landmarks by the methods.

Figure

Then why is EL-Isomap successful on manifolds with loops? This problem can be explained as follows. EL-Isomap only considers two kinds of interpoint connections: the connections between data pairs located in the approximately central part of the manifold and those between the points in the noncentral part and some nearest ones in the central part, which intrinsically construct a connected component based on the process of Step 3.1. These connections implicitly construct a graph superimposed on the whole data set, which is intrinsically utilized by EL-Isomap. A nice character of this graph is that it implicitly breaks the essential loops existing in the underlying manifold, which intrinsically leads to the effectiveness of EL-Isomap on manifold with loops.

Based on its reasonability theory, the performance of EL-Isomap highly depends on the selection of the landmark set (Step 1) and the available landmark subset to the new input (Step 3.1). Essentially speaking, the aims of two steps are to select landmarks so as to make the geodesic distance matrix of the landmark set and the geodesic distance vector between the available landmarks and the new input comply with Euclidean condition possibly, that is, according to the “faithfulness principle” mentioned in Section

A direct way to solve this problem is to calculate the maximal submatrix, to let it be an approximate Euclidean matrix, from the geodesic distance matrix of the whole data set. In fact, some methods have been presented to solve this problem theoretically [

Note that how to select a proper neighborhood size

In this paper, we have proposed a new manifold learning method called EL-Isomap. The proposed method mainly has twofold contributions. On one hand, it possesses the advantage of the previous local approaches on computational efficiency. On the other hand, it inherited the advantage of the current global approaches on representation capability. Particularly, originated from L-Isomap, EL-Isomap naturally has a similar computational efficiency property to L-Isomap, and, through constructing reasonable strategies for landmark selection, EL-Isomap significantly extends the effective range of data manifolds. Another contribution of this work is to give the reasonability theory, that is, accuracy and robustness of EL-Isomap, which provides theoretical foundation for the new method. The computational complexity of the new method and the relation between it and L-Isomap have also been analyzed. Through extensive experiments implemented on synthetic and real-world data sets, the capability of EL-Isomap has been verified to outperform the state-of-the-art approaches along this line, especially for manifolds with complex shapes.

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work was supported by National Natural Science Foundation of China (Grant nos. 61373114, 11131006, 91330204, and 11471006), National Basic Research Program of China (973) (Grant no. 2013CB329404), and Industrial Project of Shaanxi Province (Grant no. 2013K06-03).