ISRN.SIGNAL.PROCESSING ISRN Signal Processing 2090-505X Hindawi Publishing Corporation 417492 10.1155/2013/417492 417492 Review Article A Review of Subspace Segmentation: Problem, Nonlinear Approximations, and Applications to Motion Segmentation Aldroubi Akram Nappi M. A. Wang J.-G. Department of Mathematics Vanderbilt University, Nashville, TN 37212 USA vanderbilt.edu 2013 13 2 2013 2013 04 11 2012 20 12 2012 2013 Copyright © 2013 Akram Aldroubi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The subspace segmentation problem is fundamental in many applications. The goal is to cluster data drawn from an unknown union of subspaces. In this paper we state the problem and describe its connection to other areas of mathematics and engineering. We then review the mathematical and algorithmic methods created to solve this problem and some of its particular cases. We also describe the problem of motion tracking in videos and its connection to the subspace segmentation problem and compare the various techniques for solving it.

1. Introduction

The subspace clustering problem is fundamental in many engineering and mathematics applications . It can be described as follows: let 𝒰=i=1MSi be the nonlinear set consisting of a union of M subspaces {Si}i=1M of a Hilbert or a Banach space . Let W={wj}j=1N be a set of data points drawn from 𝒰. The subspace segmentation (or clustering) problem is then to determine 𝒰 (equivalently determine Si for i=1,,M), from the data W={wj}j=1N, that is, to

determine the number of subspaces M;

find an orthonormal basis for each subspace Si, i=1,,M;

group the data points belonging to the same subspace into the same cluster.

The data W is often corrupted by noise; it may have outliers or some of the data vectors wW may have missing entries. Therefore, any technique for solving the subspace segmentation problem above must be robust and stable for the aforementioned nonideal cases.

Depending on the application, the space can be finite or infinite dimensional. For example, the set of all two dimensional images of a given face i, obtained under different illuminations and facial positions, can be modeled as a set of vectors belonging to a low dimensional subspace Si living in a higher dimensional space =D . For this case, a set of such images from M different faces is a union 𝒰=i1MSi. Another application in which a union of subspaces provides a good model is the problem of motion tracking of rigid objects in videos. For this situation (further developed below), a 4-dimensional subspace is assigned to each moving object in a space =2F, where F is the number of frames in the video. Examples where is infinite dimensional arise in sampling theory, and in learning theory . For example, signals with finite rate of innovations are modeled by a union of subspaces that belongs to an infinite dimensional space such as L2(D) [2, 3, 20, 21].

1.1. Known Number of Subspaces <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M26"><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula> and Dimensions <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M27"><mml:mrow><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>

In some subspace segmentation problems, the number M of subspaces or the dimensions of the subspaces {di}i=1M are known or can be estimated [1, 8, 22, 23]. In these cases, the subspace segmentation problem, for both the finite and infinite dimensional space cases, can be formulated as follows.

Let be a Hilbert space, W={f1,,fm} a finite set of vectors in , 𝒞 a family of closed subspaces of , and 𝔖 the set of all sequences of elements in 𝒞 of length M (i.e., 𝔖=𝔖(M)={{S1,,SM}:Si𝒞,1iM})  . The subspace segmentation problem formulation as a minimization problem is as follows.

Problem 1 (optimization formulation of the subspace segmentation problem).

( 1 ) Given a finite set W, a number p with 1p, and a fixed integer M1, find the infimum of the expression (1)e(W,S):=fWmin1jMdp(f,Sj), over S={S1,,SM}𝔖, and d(x,y):=x-y.

( 2 ) Find a sequence of M-subspaces So={S1o,,SMo}𝔖 (if it exists) such that (2)e(W,So)=inf{e(W,S):S𝔖}.

An example in finite dimensions is when =D and 𝒞 is the family of all subspaces of D of dimensions d no greater than rD. For this case, when M=1, and p=2, this is a well-known least square problem that can be solved using the singular value decomposition technique . An example in infinite dimensions is when =L2(D) and 𝒞 is a family of closed, shift-invariant subspaces of L2(D) that are generated by at most r< generators . Typical shift-invariant spaces with one generator are for example the space of bandlimited functions, generated by integer shifts of the generator function sinc(x)=sin(x)/x. Other important shift invariant spaces are the spline spaces n generated by the B-spline functions βn of degree n [25, 26]. In these cases the subspaces in 𝒞 are also infinite dimensional subspaces of L2. Thus, even in the case where M=1 and p=2, this (least squares) problem is much more difficult than its finite dimensional counterpart. It should be noted that when M>1 and for any 0<p Problem 1 is neither linear nor convex [27, 28]. In the presence of outliers, it has been proven that the best value for p is p=1 [27, 28], and a good choice for light-tailed noise is p=2. There are more general versions of Problem 1, for example, the Hilbert space can be replaced by a Banach space ; moreover, the family 𝒞 can be replaced by the more general type of family 𝒞1×𝒞2,,𝒞M .

1.2. Applications and Connection to Other Areas

The subspace segmentation problem has connections to several active areas of research, including learning theory, compressed sampling, and signal processing in general [2, 3, 17, 21, 2932]. Moreover, it is relevant to several computer vision applications including motion tracking of rigid objects in videos and facial recognition [1, 4, 14, 3338].

1.2.1. Connection to Compressed Sampling

In compressed sampling, the goal is to find an unknown vector xD from a small set of linear measurements {yi=aiTx}i=1m, mD, where ai are known sampling vectors. Clearly, this problem has a solution only if some extra information is known about x and if the sampling vectors ais are well chosen. In compressed sampling, the assumption is that, in a suitable basis, or frame, the unknown vector x is s-sparse or nearly s-sparse (compressible), with sD [30, 3944]. This means that in a suitable basis or frame the vector x has at most s nonzero components, or, in the compressible assumption, that x has at most s large components. This sparsity assumption (or compressibility assumptions) implies that the vector x must belong (or must be close to) a union of subspaces of dimensions at most s. Thus, finding the sparse model for a class of signals can be obtained by solving the subspace segmentation problem in the special case where =RD, and where the 𝒞 is the class of subspaces of D of dimensions at most s, and M=D(Ds) [45, 46].

1.2.2. Connection to Learning Theory and Data Mining

In many learning theory problems, a class of data may form a complex structure embedded in a high dimensional space D . In the neighborhood of each data point, the structure may be modeled by a local tangent space, or a union of tangent spaces whose dimensions are much smaller than the dimension of the ambient space D . The global shape of the data model can then be obtained from the observed data points by solving Problem 1.

1.2.3. Connection to Signal Processing

In signal processing, signals are often modeled by an infinite dimensional shift-invariant subspace of L2(d) [15, 5461]. For example, the classical shift-invariant space is the space of bandlimited functions PW1/2, also known as the Paley-Wiener space . This is the space generated by the function sinc=sin(πx)/πx and its integer shifts. Multiresolution and wavelet spaces are also shift-invariant spaces that are often used in signal processing applications. Choosing the model for a class of signals can be cast in terms of finding the solution of 1 from observed data. Unlike the compressed sampling or learning theory discussed earlier, in this situation the class 𝒞 consists of infinite dimensional subspaces of =L2 and therefore are more difficult to deal with even for a single shift-invariant subspace model (M=1) . The case in which a signal model is not a single subspace but a union of several of such subspaces is natural as in the case of signals with finite rate of innovation .

1.2.4. Application to Motion Tracking in Video

The problem of tracking rigid moving objects in a video can be formulated as a subspace segmentation problem [33, 35, 7477]. Let xj(p),  yj(p) be the Cartesian coordinates of a point p of a moving object O in frame j of a video. By concatenating all the coordinates of p into a single vector (3)wp=(x1(p),y1(p),x2(p),y2(p),,xF(p),yF(p))t, we obtain the so-called  trajectory vector of p whose length is D=2F where F is the number of frames in the video. It can be shown that, for rigid bodies, the trajectories of any point of object O belong to a subspace SO of D of dimensions no greater than 4. Thus, if W={w1,,wN} is a set of trajectory vectors from a set video of M moving objects (background is one such objects), then the set W belongs to a union of M subspaces of dimensions at most 4. Thus, solving the subspace segmentation problem in this situation consists in using the data W to find the subspaces, and then grouping together the trajectory vectors wi that belong to the same objects according to the subspace they belong to. It can also be shown that human facial motion and other nonrigid motions can be approximated by linear subspaces .

1.2.5. Application to Face Recognition

It has been shown that the set of all two-dimensional images of a given face i, obtained under different illuminations and facial positions, can be modeled as a set of vectors belonging to a low dimensional subspace, Si, living in a higher dimensional space D . A set of such images from different faces are then a union {Si}i=1M, where each face is associated with a give face.

1.3. Dimensionality Reduction

Since the data W may live in a very high dimensional space D, but 𝒰=i=1MSi may consist of spaces Si with dimension diD and MD, the subspace clustering problem can be solved in a smaller dimensional space De, the  effective dimension. Specifically, if i=1MdiD, then the data W can be projected on a space of dimension De=i=1Mdi, where the projection is not necessarily an orthogonal projection, but any “good” linear process that maps the data to another (low dimensional) space, for example, random projection . As a result of projecting 𝒰=i=1MSiD and the data W, we get the set 𝒰~=i=1MS~iDe and the data W~De. It is now possible to solve the subspace segmentation problem with data W~De instead of W and use the segmentation in the low dimensional space to solve the original problem. This dimensionality reduction technique can be very effective and is often used in conjunction with the subspace segmentation problem [12, 83].

2. Algebraic Methods for Finite Dimensional Noise Free Case

The general subspace segmentation problem described in Section 1 does not yet have a good approach for solving it. In the ideal case where no noise is present, there are several algebraic methods that can solve this problem as will be described below. However, in realistic situations when noise, outliers, and corrupted data are present, there are no satisfactory algorithms for finding the solution, even in the finite dimensional case when =D. The difficulties are both theoretical as well as computational, as will be further described below.

In the ideal case, when =D, and the data W={w1,,wN} is drawn from a finite union of subspaces {Si}i=1M, the general problem can be solved using algebraic methods. Obviously, there must be enough data points. In particular, it is necessary that for each subspace Si there is a subset of data points of W that form a basis for Si. However, this is not sufficient. Consider for example the very simple case in which the data is drawn from a union of two subspaces S1S2 of 3 such that dimS1=dimS2=1. If we are supplied with two points, one from each line S1,S2, we will not be able to decide whether the data is drawn from a single subspace S=S1+S2 or from the union S1S2. However, if we are supplied with enough points belonging to S1 and enough belonging to S2, the structure S1S2 becomes apparent.

2.1. Reduced Row Echelon Form Method

One of the recent algebraic methods for solving the noise free subspace segmentation problem under the independent subspace restriction is the reduced row echelon form (RREF) method . This method is a generalization of the method of Gear who observed that, for four dimensional subspaces, the reduced echelon form can be used to segment motions in videos . It turns out that in the noise free case the reduced row echelon form method can completely solve the subspace segmentation in almost its most general version.

The RREF is based on the familiar Gauss elimination techniques for solving linear systems of equation. However, for this method to work, certain assumptions on the data and the subspaces are needed. Specifically, there must be enough data to cover all the dimensions of the union of subspaces 𝒰=iISi from which the data is drawn. Moreover, the susbpaces {Si}i=1M must be independent. To make these assumptions precise, we make the following definitions.

Definition 1 (generic data).

Let S be a linear subspace of D with dimension d. A set of data W drawn from SD with dimension d are said to be generic, if (i) |W|>d, and (ii) every d vector from W forms a basis for S.

Definition 2 (independent subspaces).

Subspaces {SiD}i=1n are called independent if dim(S1++Sn)=dim(S1)++dim(Sn).

Independent subspaces have the property that SiSj={0} for ij. The converse, however, is false, for example, three subspaces S1,  S2,  S3 in 2 with dimSi=1,i=1,2,3 can never be independent. More generally, if {SiD}i=1n are independent, then i=1ndim(Si)D and SiSj={0} for ij.

If we knew the subspaces Si, it would be easy to partition the data W into the partition P(W)={W1,,WM} such that WiSi. Conversely, if we knew a partition P(W)={W1,,WM} of the data W such that the set Wi comes from the same subspace Si, then we would set Si=spanWi and our problem subspace segmentation would be solved.

However, all we are given is the data W, and we do not know the partition P(W). Thus, solving the subspace segmentation problem amounts to finding the partition P(W)={W1,,WM} of W. To do this, we construct a matrix W=[w1,,wN] whose columns are the data vectors wiD. The matrix W is a D×N matrix, where D may be large, while the rank of W is often much smaller (noise free case). Using the three elementary row operation used in Gaussian elimination, we transform W to its reduced row echelon form rref(W)(4)rref(W)=[R0], where R is an r×N matrix and where r is the rank of W. By setting to the value 1 all nonzero coefficients in rref(W) we obtain the so-called  binary reduced echelon form of W denoted by Brref(W). The Binary reduced row echelon form of W has a structure that allows us to easily find the partition P(W)={W1,,WM} and thereby solve the subspace segmentation problem as Theorem 3 below suggests .

Theorem 3.

Let {Si}i=1M be a set of nontrivial linearly independent subspaces of D. Let W=[w1,,wN]D×N be a matrix whose columns are drawn from i=1MSi. Assume the data is drawn from each subspace and that it is generic. Let Brref(W) be the binary reduced row echelon form of W. Then

the inner product ei,bj of a pivot column ei and a nonpivot column bj in Brref(W) is one, if and only if the corresponding column vectors {wi,wj} in W belong to the same subspace Sl for some l=1,,M;

moreover, dim(Sl)=bj1, where bj1 is the l1-norm of bj;

finally, wpSl if and only if  bp=bj or bp,bj=1.

This theorem suggests a very simple yet effective approach to clustering the data points (Algorithm 1) and solves the subspace segmentation problem. This is done by finding a partition {W1,,WM} of the data W into M clusters such that spanWl=Sl,i=1,,M. The clusters can be formed as follows: pick a nonpivot element bj in Brref(W), and group together all columns bp in Brref(W) such that bj,bp>0. Repeat the process with a different nonpivot column until all columns are exhausted. This is detailed in Algorithm 1 below.

<bold>Algorithm 1: </bold>Subspace segmentation—row echelon form approach—no noise.

Require: D × N data matrix W.

1: Find rref (W) of W.

2: Find Brref (W) of Wby setting all non-zero entries of rref (W) to 1.

3: for allj from 1 to Ndo

4:  Pick the jth column bj of Brref (W).

5:  ifbj is pivot then

6:   continue

7:  end if

8:  for alli from 1 to j−1 do

9:  ifbi is non-pivot and bi,bj > 0 then

10:   Place {bi,bj} in the same cluster Ci.

11:   break

12:  end if

13: end for

14: end for

15: for allCido

16:  Pick any bCi.

17:  Separate b into unit vectors ui1,,uidi. {These vectors form a basis for a subspace

Si with dimension di.}

18:  for allk from 1 to Ndo

19:   ifbk{ui1,,uidi}then

20:    Place bk in the same cluster Ci. {This is for handling pivot columns.}

21:   end if

22:  end for

23:  Place the corresponding columns in Winto the same clusterWi.

24: end for

25: Renumber indices i’s of Si starting from 1.

Note that, we do not need to know the number of subspaces M nor do we need to know the dimensions of the subspaces Si for solving the subspace segmentation problem in this case. M and dim(Si) are an output of the algorithm. The only assumption is that there are enough data points and that they are well distributed (they are generic), and that the subspaces Si are independent.

For noisy data, the reduced row echelon form method does not work, and a thresholding must be applied. However, the effect of the noise on the reduced echelon form method depends on the noise level and the relative positions of the subspaces. This dependence has been analyzed in .

2.2. The Generalized Principle Component Analysis GPCA

Another algebraic method for solving the subspace segmentation problem is the so-called  generalized principle component analysis (GPCA) [12, 85]. Although the most general form of this method solves the subspace segmentation problem in its entire generality for finite dimensions, we will only describe the idea behind the GPCA method in the simplified case where the number of subspaces M is known and when the subspaces are hyperplanes in D, that is, their dimensions is D-1. For this case, each subspace Si can be described by its normal vector bi=(bi1,,biD)T, and every data point xSi satisfies the linear equation biTx=0 where x=(x1,,xD)T. Thus, a data point xW drawn from the union of subspaces 𝒰=i=1MSi must satisfy the polynomial equations (5)k=1M(bkTx)=0. The product k=1M(bkTx) is in fact a homogeneous polynomial (6)p(x)=αcαix1α1x2α2,,xDαD of degree M, where α=(α1,αD), kαk=M (αk integers). Thus, if x𝒰 it must satisfy the equation p(x)=0. Hence, in order to solve the subspace segmentation problem for this case, we must

find the polynomial p by finding the values of its coefficients cα. This is done by creating a system of N linear equations in the unkown {cα} by setting p(wi)=0,i=1,,N for each data wiW. If the number of data points is generic, then the solution of the system of equations determines the polynomial p;

once the polynomial p is determined, it must be factored into its product p(x)=k=1M(bkTx). The vectors bk can then be found by identification. The subspaces Sk in the unions 𝒰 are thus determined.

A modification of the GPCA method described in the previous section works for the general subspace segmentation in which neither the dimensions of the subspace nor their number is described [12, 85]. However, as in the case of RREF method, this method cannot work directly when noise is present and some modification is needed in the presence of noise and outliers as described in [12, 85].

3. Optimization Methods and Subspace Segmentation in the Presence of Noise

The algebraic methods discussed in the previous section do not work without modification for the case in which the data is corrupted by noise or outliers. Even with some of the adjustments to take care of noisy environment, the algebraic algorithms do not perform well when the noise is not small. Algorithms rated according to their simplicity, computational speed, and their performance in nonideal situations. Thus, algebraic methods or their modifications may be the algorithms of choice if the noise is small and computational speed is the main requirement. However, when noise is relatively large and accuracy is important, other methods are needed. In this section we discuss other methods that are robust to noise and other inaccuracies in the data.

One of the methods for the subspace segmentation problem when noise is present is typified by Problem 1. Minimizing the functional described in Problem 1 amounts to finding the union of subspaces that is nearest to the data. However, some a priori knowledge of the number of subspaces M and the dimensions of the subspaces Si may be necessary. The cost function can be modified to incorporate a cost that depends on the number of subspaces M and their dimensions, if these quantities are unknown. But before getting into algorithms for solving Problem 1, the existence of a minimizer is a theoretical question of interest. Thus, we start by some of the results pertaining to this issue.

3.1. Existence of a Minimizer to Problem <xref ref-type="statement" rid="problem1">1</xref>

Given a family 𝒞 of closed subspaces of , a solution to Problem 1 may not exist even in the linear case when M=1. For example, assume that =R2 and 𝒞 is the set of all lines through the origin except the line x=0. For this case, a minimizer may exist for certain distribution of data points but not for others. The existence of a solution here means that a minimizer exists for any distribution of any finite number of data points. We will describe the existence results when is a Hilbert space. The case when is not a Hilbert space is very difficult and only partial results are known.

It turns out that the existence of a minimizing sequence of subspaces So={S1o,,Slo} that solves Problem 1 is equivalent to the existence of a solution to the same problem but for M=1 .

Theorem 4.

Problem 1 has a minimizing set of subspaces for all finite sets of data and for any M1 if and only if it has a minimizing subspace for all finite sets of data and for M=1.

Therefore, the following definition is useful.

Definition 5.

A set of closed subspaces 𝒞 of a separable Hilbert space have the minimum subspace approximation property (MSAP) if for every finite subset W there exists an element S𝒞 that minimizes the expression (7)e(W,S)=fWd2(f,S),S𝒞.

Using this terminology, Problem 1 has a minimizing sequence of subspaces if and only if 𝒞 satisfies the MSAP. If =D and 𝒞={SD:dimSs}, then 𝒞 satisfies MSAP. This fact is easy to prove directly and is in fact a consequence of the Eckart-Young theorem . Another situation is when =L2(d) and 𝒞=n={S:S=span{φ1(x-k),,φn(x-k):kd}¯} is the set of all shift-invariant spaces of length at most n. For this last case, a result in  implies that 𝒞=n satisfies the MSAP.

In order to understand the general case, we identify each subspace S𝒞 with the orthogonal projector Q=QS whose kernel is exactly S (i.e., Q=I-PS, where PS is the orthogonal projector on S). Now we can think of 𝒞 as a set of projection operators and endow it with the induced weak operator topology. This setting allows us to give the necessary and sufficient conditions for a class 𝒞 to have the MSAP property for the case when p=2 in (1). Note that it is sufficient that 𝒞 is closed in order for 𝒞 to have the MSAP. However, this condition is too strong as the following example shows: let =3 and consider the set 𝒞=𝒞1𝒞2 which is the union of the plane 𝒞1=span{e1,e2} and the set of lines 𝒞2=v{span{v}:v=e3+ce2,forsomec}. Then 𝒞 (identified with a set of projectors as described earlier) is not closed (since Qspan{e2}𝒞). However, it is easy to show that this set satisfies the MSAP, since if the infimum in (1) is achieved by the missing line given by span{e2}, it is also achieved by the plane 𝒞1.

For finite dimensions, the weak operator and strong operator topologies are the same and the characterization of the MSAP can be obtained in terms of the convex hull of the family 𝒞+ consisting of 𝒞 together with the positive semidefinite operators added to it. Recall that the convex hull co(A) of a set A is the smallest convex set containing A, that is, co(A) is the intersection of all convex sets containing A. For finite dimensions, the following theorem give the necessary and sufficient conditions for the MSAP property and hence the necessary and sufficient conditions for the existence of a solution to Problem 1, when p=2 in (1).

Theorem 6.

Suppose has dimension d<. Then the following are equivalent

𝒞 satisfies MSAP;

𝒞+ is closed;

co(𝒞+)=co(𝒞+¯).

The necessary and sufficient conditions in infinite dimensions for the existence of solutions when p=2 can be found in , but are much more complicated. However, no such results are known for the existence of solution to Problem 1 when p2.

3.2. Search Algorithms for Problem <xref ref-type="statement" rid="problem1">1</xref>

Searching for a solution to Problem 1 is easier when M=1 since this problem is then a linear problem. Using an algorithm 𝒜1 for solving this simpler problem, the more difficult problem when M>1 can be solved by using 𝒜1 multiple times in an iterative algorithm as follows.

Let 𝒫(W) be the set of all partitions of the data W, that is, P𝒫(W) if P0={W1,,WM} is such that WiWj= when ij, k=1MWk=W.

Let P={W1,,WM} be a partition of the data W. For each i=1,,M, use Algorithm 𝒜1 to find the subspace Sio(P)C that is nearest to Fi in the sense that it minimizes e(Wi,S)=wWidp(w,S). We obtain a sequence of subspaces S={S1o(P),,SMo(P)}.

Construct a new partition Q(S) by reassigning each data point wW to its nearest subspace from {S1o,,SMo} and by grouping together those points that are assigned to the same subspace.

Iterate between the two steps as described in Algorithm 2.

<bold>Algorithm 2: </bold>Optimal solution <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M411"><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="bold-italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>.

1: Pick any partition P𝒫 (W)

2: For each subset Wi in the partition P find the subspace Sio(P)𝒞

that minimizes the expression e(Wi,S) = wWidp(w,S)

3: whilei=1Me(Wi,Sio(P))>e(W,So(P))do

4:  for alli from 1 to Mdo

5:   Update Wi={wW:d(w,Sio(P))d(w,Sko(P)),k=1,,M}

6:   Update Sio (P) =argminS𝒞e(Wi,S)

7:  end for

8:  Update P={W1,,WM}

9: end while

10: So = {S1o(P), …,SMo(P)}

It can be shown that this algorithm always converges. However, the convergence may be a local minima instead of the global one. For this reason, a good initial partition is important. This initial partition can be supplied by some modified version of the algebraic methods described in the previous section, for example.

There are many iterative algorithms for finding a solution to the subspace segmentation problem or some of its special cases (see, e.g., [86, 87]). Most of them iterate between partitioning the data and finding the union of subspaces that is consistent with the partition. The general algorithm described below solves the subspace segmentation problem by searching for the minimizer of Problem 1.

Note that the cost functions e(Wi,Sio(P)) and e(W,So(P)) in the while loop of Algorithm 2 are the one defined by (1) in Problem 1, but correspond to e(Wi,Sio(P)) for M=1.

Step 2 in Algorithm 2 is problem dependent. For example, in the situation where =D and 𝒞 is the set of subspaces of dimensions no greater than n, Step 2 can be solved using the singular value decomposition (SVD). A similar algorithm works in a much more general context as described in .

4. Motion Segmentation

The problem of motion segmentation has been described in Section 1.2.4. This problem is a special case of subspace segmentation in which =D and 𝒞 is the family of subspaces of dimensions no bigger than 4. There are many algorithms that have been developed to solve this problem, such as the methods based on sparsity [10, 8890], the algebraic methods, [1, 12, 91], the statistical methods [76, 9295], and the iterative methods [22, 86]. The most successful methods however are all based on the spectral clustering or some related method [22, 34, 36, 96, 97]. The main idea is that a similarity matrix Ξ is used to describe the “connection” between the points. Once this similarity matrix is obtained a classical clustering technique (such as the k-means) is applied to a projection of the similarity matrix Ξ on a low dimensional space (here projection is used loosely and is not necessarily an orthogonal projection). These methods are often tested and compared to the state-of-the-art methods on the Hopkins 155 Dataset , which serves as benchmark database to evaluate motion segmentation algorithms. It contains two and three motion sequences. Cornerness features that are extracted and tracked across the frames are provided along with the dataset. The ground truth segmentations are also provided for comparison. Figure 1 shows two samples from the dataset with the extracted features.

Samples from the Hopkins 155 Dataset.

4.1. Nearness to Local Subspace Algorithm

Since most spectral clustering algorithms use similar overall structure, we describe the Nearness to Local Subspace (NLS) algorithm, which is the most performant of the spectral clustering type methods as applied to the Hopkins 155 Dataset. Other spectral clustering based algorithms will be discussed in Section 4.2.

The NLS method works whenever the dimensions of the subspaces are equal and known. First, a local subspace is estimated for each data point (vector). Then, the distances between the local subspaces and points are computed and a distance matrix is generated. This is followed by construction of a binary similarity matrix Ξ constructed by applying a data-driven threshold to the distance matrix. Finally, the segmentation problem is converted to a one-dimensional data clustering problem.

The algorithm for subspace segmentation is given in Algorithm 3. It assumes that the subspaces have dimension d (for motion segmentation, d=4). The details of the various steps are as follows.

<bold>Algorithm 3: </bold>Subspace segmentation.

Require: The D×N data matrix W whose columns are drawn from subspaces of dimension d

Ensure: Clustering of the feature points.

1: Compute the SVD of  W as in (8).

2: Estimate the rank of W (denoted by r) if it is not known. For example, using (9) or any other appropriate choice.

3: Compute (Vr)t consisting of the first r rows of Vt.

4: Normalize the columns of (Vr)t.

5: Replace the data matrix W with (Vr)t.

6: Find the angle between the column vectors of W and represent it as a matrix.

{i.e., arccos (WtW).}

7: Sort the angles and find the closest neighbors of column vector.

8: for all Column vector xi of W  do

9:  Find the local subspace for the set consisting of xi and k neighbors (see (10)).

{Theoretically, k is at least d-1. We can use the least square approximation for the subspace

(see the section Local Subspace Estimation). Let Ai denote the matrix whose columns form

an orthonormal bases for the local subspace associated with xi.}

10:end for

11:fori=1 to Ndo

12:  forj=1 to Ndo

13:   define H=(dij)=(xj-Aitxjp+xi-Ajtxip)/2

14:  end for

15: end for{Build the distance matrix}

16: Sort the entries of the N×N matrix H from smallest to highest values into the vector h and set the threshold η to

the value of the Tth entry of the sorted and normalized vector h, where T is such that

χ[T,N2]-h2is minimized, and where χ[T,N2] is the characteristic function of the discrete set [T,N2].

17: Construct a similarity matrix Ξ by setting all entries of H less than threshold η to 1 and by setting all

other entries to 0. {Build the binary similarity matrix}

18: Normalize the rows of Ξ using l1-norm.

19: Perform SVD Ξt = UnΣn(Vn)t.

20: Cluster the columns of Σn(Vn)t using k-means. Σn(Vn)t is the projection on to the span of  Un.

Dimensionality Reduction and Normalization. A dimensionality reduction step is typical in any algorithm, including those using spectral clustering. Let W be a D×N data matrix whose columns are drawn from a union of subspaces, where each subspace has dimensions at most d. The data W is possibly perturbed by noise and may have other imperfections. One way to reduce the dimensionality of the problem is to use SVD. Specifically, compute the SVD of W(8)W=UΣVt, where U=[u1u2uD] is an m×D matrix, V=[v1v2vN] is an N×N matrix, and Σ is a D×N diagonal matrix with diagonal entries σ1,,σl, where l=min{m,N}.

If the rank of the data W is not known, one can use the modal selection algorithm  to estimate its rank r by (9)r=argminrσr+12i=1rσi2+κr, where σj is the jth singular value and κ is a suitable constant. Another possible model selection algorithm can be found in . UrΣr(Vr)t is the best rank-r approximation of W=UΣVt, where Ur refers to a matrix that has the first r columns of U as its columns and Vr refers to the first r rows of Vt. In the case of motion segmentation, if there are m independent motions across the frames captured by a moving camera, the rank of W is between 2(m+1) and 4(m+1).

To reduce the dimensionality of the data, replace the data matrix W with the matrix (Vr)t that consists of the first r rows of   Vt. This step is justified by the following proposition in .

Proposition 7.

Let A and B be m×n and n×k matrices. Let C=AB. Assume J{1,2,,k}.

If bispan{bj:jJ} then cispan{cj:jJ}.

If A is full rank and mn then bispan{bj:jJ}cispan{cj:jJ}.

It should also be noted that this step reduces additive noise as well, especially in the case of light-tailed noise, for example, Gaussian noise. The number of subspaces corresponds to the number of moving objects. Dimensionality reduction corresponds to Steps 1, 2, and 3 in Algorithm 3.

Another type of data reduction is normalization. Specifically, the columns of (Vr)t are normalized to lie on the unit sphere 𝕊r-1. This is because by projecting the subspace on the unit sphere we effectively reduce the dimensionality of the data by one. Moreover, the normalization gives equal contribution of the data matrix columns to the description of the subspaces. Note that the normalization can be done by using lp norms of the columns of (Vr)t. This normalization procedure corresponds to Steps 4 and 5 in Algorithm 3.

Local Subspace Estimation. The data points (i.e., each column vector of (Vr)t) that are close to each other are likely to belong to the same subspace. For this reason, a local subspace is estimated for each data point using its closest neighbors. This can be done by generating a distance matrix (aij)=(||xi-xj||p) and then sorting each column of the distance matrix to find the neighbors of each xi, which is the ith column of (Vr)t.

Once the distance matrix between the points is generated, one can find, for each point xi, a set of k+1d points {xi,xi1,...,xik} consisting of xi and its k closest neighbors. Then a d-dimensional subspace that is nearest (in the least square sense) to the data {xi,xi1,...,xik} is generated. This is accomplished using SVD (10)X=[xi,xi1,,xik]=AΣBt. Let Ai denote the matrix of the first d columns of A associated with xi. Then, the column space C(Ai) is the d-dimensional subspace nearest to {xi,xi1,,xik}. Local subspace estimation corresponds to Steps 6 to 10 in Algorithm 3.

Construction of Binary Similarity Matrix. So far, we have associated a local subspace Si to each point xi. Ideally, the points and only those points that belong to the same subspace as xi should have zero distance from Si. This suggests computing the distance of each point xj to the local subspace Si and forming a distance matrix H.

The distance matrix H is generated as H=(dij)=(||xj-Aitxj||p+||xi-Ajtxi||p)/2. A convenient choice of p is 2. Note that as dij decreases, the probability of having xj on the same subspace as xi increases. Moreover, for p=2, ||xj-Aitxj||2 is the Euclidean distance of xj to the subspace associated with xi.

Since we are not in the ideal case, a point xj that belongs to the same subspace as xi may have nonzero distance to Si. However, this distance is likely to be small compared to the distance between xj and Sk if xj and xk do not belong to the same subspace. This suggests that we compute a threshold that will distinguish between these two cases and transform the distance matrix into a binary matrix in which a zero in the (i,j) entry means xi and xj are likely to belong to the same subspace, whereas (i,j) entry of one means xi and xj are not likely to belong to the same subspace.

To do this, we convert the distance matrix H=(dij)N×N into a binary similarity matrix Ξ=(sij). This is done by applying a data-driven thresholding as follows.

Create a vector h that contains the sorted entries of HN×N from the smallest to the highest values. Scale h so that its smallest value is zero and its largest value is one.

Set the threshold η to the value of the Tth entry of the sorted vector h, where T is such that χ[T,N2]-h2 is minimized, and where χ[T,N2] is the characteristic function of the discrete set [T,N2]. If the number of points in each subspace is approximately equal, then we would expect N/n points in each subspace, and we would expect N2/n2 small entries (zero entries ideally). However, this may not be the case in general. For this reason, we compute the data-driven threshold η that distinguishes the small entries from the large entries. The data-driven threshold η is chosen according to the method described in .

Create a similarity matrix Ξ from H such that all entries of H less than the threshold η are set to 1 and the others are set to 0.

Segmentation. The last step is to use the similarity matrix Ξ to segment the data. To do this, we first normalize the rows of Ξ using l1-norm, that is, Ξ~=D-1Ξ, where D is a diagonal matrix (dij)=j=1Nsij. Ξ~ is related to the random walk Laplacian Lr(Ξ~=I-Lr) . Although other lp normalizations are possible for p1, however, because of the geometry of the l1 ball, l1-normalization brings outliers closer to the cluster clouds (distances of outliers decrease monotonically as p decreases to 1). Since SVD (which will be used next) is associated with l2 minimization, it is sensitive to outliers. Therefore l1 normalization works best when SVD is used.

Observe that the initial data segmentation problem has now been converted to segmentation of n 1-dimensional subspaces from the rows of Ξ~. This is because, in the ideal case, from the construction of Ξ~, if xi and xj are in the same subspace, the ith and jth rows of Ξ~ are equal. Since there are n subspaces, then there will be n 1-dimensional subspaces.

Now, the problem is again a subspace segmentation problem, but this time the data matrix is Ξ~ with each row as a data point. Also, each subspace is 1-dimensional and there are n subspaces. Therefore, we can apply SVD again to obtain (11)Ξ~t=UnΣn(Vn)t. Using Proposition 7, it can be shown that Σn(Vn)t can replace Ξ~t and we cluster the columns of Σn(Vn)t, which is the projection of Ξ~ onto the span of Un. Since the problem is only segmentation of subspaces of dimension 1, we can use any traditional segmentation algorithm such as k-means to cluster the data points. The segmentation corresponds to Steps 18 to 20 in Algorithm 3.

4.2. Other Spectral Clustering Methods

Other subspace clustering methods use essentially the same general algorithm as above, but the main difference is the construction of the similarity Ξ. For example, Yan and Pollefeys' method estimates a subspace Si for each point wi and then uses the cordal distance between the local subspaces to construct a similarity matrix Ξ. The algorithm of Elhamifar and Vidal [88, 89] uses the sparsity method to compute a similarity matrix based on sparse representations of the data W. The sparse representations are found using the standard 1 minimization techniques in compressed sampling. We have tested these algorithms using 2 minimizations and found that both cases produce essentially the same results. Thus, it is our conclusion that it is the spectral clustering performed on the similarity matrix Ξ that is the main reason for the good performance of this and other related algorithms.

4.3. Comparison of Motion Segmentation Algorithms

Tables 1, 2, and 3 display some of the experimental results for the Hopkins 155 Dataset. Seven approaches are compared for the motion detection algorithms: (1) GPCA , (2) RANSAC , (3) local subspace affinity (LSA) , (4) MLS , (5) agglomerative lossy compression (ALC) , (6) sparse subspace clustering (SSC) , and (7) NLS. An evaluation of those algorithms is presented in  with a minor error in the tabulated results for articulated three-motion analysis of SSC-N. SSC-B and SSC-N correspond to Bernoulli and normal random projections, respectively . Table 1 displays the misclassification rates for the two motions video sequences. Table 2 shows the misclassification rates for the three motion sequences, and Table 3 presents the misclassification rates for all of the video sequences. It can be seen that the NLS algorithm outperforms all of the algorithms.

Percentage of classification errors for sequences with two motions.

Checker (78) GPCA LSA RANSAC MSL ALC SSC-B SSC-N NLS
Average 6.09% 2.57% 6.52% 4.46% 1.55% 0.83% 1.12% 0.23%
Median 1.03% 0.27% 1.75% 0.00% 0.29% 0.00% 0.00% 0.00%

Traffic (31)

Average 1.41% 5.43% 2.55% 2.23% 1.59% 0.23% 0.02% 1.40%
Median 0.00% 1.48% 0.21% 0.00% 1.17% 0.00% 0.00% 0.00%

Articulated (11)

Average 2.88% 4.10% 7.25% 7.23% 10.70% 1.63% 0.62% 1.77%
Median 0.00% 1.22% 2.64% 0.00% 0.95% 0.00% 0.00% 0.88%

All (120 seq)

Average 4.59% 3.45% 5.56% 4.14% 2.40% 0.75% 0.82% 0.57%
Median 0.38% 0.59% 1.18% 0.00% 0.43% 0.00% 0.00% 0.00%

Percentage of classification errors for sequences with three motions.

Checker (26) GPCA LSA RANSAC MSL ALC SSC-B SSC-N NLS
Average 31.95% 5.80% 25.78% 10.38% 5.20% 4.49% 2.97% 0.87%

Traffic (7)

Average 19.83% 25.07% 12.83% 1.80% 7.75% 0.61% 0.58% 1.86%
Median 19.55% 23.79% 11.45% 0.00% 0.49% 0.00% 0.00% 1.53%

Articulated (2)

Average 16.85% 7.25% 21.38% 2.71% 21.08% 1.60% 1.60% 5.12%
Median 16.85% 7.25% 21.38% 2.71% 21.08% 1.60% 1.60% 5.12%

All (35 seq)

Average 28.66% 9.73% 22.94% 8.23% 6.69% 3.55% 2.45% 1.31%
Median 28.26% 2.33% 22.03% 1.76% 0.67% 0.25% 0.20% 0.45%

Percentage of classification errors for all sequences.

All (155 seq) GPCA LSA RANSAC MSL ALC SSC-B SSC-N NLS
Average 10.34% 4.94% 9.76% 5.03% 3.56% 1.45% 1.24% 0.76%
Median 2.54% 0.90% 3.21% 0.00% 0.50% 0.00% 0.00% 0.20%
Acknowledgment

This research is supported in part by NSF Grant DMS-110863.

Aldroubi A. Sekmen A. Nearness to local subspace algorithm for subspace and motion segmentation IEEE Signal Processing Letters 2012 19 10 704 707 6275471 10.1109/LSP.2012.2214211 Aldroubi A. Cabrelli C. Molter U. Optimal non-linear models for sparsity and sampling Journal of Fourier Analysis and Applications 2008 14 5-6 793 812 2-s2.0-57349099263 10.1007/s00041-008-9040-2 Ma Y. Yang A. Y. Derksen H. Fossum R. M. Estimation of subspace arrangements with applications in modeling and segmenting mixed data SIAM Review 2008 50 3 413 458 2-s2.0-50949096624 10.1137/060655523 Kriegel H.-P. Kroeger P. Zimek A. Subspace clustering WIREs Data Mining and Knowledge Discovery 2012 2 351 364 Nitzan S. Olevskii A. Revisiting Landau's density theorems for Paley-Wiener spaces Comptes Rendus Mathématique 2012 350 9-10 509 512 10.1016/j.crma.2012.05.003 Vidal R. Ma Y. Sastry S. Generalized principal component analysis IEEE Transactions on Pattern Analysis and Machine Intelligence 2005 27 12 1 15 Chen G. Little A. V. Maggioni M. Rosasco L. Cohen J. Zayed A. I. Some recent advances in multiscale geometric analysis of point clouds Wavelets and Multiscale Analysis 2011 Boston, Mass, USA Birkhäuser 199 225 10.1007/978-0-8176-8095-4_10 Vidal R. Subspace clustering IEEE Signal Processing Magazine 2011 28 3 52 68 Lyubarskii Y. Madych W. R. Interpolation of functions from generalized Paley-Wiener spaces Journal of Approximation Theory 2005 133 2 251 268 2-s2.0-14944365534 10.1016/j.jat.2004.12.014 Sugaya Y. Kanatani K. Improved multistage learning for multibody segmentation, in Proceedings of the 5th International Conference on Computer Vision Theory and Applications (VISAPP '10) 2010 199 206 Hu R. Fan L. Liu L. Co-segmentation of 3d shapes via subspace clustering Computer Graphics Forum 2012 31 1703 1713 Vidal R. Ma Y. Sastry S. Generalized principal component analysis (GPCA) IEEE Transactions on Pattern Analysis and Machine Intelligence 2005 27 1945 1959 Basri R. Jacobs D. W. Lambertian reflectance and linear subspaces IEEE Transactions on Pattern Analysis and Machine Intelligence 2003 25 2 218 233 2-s2.0-0037328517 10.1109/TPAMI.2003.1177153 Ho J. Yang M. H. Lim J. Lee K. C. Kriegman D. Clustering appearances of objects under varying illumination conditions Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition June 2003 I/11 I/18 2-s2.0-0042440805 Aldroubi A. Gröchenig K. Nonuniform sampling and reconstruction in shift-invariant spaces SIAM Review 2001 43 4 585 620 10.1137/S0036144501386986 Allard W. K. Chen G. Maggioni M. Multi-scale geometric methods for data sets II: geometric multi-resolution analysis Applied and Computational Harmonic Analysis 2012 32 3 435 462 10.1016/j.acha.2011.08.001 Soltanolkotabi M. Candés E. J. A geometric analysis of subspace clustering with outliers Annals of Statistics 2012 40 4 2195 2238 Sugaya Y. Kanatani K. Multi-stage optimization for multi-body motion segmentation Proceedings of IEICE Transactions on Information and Systems 2004 1935 1942 Szlam A. D. Maggioni M. Coifman R. R. Regularization on graphs with function-adapted diffusion processes Journal of Machine Learning Research 2008 9 1711 1739 2-s2.0-50949129496 Aldroubi A. Tessera R. On the existence of optimal unions of subspaces for data modeling and clustering Foundations of Computational Mathematics 2011 11 3 363 379 2-s2.0-79955688807 10.1007/s10208-011-9086-4 Petrik M. An analysis of laplacian methods for value function approximation in mdps Proceedings of the 20th International Joint Conference on Artificial Intelligence 2007 Morgan Kaufmann 2574 Aldroubi A. Sekmen A. Reduced row echelon form and non-linear approximation for subspace segmentation and high-dimensional data clustering submitted to Applied and Computational Harmonic Analysis Tron R. Vidal R. A benchmark for the comparison of 3-D motion segmentation algorithms Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07) June 2007 2-s2.0-34948881815 10.1109/CVPR.2007.382974 Eckart C. Young G. The approximation of one matrix by another of lower rank Psychometrika 1936 1 3 211 218 2-s2.0-0000802374 10.1007/BF02288367 Unser M. Aldroubi A. Eden M. B-spline signal processing. Part I. Theory IEEE Transactions on Signal Processing 1993 41 2 821 833 2-s2.0-0027541354 10.1109/78.193220 Unser M. Aldroubi A. Eden M. B-spline signal processing. Part II. Efficient design and applications IEEE Transactions on Signal Processing 1993 41 2 834 848 2-s2.0-0027542457 10.1109/78.193221 Chen G. Lerman G. Foundations of a multi-way spectral clustering framework for Hybrid Linear Modeling Foundations of Computational Mathematics 2009 9 5 517 558 2-s2.0-70349452039 10.1007/s10208-009-9043-7 Liu G. Lin Z. Yu Y. Robust subspace segmentation by low-rank representation Proceedings of 27th International Conference on Machine Learning (ICML '10) June 2010 663 670 2-s2.0-77956529193 De La Torre F. Black M. J. A framework for robust subspace learning International Journal of Computer Vision 2003 54 1–3 117 142 2-s2.0-0141742284 Candès E. Romberg J. Sparsity and incoherence in compressive sampling Inverse Problems 2007 23 3 969 985 10.1088/0266-5611/23/3/008 Haro G. Randall G. Sapiro G. Translated poisson mixture model for stratification learning International Journal of Computer Vision 2008 80 3 358 374 2-s2.0-52449104800 10.1007/s11263-008-0144-6 Eldar Y. C. Mishali M. Robust recovery of signals from a structured union of subspaces IEEE Transactions on Information Theory 2009 55 11 5302 5316 2-s2.0-70350743173 10.1109/TIT.2009.2030471 Kang S. Kwon K. H. Recovery of missing samples for oversampling in shift invariant spaces Journal of Mathematical Analysis and Applications 2012 391 1 139 146 10.1016/j.jmaa.2012.02.024 Yan J. Pollefeys M. A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and nondegenerate Proceedings of the 9th European Conference on Computer Vision 2006 94 106 Rao S. R. Yang A. Y. Sastry S. S. Ma Y. Robust algebraic segmentation of mixed rigid-body and planar motions from two views International Journal of Computer Vision 2010 88 3 425 446 2-s2.0-77951256374 10.1007/s11263-009-0314-1 Lin T. Zha H. Riemannian manifold learning IEEE Transactions on Pattern Analysis and Machine Intelligence 2008 30 796 809 Bregler C. Hertzmann A. Biermann H. Recovering non-rigid 3D shape from image streams Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '00) June 2000 690 696 2-s2.0-0033697408 Brand M. Morphable 3D models from video Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition December 2001 II456 II463 2-s2.0-0035693930 Candès E. J. The restricted isometry property and its implications for compressed sensing Comptes Rendus Mathématique 2008 346 9-10 589 592 10.1016/j.crma.2008.03.014 Candès E. J. Romberg J. Tao T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information IEEE Transactions on Information Theory 2006 52 2 489 509 2-s2.0-31744440684 10.1109/TIT.2005.862083 Foucart S. A note on guaranteed sparse recovery via 1-minimization Applied and Computational Harmonic Analysis 2010 29 1 97 103 2-s2.0-77955281521 10.1016/j.acha.2009.10.004 Boufounos P. Kutyniok G. Rauhut H. Sparse recovery from combined fusion frame measurements IEEE Transactions on Information Theory 2011 57 6 3864 3876 2-s2.0-79957632493 10.1109/TIT.2011.2143890 Aldroubi A. Chen X. Powell A. M. Perturbations of measurement matrices and dictionaries in compressed sensing Applied and Computational Harmonic Analysis 2012 33 2 282 291 10.1016/j.acha.2011.12.002 Sun Q. Recovery of sparsest signals via l(q)-minimization Applied and Computational Harmonic Analysis 2012 32 3 329 341 Aharon M. Elad M. Bruckstein A. K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation IEEE Transactions on Signal Processing 2006 54 11 4311 4322 2-s2.0-33750383209 10.1109/TSP.2006.881199 Aharon M. Elad M. Bruckstein A. M. On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them Linear Algebra and Its Applications 2006 416 1 48 67 2-s2.0-33646712150 10.1016/j.laa.2005.06.035 Lerman G. Zhang T. Robust recovery of multiple subspaces by geometric lp minimization The Annals of Statistics 2011 39 5 2686 2715 10.1214/11-AOS914 Archambeau C. Delannay N. Verleysen M. Mixtures of robust probabilistic principal component analyzers Neurocomputing 2008 71 7-9 1274 1282 2-s2.0-40649094216 10.1016/j.neucom.2007.11.029 Lu Y. M. Do M. N. A theory for sampling signals from a union of subspaces IEEE Transactions on Signal Processing 2008 56 6 2334 2345 10.1109/TSP.2007.914346 Jones P. W. Maggioni M. Schul R. Manifold parametrizations by eigenfunctions of the Laplacian and heat kernels Proceedings of the National Academy of Sciences of the United States of America 2008 105 6 1803 1808 10.1073/pnas.0710175104 Wu Q. Guinney J. Maggioni M. Mukherjee S. Learning gradients: predictive models that infer geometry and statistical dependence Journal of Machine Learning Research 2010 11 2175 2198 2-s2.0-77956910720 Zhang J. Huang H. Wang J. Manifold learning for visualizing and analyzing high-dimensional data IEEE Intelligent Systems 2010 25 4 54 61 5401149 10.1109/MIS.2010.8 Lauer F. Schnorr C. Spectral clustering of linear subspaces for motion segmentation Proceedings of IEEE International Conference on Computer Vision 2009 Aldroubi A. Unser M. Sampling procedures in function spaces and asymptotic equivalence with Shannon's sampling theory Numerical Functional Analysis and Optimization 1994 15 1-2 1 21 10.1080/01630569408816545 Unser M. Sampling—50 years after Shannon Proceedings of the IEEE 2000 88 4 569 587 2-s2.0-0034171723 10.1109/5.843002 Aldroubi A. Non-uniform weighted average sampling and reconstruction in shift-invariant and wavelet spaces Applied and Computational Harmonic Analysis 2002 13 2 151 161 2-s2.0-0242563915 10.1016/S1063-5203(02)00503-1 Anastasio M. Cabrelli C. Sampling in a union of frame generated subspaces Sampling Theory in Signal and Image Processing 2009 8 3 261 286 2-s2.0-76849116987 Xian J. Sun W. Local sampling and reconstruction in shift-invariant spaces and their applications in spline subspaces Numerical Functional Analysis and Optimization 2010 31 3 366 386 2-s2.0-77951988304 10.1080/01630561003760128 Bhandari A. Zayed A. I. Shift-invariant and sampling spaces associated with the fractional Fourier transform domain IEEE Transactions on Signal Processing 2012 60 4 1627 1637 10.1109/TSP.2011.2177260 Kushnir D. Galun M. Brandt A. Fast multiscale clustering and manifold identification Pattern Recognition 2006 39 10 1876 1891 2-s2.0-33745422982 10.1016/j.patcog.2006.04.007 Ericsson S. Generalized sampling in shift invariant spaces with frames Acta Mathematica Sinica 2012 28 9 1823 1844 10.1007/s10114-012-1235-4 Maravić I. Vetterli M. Sampling and reconstruction of signals with finite rate of innovation in the presence of noise IEEE Transactions on Signal Processing 2005 53 8 2788 2805 10.1109/TSP.2005.850321 Hogan J. A. Frame-based nonuniform sampling in Paley-Wiener spaces Journal of Applied Functional Analysis 2007 2 4 361 400 Boche H. Mönich U. J. There exists no globally uniformly convergent reconstruction for the Paley-Wiener space PW1/π of bandlimited functions sampled at Nyquist rate IEEE Transactions on Signal Processing 2008 56 7 3170 3179 10.1109/TSP.2008.920490 Boche H. Mönich U. J. Unboundedness of thresholding and quantization for bandlimited signals Signal Processing 2012 92 12 2821 2829 10.1016/j.sigpro.2012.04.003 Rao S. Tron R. Vidal R. Ma Y. Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories IEEE Transactions on Pattern Analysis and Machine Intelligence 2010 32 10 1832 1845 2-s2.0-77956034602 10.1109/TPAMI.2009.191 Bailey B. A. Multivariate polynomial interpolation and sampling in Paley-Wiener spaces Journal of Approximation Theory 2012 164 4 460 487 10.1016/j.jat.2011.12.004 Aldroubi A. Cabrelli C. Hardin D. Molter U. Optimal shift invariant spaces and their Parseval frame generators Applied and Computational Harmonic Analysis 2007 23 2 273 283 2-s2.0-34547603944 10.1016/j.acha.2007.05.001 Vetterli M. Marziliano P. Blu T. Sampling signals with finite rate of innovation IEEE Transactions on Signal Processing 2002 50 6 1417 1428 2-s2.0-0036612082 10.1109/TSP.2002.1003065 Dragotti P. L. Vetterli M. Blu T. Sampling moments and reconstructing signals of finite rate of innovation: shannon meets strang-fix IEEE Transactions on Signal Processing 2007 55 5 1741 1757 2-s2.0-34247887411 10.1109/TSP.2006.890907 Tan V. Y. F. Goyal V. K. Estimating signals with finite rate of innovation from noisy samples: a stochastic algorithm IEEE Transactions on Signal Processing 2008 56 10 5135 5146 2-s2.0-54749117487 10.1109/TSP.2008.928510 Bi N. Nashed M. Z. Sun Q. Reconstructing signals with finite rate of innovation from noisy samples Acta Applicandae Mathematicae 2009 107 1–3 339 372 2-s2.0-67650869601 10.1007/s10440-009-9474-9 Berent J. Dragotti P. L. Blu T. Sampling piecewise sinusoidal signals with finite rate of innovation methods IEEE Transactions on Signal Processing 2010 58 2 613 625 2-s2.0-74949112085 10.1109/TSP.2009.2031717 Kanade T. Morris D. D. Factorization methods for structure from motion The Royal Society of London. Philosophical Transactions A 1998 356 1740 1153 1173 10.1098/rsta.1998.0215 Costeira J. P. Kanade T. A multibody factorization method for independently moving objects International Journal of Computer Vision 1998 29 3 159 179 2-s2.0-0032154138 Torr P. H. S. Geometric motion segmentation and model selection The Royal Society of London. Philosophical Transactions A 1998 356 1740 1321 1340 10.1098/rsta.1998.0224 Goh A. Vidal R. Segmenting motions of different types by unsupervised manifold clustering 6 Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07) June 2007 2-s2.0-34948906262 10.1109/CVPR.2007.383235 Smale S. Zhou D. X. Shannon sampling II: connections to learning theory Applied and Computational Harmonic Analysis 2005 19 3 285 302 2-s2.0-27844555491 10.1016/j.acha.2005.03.001 Johnson W. Linderstrauss J. Extensions of lipshitz mapping into hilbert space Contemporary Mathematics 1984 26 189 206 Achlioptas D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins Journal of Computer and System Sciences 2003 66 4 671 687 2-s2.0-0038166193 10.1016/S0022-0000(03)00025-4 Silva N. Costeira J. Subspace segmentation with outliers: a grassmannian approach to the maximum consensus subspace Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2008 Arriaga R. I. Vempala S. Algorithm theory of learning: robust concepts and random projection Proceedings of IEEE 40th Annual Conference on Foundations of Computer Science October 1999 616 623 2-s2.0-0033354520 Aldroubi A. Anastasio M. Cabrelli C. Molter U. A dimension reduction scheme for the computation of optimal unions of subspaces Sampling Theory in Signal and Image Processing 2011 10 1-2 135 150 Gear C. W. Multibody grouping from motion images International Journal of Computer Vision 1998 29 2 133 150 2-s2.0-0032132014 Vidal R. Tron R. Hartley R. Multiframe motion segmentation with missing data using PowerFactorization and GPCA International Journal of Computer Vision 2008 79 1 85 105 2-s2.0-42549164992 10.1007/s11263-007-0099-z Tseng P. Nearest q-flat to m points Journal of Optimization Theory and Applications 2000 105 1 249 252 2-s2.0-0034349278 Bradley P. S. Mangasarian O. L. k-plane clustering Journal of Global Optimization 2000 16 1 23 32 2-s2.0-0010474110 Elhamifar E. Vidal R. Sparse subspace clustering Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '09) June 2009 2790 2797 2-s2.0-70450184118 10.1109/CVPRW.2009.5206547 Elhamifar E. Vidal R. Clustering disjoint subspaces via sparse representation Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '10) March 2010 1926 1929 2-s2.0-78049388384 10.1109/ICASSP.2010.5495317 Elhamifar E. Vidal R. Sparse subspace clustering: algorithm, theory, and applications http://arxiv.org/abs/1203.1005 Sekmen A. Susbpace Segmentation 2012 Vanderbilt Faugeras O. Torr P. Kanade T. Hollinghurst N. Lasenby J. Sabin M. Fitzgibbon A. Geometric motion segmentation andmodel selection—discussion Philosophical Transactions of the Royal Society A 1998 356 1338 1340 Gruber A. Weiss Y. Multibody factorization with uncertainty and missing data using the EM algorithm Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04) July 2004 I707 I714 2-s2.0-5044230648 Candillier L. Tellier I. Torre F. Bousquet O. SSC: statistical subspace clustering 5mes Journes d’Extraction et Gestion des Connaissances (EGC '05) 2005 Paris, France 177 182 Smale S. Zhou D.-X. Learning theory estimates via integral operators and their approximations Constructive Approximation 2007 26 2 153 172 10.1007/s00365-006-0659-y Chen G. Atev S. Lerman G. Kernel spectral curvature clustering (KSCC) Proceedings of IEEE 12th International Conference on Computer Vision Workshops (ICCV '09) October 2009 765 772 2-s2.0-77953199244 10.1109/ICCVW.2009.5457627 Zhang T. Szlam A. Wang Y. Lerman G. Randomized hybrid linear modeling by local best-fit flats Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10) June 2010 1927 1934 2-s2.0-77955989449 10.1109/CVPR.2010.5539866 Zappella L. Lladó X. Provenzi E. Salvi J. Enhanced Local Subspace Affinity for feature-based motion segmentation Pattern Recognition 2011 44 2 454 470 2-s2.0-77958006812 10.1016/j.patcog.2010.08.015 Fischler M. A. Bolles R. C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography Communications of the ACM 1981 24 6 381 395 2-s2.0-0019574599 10.1145/358669.358692 Sarlós T. Improved approximation algorithms for large matrices via random projections Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS '06) October 2006 143 152 2-s2.0-35348901208 10.1109/FOCS.2006.37