A Complete Subspace Analysis of Linear Discriminant Analysis and Its Robust Implementation

Linear discriminant analysis has been widely studied in data mining and pattern recognition. However, when performing the eigen-decomposition on the matrix pair (within-class scatter matrix and between-class scatter matrix) in some cases, one can find that there exist some degenerated eigenvalues, thereby resulting in indistinguishability of information from the eigen-subspace corresponding to some degenerated eigenvalue. In order to address this problem,we revisit linear discriminant analysis in this paper and propose a stable and effective algorithm for linear discriminant analysis in terms of an optimization criterion. By discussing the properties of the optimization criterion, we find that the eigenvectors in some eigen-subspaces may be indistinguishable if the degenerated eigenvalue occurs. Inspired from the idea of the maximummargin criterion (MMC), we embedMMC into the eigensubspace corresponding to the degenerated eigenvalue to exploit discriminability of the eigenvectors in the eigen-subspace. Since the proposed algorithm can deal with the degenerated case of eigenvalues, it not only handles the small-sample-size problem but also enables us to select projection vectors from the null space of the between-class scattermatrix. Extensive experiments on several face images and microarray data sets are conducted to evaluate the proposed algorithm in terms of the classification performance, and experimental results show that our method has smaller standard deviations than other methods in most cases.


Introduction
Linear discriminant analysis (LDA) [1][2][3][4] plays an important role in data analysis and has been widely used in many fields such as data mining and pattern recognition [5].The main aim of LDA is to find optimal projection vectors by simultaneously minimizing the within-class distance and maximizing the between-class distance in the projection space and optimal projection vectors can be achieved by solving a generalized eigenvalue problem.In solving classical LDA, the within-class scatter matrix is required to be nonsingular in the general case.However, in many applications such as text classification and face recognition [6], the within-class scatter matrix is often singular since the dimension of data we deal with is much bigger than the number of data points.This is known as the small-sample-size (SSS) problem.
In the past several decades, various variants of LDA [7][8][9][10] have been proposed to address the problems of highdimensional data and the SSS problem.It is noted that most of LDA-based methods are divided into four categories in terms of the combination of spaces of the within-class scatter and between-class scatter matrices [11].
The first category of these methods is to consider the range space of the within-class scatter matrix and the range space of the between-class scatter matrix.The typical algorithm of this category is the Fisherface [1] method where PCA is first employed to reduce the dimension of features to make the within-class scatter matrix be full-rank and then the standard LDA is performed.In the direct LDA method [12], the null space of the between-class scatter matrix is first removed and then the projection vectors are obtained by minimizing the within-class scatter distance in the range space of the between-class scatter matrix.Li et al. [13] proposed an efficient and stable algorithm to extract the discriminant vectors by defining the maximum margin criterion (MMC).The main difference between Fisher's criterion and MMC is that the former is to maximize the Fisher quotient while the latter is to maximize the average distance.
The second category mainly depends on exploiting the null space of within-class scatter matrix and the range space of the between-class scatter matrix.In terms of the null space-based LDA, Chen et al. [14] proposed to maximize the between-class scatter in the null space of the withinclass scatter matrix and their method is referred to as the NLDA method.In order to reduce the computational cost of calculating the null space of the within-class scatter matrix, several effective methods have been proposed.Instead of directly obtaining the null space of the within-class scatter matrix, C ¸evikalp et al. [15] first obtained the range space of the within-class scatter matrix and then defined the scatter matrix of common vectors.Based on this, the projection vectors were obtained from the scatter matrix they defined.They also adopted difference subspaces and the Gram-Schmidt orthogonalization procedure to obtain discriminative common vectors.Chu and Thye [16] adopted the QR factorization on several matrices to exploit a new algorithm for the null space-based LDA method.Sharma and Paliwal [17] proposed an alternative null LDA method and discussed its fast implementation.Paliwal and Sharma [18] also developed a variant of pseudoinverse linear discriminant analysis and this method yields better classification performance.
The third category consists of those methods that make use of the null space of within-class scatter matrix, the range space of between-class scatter matrix, and the range space of within-class scatter matrix.Sharma et al. [19] applied improved RLDA to devise a feature selection method to extract important genes.In order to address the problem of the regularization parameter in RLDA, Sharma and Paliwal [20] applied a deterministic method to estimate the parameter by maximizing modified Fisher's criterion.
The fourth category is made up of those methods that explore all the spaces of the within-class scatter matrix and the between-class scatter matrix.Sharma and Paliwal [11] applied a two-stage technique to regularize both the betweenclass scatter and within-class scatter matrices to achieve the discriminant information.
In addition, there are other variants of LDA that do not belong to four categories mentioned above.Uncorrelated local Fisher discriminant analysis in terms of manifold learning is devised for ear recognition [21].An exponential locality preserving projection (ELPP) is presented by introducing the matrix exponential to address the SSS problem.A double shrinking model [22] is constructed for manifold learning and feature selection.Li et al. [23] analyzed linear discriminant analysis in the worst case and reduced this problem to a scalable semidefinite feasibility problem.Zollanvari and Dougherty [24] discussed asymptotic generalization bound of linear discriminant analysis.Lu and Renals [25] used probabilistic linear discriminant analysis to model acoustic data.
In this paper, we revisit the optimization criterion for linear discriminant analysis.We find that there exists the degenerated case for some generalized eigenvalues.In order to deal with the degeneration of eigenvalues, we develop a robust implementation for this criterion in this paper.To be specific, the null space of the total scatter matrix is first removed to remedy the singularity problem.Then the eigen-subspace corresponding to each specific eigenvalue is achieved.Finally, in each eigen-subspace, the discriminability of eigenvectors is measured by the maximum margin criterion and the projection vectors can be achieved by optimizing this criterion.We also conduct extensive experiments to evaluate the proposed method on various well-known data sets such as face images and microarray data sets.Experimental results show that our method is more stable than other methods in most cases.

Related Works
Assume that there are a set of -dimensional data points, denoted by { 1 , . . .,   }, where   ∈   ( = 1, . . ., ).When the labels of data points are available, each data point belongs to exactly one of  object classes { 1 , . . .,   } and the number of samples in class   is   .Thus,  = ∑  =1   is the number of all data points.In classical linear discriminant analysis, the between-class scatter matrix, the within-class scatter matrix, and the total scatter matrix are defined as follows: where   is the centroid of the th class and  is the global centroid of the data set.The precursor matrices are defined as where   = (1, . . ., 1)  ∈ R   and   is the data matrix that consists of data points from class   .Classical LDA is to find the projection direction by making data points from different classes far from each other and data points from the same class close to each other.To be specific, LDA is to obtain the optimal projection vector by optimizing the following objective function: The optimal projection direction  can be achieved by solving the generalized eigenproblem:    =   .In general, there are at most  − 1 eigenvectors corresponding to nonzero generalized eigenvalues since the rank of the matrix   is not bigger than  − 1.When   is singular, some methods including PCA plus LDA [1], LDA/GSVD [7], and LDA/QR [26] can be used to deal with this problem.

Optimization Criterion and Its Robust Implementation
In this section, we revisit an optimization criterion for linear discriminant analysis and its properties are analyzed in detail.Finally, we discuss its robust implementation.Note that if the matrix   is singular, the optimal function value of (3) will take the positive infinity.There are several variants of the model in (3) that can be found [27].In fact, when the matrix   is nonsingular, it is not difficult to verify that these variants of (3) are equivalent [27].For convenience, we adopt the following optimization criterion to give a stable and efficient algorithm for linear discriminant analysis, denoted by The main aim for adopting (4) is based on the following reasons.First, the objective function is a bounded function in the general case, which avoids the case that the objective function takes the infinity.Second, since the null space of   plays an important role in some cases, especially in the small-sample-size problem, the optimization criterion of (4) also provides convenience for analyzing the null space of   .In fact, it is straightforward to verify that (4) and ( 3) are equivalent under some conditions.Most importantly, (4) can produce more generalized eigenvalues than (3) since the rank of   is not smaller than the rank of   .In addition, from the viewpoint of optimization, the objective function we optimize is usually bounded.Thus, (4) is more preferred than (3) in some cases.
It is obvious that the optimal projection  of (4) can be achieved by solving the generalized eigenproblem:    =    when the matrix   is nonsingular.Later we will note that the generalized eigenvalue  will take values in the interval of 0 and 1. Different from classical LDA, we extract the discriminant vectors which are composed of the first  eigenvectors of  −1    corresponding to the first  smallest eigenvalues if   is nonsingular.In such a case, we can avoid the singularity problem of the matrix   .Before giving an explicit implementation of the optimization criterion of (4), we start by giving the definitions of some subspaces [28].
Definition 1.Let  be an  ×  positive semidefinite matrix and  be an eigenvalue of .The set of all eigenvectors of  corresponding to the eigenvalue , together with the zero vector, forms a subspace.This subspace is referred to as the eigen-subspace of  with .
Definition 2. The null space of the matrix  is the set of all eigenvectors of  with  = 0. Definition 3. The range space of the matrix is the set of all eigenvectors of  corresponding to nonzero eigenvalues.
In the case of the positive semidefinite matrix, the number of repeated roots of the characteristic equation det |−| = 0 determines the dimension of the eigen-subspace of  with .If the dimension of the eigen-subspace of  with  is bigger than 1, the eigenvalue  is degenerative since the number of repeated roots of the characteristic equation is bigger than 1.It is observed from (1) that the matrices   ,   , and   are positive semidefinite.According to the above definitions, we can obtain the following four subspaces from   and   [20]: (a) The null space of   is denoted by null (  ).
(b) The null space of   is denoted by null (  ).
(c) The range space of   is denoted by span (  ).
(d) The range space of   is denoted by span (  ).
Based on these four subspaces, we can construct another four subspaces.
(e) Subspace  is defined as the intersection of span (  ) and null (  ).
(f) Subspace  is defined as the intersection of span (  ) and span (  ).
(g) Subspace  is defined as the intersection of null (  ) and span (  ).
(h) Subspace  is defined as the intersection of null (  ) and null (  ).
From Subspaces , , , and , we find that the objective function  2 () of ( 4) satisfies the following equation: From ( 5), one can see that if  is taken from Subspace , Subspace , or Subspace , the objective function  2 () is bounded.If  belongs to Subspace , the objective function  2 () takes the indefinite value.It is of interest to note that the null space of   is the intersection of the null space of   and the null space of   .It has been proved that the null space of   does not contain any discriminant information [29].Thus, Subspace  does not contain any discriminant information and this also shows that part of the null space of   does not contain discriminant information.Therefore, Subspace  can be removed without losing any information and this can be done by removing the null space of   .An effective method to remove the null space of   is to perform the singular value decomposition (SVD) [28] on   , denoted by   =   Σ     , where   consists of the left singular vectors corresponding to the nonzero singular values of   .In such a case, we do not lose any information of data.By doing so, we also remove part of the null space of   that does not contain discriminant information.Since we focus on (4), the range space of   must be considered.If the null space of   is removed, it is necessary to consider three subspaces in the case of (4): the null space of   , the range space of   , and the range space of   .For these three subspaces, we also give their relations with Subspace , Subspace , and Subspace .It is not difficult to verify that the intersection of the null space of   and the range space of   is equivalent to Subspace , and the intersection of the range space of   and the range space of   contains Subspaces  and .This shows that we do not lose any discriminant information from Subspace , Subspace , and Subspace  if we solve (4).In such a case, we first remove the null space of   .That is, we consider the following optimization function in the case of the range space of   , where 6) is nonsingular when the null space of   is removed.In such a case, we obtain the projection vectors which are composed of  eigenvectors of  −1    corresponding to  eigenvalues.From (6), we can see that  3 () takes values in the interval of 0 and 1.In fact, the value  3 () gives an indicator of choosing the effective subspace.According to the definition of the optimization criterion, we have the following conclusions: the subspace corresponding to  3 () = 0 is the most important; the subspace corresponding to  3 () ∈ (0, 1) is the second important; the subspace corresponding to  3 () = 1 is the least important.
By solving the generalized eigenproblem,    =   , we can obtain (= rank(  )) eigenvalues, which produces  eigenvectors.In some cases some of these  eigenvalues may be equal.In other words, some eigenvalues degenerate into the same eigenvalue, which may affect the performance of some algorithms.Assume that these  eigenvalues consist of  ( ≤  = rank(  )) different values   ( = 1, . .., ) in an increasing order and have multiplicities   ( = 1, . . ., ), where   denotes the algebraic multiplicities of the eigenvalue   and ∑  =1   = .In some situations, it is useful to work with the set of all eigenvectors associated with a specific value   .Let us define the following set: The dimension of (  ) is in general equal to the algebraic multiplicities of   since   and   are symmetric real matrices.The set (  ) forms the eigen-subspace of the matrix pairs (  ,   ) corresponding to the generalized eigenvalue   .When the dimension of (  ) is equal to 1, it is not necessary to deal with this subspace since it only contains an eigenvector.However, when the dimension of (  ) is bigger than 1, it is impossible to determine which eigenvector in this eigen-subspace is the most important since all the eigenvectors correspond to the same eigenvalue.The case often occurs in the small-sample-size problem where the dimension of the eigen-subspace of (  = 0) is relatively high.In such a case, it is infeasible to determine which projection vector in the eigen-subspace of (  = 0) is the most important if we only use (7).For some nonzero generalized eigenvalues from the matrix pair (  ,   ), the dimension of (  ̸ = 0) may be bigger than 1.For example, (  = 1) shows that the eigenvector is taken from the null space of   =   −   .Generally speaking, the dimension of the null space   is bigger than 1 and this makes the dimension of (  = 1) be bigger than where When the dimension of the set (  ) is 1, it is easy to prove that  = ±1.When the dimension of the set (  ) is bigger than 1, it is necessary to obtain   eigenvectors of Ŝ − Ŝ corresponding to   eigenvalues in a decreasing order.These   eigenvectors form the matrix    (= [ 1 , . . .,    ]).Thus, the discriminability of eigenvectors in the eigen-subspace of (  ) can be measured by the eigenvalues of Ŝ − Ŝ .This gives suggestions on how to choose effective discriminant vectors in the eigen-subspace (  ), which solves the degenerated case of eigenvalues.In classical LDA, the discriminability of eigenvectors in the eigen-subspace is sometimes neglected.
Note that, in the small-sample-size problem, the dimension of the eigen-subspace of (  = 0) is relatively high.In such a case, we need to obtain this eigen-subspace.In fact, it is noted that the eigen-subspace (  = 0) is the null space of   and obtaining the null space   may be time consuming when the dimension of the null space of   is high.Fortunately, several effective methods have been proposed to obtain the null space of   .C ¸evikalp et al. [15] have proposed an effective algorithm to avoid computing the null space of   by finding the range space of   .Note that the dimension of the range space of   is equal to the rank of the matrix   .Based on the range space of   , we can obtain common vectors for each class and construct the scatter matrix of the common vectors as done in [15].Finally, the projection vectors can be obtained by performing eigen-decomposition on the scatter matrix of the common vectors.
As a summary of the above discussion, we list the detailed steps for solving linear discriminant analysis in Algorithm 4.
Algorithm 4. It is a stable and efficient algorithm for solving linear discriminant analysis.
Step 1. Construct   ,   , and   , and compute the left singular matrix   of   by performing the SVD on   =   Σ     , where   consists of singular vectors corresponding to the nonzero singular values of   ; obtain   = (  Σ −1  )    .
Step 2. Obtain the range space of   , denoted by   whose column vectors are orthogonal; perform the SVD on (  )    =   Σ     and assign   in an increasing order from the diagonal elements of Σ  .
Step 4. Based on , obtain the common vectors of each class, compute the scatter matrix of common vectors, and perform the eigen-decomposition on the scatter matrix of common vectors to obtain projection vectors, denoted by  0 .
Step 5.For each nonzero   , do the following.
Step 5(a).Obtain the singular submatrix    by searching the column vectors of   corresponding to the singular value   ; let    =   Σ −1     ; apply the QR decomposition on    to obtain the matrix    whose column vectors are orthogonal.

Note that, in
Step 2 of Algorithm 4, we only need to obtain the range space of   , that is, an orthonormal basis of   .There are some effective methods for obtaining the range space of   .For example, the range space of   can be achieved by finding the left singular matrix   of   corresponding to nonzero singular values.It is pointed out in [28] that computing left singular vectors of   corresponding to nonzero singular values is more efficient than finding left singular vectors of   corresponding to all singular values including zeros.In addition, one may resort to difference subspaces and the Gram-Schmidt orthogonalization procedure [15] to obtain the range space of   .Note that, in Step 3 of Algorithm 4, we use a criterion to judge whether the null space of      exists.If  =  × −      is not a zero matrix, this shows that there exists the null space of      .In such a case, one may use the method (Step 4 of Algorithm 4) proposed in [15] to further deal with the null space of      .It is observed from Algorithm 4 that we need to perform Step 5 of Algorithm 4 regardless of the existence of the null space of      .In such a case, we can see that
Although LDA/GSVD makes use of three subspaces (Subspace , Subspace , and Subspace ), the importance of projection vectors in some eigen-subspaces is not effectively measured in some cases.In this paper we do not compare other discriminant methods since the main objective of paper is to provide a stable and efficient algorithm for solving the degenerated eigenvalue of LDA.Note that we do not give the running time of algorithms we test since some methods only make use of part of subspaces in (5).Generally speaking, the performance of each algorithm varies with the change of the dimension of features.For comparisons, we try to search for the performance on all the feature dimensions and list the best one.
Figure 1 shows the error rate of each method we test with different training images in each class on the ORL and Yale face databases.For clarity, we also show the mean and standard deviation in the parentheses of the error rates of each method in Table 2.Note that the best performance of each method in each line is highlighted in bold and we show the results of 2, 4, 6, and 8 training images per class.
From Figure 1 and Table 2, one can see that the error rate of each algorithm decreases as the number of the training samples in each class increases in most cases.It is observed from Table 2 that the standard deviation of our method is smaller than that of the other methods in most cases.On the ORL face database, the error rate of our method decreases from 16.32% with 2 training samples per class to 1% with 9 training samples per class, while the error rates of DLDA, PCA+LDA, MMC, DCV, LDA/QR, and LDA/GSVD decrease from 36.73%, 29.82%, 18.04%, 16.43%, 21.62%, and 19.92% with 2 training samples per class to 1.75%, 1.25%, 1.625%, 1.125%, 2.75%, and 3% with 9 training samples per class, respectively.The results show that our method outperforms  other methods in most cases.On the Yale face database, although the DCV method gives the best result in the case of 2 training samples per class, it obtains the biggest standard deviation.It is also observed that our method is superior to other methods in terms of the classification performance with the increase of training samples.
Since the number of the extracted features of samples by using the proposed method is not limited by the number of classes and only limited by the rank of   , we can project the samples onto the space whose dimension is greater than the number of classes.Figure 2 shows a plot of the error rate versus dimensionality.The numbers in the parentheses denote the optimal dimension corresponding to the best classification performance.As can be seen from Figure 2, the error rate of the proposed method decreases with the increase of training samples per class.It is also found from Figure 2 that the classification performance may be improved when the dimension of the reduced space is bigger than the number of classes.On the Yale face database, it is observed that the error rate of the proposed method first decreases and then rises with the increase of dimensions, which shows that choosing too many features yields the overfitting phenomenon in the classification task.On the ORL face database, the error rate of the proposed method first decreases drastically and then becomes flat when the number of training samples is bigger than 2. It is found that the best performance of our method is achieved when the number of extracted features is much bigger than the number of classes.In short, these experimental results show that Subspace  which is often neglected in classical LDA in (5) may play a role in face recognition in some cases.Now let us explain the reason why our method can achieve the good classification performance.The DLDA and LDA/QR methods first remove the null space of   .However, removing the null space of   will also lose part of the null space of   and may result in the loss of important information in the null space of   .The PCA+LDA method does not consider the null space of   .It has been proved that the null space of   will play an important role in the SSS problem [14].The DCV method does not make use of subspace  in (5) and this subspace may be helpful in obtaining discriminant vectors in the SSS problem.Although the LDA/GSVD method considers three subspaces, the discriminability of each eigen-subspace is not analyzed.In the MMC method, the discriminant vectors in Subspace  and Subspace  in (5) may have the same objective function.This results in the difficulty in determining which discriminant vector is the most important.In fact, Subspace  in ( 5) is often neglected in LDA-based methods in the previous literature.We give a strategy to measure the importance of each discriminant vector in all subspaces including Subspace  for the first time.As can be seen from Figure 2, Subspace  also plays a role in face recognition.As a result, the proposed method can achieve better classification performance than other methods in the general case.
In the following experiments, we study the effect of image sizes on the classification performance in terms of two face databases.Since the number of face images on these two face databases is relatively small, the leave-one-out method is performed where it takes one image for testing and the remaining images for training.By reducing the image resolution of 112 * 92 pixels, we can obtain 56 * 46 pixels where each pixel value is the average value of a 2 * 2 subimage of the original images.Similarly, we can achieve the images with 28 * 23 pixels.In such cases, there exists the null space of the within-class scatter matrix.Table 3 shows the experimental results of each method in three resolutions on two face databases.
As can be seen from Table 3, the error rate of each method does not always increase with the reduction of image resolutions.On the ORL face database, the DCV method obtains the best classification result on the resolution of 112 * 92 pixels.With the reduction of image resolutions, the performance of NLDA becomes worse since the dimension of the null space of   becomes smaller.On the ORL face database, the proposed method is better than LDA/GSVD and has a smaller standard deviation than other methods in most cases.
The main reason is that we consider the degenerated case of the eigenvalue.It is noted that our method achieves the best classification result when the resolution of images is 56 * 46 pixels.On the Yale face database, the proposed method outperforms other methods in terms of the classification performance.It is also observed that the best recognition rate among all methods is 92.13% and is achieved by the proposed method when the images are 56 * 46 pixels on the Yale face database.From these experiments, we can also notice that it is not necessary to use the large-size images to obtain good classification performance in the classification task.

Applications to Microarray Data Sets.
In this set of experiments, we further validate the proposed method on microarray data sets.In order to evaluate the classification performance of various LDA methods, we adopt the tenfold cross validation on these data sets.In other words, we divide each data set into ten subsets of approximately equal sizes.Then we perform training and testing ten times, each time leaving out one of the subsets for training and the discarded subset for testing.The classification performance is averaged over ten runs.Table 4 shows the mean and the standard deviation of the error rate of each method.
As can be seen from Table 4, the classification performance of the proposed method is consistently superior to that of other methods on all the data sets we tested.It is found that our method is more stable than other methods since the standard deviation of our method is smaller than that of other methods on all of data sets we tested.It is noted that PCA+LDA performs poorly on Leukemia and MLL data sets.This may come from the fact that the null space of the within-class scatter matrix is removed and it plays an important role in obtaining discriminant feature vectors.It is also found that DLDA does not give satisfactory results on Duke-Breast and Colon data sets since DLDA may remove the part of the null space of the within-class scatter matrix.One can see from Table 4 that the NLDA method achieves good classification accuracies on these data sets since these data sets are the small-sample-size sets.One can also observe that the LDA/QR method does not perform well on some data sets.This may be explained by the fact that the LDA/QR method may remove part of the range space of   and part of the null space of   .It is found that LDA/GSVD is not better than our method although LDA/GSVD considers three subspaces.This is possibly because in LDA/GSVD the discriminability of each eigen-subspace is not given.Because the discriminant vectors in Subspace  and Subspace  in the MMC method may correspond to the same objective function, this may lead to the degradation in MMC.Overall, the proposed method is very stable on these data sets due to the fact that we consider the degenerated eigenvalues of scatter matrices, especially for Subspace  which is neglected in previous literature.

Conclusions
In this paper, we revisit linear discriminant analysis based on an optimization criterion.Different from the existing LDAbased algorithms, the new algorithm adopts the spirit of the maximum margin criterion (MMC) and applies MMC to the eigen-subspace when the eigenvalue is degenerative.The new implementation avoids the singularity problem in the SSS problem and provides more than  − 1 discriminant vectors.We also conduct a series of comparative studies on face images and microarray data sets to evaluate the proposed method.Our experiments on face images and microarray data sets demonstrate that the classification performance achieved by our method is better than that of other LDAbased algorithms in most cases and the proposed method is an effective and stable linear discriminant method for dealing with high-dimensional data.

Figure 1 :
Figure 1: The error rates of each algorithm on two face databases.

Figure 2 :
Figure 2: The error rate of the proposed method with the change of features.
1.So it is necessary to use an additional strategy to determine the importance of eigenvectors if the dimension of (  ) is bigger than 1.For the subspace (  ), we can obtain a matrix whose columns consist of the eigenvectors of the generalized eigenvalue   , denoted by    .Obviously the dimension of (  ) is equal to the number of the columns of    .If this matrix is provided, it is straightforward to obtain an orthogonal basis by performing the QR decomposition on    and the orthogonal basis can be expressed in the matrix form:    .Note that the space spanned by the column vectors of    is equivalent to the space spanned by the column vectors of    .Thus, in the space spanned by the column vectors of    , we formulate the following objective function based on the maximum margin criterion.

Table 1 :
(7)tistics of the data sets we use.(=  1 +⋅ ⋅ ⋅+  ) eigenvectors can be ordered in terms of their importance.By performing Algorithm 4, we can evaluate the projection vectors from Subspace  which is often neglected in the previous literature.It is obvious that the above method can provide  discriminant vectors because the rank of   is  which is much bigger than  − 1.As a result, this method may be helpful when the number of classes is relatively small.Note that we use the eigenvalue   in(7)and it is not difficult to verify that   = (  ) 2 .If the singular value   occurs only once in the diagonal elements of Σ , we do not need to perform Step 5(b) in real applications.

Table 1
} − 1, and the remaining images in the data set are used to form the testing set.To reduce the variations of the accuracies from randomness, the classification performance we report in the experiments is achieved over twenty runs.That is, there are twenty different training and testing sets used for evaluating the classification performance.We compare the proposed method with some

Table 2 :
Performance comparisons (%) of some methods on face databases.

Table 3 :
Comparisons of misclassification rates (%) of several methods on face databases.

Table 4 :
Error rates (%) of each method on microarray data sets.