Tensor subspace analysis (TSA) and discriminant TSA (DTSA) are two effective two-sided projection methods for dimensionality reduction and feature extraction of face image matrices. However, they have two serious drawbacks. Firstly, TSA and DTSA iteratively compute the left and right projection matrices. At each iteration, two generalized eigenvalue problems are required to solve, which makes them inapplicable for high dimensional image data. Secondly, the metric structure of the facial image space cannot be preserved since the left and right projection matrices are not usually orthonormal. In this paper, we propose the orthogonal TSA (OTSA) and orthogonal DTSA (ODTSA). In contrast to TSA and DTSA, two trace ratio optimization problems are required to be solved at each iteration. Thus, OTSA and ODTSA have much less computational cost than their nonorthogonal counterparts since the trace ratio optimization problem can be solved by the inexpensive Newton-Lanczos method. Experimental results show that the proposed methods achieve much higher recognition accuracy and have much lower training cost.
1. Introduction
Many applications in the field of information process, such as data mining, information retrieval, machine learning, and pattern recognition, require dealing with high-dimensional data. Dimensionality reduction has been a key technique for achieving high efficiency in manipulating the high-dimensional data. In dimensionality reduction, the high-dimensional data are transformed into a low-dimensional subspace with limited loss of information.
Principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] are two of the most well-known and widely used dimension reduction methods. PCA is an unsupervised method, which aims to find the projection directions by maximizing variance of features in the low-dimensional subspace. It is also considered as the best data representation method in that the mean squared error between the original data and the data reconstructed using the PCA transform result is the minimum. LDA is a supervised method and is based on the following idea: the transform results of the data points of different classes should be far as much as possible from each other and the transform results of the data points of the same class should be close as much as possible to each other. To achieve this goal, LDA seeks to find optimal linear transformation by minimizing the within-class distance and maximizing the between-class distance simultaneously. The optimal transformation of LDA can be computed by solving a generalized eigenvalue problem involving scatter matrices. LDA has been applied successfully for decades in many important applications including pattern recognition [2–4], information retrieval [5], face recognition [6, 7], microarray data analysis [8, 9], and text classification [10]. One of main drawbacks of LDA is that the scatter matrices are required to be nonsingular, which is not true when the data dimension is larger than the number of data samples. This is known as the undersampled problem and also called a small sampled problem [2]. To make LDA applicable for undersampled problems, researchers have proposed several variants of LDA including PCA + LDA [7, 11], LDA/GSVD [12, 13], two-stage LDA [14], regularized LDA [15–18], orthogonal LDA [19, 20], null space LDA [21, 22], and uncorrelated LDA [20, 23].
As is well known, both PCA and LDA only take into account the global Euclidean structure of the original data. However, the high-dimensional data in real-world often lie on or near a smooth low-dimensional manifold. So, it is important to preserve the local structure. Locality preserving projection (LPP) [24] is a locality structure preserving method that aims to preserve the intrinsic geometry of the original data. Usually, LPP has better performance than the methods only preserving the global structure information such as PCA and LDA for recognition problems. Moreover, LPP is not sensitive to noise and outliers. In its original form, LPP is only an unsupervised dimension reduction method. The supervised version of LPP (SLPP) [25] exploits the class label information of the training samples and thus has a higher classification accuracy than the unsupervised LPP. Other improvements to LPP include the discriminant locality preserving projection (DLPP) [26] and the orthogonal discriminant locality preserving projection method (ODLPP) [27].
During dealing with two-dimensional data such as images, the traditional approach is first to transform these image matrices into one-dimensional vectors and then apply these dimension reduction methods mentioned above to the vectorized image data. The approach of vectorizing image matrices can bring high computational cost and a loss of the underlying spatial structure information of the images. In order to overcome disadvantages of the vectorization approach, researchers have proposed 2D-PCA [28], 2D-LDA [29], 2D-LPP [30], and 2D-DLPP [31]. These methods are directly based on the matrix expression of image data. However, these two-dimensional methods only employ single-side projection and thus cannot still preserve the intrinsic spatial structure information of the images.
In the last decade, some researchers have developed several second-order tensor methods for dimension reduction of image data. These methods aim to find two subspaces for two-sided projection. Ye [32] has proposed a generalized low-rank approximation method (GLRAM), which seeks to find the left and right projections by minimizing the reconstruction error. Moreover, an iterative procedure is presented. One of the main drawbacks of GLRAM is that one eigenvalue decomposition is required at each iteration step. So, the computational cost is high. To overcome this disadvantage, Ren and Dai [33] have proposed to replace the projection vectors obtained from the eigenvalue decomposition by the bilinear Lanczos vectors at each iteration step of GLRAM. Experimental results show that the approach based on bilinear Lanczos vectors is competitive with the conventional GLRAM in classification accuracy, while it has a much lower computational cost. We note that GLRAM is an unsupervised method and only preserves the global Euclidean structure of the image data. Tensor subspace analysis (TSA) [34] is another two-sided projection method for dimension reduction and feature extraction of image data. TSA preserves the local structure information of the original data, while it does not employ the discriminant information. Wang et al. [35] have proposed a discriminant TSA (DTSA) by combining TSA with the discriminant information. Like GLRAM, both TSA and DTSA use an iterative procedure to compute the optimal solution of two projection matrices. At each iteration of TSA and DTSA, two generalized eigenvalue problems are required to solve, which makes them inapplicable for dimension reduction and feature extraction of high-dimensional image data.
In this paper, we propose the orthogonal TSA (OTSA) and orthogonal DTSA (ODTSA) by constraining the left and right projection matrices to orthogonal matrices. Similarly to TSA and DTSA, OTSA and ODTSA also iteratively compute the left and right projection matrices. However, instead of solving two generalized eigenvalue problems as in TSA and DTSA, we require solving two trace ratio optimization problems at each iteration of OTSA and ODTSA during iteratively computing the left and right projection matrices. Thus, OTSA and ODTSA have much less computational cost than their nonorthogonal counterparts since the trace ratio optimization problem can be solved by the inexpensive Newton-Lanczos method. Two experiments on face recognition are conducted to evaluate the efficiency and effectiveness of the proposed OTSA and ODTSA. Experimental results show that these methods proposed in this paper achieve much higher recognition accuracy and have much lower training cost than TSA and DTSA.
The remainder of the paper is organized as follows. In Section 2, we briefly review TSA and DTSA. In Section 3, we firstly propose the OTSA and ODTSA. Then, we give a brief review of the trace ratio optimization problem and outline the Newton's method and the Newton-Lanczos for solving the trace ratio optimization problem. Finally, we present the algorithms for computing the left and right projection matrices of OTSA and ODTSA. Section 4 is devoted to numerical experiments. Some concluding remarks are provided in Section 5.
2. A Brief Review of TSA and DTSA
In this section, we give a brief review of TSA and DTSA, which are two recently proposed linear methods for dimension reduction and feature extraction of face recognition.
Given a set of N image data,
(1)𝒳={X1,X2,…,XN},
where Xi∈ℝL1×L2. For simplicity of discussion, we assume that the given data set 𝒳 is partitioned into C classes as
(2)𝒳={X1(1),X2(1),…,XN1(1),X1(2),X2(2),…,XN2(2),…,X1(C),X2(C),…,XNc(C)},
where Nc is the number of samples in the cth class and N=∑c=1CNc.
Let W∈ℝN×N denote the total within-class similarity matrix. Its entry is defined by
(3)Wij={exp(-∥Xi-Xj∥t),ifXi,Xjarefromthesameclass,0,otherwise,
where t is a positive parameter which can be determined empirically and ∥·∥ denotes the Frobenius norm for a matrix, that is, ∥A∥=∑i∑jAij2. Note that the total within-class similarity matrix W has a block diagonal form, where the cth block is the within-class similarity matrix Wc of the cth class and the size of the cth block is equal to the number Nc of samples in the cth class; that is,
(4)W=diag(W1,W2,…,WC).
The between-class similarity matrix B is defined as follows:
(5)Bij=exp(-∥X¯i-X¯j∥t),i,j=1,2,…,C,
where
(6)X¯i=1Ni∑k=1NiXk(i)
is the mean of samples in the ith class.
Define the diagonal matrix D=diag(d1,d2,…,dN) with
(7)di=∑j=1NWij.
Then, LW=D-W is called the within-class Laplacian matrix and is symmetric positive semidefinite. Similarly, the between-class Laplacian matrix is defined as LB=E-B, where E is a diagonal matrix with its ith entry being the row sum of the ith row of B.
In two-sided projection methods such as TSA and DTSA for dimension reduction and feature extraction of matrix data, we aim to find two projection matrices U∈ℝL1×l1, V∈ℝL2×l2 with l1≤L1 and l2≤L2 such that the low-dimensional data
(8)Yi=UTXiV
are easier to be distinguished.
2.1. Tensor Subspace Analysis
In TSA, we seek to find the left and right transformation matrices U, V by solving the following optimization problem:
(9)maxU,V∑i=1Ndi∥Yi∥2∑i=1N∑j=1NWij∥Yi-Yj∥2=maxU,V∑i=1Ndi∥UTXiV∥2∑i=1N∑j=1NWij∥UTXiV-UTXjV∥2.
The numerator part of the objective function in (9) denotes the global variance on the manifold in low-dimensional subspace, while the denominator part of the objective function is a measure of nearness of samples from the same class. Therefore, by maximizing the objective function, the samples from the same class are transformed into data points close to each other and samples from different classes are transformed into data points far from each other.
Define
(10)PU=[UTX1UTX2⋮UTXN],PV=[VTX1TVTX2T⋮VTXNT].
These two matrices, respectively, are called the total left and right transformation matrices in [35].
The optimization problem (9) can be equivalently rewritten as the following optimization problem:
(11)maxU,Vtr(VTPUT(D⊗Il1)PUV)tr(VTPUT(LW⊗Il1)PUV),
or
(12)maxU,Vtr(UTPVT(D⊗Il2)PVU)tr(UTPVT(LW⊗Il2)PVU).
Here and in the following, Il denotes an identity matrix of order l and ⊗ represents the Kronecker product of the matrices.
Clearly, from the equivalence between the maximization problem (9) and the optimization problem (11) or (12), we have the following results, from which an iterative algorithm for the computation of the transformation matrices U and V results.
Theorem 1.
Let U and V be the solution of the maximization problem (9). Then, Consider the following.
For a given U, V consists of the l2 eigenvectors of the generalized eigenvalue problem
(13)[PUT(D⊗Il1)PU]v=λ[PUT(LW⊗Il1)PU]v
corresponding to the largest l2 eigenvalues.
For a given V, U consists of the l1 eigenvectors of the generalized eigenvalue problem
(14)[PVT(D⊗Il2)PV]u=λ[PVT(LW⊗Il2)PV]u
corresponding to the largest l1 eigenvalues.
Based on Theorem 1, iterative implementation of TSA has been given in Algorithm 1; see, also, [34].
Algorithm 1: TSA.
Input: A set of N sample matrices {Xi}i=1N with class label information, l1, l2.
Output: left and right transformation matrices U and V.
(1) Initialize U with an identity matrix;
(2) Until convergence Do:
(2.1) Form the matrix MD(U)=PUT(D⊗Il1)PU;
(2.2) Form the matrix ML(U)=PUT(LW⊗Il1)PU;
(2.3) Compute the l2 eigenvectors {vi}i=1l2 of the pencil (MD(U),ML(U))
corresponding to the largest l2 eigenvalues.
(2.4) Set V=[v1,v2,…,vl2];
(2.5) Form the matrix MD(V)=PVT(D⊗Il2)PV;
(2.6) Form the matrix ML(V)=PVT(LW⊗Il2)PV;
(2.7) Compute the l1 eigenvectors {ui}i=1l1 of the pencil (MD(V),ML(V))
corresponding to the largest l1 eigenvalues.
(2.8) Set U=[u1,u2,…,ul1];
End Do
2.2. Discriminant Tensor Subspace Analysis
In this subsection, we simply review the second-order DTSA, which is proposed in [35] for face recognition. DTSA combines the advantages of tensor methods and manifold methods and thus preserves the spatial structure information of the original image data and the local structure of the samples distribution. Moreover, by integrating the class label information into TSA, DTSA obtains higher recognition accuracy for face recognition.
In DTSA, the optimization problem is described as follows:
(15)maxU,V∑i=1C∑j=1C∥Y¯i-Y¯j∥2Bij∑i=1N∑j=1N∥Yi-Yj∥2Wij=maxU,V∑i=1C∑j=1C∥UTX¯iV-UTX¯jV∥2Bij∑i=1N∑j=1N∥Yi-Yj∥2Wij,
where X¯i is the mean of samples in the ith class.
We note that the objective function in (15) has the same denominator part as that of the objective function in (9) and however has a different numerator part from that of the objective function in (9). Since the numerator part of the objective function in (15) is established based on the class label information, DTSA has better performance than TSA for transforming samples from different classes into data points far from each other.
Define the mean left and right transformation matrices QU, QV by
(16)QU=[UTX¯1UTX¯2⋮UTX¯C],QV=[VTX¯1TVTX¯2T⋮VTX¯CT].
Then, similarly, the optimization problem (15) can be equivlently formulated as the optimization problem
(17)maxU,Vtr(VTQUT(LB⊗Il1)QUV)tr(VTPUT(LW⊗Il1)PUV)
or the optimization problem
(18)maxU,Vtr(UTQVT(LB⊗Il2)QVU)tr(UTPVT(LW⊗Il2)PVU),
where LW is the within-class Laplacian matrix and LB is the between-class Laplacian matrix.
Similarly, for the optimization problem (15), we have the following result.
Theorem 2.
Let U and V be the solution of the maximization problem (15). Then, Consider the following.
For a given U, V consists of the l2 eigenvectors of the generalized eigenvalue problem
(19)[QUT(LB⊗Il1)QU]v=λ[PUT(LW⊗Il1)PU]v
corresponding to the largest l2 eigenvalues.
For a given V, U consists of the l1 eigenvectors of the generalized eigenvalue problem
(20)[QVT(LB⊗Il2)QV]u=λ[PVT(LW⊗Il2)PV]u
corresponding to the largest l1 eigenvalues.
The algorithm proposed in [35] for implementing DTSA is described in Algorithm 2.
Algorithm 2: DTSA.
Input: A set of N sample matrices {Xi}i=1N with class label information, l1, l2.
Output: left and right transformation matrices U and V.
(1) Initialize U with an identity matrix;
(2) Until convergence Do:
(2.1) Form the matrix MLB(U)=QUT(LB⊗Il1)QU;
(2.2) Form the matrix MLW(U)=PUT(LW⊗Il1)PU;
(2.3) Compute the l2 eigenvectors {vi}i=1l2 of the pencil (MLB(U),MLW(U))
corresponding to the largest l2 eigenvalues.
(2.4) Set V=[v1,v2,…,vl2];
(2.5) Form the matrix MLB(V)=QVT(LB⊗Il2)QV;
(2.6) Form the matrix MLW(V)=PVT(LW⊗Il2)PV;
(2.7) Compute the l1 eigenvectors {ui}i=1l1 of the pencil (MLB(V),MLW(V))
corresponding to the largest l1 eigenvalues.
(2.8) Set U=[u1,u2,…,ul1];
End Do
3. Orthogonal TSA and DTSA
Although TSA and DTSA are two effective methods for dimension reduction and feature extraction of facial images, they still have two serious defects. Firstly, as shown in the section above, the column vectors of the left and right transformation matrices U and V are the eigenvectors of symmetric positive semidefinite pencils. So, they are not usually orthonormal. The requirement of the orthogonality of the columns of projection matrices is common in that orthogonal projection matrices preserve the metric structure of the facial image space. Thus, orthogonal methods have better locality preserving power and higher discriminating power than nonorthogonal methods. Secondly, at each iteration step of TSA algorithm or DTSA algorithm, two generalized eigenvalue problems are required to solve for iteratively computing the left and right projection matrices. As a result, when computational efficiency is critical, relatively high computational complexities of TSA and DTSA make them inapplicable for real applications.
In this section, we propose the orthogonal TSA (OTSA) and the orthogonal DTSA (ODTSA) for dimension reduction and feature extraction of facial images.
In OTSA, we seek to obtain the orthogonal projection matrices U and V by solving the optimization problem
(21)maxU∈ℝL1×l1,V∈ℝL2×l2∑i=1Ndi∥Yi∥2∑i=1N∑j=1NWij∥Yi-Yj∥2,UTU=Il1,VTV=Il2
while in ODTSA, the optimization problem to be solved is
(22)maxU∈ℝL1×l1,V∈ℝL2×l2∑i=1C∑j=1C∥Y¯i-Y¯j∥2Bij∑i=1N∑j=1N∥Yi-Yj∥2Wij,UTU=Il1,VTV=Il2.
Clearly, for OTSA and ODTSA, we have the following theorems.
Theorem 3.
Let U and V be the solution of the maximization problem (21). Then, Consider the following.
For a given U, V is the solution of the trace ratio optimization problem
(23)maxV∈ℝL2×l2,VTV=Il2tr(VTPUT(D⊗Il1)PUV)tr(VTPUT(LW⊗Il1)PUV).
For a given V, U is the solution of the trace ratio optimization problem
(24)maxU∈ℝL1×l1,UTU=Il1tr(UTPVT(D⊗Il2)PVU)tr(UTPVT(LW⊗Il2)PVU).
Theorem 4.
Let U and V be the solution of the maximization problem (22). Then, Consider the following.
For a given U, V is the solution of the trace ratio optimization problem
(25)maxV∈ℝL2×l2,VTV=Il2tr(VTQUT(LB⊗Il1)QUV)tr(VTPUT(LW⊗Il1)PUV).
For a given V, U is the solution of the trace ratio optimization problem
(26)maxU∈ℝL1×l1,UTU=Il1tr(UTQVT(LB⊗Il2)QVU)tr(UTPVT(LW⊗Il2)PVU).
The only difference between OTSA and TSA or between ODTSA and DTSA is that U and V are constrained to orthogonal matrices in OTSA and ODTSA. However, the projection matrices U and V of orthogonal methods are quite different from those of nonorthogonal methods. In nonorthogonal methods, U and V can be formulated by some eigenvectors of the generalized eigenvalue problems, while those of orthogonal methods are the solutions of the trace ratio optimization problems.
3.1. Trace Ratio Optimization
In this subsection, we consider the following trace ratio optimization problem:
(27)maxV∈ℝn×l,VTV=Itr(VTAV)tr(VTBV),
where A,B∈ℝn×n are symmetric matrices.
For the trace ratio optimization problem (27), we have the following result, which is given in [36].
Theorem 5.
Let A, B be two symmetric matrices and assume that B is positive semidefinite with rank greater than n-l. Then the ratio (27) admits a finite maximum value ρ*.
Define the function f(ρ) as follows:
(28)f(ρ)=maxV∈ℝn×l,VTV=Itr(VT(A-ρB)V).
We collect some important properties presented in [36] on the function f(ρ) in the following theorem. Some of them indicate the relation between the trace ratio optimization problem (27) and the function f(ρ).
Theorem 6.
Let A, B be two symmetric matrices and assume that B is positive semidefinite with rank greater than n-l. Then
f(ρ) is a non-increasing function of ρ;
f(ρ)=0 if and only if ρ=ρ*, where ρ* is the finite maximum value of the ratio (27);
the derivative of f(ρ) is given by
(29)df(ρ)dρ=-tr[V(ρ)TBV(ρ)],
where
(30)V(ρ)=argmaxVTV=Itr(VT(A-ρB)V);
the columns of the solution matrix V* of the trace ratio optimization problem (27) consists of the l eigenvectors of the matrix A-ρ*B corresponding to the largest l eigenvalues, that is,
(31)V*≡argmaxV∈ℝn×p,VTV=Itr(VTAV)tr(VTBV)=argmaxVTV=Itr(VT(A-ρ*B)V).
Theorem 6 shows that instead of solving the trace ratio optimization problem (27), V*, the solution of the trace ratio optimization problem (27), can be obtained through two steps:
compute the solution ρ* of the nonlinear equation f(ρ)=0;
compute the l eigenvectors of the matrix A-ρ*B corresponding to the largest l eigenvalues.
Newton's method [37] is the most well-known and widely used method for solving a nonlinear equation. The iterative scheme of Newton's method for solving f(ρ)=0 takes the form
(32)ρk+1=ρk-f(ρk)f′(ρk)=ρk-tr(V(ρk)T(A-ρkB)V(ρk))-tr(V(ρk)TBV(ρk))=tr(V(ρk)TAV(ρk))tr(V(ρk)TBV(ρk)),
where V(ρk)∈ℝn×l consists of the l eigenvectors of the matrix A-ρkB corresponding to the largest l eigenvalues.
We now outline the procedure of Newton's method for solving the trace ratio optimization problem (27) in Algorithm 3.
Algorithm 3: Newton's method for trace ratio optimization.
Input: A,B and a dimension l
Output: V which solves the trace ratio optimization problem (27)
(1) Select an initial n×l unitary matrix V;
(2) Compute
ρ=tr(VTAV)tr[VTBV];
(3) Until convergence Do:
(3.1) Compute the l eigenvectors {vi}i=1l of the matrix A-ρB
corresponding to the largest l eigenvalues.
(3.2) Set V=[v1,v2,…,vl];
(3.3) Compute
ρ=tr(VTAV)tr[VTBV].
End Do
We remark that since Newton's method is commonly of quadratic convergence, only several iterations are required in Algorithm 3 for obtaining a good approximation of V*. The main cost at each iteration in Algorithm 3 is due to the computation of the l eigenvectors of a symmetric matrix corresponding to the largest l eigenvalues.
3.2. Lanczos Vectors
In this subsections we review the Lanczos procedure for generating the Lanczos vectors of a symmetric matrix and the Newton-Lanczos method for solving the trace ratio optimization problem (27).
Given a symmetric matrix A and an initial unit vector v. Let 𝒦l(A,v) denote the Krylov subspace associated with A and v, which is defined as
(33)𝒦l(A,v)=span{v,Av,A2v,…,Al-1v}.
The Lanczos vectors v1,v2,…,vl, which form an orthonormal basis of the Krylov subspace 𝒦l(A,v), can be established by the 3-term recurrence
(34)βk+1vk+1=Avk-αkvk-βkqk-1
with β1q0=0. The coefficients αk and βk+1 are computed so as to ensure that vk+1Tvk and ∥vk+1∥2=1. The pseudocode of the Lanczos procedure for constructing the Lanczos vectors v1,v2,…,vl is outlined in Algorithm 4.
Algorithm 4: Lanczos procedure.
Input: A,v and a dimension l
Output: Lanczos vectors v1,v2,…,vl
(1) Set v1=v, β1=0 and v0=0;
(2) For k=1:l
(2.1) wk=Avk-βkvk-1;
(2.2) αk=wkTvk;
(2.3) wk=wk-αkvk;
(2.4) βk+1=∥wk∥2;
(2.5) vk+1=wk/βk+1;
End For
It is known [38] that Lanczos vectors are commonly good approximation of the eigenvectors of a symmetric matrix corresponding to the largest eigenvalues. So it is reasonable that the l eigenvectors of the matrix A-ρB corresponding to the largest l eigenvalues in Algorithm 3 are substituted by the Lanczos vectors of the matrix A-ρB to save the expensive cost for computing the l eigenvectors. This substitution deduces the Newton-Lanczos method for solving the trace ratio optimization problem (27), which is outlined in Algorithm 5; see, also, [36].
Algorithm 5: Newton-Lanczos method for trace ratio optimization.
Input: A,B and a dimension l
Output: V which solves the trace ratio optimization problem (27)
(1) Select an initial n×l unitary matrix V;
(2) Compute
ρ=tr(VTAV)tr[VTBV];
(3) Until convergence Do:
(3.1) Compute the l Lanczos vectors {vi}i=1l of A-ρB by Algorithm 4.
(3.2) Set V=[v1,v2,…,vl];
(3.3) Compute
ρ=tr(VTAV)tr[VTBV].
End Do
3.3. OTSA and ODTSA
Similarly, from Theorem 3, we can obtain two iterative procedures for computing the left and right transformation matrices U and V of OTSA and ODTSA. Algorithms 6 and 7 summarize the steps to compute U and V for OTSA and ODTSA, respectively.
Algorithm 6: OTSA.
Input: A set of N sample matrices {Xi}i=1N with class label information, l1, l2.
Output: left and right transformation matrices U and V.
(1) Initialize U with an identity matrix;
(2) Until convergence Do:
(2.1) Form the matrix MD(U)=PUT(D⊗Il1)PU;
(2.2) Form the matrix ML(U)=PUT(LW⊗Il1)PU;
(2.3) Compute V by solving the trace ratio optimization problem (27) with
A=MD(U) and B=ML(U);
(2.4) Form the matrix MD(V)=PVT(D⊗Il2)PV;
(2.5) Form the matrix ML(V)=PVT(LW⊗Il2)PV;
(2.6) Compute U by solving the trace ratio optimization problem (27) with
A=MD(V) and B=ML(V).
End Do
Algorithm 7: ODTSA.
Input: A set of N sample matrices {Xi}i=1N with class label information, l1, l2.
Output: left and right transformation matrices U and V.
(1) Initialize U with an identity matrix;
(2) Until convergence Do:
(2.1) Form the matrix MLB(U)=QUT(LB⊗Il1)QU;
(2.2) Form the matrix MLW(U)=PUT(LW⊗Il1)PU;
(2.3) Compute V by solving the trace ratio optimization problem (27) with
A=MLB(U) and B=MLW(U);
(2.4) Form the matrix MLB(V)=QVT(LB⊗Il2)QV;
(2.5) Form the matrix MLW(V)=PVT(LW⊗Il2)PV;
(2.6) Compute U by solving the trace ratio optimization problem (27) with
A=MLB(V) and B=MLW(V).
End Do
The trace ratio optimization problem in Algorithms 6 and 7 can be solved by Newton's method or Newton-Lanczos method. For distinguishing these two cases, we use OTSA-N and ODTSA-N to denote the OTSA and ODTSA algorithms with the trace ratio optimization problem being solved by Newton's method and use OTSA-NL and ODTSA-NL to denote the OTSA and ODTSA algorithms with the trace ratio optimization problem being solved by Newton-Lanczos method.
3.4. Computational Complexity Analysis
We now discuss the computational complexity of TSA, DTSA, OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL.
In each iteration of TSA, it costs about 2l1L1L2N, 2l2L1L2N, 2l1L22N, 2l1L22N, 2l2L12N, and 2l2L12N flops (floating-point operations) for computing PU, PV, MDU, MLU, MD(V), and ML(V), respectively. Moreover, it takes 66L23 flops for computing the eigenvectors of the pencil (MDU,MLU) and 66L13 for (MD(V),ML(V)). So, the total cost for each iteration of TSA is about (2(l1+l2)L1L2+4l1L22+4l2L12)N+66(L13+L23) flops.
The main difference between DTSA and TSA is that the matrix MDU, MD(V) in TSA is replaced by MLB(U), MLB(V) in DTSA, respectively. For computing QU, QV, MLB(U), and MLB(V) in each iteration of DTSA, it will spend about 2l1L1L2C, 2l2L1L2C, 2l1L22C, and 2l2L12C flops. Thus, DTSA costs (2(l1+l2)L1L2+2l1L22+2l2L12)(N+C)+66(L13+L23) flops for each iteration. In case C≪N, The computation amount of DTSA is less than TSA.
It is known that for solving the trace ratio optimization problem (27), it costs 9Tn3 flops in Newton’s method (Algorithm 3) and 2Tln2 in the Newton-Lanczos method (Algorithm 5), where T is the number of the Newton's iteration steps. So, the total cost for each iteration of OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL is about (2(l1+l2)L1L2+4l1L22+4l2L12)N+9T(L13+L23), (2(l1+l2)L1L2+2l1L22+2l2L12)(N+C)+9T(L13+L23), (2(l1+l2)L1L2+4l1L22+4l2L12)N+2T(l1L12+l2L22), and (2(l1+l2)L1L2+2l1L22+2l2L12)(N+C)+2T(l1L12+l2L22) flops, respectively. In general, 2 or 3 iteration steps are enough for the convergence of the Newton's iteration. Therefore, OTSA and ODTSA require less computation amount than TSA and DTSA.
The time complexity for TSA, DTSA, OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL is presented in Table 1. We note that the space complexity of all the methods is L1L2N.
Time complexity of TSA, DTSA, OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL.
Method
Time complexity
TSA
(2(l1+l2)L1L2+4l1L22+4l2L12)N+66(L13+L23)
DTSA
(2(l1+l2)L1L2+2l1L22+2l2L12)(N+C)+66(L13+L23)
OTSA-N
(2(l1+l2)L1L2+4l1L22+4l2L12)N+9T(L13+L23)
ODTSA-N
(2(l1+l2)L1L2+2l1L22+2l2L12)(N+C)+9T(L13+L23)
OTSA-NL
(2(l1+l2)L1L2+4l1L22+4l2L12)N+2T(l1L12+l2L22)
ODTSA-NL
(2(l1+l2)L1L2+2l1L22+2l2L12)(N+C)+2T(l1L12+l2L22)
It is well known that Newton's iterative method for a nonlinear equation is commonly of quadratic convergence. So, it converges very fast for the nonlinear equation f(ρ)=0, where f(ρ) is defined in (28). We have observed in our numerical experiments that 5 Newton's iteration steps are enough for convergence. Therefore, the total computation costs of OTSA and ODTSA are much less than those of TSA and DTSA for obtaining the left and right transformation matrices U and V.
4. Experimental Results
In order to evaluate the performance of the proposed OTSA-N, ODTSA-N, OTSA-NL, ODTSA-NL algorithms, two well-known face image databases, that is, ORL (http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html) and Yale (http://www.cad.zju.edu.cn/home/dengcai/Data/data.html), are used in the experiments. We compare the recognition performance of OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL algorithms with TSA [34] and DTSA [35]. In the experiments, the nearest neighbor classifier is used to classify the transformed results of samples obtained using different methods.
4.1. Experiment on the ORL Database of Face Images
The ORL database contains 400 images of 40 individuals. Each individual has 10 images, which were taken at different time, different lighting conditions, different facial expressions, and different accessories (glasses/no glasses). The sample images of one individual from the ORL database are shown in Figure 1.
Sample images for one individual of the ORL database.
We randomly select i(i=2,3,…,7,8) samples of each individual for training, and the remaining ones are used for testing. Based on the training set, the project matrices are obtained by TSA, DTSA, OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL. Then all the testing samples are projected to generate the low-dimensional samples, which will be recognized by using the nearest neighbor classifier. We repeat the process 10 times and calculate the mean and standard deviation of recognition rates.
In our experiments, the parameters l1 and l2 in all the methods are set to be 10. The parameter t is set to 1. The mean and standard deviation of recognition accuracy of 10 runs of tests of six algorithms are presented in Table 2. The training time for each method is presented in Table 3. It shows that for all methods, the recognition increases with the increase in training sample size. Moreover, the orthogonal methods have higher recognition accuracy than their nonorthogonal versions, and the orthogonal methods based on the Newton-Lanczos approach cost least computational time.
Recognition accuracy (%) on ORL database (mean ± std).
TSA
DTSA
OTSA-N
OTSA-NL
ODTSA-N
ODTSA-NL
2 Train
75.62±3.02
78.03±3.28
77.89±3.34
77.44±3.24
81.25±3.04
80.78±3.16
3 Train
84.56±2.42
88.07±2.26
87.32±2.45
87.54±2.35
89.87±2.44
89.57±2.47
4 Train
90.19±1.55
92.25±1.80
91.67±1.56
91.87±1.42
93.95±1.84
93.64±1.78
5 Train
92.55±1.16
94.35±1.23
93.30±2.08
93.35±1.78
95.39±1.28
95.42±1.38
6 Train
93.77±1.44
95.63±1.79
94.65±1.80
94.46±1.69
96.34±1.76
94.50±1.58
7 Train
94.66±1.36
96.25±1.98
95.67±2.19
95.48±1.98
97.88±2.09
97.60±1.23
8 Train
96.50±1.75
97.25±1.65
97.32±1.44
97.52±1.56
98.76±1.30
98.45±1.57
Training time (second) on ORL database.
TSA
DTSA
OTSA-N
OTSA-NL
ODTSA-N
ODTSA-NL
4 Train
1.5440
1.6573
0.8912
0.2438
0.8766
0.2518
8 Train
2.1647
2.2371
1.0576
0.3429
1.1592
0.3262
4.2. Experiment on the Yale Database
The Yale face database contains 165 gray-scale images for 15 individuals where each individual has 11 images. These facial images have variations in lighting conditions (left-light, center-light, right-light), facial expressions (normal, happy, sad, sleepy, surprised, and wink), and with/without glasses. The 11 sample images of one individual from the Yale database are shown in Figure 2.
Sample images for one individual of the Yale database.
As in the previous experiments, the parameters l1 and l2 are set to be 10, and t is set to 1. The mean and standard deviation of recognition accuracy of 10 runs of tests for the Yale database are presented in Table 4. The training time of each method for the Yale database is presented in Table 5. Clearly, ODTSA-N and ODTSA-NL perform better than TSA, DTSA, OTSA-N, and OTSA-NL for this database, and ODTSA-NL and OTSA-NL outperform TSA, DTSA, OTSA-N, and ODTSA-N according to computational time.
Recognition accuracy (%) on Yale database (mean ± std).
TSA
DTSA
OTSA-N
OTSA-NL
ODTSA-N
ODTSA-NL
2 Train
41.78±5.04
45.41±5.97
43.41±5.97
43.51±6.10
47.59±5.36
47.23±5.87
3 Train
52.25±4.43
56.17±3.40
54.72±3.44
54.61±4.09
58.33±3.15
58.47±3.54
4 Train
59.14±3.90
63.10±4.00
62.10±4.04
62.76±4.20
66.45±4.32
66.33±4.15
5 Train
64.22±3.40
65.78±4.51
65.34±4.41
65.69±4.53
67.98±4.09
67.63±4.60
6 Train
69.00±4.54
70.13±5.59
70.52±5.33
70.46±5.84
72.88±5.67
72.59±5.48
7 Train
72.17±3.43
74.29±4.65
74.47±4.92
74.65±3.70
76.79±4.28
76.72±4.12
8 Train
74.78±4.38
76.89±5.62
76.56±5.13
76.83±6.27
79.43±5.05
79.26±5.42
Training time (second) on Yale database.
TSA
DTSA
OTSA-N
OTSA-NL
ODTSA-N
ODTSA-NL
4 Train
0.5497
0.5744
0.3468
0.0980
0.3362
0.0933
8 Train
0.7808
0.8059
0.4378
0.1350
0.5484
0.1311
5. Conclusion
In this paper, we propose an orthogonal TSA and orthogonal DTSA for face recognition by constraining the left and right projection matrices to orthogonal matrices. Similarly to TSA and DTSA, OTSA and ODTSA also iteratively compute the left and right projection matrices. However, instead of solving two generalized eigenvalue problems as in TSA and DTSA, it requires solving two trace ratio optimization problems at each iteration of OTSA and ODTSA during iteratively computing the left and right projection matrices. Thus, OTSA and ODTSA have much less computational cost than their nonorthogonal counterparts since the trace ratio optimization problem can be solved by the inexpensive Newton-Lanczos method. Experimental results show that these methods proposed in this paper achieve much higher recognition accuracy and have much lower training cost than TSA and DTSA.
Conflict of Interests
The authors declare that there is no conflict of interests.
Acknowledgments
Yiqin Lin is supported by the National Natural Science Foundation of China under Grant 10801048, the Natural Science Foundation of Hunan Province under Grant 11JJ4009, the Scientific Research Foundation of Education Bureau of Hunan Province for Outstanding Young Scholars in University under Grant 10B038, the Science and Technology Planning Project of Hunan Province under Grant 2010JT4042, and the Chinese Postdoctoral Science Foundation under Grant 2012M511386. Liang Bao is supported by the National Natural Science Foundation of China under Grants 10926150 and 11101149 and the Fundamental Research Funds for the Central Universities.
TurkM.PentlandA.Eigenfaces for recognition19913171862-s2.0-0026065565FukunagaK.19902ndBoston, Mass, USAAcademic Pressxiv+591Computer Science and Scientific ComputingMR1075415BishopC. M.2006New York, NY, USASpringerxx+738Information Science and Statistics10.1007/978-0-387-45528-0MR2247587DudaR. Q.HartP. E.StorkD. G.20012ndNew York, NY, USAJohn Wiley & SonsKowalskiG.1997Norwell, Mass, USAKluwer Academic PublishersJinZ.YangJ.-Y.HuZ.-S.LouZ.Face recognition based on the uncorrelated discriminant transformation2001347140514162-s2.0-003540078410.1016/S0031-3203(00)00084-4SwetsD. L.WengJ.Using discriminant eigenfeatures for image retrieval19961888318362-s2.0-003021492310.1109/34.531802BaldiP.HatfieldG. W.2002Cambridge, Mass, USACambridge University PressDudoitS.FridlyandJ.SpeedT. P.Comparison of discrimination methods for the classification of tumors using gene expression data200297457778710.1198/016214502753479248MR1963389ZBL1073.62576JainA. K.DubesR. C.1988Englewood Cliffs, NJ, USAPrentice Hallxiv+320Prentice Hall Advanced Reference SeriesMR999135BelhumeurP. N.HespanhaJ. P.KriegmanD. J.Eigenfaces vs. fisherfaces: recognition using class specific linear projection19971977117202-s2.0-003118584510.1109/34.598228HowlandP.ParkH.Generalizing discriminant analysis using the generalized singular value decomposition200426899510062-s2.0-324270700210.1109/TPAMI.2004.46YeJ.JanardanR.ParkC. H.ParkH.An optimization criterion for generalized discriminant analysis on undersampled problems20042689829942-s2.0-324276768410.1109/TPAMI.2004.37YeJ.LiQ.A two-stage linear discriminant analysis via QR-decomposition20052769299412-s2.0-2124449485310.1109/TPAMI.2005.110DaiD.-Q.YuenP. C.Regularized discriminant analysis and its application to face recognition20033638458472-s2.0-003652282510.1016/S0031-3203(02)00092-4FriedmanJ. H.Regularized discriminant analysis198984405165175MR99967510.1080/01621459.1989.10478752GuoY.HastieT.TibshiraniR.Regularized linear discriminant analysis and its application in microarrays200781861002-s2.0-3384541375510.1093/biostatistics/kxj035ChingW.-K.ChuD.LiaoL.-Z.WangX.Regularized orthogonal linear discriminant analysis2012457271927322-s2.0-8485799541910.1016/j.patcog.2012.01.007ChuD.GohS. T.A new and fast orthogonal linear discriminant analysis on undersampled problems20103242274229710.1137/090766772MR2678101ZBL1215.93024YeJ.Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems20056483502MR2249829ZBL1222.62081ChenL.-F.LiaoH.-Y. M.KoM.-T.LinJ.-C.YuG.-J.New LDA-based face recognition system which can solve the small sample size problem20003310171317262-s2.0-003430087510.1016/S0031-3203(99)00139-9ChuD.ThyeG. S.A new and fast implementation for null space based linear discriminant analysis2010434137313792-s2.0-7444909000210.1016/j.patcog.2009.10.004ChuD.GohS. T.HungY. S.Characterization of all solutions for undersampled uncorrelated linear discriminant analysis problems201132382084410.1137/100792007MR2837581ZBL1229.62084HeX.NiyogiP.Locality preserving projections200416153160ZhengZ.YangF.TanW.JiaJ.YangJ.Gabor feature-based face recognition using supervised locality preserving projection20078710247324832-s2.0-3424990786910.1016/j.sigpro.2007.03.006YuW.TengX.LiuC.Face recognition using discriminant locality preserving projections20062432392482-s2.0-3364490383910.1016/j.imavis.2005.11.006ZhuL.ZhuS.Face recognition based on orthogonal discriminant locality preserving projections2007707–9154315462-s2.0-3384736532810.1016/j.neucom.2006.12.004YangJ.ZhangD.FrangiA. F.YangJ.-Y.Two-dimensional PCA: a new approach to appearance-based face representation and recognition20042611311372-s2.0-074226883310.1109/TPAMI.2004.1261097LiM.YuanB.2D-LDA: a statistical linear discriminant analysis for image matrix20052655275322-s2.0-1464440883010.1016/j.patrec.2004.09.007ChenS.ZhaoH.KongM.LuoB.2D-LPP: a two-dimensional extension of locality preserving projections2007704–69129212-s2.0-3384600909910.1016/j.neucom.2006.10.032YuW.Two-dimensional discriminant locality preserving projections for face recognition20093015137813832-s2.0-7034909760910.1016/j.patrec.2009.07.004YeJ.Generalized low rank approximations of matrices2005611–31671912-s2.0-3004444759910.1007/s10994-005-3561-6RenC.-X.DaiD.-Q.Bilinear Lanczos components for fast dimensionality reduction and feature extraction20104311374237522-s2.0-7804932795710.1016/j.patcog.2010.04.029HeX.NiyogiP.Tensor subspace analysis200518WangS.-J.ZhouC.-G.ZhangN.PengX.-J.ChenY.-H.LiuX.Face recognition using second-order discriminant tensor subspace analysis20117412-13214221562-s2.0-7995612577010.1016/j.neucom.2011.01.024NgoT. T.BellalijM.SaadY.The trace ratio optimization problem for dimensionality reduction20103152950297110.1137/090776603MR2763712ZBL1209.65063KelleyC. T.20031Philadelphia, Pa, USASIAMxiv+104Fundamentals of Algorithms10.1137/1.9780898718898MR1998383ParlettB. N.199820Philadelphia, Pa, USASIAMxxiv+398Classics in Applied Mathematics10.1137/1.9781611971163MR1490034