Local graph based discriminant analysis (DA) algorithms recently have attracted increasing attention to mitigate the limitations of global (graph) DA algorithms. However, there are few particular concerns on the following important issues: whether the local construction is better than the global one for intraclass and interclass graphs, which (intraclass or interclass) graph should locally or globally be constructed? and, further how they should be effectively jointed for good discriminant performances. In this paper, pursuing our previous studies on the graph construction and DA, we firstly address the issues involved above, and then by jointly utilizing both the globality and the locality, we develop, respectively, a Globally marginal and Locally compact Discriminant Analysis (GmLcDA) algorithm based on so-introduced global interclass and local intraclass graphs and a Locally marginal and Globally compact Discriminant Analysis (LmGcDA) based on so-introduced local interclass and global intraclass graphs, the purpose of which is not to show how novel the algorithms are but to illustrate the analyses in theory. Further, by comprehensively comparing the Locally marginal and Locally compact DA (LmLcDA) based on locality alone, the Globally marginal and Globally compact
Discriminant Analysis (GmGcDA) just based on globality alone, GmLcDA, and LmGcDA, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant.
1. Introduction
Discriminant analysis (DA) techniques [1] are indispensable in many fields including machine learning, pattern recognition, data compression, scientific visualization, and neural computation. Multiple discriminant analysis (MDA) [2–4] is one of the most popular global DA methods. However, owing to globally constructing both intraclass and interclass graphs, they generally fail to effectively capture underlying local structures in data, for example, many low-dimensional local manifolds of samples residing on the original input space. To mitigate such limitations, plenty of local graph based DA algorithms have been proposed as powerful tools typically including marginal Fisher analysis (MFA) [5] and its variants [6], locality sensitive discriminant analysis (LSDA) [7], LDE [8], and ANMM [9–15]. These algorithms locally construct both intraclass and interclass graphs. However, is the local construction better than the global one for intraclass and interclass graphs? Subsequently, some globally maximizing and locally minimizing DR algorithms are proposed [16–18]. By contrast, there is no locally marginal and globally compact based DA algorithm to be studied. Several issues need to be addressed, that is, which (intraclass or interclass) graph should locally or globally be constructed? Further, how should they be effectively jointed for good discriminant performances? Up to date, to our knowledge, there are few particular concerns on these issues. So, pursuing our previous studies on the graph construction and DA [19–21], in this paper, we elaborately address the issues involved above. Concretely, firstly, we illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, as shown in Figure 1; secondly, we formulate globally constructed intraclass and interclass graphs; thirdly, resorting to the relation between the scatter and the structure preservation property of DA based on graph construction, by Proposition 1 and Corollary 2 and Proposition 3 and Corollary 4, we demonstrate that the interclass graph should be globally constructed and the intraclass graph should be locally constructed; finally, by jointly utilizing both the globality and the locality, we develop two DA algorithms; that is, one is Globally marginal and Locally compact Discriminant Analysis (GmLcDA) algorithm based on so-introduced global interclass and local intraclass graphs, and the other is Locally marginal and Globally compact Discriminant Analysis (LmGcDA) based on so-introduced local interclass and global intraclass graphs. It is worth pointing out that the purpose of developing both DA algorithms is not to show how novel the algorithms are but to illustrate the analyses in theory. Further, we perform experiments to compare GmLcDA, LmGcDA, LmLcDA, and GmGcDA. Concretely, the comparative experiments among GmLcDA, LmGcDA, LmLcDA (MFA and LSDA), and GmGcDA (MDA) are on the toy and real-world datasets. By the comparisons above, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant. (It is necessary to point out that the two concepts of adjacency matrix and graph are alternatively used in the whole paper since a graph is corresponding to an adjacency matrix.)
Illustration of local and global constructions for intraclass and interclass, where Lc refers to local compactness, Gc to global compactness, Lm to local margin, and Gm to global margin.
The rest of this paper is organized as follows. In Section 2, the graph construction and two typical DA algorithms, MDA and MFA, are briefly reviewed. In Section 3, we firstly indicate the meanings of compactness and margin in DA and introduce the global intraclass graph and interclass graph and then heuristically demonstrate the involved above issues and further develop GmLcDA and LmGcDA and finally compare GmLcDA, LmGcDA, LmLcDA, and GmGcDA. In Section 4, the comparative experiments are performed. Finally, the suggestions and remarks for future work are given in Section 5.
2. Related Works2.1. Graph Construction
Let X={x1,…,xn}, xi∈RD denote a set of n samples; current graph constructions mainly include the two types of k-nearest-neighbor and ε-neighborhood [22]. And the construction of adjacency matrix S is to weight edges of a graph by a similarity function, which mainly refers to the heat kernel and 0-1 two ways [22]. The graph construction of this work focuses on discussing the latter due to its simplicity and generality:
(1)Sij={1,ifxiistheknearestneighborsofxjorxjistheknearestneighborsofxi,0,otherwise.
2.2. Typical DA Algorithms
MDA is deemed an example of GmGcDA here from the viewpoint of graph embedding [5]. Given a dataset of n samples belonging to c classes X with the label l(x1),l(x2),…,l(xn), l(xi)∈{1,2,…,c}. It seeks the projection directions that maximize the interclass margin and simultaneously minimize the intraclass compactness and thus preserve the global structure in data but fail to discover the local geometric structure in manifold data embedded in the ambient space.
In order to mitigate the limitations of global algorithms, there is increasing interest in graph embedding based DA algorithms. MFA is a typical one, induced from the graph embedding framework for dimensionality reduction [5]. According to the graph embedding framework, MFA constructs a local intraclass graph with the adjacency matrix S to characterize the intraclass compactness and a local interclass graph with the adjacency matrix Sp to characterize the interclass separability: (2a)Sij={1,ifi∈Nk1+(j)orj∈Nk1+(i),0,otherwise,(2b)Sijp={1,if(i,j)∈Pk2(ci)or(i,j)∈Pk2(cj),0,otherwise,where Nk1+(i) indicates the index set of the k1 nearest neighbors of the sample xi in the same class and Pk2(c) is a set of k2 nearest sample pairs from different classes.
Here, we call such DA algorithms based on local intraclass and interclass graphs as Locally marginal and Locally compact DA (LmLcDA).
3. Analyzing and Addressing Issues
From the reviews above we have found that motivations of most local graph based DA algorithms are to mitigate the limitations of global algorithms, despite different formulations. However, up to date, there are few particular analyses on whether the local construction is always better than the global one for intraclass and interclass graphs and which (intraclass or interclass) graph should locally or globally be constructed. Further how should they be effectively jointed for good discriminant performances? In this section, in order to further analyze and address these issues, we first illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, then formally introduce the globally constructed intraclass graph Sgc and interclass graph Sgm, elaborately analyze and address the issues involved above, and finally develop Globally marginal and Locally compact Discriminant Analysis (GmLcDA) and Locally marginal and Globally compact Discriminant Analysis (LmGcDA).
3.1. Meanings of Compactness and Margin in DA
Now we firstly illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, respectively, which are all shown in Figure 1. Figure 1(a) shows the structures of local compactness of xi and xj, where xi is locally linked with the five dots within the same class and xj with pentacles, and such structures of local compactness are encompassed by the two pink dash-line ellipses for the sake of clearer display. Meanwhile, Figure 1(b) shows the structures of local margin of xp and xq, where xp is locally linked with the four pentacles from the different classes, xq with dots, and such structures of local margin are encompassed, respectively, by the blue and cyan dot-line ellipses. By contrast, Figure 1(c) shows the structures of global compactness of classes 1 and 2, while Figure 1(d) shows the structures of global margin of both classes. And it should be noted that the gray dash-line and dot-line ellipses do not denote a cluster but all points linked within them.
3.2. Globally Constructed Intraclass and Interclass Graphs
The global intraclass graph Sgc is formulated as follows:
(3)Sijgc={1,ifl(xi)andl(xj)belongtothesameclass,0,otherwise
and the global interclass graph Sgm as follows:
(4)Sijgm={1,ifl(xi)andl(xj)belongtodifferentclasses,0,otherwise.
From the formulation of Sgc in (3) and Sgm in (4), it can be seen that globally constructed intraclass and interclass graphs are parameter-free. In order to compare them with locally constructed graphs, their neighbor parameters may, respectively, be viewed as ∑i=1cni2 and n2-∑i=1cni2 for a dataset with c classes and n samples (ni samples per class), where ∑i=1cni2 denotes the maximum number of neighbor sample pairs intraclass.
3.3. Heuristic Demonstration of Issues
In this subsection, resorting to the relation between the scatter and the structure preservation property of DA based on graph construction, by Proposition 1 and Corollary 2 and Proposition 3 and Corollary 4, we demonstrate that the interclass graph should be globally constructed and the intraclass graph should be locally constructed. Those inequalities in the two propositions and corollaries demonstrate the scatter discrepancies between locally constructed graph and globally constructed graph in the input space. And on the other hand, there is a geometry structure preservation hypothesis; that is, the intraclass graph in DA can preserve the compact structures of the input space into the embedded space, while the interclass graph preserves the margin structures of the input space into the embedded space. Under such hypothesis, those scatter inequalities heuristically demonstrate to some extent that the intraclass graph should be locally constructed and the interclass graph should be globally constructed.
Proposition 1.
For a locally constructed intraclass graph corresponding to the adjacency matrix St with parameter t, then the intraclass scatters corresponding to St1 and St2 are
(5)∑i,j=1n∥xi-xj∥2Sijt1<∑i,j=1n∥xi-xj∥2Sijt2,ift1<t2,
where the parameter t stands for the nearest neighbors of the sample xi or xj in the same class, as k1 defined in (2a), while the parameters t1 and t2 denote that t in St takes t1 and t2, respectively, for St1 and St2.
Proof.
According to the definition of the adjacency matrix St, its element Sijt takes 1 or 0. What take value as 1 in St1 is less than that in St2 if t1<t2; and ∥xi-xj∥2>0 when i≠j. Hence, for t1<t2, the intraclass scatters are ∑i,j=1n∥xi-xj∥2Sijt1<∑i,j=1n∥xi-xj∥2Sijt2. The proof is completed.
Corollary 2.
∑i,j=1n∥xi-xj∥2Sijk1≤∑i,j=1n∥xi-xj∥2Sijgc.
From Proposition 1 it can be seen that Corollary 2 is clear since the parameter k1≤∑i=1cni2.
From Proposition 1 and Corollary 2 it can be shown that the intraclass scatter corresponding to locally constructed graph in the input space is not larger than that corresponding to globally constructed graph. And, it is well known that the intraclass graph for DA algorithms aims to preserve the local compactness structures of intraclass samples in the input space into the embedded space, as shown by Figure 1(a) not Figure 1(c). So, according to the geometry structure preservation property of graph embedding DA algorithms, small intraclass scatter in the input space is often also small in the embedding space; in other words, large intraclass scatter in the input space is often also large in the embedding space; then the compactness of intraclass samples corresponding to locally constructed graph often can be preserved in the low-dimensional space. Thus intraclass graph often should locally but not globally be constructed, which is consistent with the statement in [23] that, empirically, small neighbor parameter tends to perform better.
Proposition 3.
For a locally constructed interclass graph corresponding to the adjacency matrix Sp,g with parameter g, then the interclass scatters corresponding to Sp,g1 and Sp,g2 are
(6)∑i,j=1n∥xi-xj∥2Sijp,g2>∑i,j=1n∥xi-xj∥2Sijp,g1,ifg2>g1,
where the parameter g stands for the nearest sample pairs from different classes, as k2 defined in (2b), while the parameters g1 and g2 denote that g in Sp,g takes g1 and g2, respectively, for Sp,g1 and Sp,g2.
The proof is similar to Proposition 1 and thus omitted.
Corollary 4.
∑i,j=1n∥xi-xj∥2Sijgm≥∑i,j=1n∥xi-xj∥2Sijp,k2.
From Proposition 3 it can be seen that Corollary 4 is clear since (n2-∑i=1cni2)≥k2.
The interclass graph for DA algorithms aims to effectively separate samples from different classes: thus the local construction of interclass graph is not quite reasonable on several facets. (1) It is expected that the samples from different classes are separated as effectively as possible, as shown in Figure 1(d) not Figure 1(b); that is, the interclass scatter is as large as possible, while the interclass scatter corresponding to local graph in the input space is not larger than that corresponding to global graph, as demonstrated in Proposition 3 and Corollary 4. Considering the properties of structure preservation for graph embedding DA algorithms, the local construction for interclass graph seems not desirable. (2) The exhaustive search for the neighbor parameter (such as k2 in MFA) in high-dimensional input space is unavoidable and very expensive for a good performance. Moreover, if the parameter is unsuitably set, then the supervised information cannot be sufficiently effectively utilized, and further it seems not so easy to confirm which parameter value on earth is most suitable to a task. By contrast, the global construction of interclass graph does not need exhaustive search for the neighbor parameter anymore because the neighbor parameter is adaptive to different datasets. Furthermore, all available supervised information as prior knowledge can be sufficiently utilized in maximizing the interclass margin. So, interclass graph often should globally but not locally be constructed, for good discriminant performances.
From the discussions above, it can be expected that the joint of local construction for intraclass and global one for interclass is more discriminant for good performances. Next, in order to illustrate the analyses above, we develop two DA algorithms of GmLcDA and LmGcDA.
3.4. GmLcDA and LmGcDA
In this subsection, two DA algorithms are developed: one is the Globally marginal and Locally compact Discriminant Analysis (GmLcDA) by jointly utilizing the global interclass graph Sgm in (4) and the local intraclass graph Slc in (2a) and the other is Locally marginal and Globally compact Discriminant Analysis (LmGcDA) by jointly utilizing the local interclass graph Sp in (2b) and the global intraclass graph Sgc in (3).
In order to develop GmLcDA, the adjacency matrix S in (2a) is rewritten as Slc corresponding to the local intraclass graph:
(7)Sijlc={1,ifxi∈Nkc(xj)orxj∈Nkc(xi),0,otherwise,
where Nkc(xi) denotes the set of kc nearest neighbors of sample xi within the same class. In fact, Slc in (7) is consistent with S in (2a). However, since GmLcDA only involves one neighbor parameter for the local intraclass graph (the global interclass graph is parameter-free), the formulation with parameter kc in (7) is used rather than the one with parameter k1 in (2a) for a clearer understanding.
Now compute the interclass scatter Bgm and intraclass scatter Alc,
(8)Bgm=∑i,j(yi-yj)2Sijgm=∑i,j∥wTxi-wTxj∥2Sijgm=2wT[∑ixiDiigmxiT-∑i,jxiSijgmxjT]w=2wTX(Dgm-Sgm)XTw=2wTXLgmXTw,
where Dgm is a diagonal matrix and its entries are column (or row, since Sgm is symmetric) sum of Sgm, Diigm=∑jSijgm, and the corresponding Laplacian matrix Lgm=Dgm-Sgm is symmetric and positive semidefinite matrices as well. Likely, Alc is obtained by replacing Sgm in (8) with Slc and correspondingly we obtain Dlc and Llc.
Further, the objective criterion of GmLcDA can be formulated by jointly maximizing Bgm and minimizing Alc to simultaneously preserve the local compactness structures of intraclass data and the margins between different-class data. Consider
(9)wGmLc=argmaxwBgmAlc=argmaxwwTX(Dgm-Sgm)XTwwTX(Dlc-Slc)XTw=argmaxwwTXLgmXTwwTXLlcXTw
which can be solved by the generalized eigen-decomposition XLgmXTw=λXLlcXTw [24].
Likely, for LmGcDA, the local interclass adjacency matrix Sp is rewritten as Slm corresponding to the local interclass graph:
(10)Sijlm={1,if(i,j)∈Pkm(ci)or(i,j)∈Pkm(cj),0,otherwise,
where Pkm(c) is a set of km nearest sample pairs from different classes. Further Slm in (10) and Sgc in (3) are jointed to form the objective criterion of LmGcDA as follows:
(11)wLmGc=argmaxwBlmAgc=argmaxwwTX(Dlm-Slm)XTwwTX(Dgc-Sgc)XTw=argmaxwwTXLlmXTwwTXLgcXTw
which can be solved by the generalized eigen-decomposition XLlmXTw=λXLgcXTw [24].
3.5. Comparison among GmLcDA, LmGcDA, LmLcDA, and GmGcDA
GmLcDA jointly constructs global interclass graph, as shown in Figure 1(d), and local intraclass graph, as shown in Figure 1(a). Such graph construction leads to larger margin between classes and larger compactness within the same class, so it can preserve the geometry structure and is consistent to Proposition 1 and Corollary 2; Proposition 3 and Corollary 4. As a result, it more likely results in good discriminant performances, as shown in the experiments in Section 4.
LmGcDA exactly shows an opposite effect. That is, it is difficult for a locally constructed interclass graph (shown in Figure 1(b)) and a globally constructed intraclass graph (shown in Figure 1(c)) to effectively preserve geometry structure of input samples, in fact, which leads to the worse performances in real-world tasks, as shown by the experiments in Section 4.
By contrast, LmLcDA constructs locally both intra- and interclass graphs, as shown in Figures 1(b) and 1(a), respectively. It is almost well known that the local parameter settings of graph construction are relatively intractable, especially for interclass. Accordingly, for good discriminant performance, such construction of graph needs more domain knowledge and experience.
Different from the three above, the graph construction of GmGcDA only adopts the global way, for both intraclass and interclass. For such construction, its accomplishment is easy and stable for different domains.
4. Experiments
In this section, to further illustrate and support the analyses in the above theory, we compare LmLcDA, GmLcDA, LmGcDA, and GmGcDA by performing the experiments on the toy and the real-world datasets; the latter includes UCI [25], face recognition, and object categorization. The nearest neighbor classifier (1NN) [26] is followed after these DA algorithms to evaluate their classification performances. The ridge regularization [27] is adopted for all compared DA algorithms. That is, all the compared DA algorithms are derived from a regularized objective of the same form as w*=argmax(wTBw)/(wTAw+α*I) with parameter α=0.1, where B and A, respectively, respond to inter- and intraclass scatters.
4.1. A Toy Example
Here, the toy example illustrates that both intraclass and interclass graphs locally constructed show less discriminant projections, implying that the global construction appears more necessary and thus suggesting that a globally constructed interclass graph and a locally constructed intraclass graph can be desirable to obtain more discriminant performance.
For LmLcDA, such as MFA, due to the injection of the locality into both intraclass and interclass graphs, they may fail to obtain discriminant projections for some tasks that the global structure needs to consider. Moreover, there is not any guidance on the selection of the neighbor parameter for the interclass graph. When the neighbor parameters for constructing graph are inappropriately set, their projection results may be more unfavorable. For example, for a linearly separable problem shown in Figure 2(a), for the DA algorithms that adopt at least one global graph, GmGcDA (MDA), GmLcDA with parameter kc=3 and LmGcDA with km=5 can effectively separate the two-class samples, as shown in Figures 2(c)–2(e). On the contrary, from Figure 2(b), we can observe that, in the reduced one-dimensional subspace of MFA (k1 is empirically set to 3 and different k2’s), the two-class samples overlap to different extents. Moreover, no laws can be found for the selection of k2. We only can observe that when k2=15, the separation of two class samples is relatively satisfactory.
A toy example.
Input samples
LmLcDA (MFA)
GmGcDA (MDA)
LmGcDA
GmLcDA
Input samples
LmGcDA
GmLcDA
Further while the input points are added, as shown in Figure 2(f), it is changed that both the margin between the two classes and the compactness within Class 2. And it is more difficult to partition the two-class samples in the one-dimensional projection space of LmGcDA than in the one of GmLcDA, as shown in Figures 2(g) and 2(h).
4.2. Real-World Datasets
On these real-world datasets, we compare GmLcDA, LmGcDA, two LmLcDAs (MFA and LSDA), and GmGcDA (MDA). In order to effectively evaluate the various algorithms, their model parameters are searched from a large candidate range and correspondingly the best results are reported. To address the singularity of MDA, here the inverse of matrix Sw is replaced by the pseudoinverse [24].
4.2.1. UCI Datasets
We select the 5 two-class UCI datasets, whose descriptions are shown in column 1 of Table 1, and the classification results of 1NN on the original data are reported.
Accuracy% ± standard deviation (optimal reduced dimensions), where the reduced highest dimension of MDA is number of class subtract one, here equals one.
Datasets (number of sample, number of attr.)
GmLcDA
LmGcDA
LmLcDA (MFA)
LmLcDA (LSDA)
GmGcDA (MDA)
1NN
Crx
70.63 ± 1.22
66.42 ± 5.67
69.76 ± 0.79
68.71 ± 1.74
59.58 ± 7.47
67.29 ± 1.33
(666, 6)
(2)
(6)
(6)
(6)
Sonar
85.15 ± 2.63
62.68 ± 0.65
83.88 ± 1.66
85.63 ± 3.03
69.61 ± 5.81
83.11 ± 3.08
(208, 60)
(42)
(10)
(24)
(34)
Spectf
77.97 ± 2.88
67.52 ± 3.92
76.47 ± 1.77
75.94 ± 1.88
68.72 ± 5.13
71.13 ± 3.90
(267, 44)
(6)
(10)
(24)
(4)
Water
95.44 ± 2.06
92.54 ± 0.75
94.91 ± 3.83
92.11 ± 4.40
90.88 ± 4.81
87.90 ± 3.55
(116, 38)
(2)
(4)
(8)
(2)
Wdbc
96.23 ± 1.28
80.51 ± 2.58
95.53 ± 0.86
95.78 ± 0.79
94.97 ± 1.38
91.69 ± 1.13
(569, 30)
(6)
(10)
(30)
(10)
For each dataset, samples are randomly divided into training set and testing set, which, respectively, contain half of the samples. The random division is performed 30 times and the average accuracies for each algorithm are tabulated in Table 1. The neighbor parameters kc for GmLcDA, km for LmGcDA, k1 for MFA, and k for LSDA are searched from 2 to half of the minimum of each class sample with the increment of 5. The parameter k2 for MFA is searched from 20 to the sum (Notice that the maximum of k2 for MFA, i.e., the sum of training samples (n), is smaller than the so-called parameter of global interclass graph for GmLcDA (n2-∑i=1cni2).) of training samples with the increment of 20.
From the results shown in Table 1, we can observe the following.
GmLcDA produces the optimal accuracies on the 4 datasets except for Sonar, which clearly outperforms LmGcDA, MDA, MFA, LSDA, and 1NN with the unreduced data. The accuracy of GmLcDA only is 0.0048, lower than LSDA on Sonar. Moreover, the optimal reduced dimensions of GmLcDA (except for Sonar) are lower than those of MFA and LSDA, which clearly is helpful for efficient testing.
LmGcDA is the worst on all datasets except for Crx, on which it is only better than MDA. Such results clearly testify that the locality for interclass graph and globality for intraclass graph are not alternatives.
Both MFA and LSDA achieve almost similar results since they adopt local intraclass and interclass graphs to preserve the local geometry of data. However, the optimal reduced dimensions of MFA are higher than those of LSDA overall.
The performances of MDA are relatively inferior, especially on Crx and Spectf and even worse than 1NN on the unreduced data. Moreover, the standard deviations of MDA on all datasets seem larger than those of the other algorithms, which can be attributed to the factor that its reduced dimension can only be limited to one dimension for two-class datasets.
GmLcDA, LmLcDA, and GmGcDA almost all outperform 1NN on all datasets (except for GmGcDA on Crx, Sonar, and Spectf) with lower reduced-dimensions. By contrast, LmGcDA is worse than 1NN on all datasets.
4.2.2. Face Recognition
It is well known that face recognition is a very important task in pattern recognition and machine learning. In this subsection, two benchmark datasets, Yale and ORL, are used to evaluate the uses of local and global graphs for DA algorithms on the face image recognition.
Data Description. The Yale dataset contains 165 face images of 15 individuals (c=15), 11 images per individual, and these 11 images were gotten at different facial expressions or configuration. Figure 3(a) is 11-sample face images of one person in Yale dataset. The ORL dataset consists of 400 face images for 40 distinct subjects (c=40), 10 images per subject. These images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). Figure 3(b) is 10 sample face images of one person in ORL dataset. The size of each image in Yale and ORL is 112×92 and 100×100 pixels, with 256 grey levels per pixel.
Face sample images of one person. (a) Yale, (b) ORL.
Experimental Settings. Considering the very high dimension of the face images, in our experiment, PCA [28, 29] is firstly adopted to identify a low-dimensional subspace. Here, 99% energy is kept. Then the various DA algorithms are performed in the obtained PCA subspace. In order to show effects of the various algorithms on training sets with different numbers of samples, each face dataset is partitioned into the different gallery and probe sets where Gm/Pn indicates that m images for each person are randomly selected for training and the remaining n images for testing. For each dataset, 30 random splits with Gm/Pn are generated and the average results of the 30 classification accuracies are reported. The neighbor parameters kc for GmLcDA and k1 for MFA are searched from {2,3,…,Gm-1}, km for LmGcDA and k2 for MFA from {5,10,…,c*Gm} on Yale, from {20,40,…,c*Gm} on ORL, and k for LSDA from {5,10,…,c*Gm/2}. The optimal results of the several algorithms on the two datasets are listed in Table 2.
Accuracy% ± standard deviation (optimal reduced dimensions), where the reduced highest dimension of MDA is number of class subtract one, here equals one.
Dataset
Gm/Pn
GmLcDA
LmGcDA
LmLcDA (MFA)
LmLcDA (LSDA)
GmGcDA (MDA)
PCA
Yale
G3/P8
84.58 ± 2.89
69.04 ± 6.18
77.25 ± 3.73
79.42 ± 5.08
72.17 ± 3.89
64.25 ± 3.23
(14)
(10)
(16)
(18)
G4/P7
88.29 ± 4.09
73.95 ± 6.52
82.10 ± 4.18
85.62 ± 4.27
85.52 ± 5.25
67.43 ± 2.83
(14)
(10)
(16)
(18)
G5/P6
89.78 ± 3.13
76.67 ± 6.23
86.33 ± 3.14
88.11 ± 3.52
89.22 ± 2.67
68.22 ± 3.52
(16)
(10)
(24)
(20)
G6/P5
91.33 ± 2.76
82.80 ± 4.70
89.47 ± 2.22
89.73 ± 3.50
90.67 ± 2.81
70.13 ± 4.45
(14)
(10)
(18)
(24)
G7/P4
93.17 ± 3.55
84.67 ± 5.06
93.00 ± 3.41
92.00 ± 3.67
92.33 ± 4.10
73.00 ± 6.56
(28)
(10)
(20)
(20)
ORL
G3/P7
90.46 ± 2.36
86.45 ± 2.60
88.68 ± 1.95
87.43 ± 2.03
84.89 ± 2.47
88.29 ± 2.79
(38)
(24)
(28)
(39)
G4/P6
93.83 ± 1.16
87.00 ± 2.29
90.88 ± 2.49
90.92 ± 1.19
90.67 ± 1.42
91.83 ± 1.33
(43)
(18)
(18)
(40)
G5/P5
95.25 ± 1.30
85.45 ± 3.72
90.10 ± 3.33
93.65 ± 2.01
93.45 ± 1.46
94.65 ± 1.18
(78)
(16)
(23)
(40)
From the results of Table 2, we can observe the following.
With the increase of gallery samples, that is, Gm=3 to 7 (5) on Yale (ORL), the performances of these algorithms increase to different degrees.
For the two datasets with different Gm/Pn divisions, GmLcDA always obtains better accuracies than the other algorithms, which shows effectiveness and feasibility for face recognition task. These relatively excellent results of GmLcDA may be ascribed to the joint injection of the global margin interclass and local compactness intraclass.
MFA and LSDA produce some worse results than MDA, such as on Yale with G4/P7 and G5/P6 for MFA, G6/P5 for LSDA, and on ORL with G5/P5 for MFA. Moreover, MFA is relatively worse. A possible reason is that only the local geometry of data for intraclass and interclass limits their performances, especially MFA, whose k2 marginal points possibly limit its generalization ability to a certain degree.
On Yale, the unsupervised PCA is worse than the other algorithms. However, it is worth noting that PCA produces better accuracies than MDA, LSDA, and MFA on ORL. That may be ascribed to such a factor that, for ORL, the margin of different-class samples is larger in the PCA subspace with 99% energy. Besides, on Yale, the standard deviations of MDA and PCA are larger than those of the other algorithms.
4.2.3. Object Categorization
A classical problem in computer vision and pattern recognition is to classify a set of objects into a group of known categories. Here, we use the popular benchmark dataset Coil20. The dataset consists of gray-scale images of 20 objects and 72 images for each object with pose intervals of 5°, the size of each image being 32×32 pixels, as shown in Figure 4.
Sample images of 20 objects from Coil20.
Experimental Settings. Similar to the face recognition experiment, the dataset is partitioned into the different gallery and probe sets with 4 groups. For each group Gm/Pn, 20 random splits are generated and the average results of the 20 classification accuracies are reported. For the first three groups with relatively small gallery sets (Gm/Pn=G6/P66,G9/P63andG18/P54), the neighbor parameters kc for GmLcDA and k1 for MFA are searched from {2,4,…,Gm-1}, km for LmGcDA and k2 for MFA from {20,40,…,c*Gm}, and k for LSDA from {10,30,…,c*Gm-2}. For relatively large gallery samples with G36/P36, kc and k1 are searched from {2,8,…,Gm-1}, km and k2 from {40,120,…,c*Gm}, and k from {10,90,…,c*Gm-2}. The accuracies with respect to different reduced dimensions are displayed in Figure 5.
Accuracies with respect to the reduced dimensions on Coil20.
G6/P66
G9/P63
G18/P54
G36/P36
From Figure 5 we can clearly see the following.
Figures 5(a)–5(d) show that, with the increase of gallery samples, the performances of these algorithms are improved to different extents. Likewise, with the gradual increase of reduced dimension, their accuracies correspondingly increase. However, with the further increase of reduced dimension, their performances (except for MDA whose reduced dimension is at most c-1) decrease gradually. This result is especially evident for small gallery samples with Gm=6, which undoubtedly is an example of “curse of dimensionality” [2].
For the four groups Gm/Pn from small G6 to relative large G36, GmLcDA outperforms all MFA, LSDA, and MDA with respect to different reduced dimensions. In particular, when the reduced dimension is less than 10, its superiority is more obvious. For example, when the reduced dimension is 2, it is over 30% better than the worst LSDA. Moreover, the reduced dimension of GmLcDA with respect to its best accuracy is lower than that of the other algorithms. We can observe that the accuracies of GmLcDA have achieved the best value when the reduced dimension is less than or equal to 20, and afterward they tend to maintain stable.
The performances of LSDA and MFA do not show very remarkable distinction except for Gm=18. Specifically, for the first two groups, MFA is more predominant than LSDA overall, as shown in Figures 5(a) and 5(b). In contrast, with the increase of the gallery sample number, LSDA exceeds MFA in performance, as shown in Figures 5(c) and 5(d). We basically claim that LSDA is not suitable to small training samples. Moreover, we can very clearly observe that the performances of LSDA are always the worst when the reduced dimension is relatively low.
When the reduced dimensions are less than the reduced range (c-1) of MDA, the accuracies of MDA are almost parallel to those of MFA and LSDA. However, since the reduced dimension is only limited to the maximum of c-1, its performances are inferior to the other three algorithms overall.
5. Conclusion and Future Work
In this paper, we elaborately address some important issues in DA based on graph construction. And in order to illustrate and support the analyses in theory, by jointly utilizing both the globality and the locality, we develop GmLcDA algorithm based on the global interclass and local intraclass graphs and LmGcDA based on the local interclass and global intraclass graphs. Further, by comprehensively comparing LmLcDA (MFA, LSDA), GmLcDA, LmGcDA, and GmGcDA (MDA) on toy and real-world datasets, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is partially supported by NSFC (61170151, 61363051), JS-QingLan Project, Program of Higher-Level Talents of Inner Mongolia University (115118), Postdoctoral Science Foundation funded Project of Inner Mongolia University (30105-135113), General Financial Grant from China Postdoctoral Science Foundation (2013M540217).
BunteK.BiehlM.HammerB.A general framework for dimensionality-reducing data visualization mapping201224377180410.1162/NECO_a_00250ZBL1238.681172-s2.0-84861139710DudaR. O.HartP. E.StorkD. G.20012ndNew York, NY, USAWiley-InterscienceMR1802993OhJ. H.KwakN.Generalization of linear discriminant analysis using Lp-norm201334667968510.1016/j.patrec.2013.01.0162-s2.0-84874703949LiuJ.ZhaoF.LiuY.Learning kernel parameters for kernel Fisher discriminant analysis20133491026103110.1016/j.patrec.2013.03.0052-s2.0-84876157849YanS.XuD.ZhangB.ZhangH.YangQ.LinS.Graph embedding and extensions: a general framework for dimensionality reduction2007291405110.1109/TPAMI.2007.2505982-s2.0-33947194180XuD.YanS.TaoD.LinS.ZhangH.Marginal fisher analysis and its variants for human gait recognition and content-based image retrieval200716112811282110.1109/TIP.2007.906769MR24724222-s2.0-36348982900CaiD.HeX.ZhouK.HanJ.BaoH.Locality sensitive discriminant analysisProceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI '07)January 20077087132-s2.0-84880899766ChenH.ChangH.LiuT.Local discriminant embedding and its variants2Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05)June 200584685310.1109/CVPR.2005.2162-s2.0-24644496298WangF.ZhangC.Feature extraction by maximizing the average neighborhood marginProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07)June 2007Minneapolis, Minn, USA1810.1109/CVPR.2007.3831242-s2.0-35148826924RohbanM. H.RabieeH. R.Supervised neighborhood graph construction for semi-supervised classification20124541363137210.1016/j.patcog.2011.09.001ZBL1231.682182-s2.0-83655191931CaiD.HeX.HanJ.Semi-supervised discriminant analysisProceedings of the 11th IEEE International Conference on Computer Vision (ICCV '07)October 2007Rio de Janeiro, Brazil1710.1109/ICCV.2007.44088562-s2.0-49049111209LuqmanM. M.RamelJ.LladósJ.BrouardT.Fuzzy multilevel graph embedding201346255156510.1016/j.patcog.2012.07.029ZBL1251.681912-s2.0-84867396397SongY.NieF.ZhangC.XiangS.A unified framework for semi-supervised dimensionality reduction20084192789279910.1016/j.patcog.2008.01.001ZBL1154.685012-s2.0-44649132677BunkeH.RiesenK.Recent advances in graph-based pattern recognition with applications in document analysis20114451057106710.1016/j.patcog.2010.11.015ZBL1209.684432-s2.0-78651327151ParkS.ChoiS.Max-margin embedding for multi-label learning201334329229810.1016/j.patrec.2012.10.0162-s2.0-84870201134YangJ.ZhangD.YangJ.-Y.NiuB.Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics200729465066410.1109/TPAMI.2007.10082-s2.0-33947492041ChenJ.MaZ.LiuY.Local coordinates alignment with global preservation for dimensionality reduction201324110611710.1109/TNNLS.2012.2225844LiH.WangX.TangJ.ZhaoC.Combining global and local matching of multiple features for precise item image retrieval2013191374910.1007/s00530-012-0265-12-s2.0-84873408849YangB.ChenS.Sample-dependent graph construction with application to dimensionality reduction2010741–330131410.1016/j.neucom.2010.03.0192-s2.0-78649463393YangB.ChenS.Disguised discrimination of locality-based unsupervised dimensionality reduction20102471011102510.1142/S02180014100082752-s2.0-78649385790YangB.ChenS.WuX.A structurally motivated framework for discriminant analysis201114434936710.1007/s10044-011-0228-8MR28551422-s2.0-80855143614BelkinM.NiyogiP.Laplacian eigenmaps for dimensionality reduction and data representation20031561373139610.1162/089976603321780317ZBL1085.681192-s2.0-0042378381ZhuX.Semi-supervised learning literature survey20081530University of Wisconsin-MadisonGolubG. H.van LoanC. F.19963rdBaltimore, Md, USAThe Johns Hopkins University PressMR1417720BlakeC. L.MerzC. J.1998Department of Information and Computer Sciences, University of California, IrvineMateos-GarcíaD.García-GutiérrezJ.Riquelme-SantosJ. C.On the evolutionary optimization of k-NN by label-dependent feature weighting201231622322238LuJ.PlataniotisK. N.VenetsanopoulosA. N.Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition200526218119110.1016/j.patrec.2004.09.0142-s2.0-10044254164PyatykhS.HesserJ.ZhengL.Image noise level estimation by principal component analysis201322268769910.1109/TIP.2012.2221728MR30174302-s2.0-84872288511LiwickiS.TzimiropoulosG.ZafeiriouS.PanticM.Euler principal component analysis2013101349851810.1007/s11263-012-0558-zMR3021833