A Comparative Study : Globality versus Locality for Graph Construction in Discriminant Analysis

Local graph based discriminant analysis (DA) algorithms recently have attracted increasing attention to mitigate the limitations of global (graph) DA algorithms. However, there are few particular concerns on the following important issues: whether the local construction is better than the global one for intraclass and interclass graphs, which (intraclass or interclass) graph should locally or globally be constructed? and, further how they should be effectively jointed for good discriminant performances. In this paper, pursuing our previous studies on the graph construction and DA, we firstly address the issues involved above, and then by jointly utilizing both the globality and the locality, we develop, respectively, aGloballymarginal andLocally compactDiscriminantAnalysis (GmLcDA) algorithm based on so-introduced global interclass and local intraclass graphs and a Locally marginal and Globally compact Discriminant Analysis (LmGcDA) based on so-introduced local interclass and global intraclass graphs, the purpose of which is not to show how novel the algorithms are but to illustrate the analyses in theory. Further, by comprehensively comparing the Locally marginal and Locally compact DA (LmLcDA) based on locality alone, the Globally marginal and Globally compact Discriminant Analysis (GmGcDA) just based on globality alone, GmLcDA, and LmGcDA, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant.


Introduction
Discriminant analysis (DA) techniques [1] are indispensable in many fields including machine learning, pattern recognition, data compression, scientific visualization, and neural computation.Multiple discriminant analysis (MDA) [2][3][4] is one of the most popular global DA methods.However, owing to globally constructing both intraclass and interclass graphs, they generally fail to effectively capture underlying local structures in data, for example, many low-dimensional local manifolds of samples residing on the original input space.To mitigate such limitations, plenty of local graph based DA algorithms have been proposed as powerful tools typically including marginal Fisher analysis (MFA) [5] and its variants [6], locality sensitive discriminant analysis (LSDA) [7], LDE [8], and ANMM [9][10][11][12][13][14][15].These algorithms locally construct both intraclass and interclass graphs.However, is the local construction better than the global one for intraclass and interclass graphs?Subsequently, some globally maximizing and locally minimizing DR algorithms are proposed [16][17][18].By contrast, there is no locally marginal and globally compact based DA algorithm to be studied.Several issues need to be addressed, that is, which (intraclass or interclass) graph should locally or globally be constructed?Further, how should they be effectively jointed for good discriminant performances?Up to date, to our knowledge, there are few particular concerns on these issues.So, pursuing our previous studies on the graph construction and DA [19][20][21], in this paper, we elaborately address the issues involved above.Concretely, firstly, we illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, as shown in Figure 1; secondly, we formulate globally constructed intraclass and interclass graphs; thirdly, resorting to the relation between the scatter and the structure preservation property of DA based on graph construction, by Proposition 1 and Corollary 2 and Proposition 3 and Corollary 4, we demonstrate that the interclass graph should be globally constructed and the intraclass graph should be locally constructed; finally, by jointly utilizing both the globality and the locality, we develop two DA algorithms; that is, one is Globally marginal and Locally compact Discriminant Analysis (GmLcDA) algorithm based on so-introduced global interclass and local intraclass graphs, and the other is Locally marginal and Globally compact Discriminant Analysis (LmGcDA) based on so-introduced local interclass and global intraclass graphs.It is worth pointing out that the purpose of developing both DA algorithms is not to show how novel the algorithms are but to illustrate the analyses in theory.Further, we perform experiments to compare GmLcDA, LmGcDA, LmLcDA, and GmGcDA.Concretely, the comparative experiments among GmLcDA, LmGcDA, LmLcDA (MFA and LSDA), and GmGcDA (MDA) are on the toy and real-world datasets.By the comparisons above, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant.(It is necessary to point out that the two concepts of adjacency matrix and graph are alternatively used in the whole paper since a graph is corresponding to an adjacency matrix.)The rest of this paper is organized as follows.In Section 2, the graph construction and two typical DA algorithms, MDA and MFA, are briefly reviewed.In Section 3, we firstly indicate the meanings of compactness and margin in DA and introduce the global intraclass graph and interclass graph and then heuristically demonstrate the involved above issues and further develop GmLcDA and LmGcDA and finally compare GmLcDA, LmGcDA, LmLcDA, and GmGcDA.In Section 4, the comparative experiments are performed.Finally, the suggestions and remarks for future work are given in Section 5.

Graph Construction.
Let  = { 1 , . . .,   },   ∈   denote a set of  samples; current graph constructions mainly include the two types of -nearest-neighbor and -neighborhood [22].And the construction of adjacency matrix  is to weight edges of a graph by a similarity function, which mainly refers to the heat kernel and 0-1 two ways [22].The graph construction of this work focuses on discussing the latter due to its simplicity and generality: 2.2.Typical DA Algorithms.MDA is deemed an example of GmGcDA here from the viewpoint of graph embedding [5].Given a dataset of  samples belonging to  classes  with the label ( 1 ), ( 2 ), . . ., (  ), (  ) ∈ {1, 2, . . ., }.It seeks the projection directions that maximize the interclass margin and simultaneously minimize the intraclass compactness and thus preserve the global structure in data but fail to discover the local geometric structure in manifold data embedded in the ambient space.
In order to mitigate the limitations of global algorithms, there is increasing interest in graph embedding based DA algorithms.MFA is a typical one, induced from the graph embedding framework for dimensionality reduction [5].According to the graph embedding framework, MFA constructs a local intraclass graph with the adjacency matrix  to characterize the intraclass compactness and a local interclass graph with the adjacency matrix   to characterize the interclass separability: where  +  1 () indicates the index set of the  1 nearest neighbors of the sample   in the same class and   2 () is a set of  2 nearest sample pairs from different classes.
Here, we call such DA algorithms based on local intraclass and interclass graphs as Locally marginal and Locally compact DA (LmLcDA).

Analyzing and Addressing Issues
From the reviews above we have found that motivations of most local graph based DA algorithms are to mitigate the limitations of global algorithms, despite different formulations.However, up to date, there are few particular analyses on whether the local construction is always better than the global one for intraclass and interclass graphs and which (intraclass or interclass) graph should locally or globally be constructed.Further how should they be effectively jointed for good discriminant performances?In this section, in order to further analyze and address these issues, we first illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, then formally introduce the globally constructed intraclass graph  gc and interclass graph  gm , elaborately analyze and address the issues involved above, and finally develop Globally marginal and Locally compact Discriminant Analysis (GmLcDA) and Locally marginal and Globally compact Discriminant Analysis (LmGcDA).

Meanings of Compactness and
Margin in DA.Now we firstly illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, respectively, which are all shown in Figure 1. Figure 1(a) shows the structures of local compactness of   and   , where   is locally linked with the five dots within the same class and   with pentacles, and such structures of local compactness are encompassed by the two pink dash-line ellipses for the sake of clearer display.Meanwhile, Figure 1(b) shows the structures of local margin of   and   , where   is locally linked with the four pentacles from the different classes,   with dots, and such structures of local margin are encompassed, respectively, by the blue and cyan dot-line ellipses.By contrast, Figure 1(c) shows the structures of global compactness of classes 1 and 2, while Figure 1(d) shows the structures of global margin of both classes.And it should be noted that the gray dash-line and dot-line ellipses do not denote a cluster but all points linked within them.

Globally Constructed Intraclass and Interclass Graphs.
The global intraclass graph  gc is formulated as follows: and the global interclass graph  gm as follows: From the formulation of  gc in (3) and  gm in (4), it can be seen that globally constructed intraclass and interclass graphs are parameter-free.In order to compare them with locally constructed graphs, their neighbor parameters may, respectively, be viewed as ∑  =1  2  and  2 −∑  =1  2  for a dataset with  classes and  samples (  samples per class), where ∑  =1  2  denotes the maximum number of neighbor sample pairs intraclass.

Heuristic Demonstration of Issues.
In this subsection, resorting to the relation between the scatter and the structure preservation property of DA based on graph construction, by Proposition 1 and Corollary 2 and Proposition 3 and Corollary 4, we demonstrate that the interclass graph should be globally constructed and the intraclass graph should be locally constructed.Those inequalities in the two propositions and corollaries demonstrate the scatter discrepancies between locally constructed graph and globally constructed graph in the input space.And on the other hand, there is a geometry structure preservation hypothesis; that is, the intraclass graph in DA can preserve the compact structures of the input space into the embedded space, while the interclass graph preserves the margin structures of the input space into the embedded space.Under such hypothesis, those scatter inequalities heuristically demonstrate to some extent that the intraclass graph should be locally constructed and the interclass graph should be globally constructed.

Proposition 1. For a locally constructed intraclass graph corresponding to the adjacency matrix 𝑆 𝑡 with parameter 𝑡, then the intraclass scatters corresponding to
where the parameter  stands for the nearest neighbors of the sample   or   in the same class, as  1 defined in (2a), while the parameters  1 and  2 denote that  in   takes  1 and  2 , respectively, for   1 and   2 .
Proof.According to the definition of the adjacency matrix   , its element    takes 1 or 0. What take value as 1 in   1 is less than that in   2 if  1 <  2 ; and ‖  −   ‖ 2 > 0 when  ̸ = .Hence, for  1 <  2 , the intraclass scatters are From Proposition 1 it can be seen that Corollary 2 is clear since the parameter  1 ≤ ∑  =1  2  .From Proposition 1 and Corollary 2 it can be shown that the intraclass scatter corresponding to locally constructed graph in the input space is not larger than that corresponding to globally constructed graph.And, it is well known that the intraclass graph for DA algorithms aims to preserve the local compactness structures of intraclass samples in the input space into the embedded space, as shown by Figure 1(a) not Figure 1(c).So, according to the geometry structure preservation property of graph embedding DA algorithms, small intraclass scatter in the input space is often also small in the embedding space; in other words, large intraclass scatter in the input space is often also large in the embedding space; then the compactness of intraclass samples corresponding to locally constructed graph often can be preserved in the lowdimensional space.Thus intraclass graph often should locally but not globally be constructed, which is consistent with the statement in [23] that, empirically, small neighbor parameter tends to perform better.Proposition 3.For a locally constructed interclass graph corresponding to the adjacency matrix  , with parameter , then the interclass scatters corresponding to  , 1 and  , 2 are where the parameter  stands for the nearest sample pairs from different classes, as  2 defined in (2b), while the parameters  1 and  2 denote that  in  , takes  1 and  2 , respectively, for  , 1 and  , 2 .
The proof is similar to Proposition 1 and thus omitted.
From Proposition 3 it can be seen that Corollary 4 is clear since The interclass graph for DA algorithms aims to effectively separate samples from different classes: thus the local construction of interclass graph is not quite reasonable on several facets.(1) It is expected that the samples from different classes are separated as effectively as possible, as shown in Figure 1(d) not Figure 1(b); that is, the interclass scatter is as large as possible, while the interclass scatter corresponding to local graph in the input space is not larger than that corresponding to global graph, as demonstrated in Proposition 3 and Corollary 4. Considering the properties of structure preservation for graph embedding DA algorithms, the local construction for interclass graph seems not desirable.(2) The exhaustive search for the neighbor parameter (such as  2 in MFA) in high-dimensional input space is unavoidable and very expensive for a good performance.Moreover, if the parameter is unsuitably set, then the supervised information cannot be sufficiently effectively utilized, and further it seems not so easy to confirm which parameter value on earth is most suitable to a task.By contrast, the global construction of interclass graph does not need exhaustive search for the neighbor parameter anymore because the neighbor parameter is adaptive to different datasets.Furthermore, all available supervised information as prior knowledge can be sufficiently utilized in maximizing the interclass margin.So, interclass graph often should globally but not locally be constructed, for good discriminant performances.
From the discussions above, it can be expected that the joint of local construction for intraclass and global one for interclass is more discriminant for good performances.Next, in order to illustrate the analyses above, we develop two DA algorithms of GmLcDA and LmGcDA.In order to develop GmLcDA, the adjacency matrix  in (2a) is rewritten as  lc corresponding to the local intraclass graph: where   (  ) denotes the set of  nearest neighbors of sample   within the same class.In fact,  lc in ( 7) is consistent with  in (2a).However, since GmLcDA only involves one neighbor parameter for the local intraclass graph (the global interclass graph is parameter-free), the formulation with parameter  in ( 7) is used rather than the one with parameter  1 in (2a) for a clearer understanding.Now compute the interclass scatter  gm and intraclass scatter  lc , where  gm is a diagonal matrix and its entries are column (or row, since  gm is symmetric) sum of  gm ,  gm  = ∑   gm  , and the corresponding Laplacian matrix  gm =  gm −  gm is symmetric and positive semidefinite matrices as well.
Likely,  lc is obtained by replacing  gm in (8) with  lc and correspondingly we obtain  lc and  lc .
Further, the objective criterion of GmLcDA can be formulated by jointly maximizing  gm and minimizing  lc to simultaneously preserve the local compactness structures of intraclass data and the margins between different-class data.Consider which can be solved by the generalized eigen-decomposition  gm    =  lc    [24].Likely, for LmGcDA, the local interclass adjacency matrix   is rewritten as  lm corresponding to the local interclass graph: where   () is a set of  nearest sample pairs from different classes.Further  lm in (10) and  gc in (3) are jointed to form the objective criterion of LmGcDA as follows: LmGcDA exactly shows an opposite effect.That is, it is difficult for a locally constructed interclass graph (shown in Figure 1(b)) and a globally constructed intraclass graph (shown in Figure 1(c)) to effectively preserve geometry structure of input samples, in fact, which leads to the worse performances in real-world tasks, as shown by the experiments in Section 4.
By contrast, LmLcDA constructs locally both intra-and interclass graphs, as shown in Figures 1(b) and 1(a), respectively.It is almost well known that the local parameter settings of graph construction are relatively intractable, especially for interclass.Accordingly, for good discriminant performance, such construction of graph needs more domain knowledge and experience.
Different from the three above, the graph construction of GmGcDA only adopts the global way, for both intraclass and interclass.For such construction, its accomplishment is easy and stable for different domains.

Experiments
In this section, to further illustrate and support the analyses in the above theory, we compare LmLcDA, GmLcDA, LmGcDA, and GmGcDA by performing the experiments on the toy and the real-world datasets; the latter includes UCI [25], face recognition, and object categorization.The nearest neighbor classifier (1NN) [26] is followed after these DA algorithms to evaluate their classification performances.The ridge regularization [27] is adopted for all compared DA algorithms.That is, all the compared DA algorithms are derived from a regularized objective of the same form as  * = arg max(  )/(  + * ) with parameter  = 0.1, where  and , respectively, respond to inter-and intraclass scatters.

A Toy
Example.Here, the toy example illustrates that both intraclass and interclass graphs locally constructed show less discriminant projections, implying that the global construction appears more necessary and thus suggesting that a globally constructed interclass graph and a locally constructed intraclass graph can be desirable to obtain more discriminant performance.
For LmLcDA, such as MFA, due to the injection of the locality into both intraclass and interclass graphs, they may fail to obtain discriminant projections for some tasks that the global structure needs to consider.Moreover, there is not any guidance on the selection of the neighbor parameter for the interclass graph.When the neighbor parameters for constructing graph are inappropriately set, their projection results may be more unfavorable.For example, for a linearly separable problem shown in Figure 2(a), for the DA algorithms that adopt at least one global graph, GmGcDA (MDA), GmLcDA with parameter  = 3 and LmGcDA with  = 5 can effectively separate the two-class samples, as shown in Figures 2(c)-2(e).On the contrary, from Figure 2(b), we can observe that, in the reduced one-dimensional subspace of MFA ( 1 is empirically set to 3 and different  2 's), the twoclass samples overlap to different extents.Moreover, no laws can be found for the selection of  2 .We only can observe that when  2 = 15, the separation of two class samples is relatively satisfactory.
Further while the input points are added, as shown in Figure 2(f), it is changed that both the margin between the two classes and the compactness within Class 2. And it is more difficult to partition the two-class samples in the onedimensional projection space of LmGcDA than in the one of GmLcDA, as shown in Figures 2(g) and 2(h).

4.2.
Real-World Datasets.On these real-world datasets, we compare GmLcDA, LmGcDA, two LmLcDAs (MFA and LSDA), and GmGcDA (MDA).In order to effectively evaluate the various algorithms, their model parameters are searched from a large candidate range and correspondingly the best results are reported.To address the singularity of MDA, here the inverse of matrix   is replaced by the pseudoinverse [24].

UCI Datasets.
We select the 5 two-class UCI datasets, whose descriptions are shown in column 1 of Table 1, and the classification results of 1NN on the original data are reported.
For each dataset, samples are randomly divided into training set and testing set, which, respectively, contain half of the samples.The random division is performed 30 times and the average accuracies for each algorithm are tabulated in Table 1.The neighbor parameters  for GmLcDA,  for LmGcDA,  1 for MFA, and  for LSDA are searched from 2 to half of the minimum of each class sample with the increment of 5.The parameter  2 for MFA is searched from 20 to the sum (Notice that the maximum of  2 for MFA, i.e., the sum of training samples (), is smaller than the so-called parameter of global interclass graph for GmLcDA ( 2 − ∑  =1  2  ).) of training samples with the increment of 20.
From the results shown in Table 1, we can observe the following.
(i) GmLcDA produces the optimal accuracies on the 4 datasets except for Sonar, which clearly outperforms LmGcDA, MDA, MFA, LSDA, and 1NN with the unreduced data.The accuracy of GmLcDA only is 0.0048, lower than LSDA on Sonar.Moreover, the optimal reduced dimensions of GmLcDA (except for Sonar) are lower than those of MFA and LSDA, which clearly is helpful for efficient testing.
(ii) LmGcDA is the worst on all datasets except for Crx, on which it is only better than MDA.Such results clearly testify that the locality for interclass graph and globality for intraclass graph are not alternatives.
(iii) Both MFA and LSDA achieve almost similar results since they adopt local intraclass and interclass graphs to preserve the local geometry of data.However, the optimal reduced dimensions of MFA are higher than those of LSDA overall.
(iv) The performances of MDA are relatively inferior, especially on Crx and Spectf and even worse than 1NN on the unreduced data.Moreover, the standard deviations of MDA on all datasets seem larger than those of the other algorithms, which can be attributed to the factor that its reduced dimension can only be limited to one dimension for two-class datasets.
(v) GmLcDA, LmLcDA, and GmGcDA almost all outperform 1NN on all datasets (except for GmGcDA on Crx, Sonar, and Spectf) with lower reduceddimensions.By contrast, LmGcDA is worse than 1NN on all datasets.Experimental Settings.Considering the very high dimension of the face images, in our experiment, PCA [28,29] is firstly   2.

Face
From the results of Table 2, we can observe the following.
(i) With the increase of gallery samples, that is,   = 3 to 7 (5) on Yale (ORL), the performances of these algorithms increase to different degrees.(ii) For the two datasets with different   /  divisions, GmLcDA always obtains better accuracies than the other algorithms, which shows effectiveness and feasibility for face recognition task.These relatively excellent results of GmLcDA may be ascribed to the joint injection of the global margin interclass and local compactness intraclass.(iii) MFA and LSDA produce some worse results than MDA, such as on Yale with  4 / 7 and  5 / 6 for MFA,  6 / 5 for LSDA, and on ORL with  5 / 5 for MFA.Moreover, MFA is relatively worse.A possible reason is that only the local geometry of data for intraclass and interclass limits their performances, especially MFA, whose  2 marginal points possibly limit its generalization ability to a certain degree.(iv) On Yale, the unsupervised PCA is worse than the other algorithms.However, it is worth noting that PCA produces better accuracies than MDA, LSDA, and MFA on ORL.That may be ascribed to such a factor that, for ORL, the margin of different-class samples is larger in the PCA subspace with 99% energy.Besides, on Yale, the standard deviations of MDA and PCA are larger than those of the other algorithms.

Object Categorization.
A classical problem in computer vision and pattern recognition is to classify a set of objects into a group of known categories.Here, we use the popular benchmark dataset Coil20.The dataset consists of gray-scale images of 20 objects and 72 images for each object with pose  (i) Figures 5(a)-5(d) show that, with the increase of gallery samples, the performances of these algorithms are improved to different extents.Likewise, with the gradual increase of reduced dimension, their accuracies correspondingly increase.However, with the further increase of reduced dimension, their performances (except for MDA whose reduced dimension is at most  − 1) decrease gradually.This result is  especially evident for small gallery samples with   = 6, which undoubtedly is an example of "curse of dimensionality" [2].(ii) For the four groups   /  from small  6 to relative large  36 , GmLcDA outperforms all MFA, LSDA, and MDA with respect to different reduced dimensions.
In particular, when the reduced dimension is less than 10, its superiority is more obvious.For example, when the reduced dimension is 2, it is over 30% better than the worst LSDA.Moreover, the reduced dimension of GmLcDA with respect to its best accuracy is lower than that of the other algorithms.We can observe that the accuracies of GmLcDA have achieved the best value when the reduced dimension is less than or equal to 20, and afterward they tend to maintain stable.(iii) The performances of LSDA and MFA do not show very remarkable distinction except for   = 18.Specifically, for the first two groups, MFA is more predominant than LSDA overall, as shown in We basically claim that LSDA is not suitable to small training samples.Moreover, we can very clearly observe that the performances of LSDA are always the worst when the reduced dimension is relatively low.(iv) When the reduced dimensions are less than the reduced range ( − 1) of MDA, the accuracies of MDA are almost parallel to those of MFA and LSDA.However, since the reduced dimension is only limited to the maximum of −1, its performances are inferior to the other three algorithms overall.

Conclusion and Future Work
In this paper, we elaborately address some important issues in DA based on graph construction.And in order to illustrate and support the analyses in theory, by jointly utilizing both the globality and the locality, we develop GmLcDA algorithm based on the global interclass and local intraclass graphs and LmGcDA based on the local interclass and global intraclass graphs.Further, by comprehensively comparing LmLcDA (MFA, LSDA), GmLcDA, LmGcDA, and GmGcDA (MDA) on toy and real-world datasets, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant.

Figure 1 :
Figure 1: Illustration of local and global constructions for intraclass and interclass, where Lc refers to local compactness, Gc to global compactness, Lm to local margin, and Gm to global margin.

3. 4 .
GmLcDA and LmGcDA.In this subsection, two DA algorithms are developed: one is the Globally marginal and Locally compact Discriminant Analysis (GmLcDA) by jointly utilizing the global interclass graph  gm in (4) and the local intraclass graph  lc in (2a) and the other is Locally marginal and Globally compact Discriminant Analysis (LmGcDA) by jointly utilizing the local interclass graph   in (2b) and the global intraclass graph  gc in (3).
Recognition.It is well known that face recognition is a very important task in pattern recognition and machine learning.In this subsection, two benchmark datasets, Yale and ORL, are used to evaluate the uses of local and global graphs for DA algorithms on the face image recognition.Data Description.The Yale dataset contains 165 face images of 15 individuals ( = 15), 11 images per individual, and these 11 images were gotten at different facial expressions or configuration.Figure 3(a) is 11-sample face images of one person in Yale dataset.The ORL dataset consists of 400 face images for 40 distinct subjects ( = 40), 10 images per subject.These images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).Figure 3(b) is 10 sample face images of one person in ORL dataset.The size of each image in Yale and ORL is 112 × 92 and 100 × 100 pixels, with 256 grey levels per pixel.

Figure 5 :
Figure 5: Accuracies with respect to the reduced dimensions on Coil20.

Figures 5 (
Figures 5(a) and 5(b).In contrast, with the increase of the gallery sample number, LSDA exceeds MFA in performance, as shown in Figures5(c) and 5(d).We basically claim that LSDA is not suitable to small training samples.Moreover, we can very clearly observe that the performances of LSDA are always the worst when the reduced dimension is relatively low.(iv) When the reduced dimensions are less than the reduced range ( − 1) of MDA, the accuracies of MDA are almost parallel to those of MFA and LSDA.However, since the reduced dimension is only limited to the maximum of −1, its performances are inferior to the other three algorithms overall.

Table 1 :
Accuracy% ± standard deviation (optimal reduced dimensions), where the reduced highest dimension of MDA is number of class subtract one, here equals one.

Table 2 :
Accuracy% ± standard deviation (optimal reduced dimensions), where the reduced highest dimension of MDA is number of class subtract one, here equals one.