Large Margin Graph Embedding-Based Discriminant Dimensionality Reduction

Discriminant graph embedding-based dimensionality reduction methods have attracted more and more attention over the past few decades. /ese methods construct an intrinsic graph and penalty graph to preserve the intrinsic geometry structures of intraclass samples and separate the interclass samples. However, the marginal samples cannot be accurately characterized only by penalty graphs since they treat every sample equally. In practice, these marginal samples often influence the classification performance, which needs to be specially tackled. In this study, the near neighbors’ hypothesis margin of marginal samples has been further maximized to separate the interclass samples and improve the discriminant ability by integrating intrinsic graph and penalty graph. A novel discriminant dimensionality reduction named LMGE-DDR has been proposed. Several experiments on public datasets have been conducted to verify the effectiveness of the proposed LMGE-DDR such as ORL, Yale, UMIST, FERET, CMIU-PIE09, and AR. LMGE-DDR performs better than other compared methods, and the corresponding standard deviation of LMGE-DDR is smaller than others. /is demonstrates that the evaluation method verifies the effectiveness of the introduced method.


Introduction
Dimensionality reduction (DR) is more important in most fields such as machine learning and pattern recognition [1][2][3][4]. It aims to resolving the curse of dimensionality by achieving relevant low-dimensional representations of high-dimensional datasets. Linear discriminant analysis (LDA) and principal component analysis (PCA) are the most representative methods [5,6]. PCA obtains lowdimensional space by maximizing variance. LDA can use label information to project the feature space to distinguish categories by maximizing the interclass distance and minimizing the intraclass distance. However, LDA cannot capture the local structure of data. As is known, the local structures of high-dimensional data are very important for data representation.
K near neighbor graph can better characterize the local structure of data [7]. us, over the past years, graph embedding-based dimensionality reduction methods have sprung up [7,8], such as LLE [9], Isomap [10,11], and Laplacian eigenmap [12]. However, these manifold learning methods do not directly process the new samples because they do not obtain any mapping function, which is known as the 'out-of-sample' problem [13]. erefore, to solve the problem, a more effective method is presented to obtain the explicit projection mapping. Locality preserving projections (LPPs) are to preserve the local structure of data in the lowdimensional space, which is a famous method [2]. For its simplicity and effectiveness, its variants have been proposed [14,15]. However, LPP performs worse in classification since it does not fully use label information, which is an unsupervised method [16]. Neighborhood preserving projection (NPP) preserved the local neighborhood information on the data manifold [17].
To further improve the classification performance, discriminant graph embedding-based methods have gradually become a popular research topic by using label information, which aims to preserve the within-class geometrical structure while, at the same time, maximizing the between-class distances of different manifolds [18]. us, recently, more and more discriminant graph embedding-based methods have been studied. Marginal fisher analysis (MFA) constructs two adjacency graphs to maximize the separability between pairwise marginal data points [19]. Local discriminant embedding (LDE) [20] utilized the label information and proposed the nearest neighbor-based embedding. However, it suffers from the so-called smallsample-size (SSS) problem that it cannot directly be applied to high-dimensional data [20]. Considering the local intraclass attraction or interclass repulsion, discriminant neighborhood embedding (DNE) was proposed to make data points in the same class compacted, whereas the gaps between classes become wider in a low-dimensional subspace [21]. However, DNE does not always set the edges with its neighbors of different classes, which would reduce the interclass distance in the new space and will deteriorate the classification [22]. us, Ding et al. constructed double adjacency graphs to link their homogeneous and heterogeneous neighbors and introduced a more effective version of DNE termed DAG-DNE [22]. Inspired by DAG-DNE, some discriminant analysis-based methods have been proposed over the past few years [23][24][25][26][27][28][29][30][31][32][33].
Most dimensionality reduction methods can be unified in the graph-embedding framework [19]. e ways to construct the similarity graph and the penalty graph among these methods are different [34]. erefore, the graph-embeddingbased methods are sensitive to the weight matrix, whereas they endow the same weight for each sample (including marginal samples) in the same way. However, as stated in [35], these marginal samples located in the class margin in the high-dimensional space have been treated to achieve maximum between-class hypothesis margin and good classification performance, which is more crucial in the classification performance. erefore, large hypothesis margins between near neighbors of these marginal samples can improve the discriminating power of embedding features and should be treated separately. In this study, for the marginal samples, the nearest neighbors' hypothesis margin of the marginal sample has been considered and maximized to improve the discriminant power, in addition to constructing double adjacency graphs. In this study, a novel large margin graph embeddingbased discriminant dimensionality reduction named LMGE-DDR has been introduced. Most experimental results confirm the effectiveness of the proposed LMGE-DDR on several public datasets.

Methods
Firstly, the common notations in this study are presented. e high-dimensional data are denoted as X � [x 1 , x 2 , . . . , x n ] ∈ R d×n with n samples in d-dimensions and include C classes with class c i ∈ 1, 2, . . . , C. y i � P T x i denotes that the sample x i is transformed by the matrix P � [p 1 , p 2 , . . . , p r ] ∈ R d×r , where d ≫ r, p i ∈ R d is any one column vector. N + k (x i ) (N − k (x i )) and N k (x i ), respectively, denote the k neighbors with the same class (different class) and k neighbors of sample x i .

DNE.
Discriminant neighborhood embedding (DNE) considered the local intraclass attraction and interclass repulsion and learned the intrinsic graph F w and penalty graph F b as follows: e objective function can be denoted as follows: Herein, where where D b ii � j F b ij . e constraint P T P � Ι can preserve the local structure and reinforce the discriminant ability [36]. e objective in (2) can be rewritten by the formal of trace as follows: erefore, the objective function (2) can be rewritten as follows: max θ(P) � tr P T XSXP , e projection matrix P can be found by resolving the following eigenvector problem: 2 Scientific Programming where λ i is the eigenvalues, i � 1,...,d, and P i (i � 1,...,d) is the corresponding eigenvector. Assume λ 1 ≥ λ 2 ≥ · · · ≥ λ d and P � [P 1 , P 2 , . . . , P r ]. e details are presented in [21].

DAG-DNE.
Double adjacency graph-based discriminant neighborhood embedding termed DAG-DNE constructed double adjacency graphs to propose a more effective version of DNE. In DAG-DNE, F b and F w can be defined as follows: e projection matrix P can be solved as in DNE as follows:

Proposed Method
It is revealed that the weights in the adjacency matrix have been endowed in the same way for each sample including the marginal sample, which cannot further improve the between-class hypothesis margin and deteriorates the classification performance. In this study, the marginal sample is defined in Definition 1. e hypothesis margin was studied as in [37][38][39].
e marginal samples in this study are the ones located in class margin. Figure 1 is the k 0 near neighbors' graph and shows the marginal samples (i.e., {5, 6, 7, 8}).
Definition 2 (hypothesis margin). As is shown in [37], the hypothesis margin can be defined as follows: where nearhit(x) and nearmiss(x) denote the nearest neighbors of sample x with the same class and different class, respectively. ‖ · ‖ represents the L 2 norm. e sample x can be accurately recognized by 1NN classifier (the nearest neighbor) when Η(x) > 0, as illustrated in Figure 2.
Definition 3 (heterogeneous near neighbors' hypothesis margin). A marginal of sample x 1 is shown in Figure 3 to illustrate the heterogeneous near neighbors' hypothesis margin of x 1 , which is defined as follows: As shown in (11), it can keep the heterogeneous samples separated and achieve a large margin between heterogeneous near neighbors when all the expressions in brackets are larger than zero, which means it can be correctly classified by the 1NN classifier.

LMGE-DDR
On the basis of DAG-DNE, the marginal samples in highdimensional space are additionally treated separately by Nearhit (x) Figure 2: Illustration of the hypothesis margin.
x 3 x 1 x 4 x 2 x 7 x 6 x 5 Here, Ψ(P), Φ(P) are the same as in DAG-DNE. MS denotes the marginal samples set in the high-dimensional space. α is a trade-off parameter and here α ∈ [0, 1].
is objective function is transformed into two parts as follows: Ψ(P) − Φ(P) � 2trP T XSX T P based on (5).

Analysis of LMGE-DDR
In this section, LMGE-DDR will be analyzed to illustrate the effectiveness in preserving the geometrical and discriminant structures. Although LMGE-DDR is similar to DAG-DNE in constructing an adjacency graph, for the marginal samples in high-dimensional space, LMGE-DDR maximizes the heterogeneous near neighbors' hypothesis margin to achieve a large between-class margin in low-dimensional subspace and discriminate the local structure of neighbors, improving the discriminant power compared to DAG-DNE. e performances of LMGE-DDR in a Toy data are illustrated in Figure 4.
As shown in Figure 4(a), for the sample x 1 , nearhit(x 1 ) is x 2 and nearmiss(x 1 ) is x 3 .
us, based on (12), the hypothesis margin of x 1 is denoted as follows: Based on Definition 2, the sample x 1 will be recognized by mistake because its hypothesis margin is less than zero. e embedded results and hypothesis margins in onedimensional space are illustrated in Figures 4(b)-4(e)). It can be seen that the hypothesis margins of sample x 1 in the lowdimensional space are less than zero in MFA and DAG-DNE, which is the opposite situation in MNMDP, DNE, and LMGE-DDR. In LMGE-DDR, the hypothesis margin of sample x 1 is larger (H(x 1 ) � 0.39) than that in DAG-DNE, which is useful for the classification.
Overall, maximizing the heterogeneous near neighbors' hypothesis margin of marginal samples can further improve the discriminant power in low-dimensional space.

Experiments
In this section, compared with several popular methods such as DAG-DNE, DNE, MNMDP, and MFA, LMGE-DDR is conducted on several experiments systematically to verify its effectiveness. Specifically, the performance of LMGE-DDR is illustrated on the experiments of face recognition and 2dimensional visualization. e randomly selected l images from each person constitute the training data, and the remaining are the testing data. e nearest neighbor parameters k, k 1 , and k 2 in constructing adjacency graphs are set as l-1 for all methods as in [40]. PCA is taken to reduce 4 Scientific Programming Input: a training set (x i , c i ) N i�1 , α, k 1 , k 2 , k and the dimensionality of discriminant subspace r. Output: projection matrix P; (1) Construct the intraclass adjacency graph F w by: 0, otherwise and interclass adjacency graph F b by:

2D Visualization.
Wine dataset is taken to perform the 2D visualization as shown in Figure 5 [41]; from Figure 5, it can be seen clearly that the sample points in the low-dimensional space learned by LMGE-DDR are separated compared to DAG-DNE.

Parameter
Analysis. e sensitivity of parameter k 0 , a in LMGE-DDR is analyzed on several face datasets when parameters k, k 1 , and k 2 have been set as l-1. Figure 6 presents the best recognition rates of LMGE-DDR with the different values of k 0 , a. e results in Figure 6 reveal that the recognition accuracy of LMGE-DDR fluctuates up and down. In total, the best recognition accuracy can be achieved when a and k 0 are larger. e reason is that large a can make marginal samples tightly clustered toward the class center. e large k 0 is, the more marginal samples are.
at is to say, heterogeneous near neighbors' margin of more marginal samples can be maximized and achieve large between-class margin, which is favorable for classification. us, the values of k 0 and a in LMGE-DDR on different datasets are adopted by cross-validation in face recognition experiments.

Experiments Results.
In this section, several experiments on public datasets have been conducted to verify the effectiveness of the proposed LMGE-DDR, such as ORL, Yale, UMIST, FERET, CMIU-PIE09, and AR, whose example images are shown in Figure 7. Each image in ORL is first aligned and cropped to 32 × 32. Each image in Yale is first aligned and cropped to 32 × 32. Each image in UMIST is first aligned and cropped to 40 × 50. All the images in FERET are cropped to 80 × 80. All the images in CMIU-PIE09 are cropped to 64 × 64. All the images in AR are cropped to 50 × 40. Tables 1-6 are the best recognition results on different datasets. Figure 8 are the recognition results on different dimensions.
As shown in Figure 8 and Tables 1-6, we can see that in most experiments, LMGE-DDR performs better than other      Method

Conclusions and Future Works
In this study, we propose a novel graph embedding-based dimensionality reduction approach named LMGE-DDR, which is based on heterogeneous near neighbors' hypothesis margin. Different from other discriminant learning methods, for marginal samples in high-dimensional space, we additionally maximize the heterogeneous near neighbors' hypothesis margin to achieve a large between-class margin, excluding learning two kinds of adjacency graphs for each same equally is is very crucial for classification of the experiment results. Experimental results illustrate the effectiveness of LMGE-DDR. In this paper, we also employed several evaluation methods to evaluate the proposed model. e results show that on several public datasets such as ORL, Yale, UMIST, FERET, CMIU-PIE09, and AR, the proposed model outperformed other benchmark models. However, in constructing adjacency graphs and marginal samples, it will be influenced by the noise, which is not completely avoided. In the future works, how to evaluate the reliability of neighborhood will be studied by introducing an adaptive adjacency factor as in [44].

Data Availability
e experimental data used to support the findings of this study are available from the corresponding author upon request.