Dimension Reduction Using Samples ’ Inner Structure Based Graph for Face Recognition

Graph construction plays a vital role in improving the performance of graph-based dimension reduction (DR) algorithms. In this paper, we propose a novel graph construction method, and we name the graph constructed from such method as samples’ inner structure based graph (SISG). Instead of determining the k-nearest neighbors of each sample by calculating the Euclidean distance between vectorized sample pairs, our new method employs the newly defined sample similarities to calculate the neighbors of each sample, and the newly defined sample similarities are based on the samples’ inner structure information. The SISG not only reveals the inner structure information of the original sample matrix, but also avoids predefining the parameter k as used in the k-nearest neighbormethod. In order to demonstrate the effectiveness of SISG, we apply it to an unsupervisedDR algorithm, locality preserving projection (LPP). Experimental results on several benchmark face databases verify the feasibility and effectiveness of SISG.


Introduction
Dimensionality reduction (DR) [1][2][3][4] has been intensively used as an effective approach to analyze high-dimensional data, especially face images.In particular, graph-based DR receives more and more attention recently in the fields of pattern recognition and machine learning.It is stated that most existing DR methods [5][6][7][8][9][10][11] actually fall into the graph embedding framework [12].In graph embedding algorithms graph construction plays a vital role, because graph is an effective tool to reveal the structure information hidden in the original data.So it is worthwhile to study graph construction [13][14][15][16][17] and develop novel construction approaches to construct more reasonable graphs for graph-based DR methods.Jebara et al. [14] presented the so-called -matching graph, which is an alternative approach to the traditional -nearest neighbor graph.The authors in [15,17] focused on developing a way of combining different graphs so that a better graph will be given a heavier weight.However, we point out that the traditional graph construction method suffers from the following two issues.
(1) The -nearest neighbors of each sample are based on the Euclidean distance between every two vectorized samples.However, samples' inner structure information is not taken into consideration by the traditional graph construction method, and such information can be utilized to construct better graphs for dimension reduction algorithms.
(2) The same neighbor parameter  (or ) [18,19] has to be predefined for all samples before constructing graphs.This may cause the difficulty of parameter selection and it is not reasonable to set the same parameter value for all samples.
To mitigate the shortcomings of the traditional graph construction method, in this paper we present a samples' inner structure based graph construction method, and we 2 Mathematical Problems in Engineering name the graph constructed from such method as samples' inner structure based graph (SISG).In this new method we first use the column similarity to determine the nearest neighbors of each column for sample matrices.Then we use the sample similarity measured by the number of nearest neighbor columns between sample pairs to determine the nearest neighbors of each sample.This strategy not only avoids the stiff criterion (predefining the same parameter  for all samples) as used in the traditional graphs but also utilizes every sample's inner structure information.We summarize the favorable and attractive characteristics of SISG as follows.
(1) SISG preserves samples' intrinsic features by using the inner structure information of sample matrices to construct graph.
(2) SISG uses the newly defined sample similarities to calculate the neighbors of each sample.This strategy avoids predefining the neighbor parameter  (or ) in traditional graph construction methods.
(3) The edge weights of SISG are determined by the sample similarities between sample pairs.If the sample similarity between two samples is high, the edge weights between these two samples will be big.This means the greater the sample similarity between two samples is, the more important the corresponding edge is in the graph.
(4) Both the weighted adjacency matrix and the adjacency matrix of SISG are generally asymmetric.This characteristic may be more reasonable for capturing the relationship among samples.
(5) The construction method of SISG is very general.It can be applied to many graph-based DR algorithms.
The rest of this paper is organized as follows.Section 2 briefly reviews traditional graph construction and locality preserving projection (LPP).Section 3 firstly presents SISG's construction method and then applies SISG to LPP.In Section 4 we perform a series of the experiments to evaluate the feasibility and effectiveness of SISG.This is followed by the conclusions made in Section 5.

Traditional Graph Construction.
Let  = { 1 , . . .,   } be a set of  sample matrices, which are taken from an ( × ) dimensional image space.Then, the original samples were transformed into their vectorial forms, and we denote these vectors by  = { 1 , . . .,   },   ∈   .The weighted graph can be denoted by  = {, , }, where  corresponds to the vectors in the set ,  is the set of edges, each of which is between one sample pair, and the matrix  contains weight values of the edges among sample pairs.The construction process of the graph  normally consists of two steps.
The first step is the construction of edges, which includes the two categories of -neighborhood and -nearest neighbor [12,19].The second step is the calculation of the weight value for each edge.The weight value of   can be calculated by the following two ways [12]: (1) heat kernel where  is the width parameter in the heat kernel; (2) simple minded In (3) the weighted adjacency matrix  can be computed by either (1) or (2).The matrix  is a diagonal one   = ∑    , and  =  − .The constraint      = 1 was added so that the arbitrary scaling in the embedding can be removed.The minimization problem in (3) thus becomes LPP can be solved by the generalized eigenvector approach [20]: (5)

Samples' Inner Structure Based Graph (SISG)
As discussed in Section 1, the traditional graph construction method suffers from two main issues: the stiff criterion problem and the loss of inner structure information of samples.To overcome these limitations to some extent, in this section we first present a new approach to graph construction, and graphs constructed by this new approach are called samples' inner structure based graph (SISG); then we incorporate SISG into LPP, which forms a new algorithm called SISG-LPP.

The Construction of SISG.
Given a set of  sample matrices  = { 1 , . . .,   }, which is taken from an ( × ) dimensional image space, we let  = { 1 , . . .,   },   ∈   , denote the vector pattern of image set , and let  ( = 1, 2, . . ., , and  is the maximum column number of the sample matrix) be the column number of each sample (Algorithm 1).   denotes the th column of sample matrix   .SISG can be denoted by SISG = {, , ,  SISG }, where  corresponds to the vectors in the set ,  is the adjacency matrix of SISG which denotes the edge set between the sample pairs,  is the sample similarity matrix of SISG which denotes sample similarities between sample pairs, and  SISG is the weighted adjacency matrix which denotes the weight values of the edges between sample pairs.There are two steps to build a SISG, as detailed below.
Step 1.For the th columns of all the samples, we calculate the nearest neighbors of each column.
The meaning of ( 6) is as follows: for the th column of the sample matrix  (   ), if the column similarity between this column and    is greater than the mean of column similarities between    and all other samples' th column,    will become a neighbor of    , and we place an edge between    and    ; that is,    = 1. Figure 1 shows the 3 nearest column neighbors of    , and the black rectangular boxes in the sample matrices represent the th column of each sample.
Step 2. Determine every sample's neighbors by the sample similarity between sample pairs, and calculate the weight value for each neighbor pair .
In this step, the original samples are transformed into their forms representation, and we denote these vectors by  = { 1 , . . .,   },   ∈   .We let   denote the number of column neighbors of sample   for   .So,   = ∑  =1    ( is the maximum column number of a sample matrix) means the number of column neighbors of sample matrix   for sample matrix   .It is noted that   is normally not equal to   , and this characteristic is simply shown in Section 3.2.Consequently, the sample similarity of sample   for   is described by   .
When deciding which samples can be the nearest neighbors of sample   , we only consider those samples whose sample similarity with sample   is nonzero.This is because if there are no nearest neighbor columns between two sample matrices, these two sample matrices will not be similar at all, and thus they cannot become neighbors.The weighted adjacency matrix  SISG of the SISG is constructed according to the following equation: where   ⋅ is a vector and   ⋅ = [  ],  = 1, . . ., −1.  ⋅ denotes the sample similarity of sample   for all other samples.‖ ⋅ ‖ 0 represents the  0 -norm, which is the number of nonzero entries in a vector.So ‖  ⋅ ‖ 0 denotes the number of nonzero elements in vector   ⋅ .‖ ⋅ ‖ 1 represents the  1 -norm, which is a linear combination of the absolute values of all entries in a vector.So ‖  ⋅ ‖ 1 is the sum of all entries in vector   ⋅ .
The meaning of ( 7) is as follows: if the sample similarity between samples   and   is greater than the mean of sample similarities between   and all samples, then   becomes a neighbor of   , and we put an edge between them; that is,   = 1.
The weight value of the edge between   and   is the heat kernel multiplied by the sample similarity between samples   and   .By doing this, the greater the sample similarity between samples   and   , the more important the corresponding edge in  SISG .The meaning of  SISG  is as follows: the weight value between   and   is proportional to their sample similarity and inversely proportional to their Euclidean distance.

Characteristic of SISG.
Both the weighted adjacency matrix  SISG and adjacency matrix  are asymmetric in most situations, and the symmetric ones are only special cases.This characteristic is more reasonable for effectively capturing and fitting the relationship among samples.
For    , the mean of column similarities between this column and all other samples' th column is calculated as follows: For    , the mean of column similarities between this column and all other samples' th column is calculated as follows: (1) The following situations may arise when calculating the column similarity. ( and    become column neighbors of each other.
is not a column neighbor of    , and vice versa.
is a column neighbor of    while    is not a column neighbor of    . ( The following situations may arise when calculating the column similarity.
( .This is because the edge weights of SISG depend on the sample similarity of each sample.In this case, both the weighted adjacency matrix  SISG and the adjacency matrix  are symmetric. If the condition in (1.3) is met,   will not be equal to   .Furthermore, if (2.1) or (2.2) can always be met for arbitrary   and   (,  = 1, . . ., ), the adjacency matrix  will be symmetric, but  SISG  will not be equal to  SISG  and the weighted adjacency matrix  SISG will be asymmetric.
Apart from the two cases discussed in the above two paragraphs, both the weighted adjacency matrix  SISG and adjacency matrix  are normally asymmetric.
We should notice that for all the columns of   (   ,  = 1, 2 . . ., ) and all the columns of   they must meet (1.1) or (1.2) at the same time, but (1.1) can never be met alone.Because    is the mean of column similarities, it is impossible that all the column similarities between any two columns    and    are both greater than    .For similar reasons, (1.2), (2.1), and (2.2) can never be met alone either.
From what has been discussed above, we can see that both the weighted adjacency matrix  SISG and adjacency matrix  are generally asymmetric, and symmetric ones are just their special cases, which are very rare situations.

SISG-LPP.
The construction method of SISG is very general.So SISG can be used in many graph-based dimensionality reduction algorithms.In this subsection, we use SISG in state-of-the-art unsupervised DR algorithm, locality preserving projection (LPP), to develop a new DR algorithm called SISG-LPP.
Similar to LPP, the goal of SISG-LPP is preserving the local manifold structures in high-dimensional space.Given a set of  samples  = { 1 , . . .,   },   ∈   , in highdimensional space, we try to find a transformation matrix which can map these  points to a set of points  1 , . . .,   in low-dimensional space.Assuming that the projection is  =   , where  is the projection vector, the objective function of SISG-LPP is given in In the above,  SISG is an asymmetric matrix, but  =  SISG

𝑖𝑗
)  is a symmetric matrix.Let   and    denote the diagonal matrices; the entries of   are column sums of  SISG , and the entries of    are column sums of ( SISG  )  . =   +    is the diagonal matrix whose entries are column sums of .So,  =  −  is the Laplacian Mathematical Problems in Engineering matrix.Because  provides a measure on the "importance" of the data points, we impose the following constraint: Thus, the optimization objective of SISG-LPP is The solutions of ( 18) can be obtained by solving the generalized eigenvalue decomposition problem [21].That is to say, the projection vectors of ( 18) are actually the eigenvectors which correspond to the first  smallest eigenvalues of    =    [22].

Experiments
In order to intuitively illustrate the construction process and the properties of SISG, we created an experiment to elucidate structure changes of SISG during the construction process.In addition, the experiment was also designed to show the differences between SISG and the -nearest neighbor graph.
To investigate the influence of parameters on the classification performance of learning algorithms, we designed an experiment to show the sensitivity of LPP to the neighbor parameter .
In order to test and evaluate the effectiveness of SISG and SISG-LPP, we conducted a series of face recognition experiments on three well-known databases.

Experiment for the Structure of SISG.
In this experiment, we hope to demonstrate the structure changes of SISG during the construction process.By comparing the structure of nearest neighbor graph and that of SISG, we will be able to illustrate differences between them.This experiment was conducted on the well-known ORL database [23].The ORL database contains 400 images from 40 different persons (ten for each person).All images are gray scale and the size of each image is 112 × 92 pixels.Figure 6(a) shows 10 sample face images for one person in ORL.
First, we design a dataset containing ten images selected from the ORL database.Among all these images, six of them belong to the same person (in our dataset, images 2, 3, 4, 6, 7, and 9 belong to the same person) and the rest were selected at random.Then, we visualize the sample similarity matrix (  ) of SISG, the adjacency matrix () of SISG, and the traditional -nearest neighbor graph for the dataset, as shown in Figures 2, 3(a), and 3(b).Finally, we compare the adjacency matrix of SISG and the -nearest neighbor graph.
In Figure 2, those numbers without parentheses display the sample similarity between sample pairs.Sample similarity is described by the number of column neighbors between sample pairs.The value of Row 2 and Column 6 of Figure 2 is 16, and the value of Row 6 and Column 2 is 19, which, respectively, means that 16 columns of Image  6 become (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) ( (1) The -nearest neighbor graph is symmetric while the SISG is asymmetric.For example, we can observe from (a) that Image  9 is a neighbor of Image  5 but Image  5 is not a neighbor of Image  9 .
(2) SISG can more accurately reflect the relationship between samples.From (a) we can see that images which became neighbors generally belong to the same person.For example, from the second row of (a) we can see that Images  3 ,  4 ,  6 ,  7 , and  9 are neighbors of Image  2 , and from the third row of (a) we also can see that Images  2 ,  4 , and  7 are neighbors of  3 .We know that, in our dataset, Images  2 ,  3 ,  4 ,  6 ,  7 , and  9 belong to the same person, so this example shows that the SISG makes similar samples become neighbors.
(3) The -nearest neighbor graph does not very successfully reflect the relationship between samples.We also take Image  2 as an example; from the second row of (b) we can observe that Images  3 ,  6 ,

Face Manifold Visualization.
In this experiment, we compare the visualization effect of SISG-LPP, LPP, PCA, and NPE.We randomly selected 4 people from the ORL database and 10 samples from each person.Then we mapped all these samples to the 2-dimensional subspace using these algorithms.From Figure 4  From Figure 5 we can see that LPP is very sensitive to the neighbor parameter .In contrast, SISG-LPP does not have neighbor parameter , so it is much less sensitive to the parameter than LPP.

Face Recognition.
To evaluate the proposed SISG and SISG-LPP algorithm, we compare the performance of SISG-LPP with LPP on three face databases.Here, we adopt the benchmark face databases ORL [23], YALE [24], and the subset of CMU PIE [25]  Firstly, for each person we select  ( = 5, 6) images from the ORL database and YALE database, respectively.Four divisions are considered:  5 / 5 ,  6 / 4 on the ORL database as well as  5 / 6 ,  6 / 5 on the YALE database; 50 random splits are generated and the final results of the four divisions are obtained by taking the mean of the 50 classification accuracy values.The accuracy values versus the numbers of reduced dimensions are shown by Figure 7.
From Figures 7(a) and 7(b) we can see that SISG-LPP outperforms LPP for all divisions, and from Figure 7(b) we can observe that SISG-LPP significantly outperforms LPP when the number of the reduced dimensions is relatively low.From Figures 7(c) and 7(d) we can find that SISG-LPP outperforms LPP for most of the situations.
The mean accuracy values of LPP and SISG-LPP on ORL, YALE, and the Illumination subset of PIE database are listed in Tables 1∼3, respectively.
From the results shown in Tables 1∼3, one can find the following.
(1) The overall accuracy values of all algorithms are improved at various degrees when the number of training samples is increased.
(2) From the results shown in Tables 1 and 2, one can see that the recognition accuracy values of SISG-LPP are much higher than that of LPP when the number of training samples is relatively small.For instance, the accuracy of SISG-LPP on the division  3 / 7 of the ORL database is 15.69% higher than that of LPP, while the accuracy of SISG-LPP on the division  8 / 2 is only 1.33% higher than that of LPP.(3) SISG-LPP outperforms LPP in all divisions with   /  on the ORL and YALE databases.SISG-LPP outperforms LPP for most of the divisions with   /  on the Illumination subset of PIE database.

Conclusions
In this paper, we present a new graph construction method, and we name the graph constructed by this method as samples' inner structure based graph (SISG).Unlike the traditional graph construction method, SISG avoids predefining neighbor parameter  (or ).Moreover, SISG can also well preserve intrinsic features of samples by using samples' inner structure information to construct graph.Both the weighted adjacency matrix and the adjacency matrix of SISG are generally asymmetric, which may be more reasonable for capturing the relationships among samples.For the sake of proving that the construction method of SISG is very general, we incorporated it into a state-of-the-art DR algorithm, locality preserving projection (LPP), and thus developed a novel DR algorithm SISG-LPP.Finally, several experiments are conducted on three well-known face databases.Experimental results verified the effectiveness and feasibility of the SISG and SISG-LPP algorithms.

Figure 1 :
Figure 1: The 3 nearest neighbors of Sample   's th column.

Figure 2 :
Figure 2: The sample similarity matrix of SISG.

Figure 5 :Figure 6 :
Figure 5: Influence of the parameter  on the classification performance of LPP.
(a) we can see that SISG-LPP more effectively separates the 4-class samples in its 2-dimensional reduction subspace.In contrast, in the subspaces of LPP (Figure4(b)) and NPE (Figure4(c)), the samples are not very well separated, and more than half of the samples overlapped.From Figure4(d) we can see that in the subspace of PCA, the samples are basically entangled together.

Figure 7 :
Figure 7: The recognition accuracy values of SISG-LPP and LPP versus the dimensions.(a)  5 / 5 of the ORL database, (b)  6 / 4 of the ORL database, (c)  5 / 6 of the YALE database, and (d)  6 / 5 of the YALE database.
-Neighborhood:   and   are connected if ‖  −   ‖ < , where ‖ ⋅ ‖ is the Euclidean distance in   and  is a local threshold parameter.-Nearest neighbor:   and   are connected if   is one of the -nearest neighbors of   or   is one of the -nearest neighbors of   .
4.3.Parameter Sensitivity of LPP.Since SISG does not have the neighbor parameter , in this experiment, we only investigate the sensitivity of LPP to the neighbor parameter .During this experiment, the set of images selected from face databases were partitioned into different sample collections.We use   /  to indicate that for each person in the face database  images were selected at random for training and the remaining  images were employed for testing.We conducted this experiment on  6 / 4 in the ORL database.
to conduct experiments on face recognition.There are faces of 15 people in the YALE database, and each person has 11 face images with size 100 × 100.The CMU PIE database contains 41,368 images from 68 people, and the word PIE means Pose, Illumination, and Expression.The size of images in the PIE database is 217×178.In this research, we use the Illumination subset of the CMU PIE database to conduct our experiment; we select 16 different images per person from the Illumination subset.Figures 6(a), 6(b), and 6(c) show part of the face images in ORL, YALE, and the Illumination subset of PIE, respectively.As described above, we use   /  to indicate that  images from each person are randomly selected as the training data and the remaining  images are used for testing.For each division with   /  , 50 random splits are generated and the final performance of the algorithm being tested is obtained by averaging the results of 50 classification accuracy values.The neighbor parameter  for LPP is set to  − 1.

Table 1 :
Mean accuracy values of LPP and SISG-LPP on the ORL database.The numbers in parentheses are the corresponding feature dimensions with the best results after dimensionality reduction.