Subspace Learning via Local Probability Distribution for Hyperspectral Image Classification

The computational procedure of hyperspectral image (HSI) is extremely complex, not only due to the high dimensional information, but also due to the highly correlated data structure.The need of effective processing and analyzing of HSI has met many difficulties. It has been evidenced that dimensionality reduction has been found to be a powerful tool for high dimensional data analysis. Local Fisher’s liner discriminant analysis (LFDA) is an effective method to treat HSI processing. In this paper, a novel approach, called PD-LFDA, is proposed to overcome the weakness of LFDA. PD-LFDA emphasizes the probability distribution (PD) in LFDA, where the maximum distance is replaced with local variance for the construction of weight matrix and the class prior probability is applied to compute the affinity matrix.The proposed approach increases the discriminant ability of the transformed features in low dimensional space. Experimental results on Indian Pines 1992 data indicate that the proposed approach significantly outperforms the traditional alternatives.


Introduction
With the rapid technological advancement of remote sensing, the technology of high dimensional data analysis has been gotten forward.With the great demand of need and appetite to automatical process, the remote sensing data in very high dimensional space, a series of analytical methods, and applicable toolkits were engendered one after another.Hyperspectral imaging (HSI) typically have hundreds even thousands of electromagnetic spectral bands for each pixel, and these bands are often highly correlated.To make full use of rich spectrum and to enable effective processing of HSI data, it is often dramatic to extract useful features, preventing negative effect and precaution caused by redundant data.Dimensionality reduction is an efficient technique to eliminate the redundance among data samples.Dimensionality reduction also eliminates the effects brought by the uncorrelated features and simultaneously "selects" or "extracts" the features that are beneficial to precious classification.To be specific, the aim of dimensionality reduction is to decrease computational complexity and ameliorate statistical ill-conditioning by discarding redundant features that potentially deteriorate classification performance [1].
Nevertheless, how to suppress the redundance and preserve the most valuable features still remains an open topic in the community of high dimensional data analysis.
The common drawback of nonlinear embedding methods is that these techniques are too expensive to compute HSI data when the size of samples becomes large.For instance, Isomaps employs geodesic distance to measure the distance of data samples rather than Euclidean distance, that is the classical straight-line distance.However, the theory of Isomaps is established on the basis of training samples, which is excessively reliant on the assumption of manifold-like distribution.Meantime, the mapping found by Isomaps is recessive and implicit.For the new data points, the geodesic distance has to be recomputed on the new training set to obtain the low dimensional embedding.There is no exact 2 Mathematical Problems in Engineering computational expression of new data points in Isomaps.It is clear that such computation is explicitly complex and unapplicable for the large capacity of HSI data.For this reason, Isomaps is impractical for the dimensionality reduction of HSI data.Similar drawback occurs in the construction of LLE [7].Recent interests in discovering the intrinsic manifold of data structure have been a trend and the theory is still in the progress of development [9], yet some achievements have been gained and reported in many research articles [10].
Nevertheless, the linear approaches are efficient to deal with this issue [11,12].PCA, an unsupervised approach, finds the global scatter as the best projected direction with the aim of minimizing the least square error of reconstruction data points [13].Due to its "unsupervised" nature, the learning procedure is often blind and the projected direction found by PCA is usually not the optimal direction [14].LDA is a supervised methodology which absorbs the advantage of purpose of learning [15].Toward that goal, LDA seeks the direction that minimizes the classified error.However, the within-class scatter matrix of LDA is often singular when it is applied to the small size of samples [16].Consequently, the optimal solution of LDA is unable to solve, and the projected direction is failed to achieve.These drawbacks will limit the wide promotion of LDA [4].To cope with this issue, a derived discriminant analysis, which puts additional constraint on the objective function [17], was proposed in some research papers [18][19][20], for example, Join Global and Local Discriminant Analysis (JGLDA) [21].The common scheme of these methods is that they are easy to compute and implement, and the mapping is explicit.Yet they have shown effeciency in most cases despite the simple models.
The linear algorithms would have more advantages in the dimensionality reduction of HSI data in general.As a matter of fact, conventional linear approaches, such as PCA, LDA, and LPP, make the assumption that the distributions of data samples are Gaussian distribution or mixed Gaussian distribution.However, the assumption is often failed [22] since the distribution of real HSI data is preferred to be multimodal instead of a single modal.To be specific, the distribution of HSI data is usually unknown [23], and the single Gaussian model or Gaussian mixture model can not capture the distribution of all landmarks of the HSI data since the landmarks from different classes are multimodal [24].In this case, the conventional methodologies work poorly.In view of this, some methods extend the idea of LDA and formulate extend-LDA algorithms; for example, Sugiyama [25] proposed Local Fisher Linear Discriminant Analysis (LFDA) for multimodal clusters.LFDA incorporates the supervised nature of LDA with local description of LPP and then the optimal projection is obtained under the constraint of multimodal samples.Li et al. [1] apply LFDA with maximum likelihood estimate (MLE), support vector machine (SVM), and Gaussian mixture model (GMM) to HSI data.As reported in his paper, LFDA is superior not only in the computational time, but also in the classified accuracy.In a word, LFDA is especially appropriate for the landmarks classification of HSI data.Nevertheless, the conventional LFDA ignores the distribution of data samples in the construction procedure of affinity matrix.
In LFDA, the computation of affinity matrix is important.Note that there are clearly many different ways to define an affinity matrix, but the heat kernel derived from LPP has been shown to result in very effective locality preserving properties [26].In this way, the local scaling of data samples in the -nearest neighborhood is utilized. is a selftuning predefined parameter.To simplify the calculation procedure of parameters, [1,27,28] employing a fixed value of  = 7 for experiments.Note that such calculation may ignore the distribution of data samples in the construction procedure of affinity matrix.Actually, the simplification of local distribution by the distance between the samples and the th nearest neighbor sample may be unreasonable, and the results by using this simplification may raise some error.
Thus, in this paper, to overcome the weakness of conventional LFDA, a novel approach is proposed, where by adopting the local variance of local patch instead of farthest distance for weight matrix and the class prior probability for affinity matrix, the weight matrix of proposed algorithm takes into account both the distribution of HSI data samples and the objective function of HSI data after dimension reduction.This novel approach is called PD-LFDA because the probability distribution (PD) is used in LFDA algorithm.To be specific, PD-LFDA incorporates two key points, namely.
(1) The class prior probability is applied to compute the affinity matrix.
(2) The distribution of local patch is represented by the "local variance" instead of "farthest distance" to construct the weight matrix.
The proposed approach essentially increases the discriminant ability of transformed features in low dimensional space.
The pattern found by PD-LFDA is expected to be more accurate and is coincide with the character of HSI data and is conducive to classify HSI data.The rest of this paper is organized as follows.In the beginning of this paper, the most basic concepts of conventional linear approaches related to our work will be introduced in Section 2. Precisely, Fisher's linear discriminant analysis (LDA) and locality preserve projection (LPP) as well as local Fisher discriminant analysis (LFDA) will be presented.Proposed algorithm is developed and formalized in Section 3, which is the core of this paper.The experimental results with comparison on real HSI dateset are provided in Section 4. Finally, we conclude our work in Section 5.

Related Work
The purpose of linear approaches is to find an optimal projected direction where the information of embedding features is preserved as much as possible.To formulate our problem, let   be the -dimensional feature in the original space and let { 1 ,  2 , . . .,   } be the  samples.For the case of supervised learning, let   be label of   , and then the label set of all samples can be represented by notation { 1 ,  2 , . . .,   }.Suppose that there are  classes in all, and the sample number of the th is   that fulfils the condition  = ∑  =1   .That is, the number of all samples is the total sum of each class.Let    be the th sample of the th class.Then, the corresponding sample mean becomes   = (1/  ) ∑   =1    , yet the data center of all samples is denoted by  = (1/)∑  =1   .Suppose that the data set  in -dimensional hyperspace is distributed on a low -dimensional subspace.A general problem of linear discriminant is to find a transformation  ∈ R × that maps the -dimensional data into a low -dimensional subspace data by  =    such that each   represents   without losing useful information.The transformed matrix  is pursued by different methods and different objective function, resulting in different algorithm.

Fisher's Linear Discriminant Analysis (LDA).
LDA introduces the within-scatter matrix   and between-scatter matrix   to describe the distribution of data samples: Fisher criterion seeks a transformation  that maximized the between-class scatter while minimized the within-class scatter.This can be achieved by optimizing the following objective function: It is implicitly assumed that      is full rank.Under this assumption, the problem can then be attributed to the generalized eigenvectors { 1 ,  2 , . . .,   } by solving Finally, the solution of  LDA is given by  LDA = { 1 ,  2 , . . .,   } which are associated with the first  largest eigenvalues  1 ≥  2 ≥ ⋅ ⋅ ⋅ ≥   .Since the rank of between-class scatter matrix   is at most  − 1, there are  − 1 meaningful features in conventional LDA.To deal with this issue, a regularization procedure is essential in practice.

Locality Preserve Projection (LPP).
A drawback of LDA is that it does not consider the local structure among data points [29], and the distribution of real HSI data is often multimodal.Locality preserving projection meets this requirement [30].The goal of LPP is to preserve the local structure of neighborhood points.Toward this goal, a graph is modeled explicitly to describe the relationship using -nearest neighborhood.Let  denote the affinity matrix, where (, ) ∈ [0, 1] represents the similarity between points   and   .The larger the value of (, ), the closer the relationship between   and   .A simple and effective way to define affinity matrix  is given by where ‖ * ‖ 2 denotes the square 2-norm Euclidean distance,  is a tuning parameter, and KNN(, ) represents the nearest neighborhoods of  under parameter .
The transformed matrix of LPP is achieved in the following criterion [31]: where  = diag(  ) is a diagonal matrix whose entries are the column sum (also can be a row sum since  is symmetric) of ; that is,  , = ∑    .Arbitrary scaling invariance and degeneracy are guaranteed by the constraint of ( 6).
The solution of LPP problem can be gained by solving the eigenvector problem of where  ≡  −  denotes the graph-Laplacian matrix in the community of spectral analysis and can be viewed as the discrete version of Laplace Beltrami operator on a compact Rimannian manifold [29].And, finally, the transformation matrix  is given by

Local Fisher Discriminant Analysis (LFDA).
Local Fisher discriminant analysis (LFDA) [32] measures the "weights" of two data points by the corresponding distance, and then the affinity matrix is calculated by these weights.Note that the "pairwise" representation of within-scatter matrix and between-scatter matrix is very important for LFDA.Following simple algebra steps, the within-scatter matrix (1) of LDA can be transformed into the following forms: where Let   be the total mixed matrix of LDA, and then we gain where LFDA is achieved by weighting the pairwise data points where P (, ) and P (, ) denote the weight matrix of different pairwise points for the within-class samples and between-class samples, respectively, where  indicates the affinity matrix.The construction of  is critical for the performance of classified accuracy; thereby, the investigation of construction is in great need to be further elaborated in the following section.

Proposed Scheme
The calculation of ( 13) and ( 14) is very important to the performance of LFDA.There are many methods to compute the affinity matrix .The simplest one is that  is equivalent to a constant; that is, where  in the above equation is a real nonnegative number.However, the equations of ( 13) and ( 14) are derived to the state-of-the-art Fisher's linear discriminant analysis under this construction.
Another construction adopts the heat kernel derived from LPP where  is a tuning parameter.Yet, the affinity is valued by the distance of data points, and the computation is too simple to represent the locality of data patches.A more adaptive version [26] of ( 16) is proposed as follows: Compared with the former computation, ( 17) is in conjunction with -nearest data points, which is computationally fast and light.Moreover, the property of local patches can be characterized by (17).However, the affinity defined in ( 16) and ( 17) is globally computed; thus, it may be apt to overfit the training points and be sensitive to noise.Furthermore, the density of HSI data points may vary according to different patches.Hence, a local scaling technique is proposed in LFDA to cope with this issue [29], where the sophisticated computation is given by where   denotes the local scaling around the corresponding sample   with the following definition: where    represents the th nearest neighbor of   , ‖ * ‖ 2 denotes the square Euclidean distance, and  is a self-tuning predefined parameter.
To simplify the calculation, many researches considered a fixed value of , and a recommended value of  = 7 is studied in [1,28].Note that   is used to represent the distribution of local data around sample   .However, the above work ignored the distribution around each individual sample.The diversity of adjacent HSI pixels is approximate; thus, the spectrum of the neighboring landmarks has great similarity.That is, the pixels of HSI data which have resembling spectrums tend to be of the same landmark.This phenomenon indicates that the adjacency of local patches not only lies in the spectrum space but also in the spatial space.For a local point, the calculation of making use of the diversity of its th nearest neighborhoods is not fully correct.
An evident example is illustrated in Figure 1, where, two groups of points have different distributions.In group (a), most neighbor points are closed to point  0 , while, in group (b), most neighbor points are far from point  0 .However, the measurement of two cases are the same according to (19).This can be found in Figure 1, where, the distances between point  0 and its th nearest neighborhoods ( = 7) are same in both distributions, which can be shown in Figures 1(a) and 1(b),  1 =  2 .This example indicates that the simplification of local distribution by the distance between the sample   and the th nearest neighbor sample is unreasonable.Actually, the result by using of this simplification may raise some errors.
Based on the discussion above, a novel approach, which is called PD-LFDA, is proposed to overcome the weakness of LFDA.To be specific, PD-LFDA incorporates two key points, namely.
(1) The class prior probability is applied to compute the affinity matrix.
(2) The distribution of local patch is represented by the "local variance" instead of the "farthest distance" to construct the weight matrix.
The proposed approach essentially increases the discriminant ability of transformed features in low dimensional space.The pattern found by PD-LFDA is expected to be more accurate and coincids with the character of HSI data, and is conducive to classify HSI data.
In this way, a more sophisticated construction of affinity matrix, which is derived from [29], is proposed as follows: where (  ) stands for the class prior probability of class   and ρ indicates the local variance.Note that the denominator item of ( 13) is 1/  , which will cancel out our prior effect if we use (  ) to replace (  ) 2 (the construction of (  ) will be given in ( 21)).Different part of this derivation plays the same role as the original formulation; for example, for the last item, on one hand, it plays the role of intraclass discriminating weight and, on the other hand, the product result of  may reach zero if the Euclidean square distance ‖ ⋅ ‖ is very small for some data points.For this case, an extra item (1 + exp(−‖  −   ‖ 2 / ρ ρ )) is added to the construction of intraclass discriminating weight to prevent accuracy truncation.By doing so, our derivation can be viewed as an integration of class prior probability, the local weight, and the discriminating weight.This construction is expected to preserve both the local neighborhood structure and the class information.Besides, this construction is expected to share the same advantages detailed in the original work.
It is clear that (20) consists of two new factors compared with LFDA method: (1) class prior probability (  ) and (2) local variance ρ .
Suppose class   to be class ; that is   = , so that the probability of class   can be calculated by where   is the number of the samples in class , whole  denotes the total number of samples, and  = ∑  =1   .Please note that the item (1+exp(−‖  −  ‖ 2 / ρ ρ )) in ( 20 The corresponding mean m() can be defined as where ‖ * ‖ 2 represents a square Euclidean distance and  is a predefined parameter whose recommended value is  = 7.
The standard deviation can be calculated as Note that, in the above equation, the item 1/ becomes a constant that can be shifted outside.Thus, an equivalent formula is given by Similar procedure can be deduced to   .Hence, we have Comparing (19) with (27), it is noticeable that (28) holds ρ ≤   . ( Compared with the former definitions, our definition has at least the following advantages. (i) By incorporating the prior probability of each class with local technique (  ), the proposed scheme is expect to be a benefit for the classified accuracy.
(ii) The representation of local patches equation ( 26) is described by local standard deviation ρ rather than absolute diversity in (19), which is more accurate in measuring the local variance of data samples.
(iii) Compared with the global calculation, the proposed calculation is taken on local patches, which is efficient in getting rid of over-fitting.
(iv) The proposed local scaling technique meets the character of HSI data, which is more applicable for the processing of hyperspectral image in real applications.
Based on the above affinity defined, an extended affinity matrix can also be defined in a similar way.Our definition only provides a heuristic exploration for reference.The affinity can be further sparse, for example, by introducing the idea of -nearest neighborhoods [31].
The optimal solution of improved scheme can be achieved by maximize the following criterion: It is evident that (29) has the similar form of (3).This finding enlightens us that the transformation  can be simply achieved by solving the generalized eigenvalue decomposition of Ŝ−1  Ŝ .Moreover, Let  ∈ R × be a -dimensional invertible square matrix.It is clear that  PD-LFDA  is also an optimal solution of ( 29).This property indicates that the optimal solution is not uniquely determined because of arbitrary arithmetic transformation of  PD-LFDA .Let φ be the eigenvector of Ŝ−1  Ŝ corresponding to eigenvalue λ ; that is, Ŝ φ = λ Ŝ φ .To cope with this issue, a rescaling procedure is adopted [25].Each eigenvector { φ }  =1 is rescaled to satisfy the following constraint: Then, each eigenvector is weighted by the square root of its associated eigenvalue.The transformed matrix  PD-LFDA of the proposed scheme is finally given by  PD-LFDA = { √λ 1 φ1 , √λ 2 φ2 , . . ., √ λ 1 φ } ∈ R × (31) with descending order: λ1 ≥ λ2 ≥ ⋅ ⋅ ⋅ ≥ λ .
For a new testing points x, the projected point in the new feature space can be captured by ŷ =   PFDA x; thus, it can be further analyzed in the transformed space.
According to the above analysis, we can design an algorithm, which is called PD-LFDA Algorithm, to perform our proposed method.The detailed description of this algorithm can be found in the appendix (Algorithm 2).A summary of the calculation steps of PD-LFDA Algorithm is presented in Algorithm 1.
The advantage of PD-LFDA is discussed as follows.
Firstly, to investigate the rank of the between-class scatter matrix   of LDA,   can be rewritten as Input: HSI training samples  ∈ R × , dimensionality to be embedded , the parameter  of NN, and a test sample   ∈ R Step 1.For each sample   from the same class, calculate the  , by (20), where the local scaling factor ρ is calculated via ( 26) or ( 27); Step 2. Equations ( 13) and ( 14) can be globally and uniformly transformed into an equivalent formula via: where the operator  ⋅  denotes the dot product between  and , and (iib) By the above formulas the product of elements in different matrices can be achieved via dot product between matrices.The equations (iia), (iib) and (iic) can be gained by integrating the number of each class   , the number of total training samples , and the local scaling ρ , then matrices ,  1 ,  2 can be calculated.
Step 3. Construct within-scatter matrix P and between-scatter matrix P , according to (i).
Step 6.For a testing sample   ∈ R  , the extracted feature is   =     ∈ R  .Output: Transformation matrix , and the extracted feature   .
Algorithm 1: PD-LFDA Algorithm. Thereby, It is easy to infer that the rank of the between-class scatter matrix   is −1 at most; thus, there are up to −1 meaningful subfeatures that can be extracted.Thanks to the help of affinity matrix , when compared with the conventional LDA, the reduced subspace of proposed PD-LFDA can be any subdimensional space.On the other hand, the classical local fisher's linear discriminant only weights the value of sample pairs in the same classes, while our method also takes in account the sample pairs in different classes.Hereafter, the proposed method will be more flexible, and the results will be more adaptive.The objective function of proposed method is quite similar to the conventional LDA; hereby, the optimal solution is almost same as the conventional LDA, which indicates that it is also simple to implement and easy to revise.
To further explore the relationship of LDA and PD-LFDA, we now rewrite the objective function of LDA and PD-LFDA, respectively: This implies that LDA tries to maximize the betweenclass scatter and simultaneously constraint the within-class scatter to a certain level.However, such restriction is hard to constraint and no relaxation is imposed.When the data is not a single modal, that is, multimodal, or unknown modal, LDA often fails.On the other hand, benefiting from the flexible designing of affinity matrix , PD-LFDA gains more freedom in (35).That is, the separability of PD-LFDA will be more distinct, and the degree of freedom remains more than  the conventional LDA; thus, our method is expected to be more robust and significantly preponderant.
For large scale data sets, we discuss a scheme that can accelerate the computation procedure of the within-scatter matrix Ŝ .In our algorithm, owning to the fact that we have put penalty on the affinity matrix for different class samples in constructing the between-scatter matrix, the accelerated procedure will remain for further discussion.
The within-class scatter Ŝ can be reformulated as Here, P can be block diagonal if all samples {  }  =1 are sorted according to their labels.This property implies that D and L can also be block diagonal matrix.Hence, if we compute Ŝ through (36), then the procedure will be much more efficient.Similarly, Ŝ can also be formulated as Nevertheless, P is dense and can not be further simplified.However, the simplified computational procedure of P saves for us part of time in a way.In this paper, we adopt the above procedure to accelerate Ŝ and pursue Ŝ normally.In addition to locality structure, some papers show that another property, for example, marginal information, is also important and should be preserved in the reduced space.The theory of extended LDA and LPP algorithm is developed rapidly recently.Yan et al. [33] summarized these algorithms in a graph embedding framework and also proposed a marginal fisher analysis embedding (MFA) algorithm under this framework.
In MFA, the criterion is characterized by intraclass compactness and interclass marginal superability, which is replaced for the within-class scatter and between-class scatter, severally.The intraclass relationship is reflected by an intrinsic graph which is constructed by -nearest neighborhood sample data points in the same class, while the interclass superability is mirrored by a penalty graph computed for marginal points from different classes.Following this idea, the intraclass compactness is given as follows: where Here,  () () represents the -nearest neighborhood index set of   from the same class and  is the row sum (or column sum) of : (, ) = ∑    .Interclass separability is indicated by a penalty graph whose term is expressed as follows: = ∑ ,: (,)∈ Note that   and   are corresponding to "within-scatter matrix" and "between-scatter matrix" of the traditional LDA, alternatively.The optimal solution of MFA can be achieved by solving the following minimization problem.That is, We know that (43) is also a general eigenvalue decomposition problem.Let  PCA indicate the transformation matrix from the original space to PCA subspace with certain energy remaining, and then the final projection of MFA is output as As can be seen, MFA constructs two weighted matrices  and W according to the intraclass compactness and interclass separability.In LFDA and PD-LFDA, only one affinity is constructed.The difference lies in that the "weight" in LFDA and PD-LFDA is in the range of [0, 1] according to the level of difference.Yet, MFA distributes the same weight to its nearest neighborhoods.The optimal solution of MFA, LFDA, and PD-LFDA can be attributed to a general eigenvalue decomposition problem.Hence, the idea of MFA, LFDA, and PD-LFDA is approximately similar in a certain interpretation.Relationship with other methodologies can be analyzed in an analogous way.

Experimental Results
To illustrate the performance of PD-LFDA, experiments on a real hyperspectral remote sensing image data set-AVIRIS Indian Pine 1992, are conducted in this section.The AVIRIS Indian Pines 1992 data set was gathered by National Aeronautics and Space Administrator (NASA) with Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pine test site in northwest Indians in June 1992.This data set consists of 145 × 145 pixels and 224 spectral reflectance bands ranging from 0.4 m to 2.45 m with a spatial resolution of 20 m.The Indian Pines scene is composed of two-third agriculture and one-third forest or other natural perennial vegetation.Some other landmarks such as dual lane highways, rail line, low density housing, and smaller roads are also in this image.Since the scene was taken in June, some main crops, for example, soybeans and corn, are in their early growth stage with less than 5% coverage, while the no-till, min-till, and clean-till indicate the amount of previous crop residue remaining.The region map can be referred to Figure 5(a).The 20 water absorption bands (i.e., [108-112, 154-167] 224) were discarded.
In this section, performance of different dimension reduction methods, that is, PCA, LPP, LFDA, LDA, JGLDA and RP [34], is compared with the proposed PD-LFDA.Classification accuracy is reported via the concrete classifiers.Generally, many dimension reduction research papers adopt -nearest neighborhood classifier (KNN) and support vector machine (SVM) classifier to measure the performance of the extracting features after the dimension reduction, where the overall accuracy and kappa coefficient are detailed in the reports.Hereby, in this paper, we also adopt KNN classifier and SVM classifier for performance measurement.In KNN classifier, we select the value of  as 1, 5, and 9, so that three classifiers based on nearest neighborhoods are formed, which are called 1NN, 5NN, and 9NN.In SVM classifier, we seek a hyperplane to separate classes in kernel-induced space where the linear nonseparable classes in the original feature space can be separated via kernel trick.SVM, as a robust and successful classifier, has been widely used to evaluate the performance of multifarious methods in many areas.For simplicity and convenience, we use LIBSVM package [35] for experiments.Accuracy of dimension reduced performance will be reported by classified performance from SVM classifier.In the following schedule, the feature subspace will be calculated at the first step from training samples by different dimensional algorithms.Table 1 gives a numerical statistics of training samples corresponding to each class.Then, the new sample will be projected into a low subspace by the transformed matrix.Finally, all the new samples are classified by SVM classifier.
In this experiment, a total of 1,029 samples were selected for training, and the remaining samples are used for testing.Note that all the labeled samples in database are unbalanced, and the available samples of each category differ dramatically.The following strategy is imposed for sample division.A fixed number of 15 samples are randomly selected to form the training sample, yet the absent samples are randomly selected        from the remaining samples.Under this strategy, the training samples and testing samples are listed in Table 1.
Figure 2 shows the overall accuracy of different dimension reduction methods applied to AVIRIS Indian 92AV3C data set.The neighborhood of KNN classifier is selected as 1, 5, and 9, respectively, which produces three classifiers, that is, 1NN, 5NN, and 9NN.Three different kernel functions are adopted for SVM classifier.Then, derived classifiers are also used in this experiment, that is, linear SVM, polynomial SVM, and RBFSVM.It can be deduced from Figures 2(a)∼ 2(c) that when the embedding space is greater than 5, proposed PD-LFDA performs the best, while JGLDA performs the worst.The results produced by RP is slightly better than JGLDA.PCA, LDA, LPP, and RP show the similar classified results under KNN classifier.That is, the proposed PD-LFDA outperforms the others.Meantime, compared with LFDA, the proposed PD-LFDA leads to 2% more improvements on average.Moreover, it can be observed from (d) that the classified accuracy increases steadily as the embedded space increases.However, LDA demonstrates the highest overall accuracy when the reduced features vary to 9, while LFDA has the significant improvements when the number of reduced features is greater than 9.This phenomenon of Figure 2(d) indicates the instability of linear SVM.Nevertheless, the situation reversed for polynomial SVM and RBF SVM in Figures 2(e) and 2(f), wherein the proposed PD-LFDA wins a little improvement against LFDA and has significant improvement compared with the others.Inspired effects of proposed PD-LFDA algorithm were achieved in all cases.Furthermore, Table 2 gives the detailed overall accuracy under different feature dimensions using 3NN, 7NN, and RBFSVM classifiers, which validates the feasibility of the proposed scheme in this paper.
Figure 3 displays the results of kappa coefficient obtained using the different dimension reduction algorithms under KNN classifier and SVM classifiers.The experimental circumstance of Figure 3 is same as that of Figure 2. We can find that, from these results, JGLDA performs the worst in most cases except in Figure 3(e).The proposed PD-LFDA method outperforms the other methods and achieves the highest kappa numerical value in most cases except using the linear SVM as the classifier.In fact, none of them work steady in the case of linear SVM (Figure 3(d)).Note that the situation is improved in polynomial SVM, where the kappa numerical value of the proposed PD-LFDA is significantly better than the others.All the achievements demonstrate the robustness of our contribution in PD-LFDA.Simultaneously, it is noticeable that LPP exhibits an average kappa level.The kappa value gained by LPP is not seriously bad and is not dramatically good.The kappa results produced by RP are approximately the same as LPP.A significant advantage of RP is the simple construction and computation, where the accuracy is closed to LPP.More details are summarized in Table 3.It can be concluded that the kappa coefficient of proposed algorithm is higher than the other approaches, which is more appropriate for the classification of HSI data.
The visual results of all methods are presented in Figures 5∼6, where  1.In this experiment, all the available labeled samples are used for testing, while approximate 10% of samples are used for training.The subspace is fixed to 13 (the number here is only used for reference; it can be changed).For each experiment, the dimension from original feature space is reduced to the objective dimensionality; thereafter, the classified maps are induced by 7NN classifier and RBF-SVM classifier.The overall accuracy, kappa coefficient, and average accuracy are

Figure 1 :
Figure 1: Different distributions of  0 and the corresponding -nearest neighborhoods ( = 7).(a) Most neighbors are closed to point  0 .(b) Most neighbors are far from point  0 .The distances between point  0 and its th nearest neighbors are the same in both distributions,  1 =  2 .

Figure 2 :
Figure 2: Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database.

Figure 3 :
Figure 3: Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database.

Figure 5 :
Figure 5: Classified map generated by different dimension reduction methods, where the overall accuracy, kappa coefficient, and average classification accuracy are listed at the top of each map, respectively.
the class labels are converted to pseudocolor image.The pseudocolor image of the hyperspectral image from Indian 92AV3C database is shown in Figure 4(a).The available labeled image which represents the truth ground is illustrated in Figure 4(b), where the labels are made by human.The training samples are selected from the labeled image, represented as points in the image as shown in Figure 4(c).Each label number (ID) corresponds to each class name, which is indexed in Table ) is used to prevent the extra rounding error produced from the first two items and to keep the total value of  (  )2 exp (−        −         are the -nearest samples of   , and then the square distance between   and () is given by which does not reach the minimum.Here, ρ * denotes the local scaling around  * .In this paper, a local scaling  * is measured by the standard deviation of local square distance.Assume that ()1 ,  () 2 , . . .,  () 2 ,  = 1, 2, . . ., .

Table 1 :
Training set in AVIRIS Indian Pines 1992 database.