Hyperspectral Image Classification Using Kernel Fukunaga-Koontz Transform

This paper presents a novel approach for the hyperspectral imagery (HSI) classification problem, using Kernel Fukunaga-Koontz Transform (K-FKT). The Kernel based Fukunaga-Koontz Transform offers higher performance for classification problems due to its ability to solve nonlinear data distributions. K-FKT is realized in two stages: training and testing. In the training stage, unlike classical FKT, samples are relocated to the higher dimensional kernel space to obtain a transformation from non-linear distributed data to linear form.This provides a more efficient solution to hyperspectral data classification.The second stage, testing, is accomplished by employing the FukunagaKoontz Transformation operator to find out the classes of the real world hyperspectral images. In experiment section, the improved performance of HSI classification technique, K-FKT, has been tested comparing other methods such as the classical FKT and three types of support vector machines (SVMs).


Introduction
In the last decade, hyperspectral remote sensing technology has been included in popular study issues.Many articles have been proposed regarding hyperspectral images and spectral analysis since it offers new insight to various application areas such as agriculture [1], medical diagnose [2], illegal drug field detection [3], face recognition [4], and military target detection [5].
The idea behind the remote sensing technology relies on the relationship between photons and surface materials.Hyperspectral images are captured by the spectral sensors which are sensitive to larger portion of the electromagnetic spectrum than the traditional color cameras.While a digital color camera can capture only 3 bands (Red, Green, and Blue) in the range of 400 nm to 700 nm spectral wavelength, a typical hyperspectral sensor captures more than 200 bands within the range of 400 nm to 2500 nm.This means that HSI offers 200 or more features for an image pixel, instead of 3 values.HSI contains diverse information from a wide range of wavelengths.This characteristic yields more effective classification power for the application areas mentioned above.
Different type of materials can be represented by a set of bands which is called "spectral signature" that simplifies the separation of these materials.
There are also several challenges of the HSI to be solved.For instance, water absorption and some other environmental effects may induce some spectral bands to be noisy.These specific bands are called "noisy bands" and are required to be removed from the dataset.Another problem is the size of the data.Even in small scenes, hyperspectral images may have much larger size than traditional gray scale and color images, which means that the processing time is also longer than usual images.Detection of the redundant bands and removing them is crucial to reduce the number of features and total processing time.For this purpose, we refer to two papers [6,7] to select best informative bands of our dataset.
In the literature, there are various classification techniques proposed for hyperspectral image classification problem, including neural networks, support vector machines, and Bayesian classifiers.In 2005, Benediktsson et al. [8] proposed a solution based on extended morphological models and neural networks.In 2007, Borges et al. [9] published their studies which is based on discriminative class learning using a new Bayesian based HSI segmentation method.In 2008, Alam et al. [10] proposed a Gaussian filter and post processing method for HSI target detection.Samiappan et al. [11] introduced a SVM based HSI classification approach which uses the same dataset used in this paper.
In this study, we present Kernel based Fukunaga-Koontz Transform method which is a novel solution to hyperspectral image classification.Classical FKT is a powerful method to solve two-pattern classification problems.However, when data is more complicated, classical FKT cannot produce satisfactory results.In this case, kernel transformations help FKT to increase separability of the data.In order to evaluate the K-FKT algorithm performance on HSI classifications, we select AVIRIS hyperspectral dataset which is a benchmark problem of this area.We have also used some other HSI dataset in our earlier studies and obtained high accuracy results which are presented in [12].
The remainder of this paper is organized as follows.The following section gives information about the contents of the AVIRIS dataset.A detailed description of the Kernel Fukunaga-Koontz Transform is presented in Section 3 with training and testing stages.Classification results are given in Section 4, including the comparison with other methods.In the last section, we conclude our paper.

AVIRIS Dataset
This section includes detailed information about the AVIRIS Hyperspectral Image dataset which is called "Indian Pines" [13].The dataset contains several different areas.Among these, there are mostly agricultural crop fields.Remaining parts have forests, highways, a rail line, and some low density housing.A convenient RGB colored view of the image can be seen in Figure 1(a).
Indian Pines dataset contains 16 different classes of crop fields.The ground truth data is shown in Figure 1(b).Table 1 shows the names of the classes and total number of samples for each class.
Basically, our dataset is 145 × 145 × 220 matrix that corresponds to 220 different bands of images having size of 145 × 145.In order to have more convenient form, this 3D matrix is transformed into 2D form as 21025 × 220 matrix which indicates 21025 samples, and each sample has 220 numbers of features.
Before the classification processing, we removed regions that do not correspond to any class (dark blue areas in Figure 1(b)) from the dataset.Almost half of the samples do not belong to one of 16 classes.Once we eliminate these redundant bands, only 10336 samples are kept in the dataset.

Kernel Fukunaga-Koontz Transform
Traditional Fukunaga-Koontz Transform is a statistical transformation method which is a well-known approach [14][15][16][17], for two class classification problems.Basically, it operates by transforming data into a new subspace where both classes share the same eigenvalues and eigenvectors.While a subset of these eigenvalues can best represent ROI class, the Traditional FKT has been proposed to solve linear classification problems.When the data is nonlinearly distributed, the classical approach is not the best solution.Like other linear classifiers such as linear discriminant analysis (LDA), independent component analysis (ICA), and support vector machines (SVM), classical FKT suffers from nonlinearly distributed data.Therefore, in this paper, we used an improved version of classical FKT which basically changes the data distribution with a Kernel transformation to classify nonlinearly distributed data in a linear classification fashion.We will call it "K-FKT" in the rest of the paper.K-FKT algorithm consists of two stages: training and testing.
The training sets  and  are first normalized to avoid unexpected transformation results.Then they are mapped into higher dimensional Kernel space via the kernel transform.In simple terms, we assume that there is a mapping function () to map all training samples to the Kernel space.
In this manner, we would obtain new training sets X and Ỹ in which (  ) and (  ) denote the training samples in Kernel space.Equation (2) shows the mapping process.The symbol tilde "∼" indicates that corresponding variable is a kernel variable which has been transformed into the Kernel space as follows: Unfortunately, such a mapping function is not available for many cases.Even if it was available, complexity of this operation would be very high since all training samples must be mapped to higher dimensional space separately.In order to overcome this problem, we may bypass the mapping function () and get the same results in a faster way by using an approach called the "Kernel Trick" [16,18].According to the kernel trick, a kernel function is employed instead of mapping function.The following equation shows a generalized form of the kernel function: where  is the kernel function with the parameters   and   which represent th and th training samples, respectively.In this paper we have examined two well-known kernel functions, Gaussian and Polynomial kernel.Gaussian kernel (4) relocates the samples in accordance to Gaussian distribution and employs "" parameter to calibrate sensitivity as follows: Polynomial Kernel is shown in (5).This function requires "" parameter to change the degree of the polynomial function and calibrate the sensitivity as follows: In traditional FKT, computation of the covariance matrices of  and  would be the next step.If we would apply the same operation to matrices X and Ỹ, the results would be as follows: where  0 and  0 are the kernel matrices of the ROI and clutter classes, respectively.At this step, we are able to exploit the covariance properties to realize the Kernel Trick.As shown in (7), one of the kernel functions may be employed to complete the kernel transformation without requiring the mapping operation [19] as follows: After the kernel operations, the summation matrix of  0 and  0 is computed.Then it is decomposed into eigenvalues and eigenvectors as follows: where the symbols  and  represent eigenvector matrix and eigenvalue matrix, respectively.The diagonal elements of  are eigenvalues of the summation matrix.By using  and , transformation operator  can be constructed as follows: After the multiplication by , matrices  0 and  0 are transformed into eigenspace where both ROI and clutter classes share the same eigenvalues and eigenvectors as follows: where  and  are the transformed matrices, respectively.Since they are transformed into the same eigenspace, the sum of matrices is equal to identity matrix as follows: Equation (11) implies that if   is an eigenvector of  and its corresponding eigenvalue is   , then 1−  is the eigenvalue of  with the same eigenvector   .This relation can be represented as follows: The above equations state that the more information an eigenvector contains about ROI class, the less information it has about clutter class.This characteristic evolves from the classical FKT algorithm.

Testing Stage.
Testing stage starts with normalization of the test sample as it is done in the training stage.Similarly, the test sample must be mapped into Kernel space, but it is not applicable due to the reasons explained in training stage.So we shall use kernel trick operation once more as follows: Equation (13) shows the kernel transformation of the test sample .In the equation,   represents the ROI training samples and  represents the kernel matrix of the corresponding test sample.Matrix  is employed to calculate feature vector  in (16).Other factor required to calculate  is the normalized  0 matrix which is obtained by (14) as follows: where T represents the normalized form.Once we have the matrices  and T, we are able to calculate the feature vector  as follows: where   and   denote the eigenvectors and eigenvalues of the T matrix, respectively.The final step is the multiplication of the feature vector  by the transpose of eigenvector matrix of  as follows: where Φ is the eigenvector matrix of  and  represents the result vector of test sample .The test sample is estimated in ROI class if the  2 norm of  is a large value; otherwise it is estimated in clutter class.In order to summarize the proposed method and give a brief representation, steps of our algorithm are described below.

Classification Results
In this section, classification results are presented.For each case, we selected a class among 16 classes and marked it as ROI class.Since it is not feasible to present the graphical results of all classes, we labeled some of the classes as in Figure 2.
Particularly, we have shown the classified images of the class number 8 (Hay-windrowed).ROC curves are presented for the rest of labeled classes.Finally, we present Table 2 which includes the recall, precision, and accuracy results for all classes.
In first experiment "Hay-windrowed" class (labeled as number 8 in the figure) is selected as the ROI class.ROC curve results of three methods are shown in Figure 3(a).The results indicate that the kernel transformation remarkably improves the classification results.Also, our method shows higher accuracy than SVM at the same false acceptance rate (FAR) levels.Table 2 shows classification results of all classes.Precision, Recall, and Accuracy results are presented in each column.The results show that K-FKT offers promising classification capability for the hyperspectral image classification problem.
Experiments show that the overall accuracy of some specific classes such as "Corn-notill" and "Soybean-clean" is not higher than 80%.To clarify the ambiguity, we investigated the samples of these complicated classes and we realized that some classes are not in a "well-separable" condition.Our correlation analyses show that they are highly correlated with each other.It is the main reason behind the lower accuracy.In this study, we have only studied spectral features of the dataset, but employing also spatial features (e.g., neighbourhood information) may improve the results.
It is usually not an easy task to show a fair comparison of two studies due to unknown experimental parameters.However, by its experimental similarities, [11] can be considered as a comparison paper to our study.Samiappan et al. classify the same dataset using SVMs with nonuniform feature selection.Their results present 75% of overall accuracy.Our results point out a remarkable contribution and exceed 75% accuracy by reaching 86% overall accuracy.

Conclusions
In this paper, we have presented a solution for hyperspectral image classification problems using supervised classification method called Kernel Fukunaga-Koontz Transform which is improved version of classical FKT.Since the classical FKT gives low performance for classification of nonlinearly distributed data, we have mapped the HSI data to higher dimensional Kernel space by kernel transformation.In that Kernel space, each region can be separated by classical FKT with higher performance.The experimental results verify that Kernel Fukunaga-Koontz Transform has higher classification performance than classical FKT, Linear, Polynomial, and Radial based SVM methods.Under these considerations, we can conclude that the Kernel FKT is a robust classification method for hyperspectral image classification.Our next goal is to use different kinds of kernel functions and investigate their effects to the classification results.We have an ongoing study that compares performances of different kernels.

3. 1 .
Training Stage.Since Fukunaga-Koontz Transform is a binary classification method, the training dataset is divided into two main classes.The region of interest (ROI) class is the first one to be classified.And the clutter class contains all other classes in the dataset except ROI.The algorithm is initiated by collecting an equal number () of training samples for ROI and clutter classes that are represented as  and , respectively.Similarly,   and   are the training samples (or training signatures) of ROI and clutter classes as follows:

( 1 )
Training stage (i) Select  number of training samples for ROI and Clutter class.(ii) Map training samples into Kernel space using "Kernel Trick" approach.(iii) Calculate the transformation  using eigenvalues and eigenvectors.(iv) Transform the the matrices  0 and  0 into the eigenspace via  operator.(2) Testing stage (i) Map test sample  into Kernel space to obtain kernel matrix .(ii) Use  and normalized ROI matrix T to calculate feature vector .(iii) Use  and eigenvalues of  to reach result value.(iv) Make the final decision by thresholding the result value.
For a better view of classification, result images are shown in Figures 3(b), 3(c), and 3(d) which are the results of K-FKT, Radial based SVM, and classical FKT, respectively.As shown in the figures, while classical FKT cannot classify the area, K-FKT and SVM classify the area with high accuracy.Figures 4(a), 4(b), 4(c), and 4(d) show the ROC curves for other 4 classes, which are labeled as 5 (Grass-pasture), 6 (Grass-trees), 10 (Soybean-notill), and 13 (Wheat), respectively.According to the results, K-FKT presents higher accuracy than other classification methods.The results indicate that the ROC curve may vary for different classes since classes have different distributions.

Table 1 :
Class names and number of samples.
remaining eigenvalues represent the clutter class.With this characteristic, FKT differs from other methods.