Hyperspectral remote sensing technology is a rapidly developing new integrated technology that is widely used in numerous areas. Rich spectral information from hyperspectral images can aid in the classification and recognition of the ground objects. However, the high dimensions of hyperspectral images cause redundancy in information. Hence, the high dimensions of hyperspectral data must be reduced. This paper proposes a hybrid feature selection strategy based on the simulated annealing genetic algorithm (SAGA) and the Choquet fuzzy integral (CFI). The band selection method is proposed from subspace decomposition, which combines the simulated annealing algorithm with the genetic algorithm in choosing different cross-over and mutation probabilities, as well as mutation individuals. Then, the selecting bands are further refined by CFI. Experimental results show that the proposed method can achieve higher classification accuracy than traditional methods.
1. Introduction
Hyperspectral remote sensors peculiarly provide measurements of the Earth’s surface with very high spectral resolution, usually resulting in tens of channels. Unlike multispectral sensors, the high spectral resolution renders hyperspectral remote sensors very powerful in applications requiring the identification of subtle differences in ground covers (e.g., material quantification and target detection). On the other hand, the large-dimensional data spaces generated by these sensors introduce challenging methodological problems. In the context of supervised classification, the most important methodological issue raised by these sensors is the so-called curse of dimensionality (also known as the Hughes effect) that occurs when the numbers of features and of available training samples are unbalanced [1].
Meanwhile, hyperspectral remote sensing images have nonlinear properties. These nonlinear properties originate from the multiscattering between photons and ground targets, within pixel spectral mixing, and from scene heterogeneity. In addition, given that the pixel size in most remote sensing systems is sufficiently large to include different types of land cover, classification error arises and produces unreliable classification results. In this case, traditional classifiers may fail completely.
In remote sensing literature, numerous methods have been developed to solve the hyperspectral data classification problem. A successful approach to hyperspectral data classification is based on the support vector machine (SVM). SVM determines two classes by identifying the optimal separating hyperplane that maximizes the margin between the closest training sample and the separating hyperplane. Data samples located at the hyperplane border are referred to as support vectors and are used to create a decision surface. The properties of SVM for both full-dimensional and reduced-dimensional data have been investigated, while multi-class SVM strategies have been considered in [2]. Hyperspectral image classification using different kernel-based approaches has been analyzed and compared, and SVM has been found to be more useful than other kernel-based methods in [3]. SVM classification performance is compared with other well-known neural approaches in [4], which exhibited that SVM provides simplicity, robustness, and increased classification accuracy compared with neural networks. In addition, some improved SVM methods have also been successfully used in hyperspectral image classification. The proposed method, called contextual SVM using Hilbert space embedding showed significant improvement over other methods on several hyperspectral images in [5]. A semisupervised method for addressing a domain adaptation problem based on multiple-kernel SVMs in the classification of hyperspectral data was presented in [6]. Thus, SVM is very suitable for hyperspectral image classification. However, dimension reduction is not sufficiently considered in SVM.
Commonly used dimension reduction methods fall into two categories, namely, feature selection and feature extraction. Since every band of hyperspectral data has its own corresponding image, the feature extraction approach maps a high-dimensional feature space to low-dimensional space via linear or nonlinear transformation. However, the original physical interpretation of the image cannot be retained. Thus, feature extraction approaches are unsuitable for the dimension reduction of hyperspectral images. Given that the spectral distance between adjacent bands in the hyperspectral data is only 10 nm and because the correlation between them is extremely high [7], a considerable redundancy is observed, which should be largely reduced by the feature selection or band selection methods to improve classification efficiency and accuracy. A semisupervised feature-selection technique for hyperspectral image classification was developed in [8]. A method for unsupervised band selection by transforming the hyperspectral data into complex networks was presented in [9]. Therefore, a new dimension reduction method is proposed that combines the simulated annealing genetic algorithm (SAGA) with the Choquet fuzzy integral (CFI).
A population and temperature ladder-based new genetic algorithm (GA) or the so-called SAGA was recently proposed to examine a sample from a distribution defined on a space of finite binary sequence. The feature selection strategy of hyperspectral images based on GA and SVM was proposed in [10, 11]. A GA-based feature selection and local-Fisher’s discriminant analysis-based feature projection are performed for effective dimensionality reduction in [12]. But SAGA method works by simulating a parallel population of samples with different temperatures. The population is updated via selection, mutation, cross-over, and exchange operations that are highly similar with GA. SAGA has the learning capability of GA, as well as the fast-mixing capability of parallel tempering (simulated tempering). In most cases, classification accuracy is only used as the fitness function, but internal relations between bands and classes have not been taken into account. Considering the above problem, a correction method based on CFI is proposed. The CFI does not assume the independence of one element from another and, based on any fuzzy measure, it is employed to perform the overall evaluation of an input pattern [13]. Moreover, the fuzzy measure defined on an attribute is used as the relative degree of importance of this attribute such that the connection weights can be interpreted as the fuzzy measure values or the degrees of importance of the respective input variables. The band selection method of this paper that is based on SAGA and CFI (SAGA-CFI) cannot only improve the accuracy of classification, but also effectively reduce the uncertainty of the information in order to further improve the accuracy.
Since hundreds of bands in the hyperspectral imagery exist, the direct search space for SAGA and CFI on the original band space becomes extremely huge. An adaptive subspace decomposition (ASD) method for hyperspectral data dimensionality reduction was proposed in [14]. To avoid the impact of enormous data sets on traditional statistical classification techniques, the ASD scheme is used. Thus, the differences between global and local statistical characteristics have been fully considered, and the problem presented by a limited number of training samples is then alleviated.
In this paper, we use SAGA and CFI in every subspace to choose suitable bands based on ASD which differs from the previous work [5, 6, 8–12] in three aspects. First of all, ASD is employed to divide the bands into disjoint subspace rather than mutual information. Although mutual information may make better performance than ASD, it also cannot be chosen in this paper because mutual information is interconnected with entropy of information, and it can be directly formulated by entropy. It is better to keep independence between ASD and CFI. Furthermore, based on GA, SAGA is used in band selection which includes a schedule of temperatures and approaches the global minimum when the temperatures change gradually. Last but not least, CFI is first employed to further optimize the band selection method. Thus, we reduce the search space and computational complexity, while avoiding the selection of an excessive number of adjacent bands.
The remainder of this paper is organized as follows. Section 2 introduces subspace decomposition. Section 3 presents the proposed SAGA. In Section 4, a brief description of three related elements and fuzzy measure followed by CFI is given. Section 5 provides the SVM classification adopted in this paper. Section 6 describes the proposed method. Experiments and analysis are demonstrated in Section 7. Finally, Section 8 concludes the paper.
2. Subspace Decomposition
The main characteristics of hyperspectral remote sensing data are a large quantity of imaging channels (approximately 220 bands) and a narrow band spectrum. The spectrum of hyperspectral data is highly concentrated, rendering overall and local characteristics quite different. We may lose some important local characteristics if we select the bands from the total space. In terms of the overall situation, the bands are notably characterized by groups. We can divide all bands into several groups as long as a lower correlation exists between adjacent bands. Subspace decomposition not only reduces the dimension of the images, but also significantly improves the efficiency of data processing. Division of data sources based on ASD and fusion classification based on consensus theory is proposed in [15]. So the commonly used method continues to be ASD. According to the correlation matrix of hyperspectral images between bands, the full data space with dimensionality is adaptively decomposed into numerous subspaces with different dimensionalities. In each subspace, the bands have very strong correlation, while the energy is more concentrated. Hence, full data dimensionality can be logically reduced.
Since different bands have different correlations, all subspaces do not have the same dimensionality. Therefore, the goal is to match the features of each subspace with one or few classes. For this purpose, the new method primarily depends on the correlation matrix R between different bands. The element of the correlation matrix R is defined as
(1)Rij=E[(xi-μi)(xj-μj)][E(xi-μi)2E(xj-μj)2].
The value Rij of the matrix R ranges between 0 and 1. The closer Rij is to 1, the more correlation exists between the two bands. μi and μj are the mean values of xi and xj, respectively. E[•] is the value of the mathematical expectation.
3. Simulated Annealing Genetic Algorithm
Traditional selection, cross-over, and mutation operator, as well as the selection of fitness proportion in GA, allow the superior chromosome to maintain its predominance or strengthen it in the subsequent generations. The convergent chromosome may not be the overall optimal chromosome. SAGA combines the simulated annealing algorithm with GA. Thus, SAGA can perform the temperature-control function in the simulated annealing algorithm by controlling selection probability [16]. If we want to sample from a distribution defined on a space of finite binary sequence, we employ the following:
(2)π(X)∝exp[-H(X)τ],
where X is the d-dimensional binary vector X={β1,β2,…,βd} with βi∈{0,1}, τ is the scale parameter (a so-called temperature that can be any value of interest), and H(X) is the fitness function in terms of GA.
First, a sequence of distributions π1(x1),π2(x2),…,πN(xN) is constructed as follows:
(3)πi(xi)=1Z(ti)exp[-H(xi)ti],
where, for i=1,2,…,N, Z(ti)=∑xiexp[-H(xi)/ti]. The temperatures T=(t1,t2,…,tN) form a ladder with the order t1>t2>⋯>tN-1>tN. For convenience, we denote the ladder by T=(t1,t2,…,tN). Note that we always set tN=τ as πN(x)=π(x) to correspond to the target distribution from which we obtain the sample. X={x1,x2,x3,…,xN} denotes a population of samples where xi={β1i,β2i,…,βdi} is a sample from πi(x) and is called a chromosome or an individual in terms of GA, while N represents the population size. In SAGA, the Boltzmann distribution of the population is expressed as
(4)π(X)=∏i=1Nπi(xi)=1Z(T)·exp[-∑i=1NH(xi)ti],
where Z(T)=∏i=1NZi(ti). The population is updated by selection, cross-over, mutation, and exchange operators.
3.1. Selection
The probability of having the chromosome chosen first is
(5)p(xi)=exp[-H(xi)/t]Z(X),
and probability of (xi,xj) is
(6)p[(xi,xj)X]={exp[-H(xi)/t]+exp[-H(xj)/t]}[(N-1)Z(X)],
where Z(X)=∑i=1Nexp[-H(xi)/t].
3.2. Cross-Over
One chromosome pair, such as xi and xj (i≠j), is selected from the current population X through the roulette wheel. Two offspring, yi and yj, are generated according to a specific cross-over operator. A new population Y is proposed as Y={x1,x2,x3,…,yi,…,yj,…xN} and is accepted with probability min(1,pc) according to the Metropolis-Hastings rule that is expressed as follows:
(7)pc=exp{-[H(yi)-H(xi)]ti-[H(yj)-H(xj)]tj}×P[(yi,yj)∣Y]P[(xi,xj)∣X],
where P[(yi,yj)∣Y] denotes the selection probability of (yi,yj) from the population Y and P[(xi,xj)∣X] denotes the selection probability of (xi,xj) from the population X.
3.3. Mutation
We define the mutation operator as an additional move of the Metropolis-Hastings rule. One chromosome, such as xk, is uniformly chosen from the current population X. A new chromosome is generated by the addition of a random vector ek, such that
(8)yk=xk+ek,
where ek is usually chosen to achieve moderate acceptance probability for the mutation operation. The new population Y={x1,x2,x3,…,yk,…xN} is accepted with the probability min(1,pm) according to the following Metropolis-Hastings rule:
(9)pm=exp{-[H(yk)-H(xk)]tk}T(X∣Y)T(Y∣X).
3.4. Exchange
A straightforward implementation of relative parallel tempering can outperform simulated annealing in several crucial respects, and parallel tempering can offer a powerful alternative to simulated annealing for combinatorial optimization problems [17]. Given the current population X and the attached temperature ladder T in {X,T}={x1,t1,x2,t2,…,xm,tm,xm+1,tm+1,…,xN,tN}, we propose to obtain a new population Y by making an exchange between xm and xm+1 without changing the T. That is, {Y,T}={x1,t1,x2,t2,…,xm+1,tm,xm,tm+1,…,xN,tN}. The new population is then accepted with probability min(1,pe) according to the Metropolis-Hastings rule below:
(10)pe=exp{[H(xm)-H(xm+1)](1tm-1tm+1)}.
3.5. Fitness Function
In addition, another key of SAGA is the design of the fitness function. We use only the classification accuracy obtained from the training feature subset as the fitness function. The purpose of the iterative repetition is to determine the optimal feature subset and to maximize classification accuracy. The adopted classifier is SVM, which is described in Section 5.
4. Choquet Fuzzy Integral
Based on subspace decomposition, CFI method is used to further refine the selecting bands. The definition of fuzzy measure and Choquet integral are shown in [18, 19].
Definition 1 (fuzzy measure (see [<xref ref-type="bibr" rid="B18">18</xref>])).
Denote the Borel set as B which is obtained from the domain X={x1,x2,…,xn}, and then define a fuzzy measure g:B→[0,1] on B, it must satisfy the following conditions:
g(X)=1; g(ϕ)=0, ϕ is null set;
given two subsets A,B⊂B, if A⊂B, so g(A)≤g(B);
If A1⊂A2⊂⋯⊂An⊂⋯, then limi→∞(g(Ai))=g(limi→∞Ai).
According to the definition of fuzzy measure, Sugeno introduces the g-fuzzy measure.
Definition 2 (<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M84"><mml:mi>g</mml:mi><mml:mtext>-fuzzy</mml:mtext></mml:math></inline-formula> measure (see [<xref ref-type="bibr" rid="B19">19</xref>])).
For all the sets A,B⊂X, A∩B=ϕ, there exists λ>-1 to satisfy
(11)g(A∪B)=g(A)+g(B)+λg(A)g(B).
Obviously, when λ=0, g-fuzzy measure is the probability measure.
Given a finite set X={x1,x2,…,xn} and orders gi=g({xi}), the mapping g:xi→gi as fuzzy density function, also as single-point importance. If A={xi1,xi2,…,xim}⊆X, according to (20), the following formula can be deduced as
(12)g(A)=∏xi∈A(1+λgi)-1λ,
where >-1,λ≠0. Because g(X)=1, the value of λ can be achieved by solving λ+1=∏i=1n(1+λgi). It can be proved that given the fixed set {gi}, 0<gi<1, there exists one and only one λ∈(-1,∞) and λ≠0. So, if the fuzzy density gi (i=1,2,…,n) is given, it can get the unique g-fuzzy measure.
With regards to the theory on information fusion, fuzzy density gi server as the importance or the contribution of the source xi. The group of source A can determine a unique g-fuzzy measure in the process of data fusion. Based on the g-fuzzy measure, Choquet proposed a fuzzy integral method.
Definition 3 (Choquet integral (see [<xref ref-type="bibr" rid="B19">19</xref>])).
Given a function h:X→[0,1], and its Choquet integral on fuzzy measure g is defined as
(13)Cμ(h(x1),h(x2),…,h(xn))=∑i=1n(h(xi)-h(xi-1))g(Ai).
In the equation, the value of the function h(xi) can be interpreted as a credibility estimation of the source xi for specific target. Note that the function h(xi) is increasing, 0=h(x0)≤h(x1)≤h(x2)≤⋯≤h(xn)≤1; fuzzy measure g is the importance or contribution of information source with respect to the ultimate decision-making or estimation, Ai={xi,xi+1,…,xn}.
According to (13), CFI can be seen as the weighted sum of h(x1),h(x2),…,h(xn), and the weights depend on of the rank of {xi}, and the value of h{xi} decides the rank of {xi}; so the CFI is a nonlinear function of function h. It is clear that when λ=0, the g-fuzzy measure is the probability measure, and the CFI is a linear function of h. The CFI is used in data fusion if we regard the h(xi)∈[0,1] as a result of goal judgment and g—as the degree of importance or contribution. Obviously the CFI is the nonlinear combination of the result of information source with the importance of information source.
Before computing the fuzzy integral we must compute the value of λ. From (12), we know that the solution to λ of the fuzzy integral is the root of high-order polynomial. If there are many sources, there is computation burden to get the value of parameter λ, blocking the online and real-time of algorithm.
5. SVM Classification Methods
Training data are required to train the SVM model. However, these data cannot be separated without errors. The data points that are closest to the hyperplane are used to measure the margin, while SVM attempts to identify the hyperplane that maximizes the margin and minimizes a quantitative proportion to the number of misclassification errors [20, 21]. SVM derives the optimal hyperplane as the solution of the following convex quadratic programming problem [22]:
(14)minw,b,ξ12wTw+C∑i=1nξi,s.t.yi(wTϕ(xi)+b)≥1-ξi,hhhhhhhlξi≥0,i=1,2,…,n,
where {(x1,y1),…,(xi,yi)} are the labeled training datasets, xi∈Rd and yi∈{-1,1}; w*, and b*defines a linear classifier in the feature space; C is the regularization parameter defined by the user; and ξi is a positive slack variable that handles permitted errors.
The aforementioned optimization problem can be reformulated through a Lagrange function, where Lagrange multipliers can be found via dual optimization to generate a convex quadratic programming solution as follows [23–25]:
(15)maxQ(α)=∑i=1nαi-12∑i=1n∑j=1nαiαjyiyjK(xi,xj),s.t.∑i=1nαiyi=0,0≤αi≤C,i=1,2…n,
where α=[α1,α2,…,αn] is the vector of Lagrange multipliers, while K(•,•) is a kernel function which is introduced as follows [26]:
(16)K(xi,xj)=ϕ(xi)·ϕ(xj).
Thus, the final result is a discrimination function F(x) conveniently expressed as a function of the data in the original (lower) dimensional feature space [27]:
(17)F(x)=sgn[(w*)Tϕ(x)+b*]=sgn(∑i=1nαi*yiK(xi,x)+b*).
In the beginning, adaptive subspace decomposition is used to divide into seven subspace according to (1). All Rij values are identified, and then the proper threshold Tb is set. The continuous bands of |Rij|≥Tb in the same subspace are subsequently placed. We can dynamically control the number of subspaces and the number of bands in each subspace by changing the threshold Tb.
6.2. The Band Order Method in Subspace
SAGA is used in order to find out the optimal bands in each subspace. Here we choose common binary coding method as the genetic coding mode, and the iteration times of SAGA is 50. Generally, a subspace has many bands, and all the suitable bands should be chosen. Meanwhile, if the subspace has only one band, it must be chosen.
6.3. The Band Reorder Method in Subspace
After the bands are chosen according to SAGA, they also can be further optimized based on the CFI method. CFI takes into account the factors of entropy of information, correlation coefficient, and standard distance between the means.
6.3.1. Entropy of Information and Variance
According to Shannon's information theory, entropy measures information content in terms of uncertainty. The entropy of the hyperspectral components represents the information content of each component. Thus, the higher the entropy, the richer the information content, resulting in a more meaningful representation. The entropy or total information [28] is defined as
(18)H(X)=∑xpX(x)lnpX(x),
where pX(x) is the probability of pixel value x.
Variance represents deviation from mean value to the gray-scale value of pixel. The formulae of computing mean value μi and variance σi2 are as follows [29]:
(19)μi=1M×N∑x=1M∑y=1Nfi(x,y),(20)σi2=1M×N∑x=1M∑y=1N[fi(x,y)-μi]2,
where i and i+1 are the numbers of two adjacent bands. M and N represent the width and the height of image, and fi(x,y) is the gray-scale value of pixel (x,y).
6.3.2. Correlation Coefficients
In statistics, the correlation coefficient denotes the accuracy of a least square fitting to the original data. It is a normalized measure of the strength of the linear relationship between two variables. Correlation is employed in many types of applications, such as hyperspectral image processing where it is used to measure and to quantitatively compare the similarity between bands [30]. The two-dimensional normalized correlation function for image processing is shown below:
(21)CC=(∑x=1M∑y=1N[fi(x,y)-μi][fi+1(x,y)-μi+1])×((∑x=1M∑y=1N[fi(x,y)-μi]2)hhhhh×(∑x=1M∑y=1N[fi+1(x,y)-μi+1]2))-1/2,
where CC is a real number between −1 and 1.
6.3.3. Standard Distance between the Means
Object classes need to be analyzed in depth in which the band is easy to be distinguished [31] that is, the statistical distance between object classes in the band. Standard distance between the means d is defined as
(22)d=|μ1′-μ2′|σ1′+σ2′,
where μ1′ and μ2′ are spectrum means of corresponding regions of the two samples. σ1′ and σ2′ are variances of corresponding regions of the two samples. d reflects separability of the two samples in each band.
Then, the procedure of the band reorder method using CFI is as follows.
According to (18), entropies of information in each subspace are computed and recorded as u1.
According to (21), the correlation coefficients in each subspace are computed and recorded as u2.
According to (22), standard distances between the means in each subspace are computed and recorded as u3.
Belief function is constructed and domain is U={u1,u2,u3}. The relations between index value of each factor and band reorder are described below. (1) The bigger the entropy is, the rich the information is. (2) The smaller the correlation coefficient is, the more independent the band is. (3) The bigger the standard distance between the means is, the easier to distinguish the two samples is. So the belief functions of CFI are listed as follows:
(23)h(u1)=u1-u1minu1max-u1min,h(u2)=u2max-u2u2max-u2min,h(u3)=u3-u3minu3max-u3min,
where 0≤h(u1)≤h(u2)≤⋯≤h(um)≤1. Equation (23) is reordered and a new equation (24) is generated:
(24)h′(u1)=min{h(u1),h(u2),h(u3)},h′(u2)=mid{h(u1),h(u2),h(u3)},h′(u3)=max{h(u1),h(u2),h(u3)}.h′(u1), h′(u2), and h′(u3) are minimum, median, and maximal values of the three, respectively.
Another important problem that needs to be fixed is fuzzy measure g. Belief function is arranged in ascending order, and the biggest one is of primary importance. In each subspace, S=h′(u1)+h′(u2)+h′(u3),
(25)g(u1)=h′(u1)S,g(u2)=h′(u2)S,g(u3)=h′(u3)S.
The formula of computing fuzzy integral value is as follows:
(26)C=∑i=13g(h∂i)(h′(ui)-h′(ui-1)),
where h∂i={ui,ui+1,…,un}, h′(u0)=0.
6.4. Flowchart of the Proposed Method
The flowchart of this paper is illustrated in Figure 1.
Overall structure of the proposed method.
7. Experiments and Analysis7.1. Hyperspectral Images
Experiments were conducted on a hyperspectral data set from the Northwest Indiana Indian Pine test site 3 (2 × 2 mile portion of Northwest Tippecanoe County, Indiana) on June 12, 1992. These data include 145 by 145 pixels and 220 bands. The false color image is shown in Figure 2, which is composed of band 89, band 5, and band 120.
AVIRIS false color image.
7.2. Subspace Decomposition Experiment
The ASD scheme is used to obtain the correlation value between the bands. Table 1 gives the values of the parts of the correlation matrix R according to (1).
Correlation matrix R between 5 and 15.
Bands
5
6
7
8
9
10
11
12
13
14
15
5
1.0000
0.9793
0.9823
0.9813
0.9809
0.9800
0.9785
0.9769
0.9750
0.9710
0.9667
6
0.9793
1.0000
0.9889
0.9892
0.9886
0.9881
0.9869
0.9857
0.9835
0.9798
0.9755
7
0.9823
0.9889
1.0000
0.9920
0.9930
0.9926
0.9918
0.9911
0.9892
0.9856
0.9819
8
0.9813
0.9892
0.9920
1.0000
0.9938
0.9941
0.9935
0.9929
0.9911
0.9875
0.9836
9
0.9809
0.9886
0.9930
0.9938
1.0000
0.9950
0.9954
0.9949
0.9934
0.9897
0.9861
10
0.9800
0.9881
0.9926
0.9941
0.9950
1.0000
0.9959
0.9959
0.9944
0.9907
0.9872
11
0.9785
0.9869
0.9918
0.9935
0.9954
0.9959
1.0000
0.9966
0.9959
0.9926
0.9894
12
0.9769
0.9857
0.9911
0.9929
0.9949
0.9959
0.9966
1.0000
0.9968
0.9943
0.9915
13
0.9750
0.9835
0.9892
0.9911
0.9934
0.9944
0.9959
0.9968
1.0000
0.9965
0.9949
14
0.9710
0.9798
0.9856
0.9875
0.9897
0.9907
0.9926
0.9943
0.9965
1.0000
0.9977
15
0.9667
0.9755
0.9819
0.9836
0.9861
0.9872
0.9894
0.9915
0.9949
0.9977
1.0000
As presented in Table 1, the autocorrelation coefficient of each band is equal to 1, and the correlation value is very high. In this paper, the ASD method is performed using the correlation criterion of a given threshold, which is 0.8. The full data space is decomposed into seven subspaces. The dimensions of each subspace are shown in Table 2.
Dimensions and bands of each subspace.
Subspace
1
2
3
4
5
6
7
Dimensions
1–15
16–35
36
37-38
39–76
77–97
98–220
Bands
15
20
1
2
38
21
123
From the 220 spectral channels acquired by the AVIRIS sensor, 41 bands were discarded because they were affected by atmospheric problems. The discarded bands were as follows: 1–4, 78, 80–86, 103–110, 149–165, and 217–220. As a result, the new dimensions of each subspace are shown in Table 3.
New dimensions and bands of each subspace.
Subspace
1
2
3
4
5
6
7
8
9
10
11
Dimensions
5–15
16–35
36
37-38
39–76
77
79
87–97
98–102
111–148
166–216
Bands
11
20
1
2
38
1
1
11
15
38
51
7.3. SAGA in Each Subspace
The hyperspectral image is categorized into seven classes according to the real data on the ground. The ratio of training and test samples is 1 : 3 because SVM is suitable for small samples. SAGA is used in each subspace, while fitness is computed and illustrated in Figure 3. We select the most optimum band in each subspace. SAGA in subspace numbers. 3, 6, and 7 is unnecessary because each of these subspaces contains only one band. The kernel function used is a radial basis function, while the two SVM parameters (i.e., C and γ) are selected based on fivefold cross-validation during the training phase. The search range for C is in [2-3,210] and [2-8,22] for γ.
(a) The fitness curve of subspace 1; (b) the fitness curve of subspace 2; (c) the fitness curve of subspace 4; (d) the fitness curve of subspace 5; (e) the fitness curve of subspace 8; (f) the fitness curve of subspace 9; (g) the fitness curve of subspace 10; and (h) the fitness curve of subspace 11.
7.4. Index Value of CFI
Entropy, correlation coefficient, and standard distance between the means of each band are computed. The index values of CFI are then obtained and sorted in descending order in each subspace. The bigger the index value is, the more important the band is. Tc is a given threshold of index value. Table 4 shows the index values of the bands when threshold Tc is 0.940.
Band number and index value in each subspace.
Subspace
Band No.
Index value
Band No.
Index value
1
11
0.9419
2
17
0.9732
18
0.9706
16
0.9604
19
0.9502
3
36
1
4
37
0.9868
5
71
0.9667
69
0.9658
70
0.9652
73
0.9642
67
0.9638
72
0.9636
74
0.9636
6
77
1
7
79
1
8
88
0.9813
89
0.9659
9
101
0.9608
10
119
0.9689
120
0.9689
121
0.9687
122
0.9685
118
0.9683
123
0.9681
124
0.9675
125
0.9671
126
0.9669
127
0.9663
128
0.9663
129
0.9660
130
0.9659
11
184
0.9659
SAGA is used to determine which band/bands shall be selected in each subspace, but it cannot indicate which bands have higher priority than others. The index values of CFI are then further refined the selecting bands, and the more effective optimizations come into being.
7.5. Computational Time Complexity
There is one issue that needs to be considered. The proposed procedure constructs and analyses probably consume considerable time. Thus, we compare the time complexity of the four methods GA, SAGA, CFI, and SAGA-CFI in this part. The time complexity of SAGA-CFI is O(n2), just the same as the other three methods. This means that the processing cost of SAGA-CFI is no more than the others.
7.6. Classification Experiment
The hyperspectral image is also categorized into seven classes, while the ratio of training and test samples remains 1 : 3. The numbers of training samples and of test samples are shown in Table 5.
Numbers of training samples and of test samples.
Classes
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
Class 7
Number of training samples
48
485
632
86
777
749
163
Number of test samples
150
1435
2192
235
2417
2469
516
In this work, we implement another two similar classification methods for hyperspectral images to compare with the algorithm proposed in the paper. One method is based on SAGA and SVM classification (SAGA-SVM). The other method is based on CFI and SVM classification (CFI-SVM). Similarly, the method of this paper is based on SAGA, CFI, and SVM classification (SAGA-CFI-SVM). The error matrices of the three methods are presented in Tables 6, 7, and 8, while the total accuracy and Kappa value are exhibited in Table 9. The threshold Tc is 0.94 in all of the above methods. Table 10 shows the different total accuracy and Kappa value by changing the threshold Tc.
Error matrix of SAGA-SVM.
Class
Reference data
Row total
UA
1
2
3
4
5
6
7
Classified data
1
137
0
35
3
51
44
0
270
0.5074
2
0
1328
42
4
41
0
9
1424
0.9326
3
1
0
2024
0
45
107
0
2177
0.9297
4
4
22
38
222
0
0
13
299
0.7425
5
5
28
0
0
2235
47
10
2325
0.9613
6
0
36
53
3
0
2271
8
2371
0.9578
7
3
21
0
3
45
0
476
548
0.8686
Column total
150
1435
2192
235
2417
2469
516
9414
PA
0.9133
0.9254
0.9234
0.9446
0.9247
0.9198
0.9225
0.9249
Error matrix of CFI-SVM (Tc=0.94).
Class
Reference data
Row total
UA
1
2
3
4
5
6
7
Classified data
1
134
0
69
5
66
70
0
344
0.3895
2
0
1276
55
9
75
0
15
1430
0.8923
3
5
0
1937
0
75
135
0
2152
0.9001
4
1
41
62
210
0
0
14
328
0.6402
5
6
39
0
0
2114
63
20
2242
0.9429
6
0
35
69
6
0
2201
15
2326
0.9463
7
4
44
0
5
87
0
452
592
0.7635
Column total
150
1435
2192
235
2417
2469
516
9414
PA
0.8933
0.8892
0.8837
0.8936
0.8746
0.8915
0.8760
0.8841
Error matrix of SAGA-CFI-SVM (Tc=0.94).
Class
Reference data
Row total
UA
1
2
3
4
5
6
7
Classified data
1
136
0
32
1
53
49
0
271
0.5018
2
0
1323
43
2
35
0
4
1407
0.9403
3
5
0
2058
0
50
95
0
2208
0.9321
4
3
23
27
227
0
0
13
293
0.7747
5
2
24
0
0
2239
29
13
2307
0.9705
6
0
35
32
1
0
2296
9
2373
0.9676
7
4
30
0
4
40
0
477
555
0.8595
Column total
150
1435
2192
235
2417
2469
516
9414
PA
0.9067
0.9220
0.9389
0.9660
0.9264
0.9299
0.9244
0.9301
Total accuracy and Kappa value (Tc=0.94).
Index
Total accuracy
Kappa value
SVM
86.72%
0.8315
GA-SVM
90.28%
0.8752
SAGA-SVM
92.49%
0.9030
CFI-SVM
88.41%
0.8539
SAGA-CFI-SVM
93.01%
0.9114
Total accuracy and Kappa value with changing Tc.
Index
Total accuracy
Kappa value
Tc
0.940
0.965
0.970
0.940
0.965
0.970
CFI-SVM
88.41%
89.12%
68.75%
0.8539
0.8654
0.6173
SAGA-CFI-SVM
93.01%
93.91%
72.52%
0.9114
0.9223
0.6355
In the error matrix, the product’s accuracy (PA) is defined as
(27)PAi=xi,ix+i,
and the user’s accuracy (UA) is defined as
(28)UAi=xi,ixi+,
where xi,i is the value on the major diagonal of the ith row in the error matrix, xi+ is the total number of the ith row, and x+i is the total number of the ith column.
To measure the agreement between the classification and the reference data, we compute the kappa coefficient based on the following equation:
(29)Kappa=[N∑i=1kxi,i-∑i=1k(xi+×x+i)][N2-∑i=1k(xi+×x+i)],
where N is the number of total pixels.
The original reference image, the SAGA-SVM classification image, the CFI-SVM classification image, and the SAGA-CFI-SVM are illustrated in Figure 4.
(a) The original reference image; (b) the SVM classification image; (c) the GA-SVM classification image; (d) the SAGA-SVM classification image; (e) the CFI-SVM (Tc=0.94) classification image; (f) the SAGA-CFI-SVM (Tc=0.94) classification image; (g) the CFI-SVM (Tc=0.965) classification image; and (h) the SAGA-CFI-SVM (Tc=0.965) classification image.
8. Conclusions
An innovative band selection algorithm called SAGA-CFI has been developed and combined with the classification method SVM to classify hyperspectral remote sensing images. On the basis of subspace decomposition, SAGA was used in each subspace to lower the computational complexity and select the suitable bands, and CFI method was adopted to further modificate the selecting bands in order to increase classification accuracy. SAGA-CFI-SVM has been implemented to achieve improved classification methods compared with conventional algorithms. Comparison results show that the proposed method is superior in terms of classification accuracy.
The classification of hyperspectral remote sensing images based on SAGA-CFI-SVM in this paper is far from complete and thus requires further research. One problem cited is the further reduction of the computational complexity of SAGA and the acceleration of the searching procedure faster. Another problem is the thorough improvement of the kernel function to obtain significantly higher classification accuracy. Last but not least, we need to study the classification method based on selective ensemble support vector machine, for it may further improve the accuracy.
Acknowledgments
This study is supported by the National Natural Science Foundation of China (no. 61271386) and funded by the CRSRI Open Research Program (no. CKWV2013215/KY) and the Industrialization Project of Universities in Jiangsu Province (no. JH10-9).
MelganiF.BruzzoneL.Classification of hyperspectral remote sensing images with support vector machinesJolliffeI. T.GreenA. A.BermanM.SwitzerP.CraigM. D.A transformation for ordering multispectral data in terms of imagequality with implications for noise removalCamps-VallsG.Gómez-ChovaL.Calpe-MaravillaJ.Martín-GuerreroJ. D.Soria-OlivasE.Alonso-ChordáL.MorenoJ.Robust support vector method for hyperspectral data classification and knowledge discoveryGurramP.KwonH.Contextual SVM using hilbert space embeddingfor hyperspectral classificationSunZ.WangC.WangH.LiJ.Learn multiple-kernel SVMs for domain adaptation in hyperspectral dataTsoB.MatherP.YangC.LiuS.BruzzoneL.GuanR.DuP.A feature-metric-based affinity propagation technique for feature selection in hyperspectral image classificationXiaW.WangB.ZhangL.Band selection for hyperspectral imagery: a new approach based on complex networksLiS.WuH.WanD.ZhuJ.An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machinePalM.Hybrid genetic algorithm for feature selection with hyperspectral dataCuiM.PrasadS.LiW.BruceL. M.Locality preserving genetic algorithms for spatial-spectral hyperspectral image classificationQinglinM.A gas outburst prediction algorithm based on Choquet fuzzy integralProceedings of the WRI Global Congress on Intelligent Systems (GCIS '09)May 2009372-s2.0-7044934731010.1109/GCIS.2009.361ZhangY.DesaiM. D.ZhangJ.JinM.Adaptive subspace decomposition for hyperspectral data dimensionality reductionProceedings of the International Conference on Image Processing (ICIP '99)October 19993263292-s2.0-0033346141ZhangJ.ZhangY.ZouB.ZhouT.Fusion classification of hyperspectral image based on adaptive subspace decompositionProceedings of the International Conference on Image Processing (ICIP '00)September 20004724752-s2.0-0034443651LiangF.WongW. H.Evolutionary Monte Carlo: applications to Cp model sampling and change point problemWangC.HymanJ. D.PercusA.CaflischR.Parallel tempering for the traveling salesman problemAuephanwiriyakulS.KellerJ. M.GaderP. D.Generalized Choquet fuzzy integral fusionHaoN.WangG. J.Double-null asymptotic additivity of generalized fuzzy valued Choquet integralsTarabalkaY.FauvelM.ChanussotJ.BenediktssonJ. A.SVM- and MRF-based method for accurate classification of hyperspectral imagesWangH. B.ChenZ.WangX.MaY.Random finite sets based UPF-CPHD multi-object trackingDuP.TanK.XingX.Wavelet SVM in reproducing kernel hilbert space for hyperspectral remote sensing image classificationHuangC.SongK.KimS.TownshendJ. R. G.DavisP.MasekJ. G.GowardS. N.Use of a dark object concept and support vector machines to automate forest cover change analysisTanG. P.NiX. Y.LiuX. Q.QuC. Y.TangL. Y.Real-time multicast with network coding in mobile ad-hoc networksTuiaD.RatleF.PacificiF.KanevskiM. F.EmeryW. J.Active learning methods for remote sensing image classificationHamJ.ChenY.CrawfordM. M.GhoshJ.Investigation of the random forest framework for classification of hyperspectral dataZhangB.LiS.JiaX.GaoL.PengM.Adaptive Markov random field approach for classification of hyperspectral imageryTsagarisV.AnastassopoulosV.LampropoulosG. A.Fusion of hyperspectral data using segmented PCT for color representation and classificationMingD.LuoJ.LiL.SongZ.Modified local variance based method for selecting the optimal spatial resolution of remote sensing imageProceedings of the 18th International Conference on Geoinformatics (Geoinformatics '10)June 20102-s2.0-7795807296210.1109/GEOINFORMATICS.2010.5567566ZhuY.VarshneyP. K.ChenH.Evaluation of ICA based fusion of hyperspectral images for color displayProceedings of the 10th International Conference on Information Fusion (FUSION '07)July 20072-s2.0-5014908917510.1109/ICIF.2007.4408068ZhaoY. S.Methods on optimal bands selection in hyperspectral remote sensing data interpretation