Hyperspectral Band Selection Based on Adaptive Neighborhood Grouping and Local Structure Correlation

Band selection is a direct and effective dimension reduction method and is one of the hotspots in hyperspectral remote sensing research. However, most of the methods ignore the orderliness and correlation of the selected bands and construct band subsets only according to the number of clustering centers desired by band sequencing. To address this issue, this article proposes a band selection method based on adaptive neighborhood grouping and local structure correlation (ANG-LSC). An adaptive subspace method is adopted to segment hyperspectral image cubes in space to avoid obtaining highly correlated subsets. Then, the product of local density and distance factor is utilized to sort each band and select the desired cluster center number. Finally, through the information entropy and correlation analysis of bands in different clusters, the most representative bands are selected from each cluster. Regarding evaluating the effectiveness of the proposed method, comparative experiments with the state-of-the-art methods are conducted on three public hyperspectral datasets. Experimental results demonstrate the superiority and robustness of ANG-LSC.


Introduction
Hyperspectral images (HSI), as a rich spectral information source, can accurately describe objects and are widely used in various fields such as marine exploration, military target detection, forestry, and hydrology [1][2][3][4]. However, the high-dimensionality of hyperspectral data and the scarcity of labeled samples make the classification of hyperspectral data a challenging subject. Redundant adjacent bands in a hyperspectral image cause storage and transmission challenges, increasing computational complexity, and commonly, can decrease the performance of the classifiers. Hence, it is still a challenging task to remove redundant and insignificant information within a reasonable time in these complex situations.
Feature extraction and feature selection (also known as "band selection") are two of the most widely used dimensionality reduction strategies. Feature extraction mainly relies on finding an embedding and then projects the original highdimensional data into a lower-dimensional feature space [5]. However, through spatial transformation, the physical meaning of the original hyperspectral data is changed, and some key information is lost. The feature selection method [6] reduces the dimensionality of HSI by finding the most representative band to form a data subset, i.e., a group of most important bands is selected from all spectral bands to represent the entire spectrum. This way preserves the physical meaning of the original spectral data and facilitates the interpretation of the selected datasets. Therefore, the band selection method is very suitable for dimensionality reduction of hyperspectral data.
So far, a large number of band selection methods have been proposed. According to whether the training samples are utilized, these methods can be roughly grouped into supervised [7], unsupervised [8], and semisupervised [9]. Supervised and semisupervised methods rely heavily on supervised information to identify relevant bands. In general, unsupervised methods select representative bands by exploring the nature of samples without the prior information associated with labels. However, tagging samples is an expensive, tedious, and time-consuming task. In contrast, unsupervised methods require no labeling. Therefore, unsupervised band selection becomes a highly popular research topic.
In the past few years, many unsupervised band selection methods have been proposed. They can be divided into four categories: ranking-based [10], clustering-based [11], searching-based [12], and sparsity-based [13]. Because the interaction between bands is fully considered in the clustering method, the band selection based on clustering can obtain accurate results, which attracts more and more attention. These methods can obtain satisfying results, but they always have two inherent shortcomings in the clustering process. On the one hand, most of the methods only consider the correlation between bands and ignore the subset information of the selected bands. On the other hand, according to the characteristics of the hyperspectral image cube, it is found that these bands are arranged in an orderly manner. In addition, for a certain band, within a certain range, the correlation with the adjacent band is stronger, and the correlation with the farther band is lower. Therefore, it can be concluded that discontinuous bands of different wavelength ranges cannot be grouped into clusters for band selection.
Motivated by the above descriptions, a band selection method based on adaptive neighborhood grouping and local structure correlation is proposed in this paper. Unlike the SNNC [14] and SNNCA [15] methods, we treat the bands of hyperspectral images as ordered. Moreover, without changing the original hyperspectral image data, the hyperspectral image cube is divided into several subcubes, and the relevant information bands in each group are obtained using the idea of sorting. The main contributions are as follows: (1) Based on the fact that adjacent bands have high redundancy, the ordered hyperspectral bands are partitioned into multiple subcubes by the clustering algorithm, which can effectively avoid obtaining a subset with high correlation (2) The product of local density and distance factor is used to sort and select the number of clustering centers required for each band to ensure the low redundancy of clustering centers. It can make better use of the local distribution feature and make the obtained subset have more recognition bands at the same time (3) Through the information entropy and correlation analysis of the bands in different clusters, the most representative bands are selected from each cluster. This method does not directly select the cluster center as the selected band but considers the combination of bands in different clusters to avoid falling into the local optimal solution The rest of this article is organized as follows. In Section 2, the development of related technologies of the proposed method is briefly introduced. Section 3 introduces the proposed band selection method in detail, including the algorithm principles of adaptive subspace division and neighborhood grouping. Section 4 gives experimental results and discussion. Conclusions and future goals will be given in Section 5.

Related Work
The clustering-based method groups the original bands and selects representative bands from each cluster to form the final subset of bands. The algorithm can minimize intraclass variance and maximize interclass variance simultaneously. Most clustering-based methods are derived from k-means, affinity propagation (AP), and graph clustering. A few typical methods are briefly introduced below.
The k-means algorithm is a widely used clustering technique. It is initialized with a randomly selected set of bands and iteratively optimizes the objective function until the optimal cluster centers are found. In [16], the authors proposed a new statistical method for band clustering and band selection based on k-means clustering. The algorithm is based on interquartile range, geometric mean, median absolute deviation, median, correlation coefficient, covariance, and mode. Hyperspectral data can be calculated by these approaches. The authors in [17] proposed a new spectral band selection method, i.e., representative band mining. In this method, disjoint information is adopted to measure the distance between two spectral bands. To tackle the inherent drawbacks of the clustering-based band selection method, Yuan et al. [18] proposed a new band selection framework for dual clustering, which included a dual clustering method with context information in the clustering process and considered the mutual influence of various bands. In [19], the authors proposed a fast clustering algorithm (FDPC) based on density peaks. According to the calculation of the local density of each point and the distance within the cluster, the product of the two factors is sorted in descending order, and the cluster center is identified as a point with an abnormally large value. The authors in [20] enhanced the FDPC algorithm to make it suitable for hyperspectral band selection. First, the score of each band is calculated by weighting the normalized local density and the intracluster distance of each point instead of considering them equally. Then, an exponential-based learning rule is employed to adjust the cutoff threshold for a different number of selected bands, which is fixed in the FDPC. Therefore, the proposed method is called Enhanced FDPC (E-FDPC). In [14], the authors proposed a hyperspectral optimal selection clustering method based on shared nearest neighbor (SNNC). They used the local density of each band to reflect the local distribution characteristics and used information entropy as a weighting factor to design a method to automatically select the optimal band subset. Inspired by the SNNC method, [21] proposed the SNNCA method, which considered the interaction of bands in different clusters, and obtained a set of bands with large amount of information and low redundancy.
Since the clustering of the k-means algorithm is sensitive to the initial conditions, therefore, the exemplar-based AP clustering algorithm was proposed to search for an appropriate set of exemplars as representative bands. [22] proposed a new method of semisupervised band selection based on affinity propagation (AP). This algorithm proposes a new 2 Journal of Sensors normalized trivariable mutual information (NTMI) to measure band correlation for classification, taking into account not only band redundancy but also band synergy. The authors in [23] constructed the similarity matrix based on several distance measures proposed but did not consider the spatial structure information of the image. A structural method for evaluating image quality is proposed to establish the similarity between band images for band selection. [24] proposed a new affinity propagation (AP) technology based on distinguishing feature metrics. The spectral and spatial relationships between pixels are constructed by a new type of discrimination constraint. A discriminative feature measure (DFM) is proposed, and the discriminative constraint is modeled in terms of the best criterion for identifying the learning method of effective distance measurement, and then, the representative subset of bands is identified based on the AP clustering algorithm. Graph clustering expresses band selection as a graph problem. The nodes in the graph represent HIS bands, and the edge connecting two nodes corresponds to the similarity between the two bands. A graph-based clustering method, by constructing an affinity matrix related to the similarity of the bands, the graphs are clustered into subgraphs to find representative bands. In [25], the authors proposed a novel method to address the problem of hyperspectral band selection. The principle is to create a band adjacency graph, where nodes represent bands, and edges represent similarity weights between bands. A series of random matrices are defined by the alternation of two operators on the affinity matrix to form different clusters of high correlation bands. In [21], the authors proposed a multigraph determinantal point process (MDPP) model to capture the complete structure between different bands. MDPP employs multiple graphs to capture the intrinsic relationship between hyperspectral bands and provides an effective search strategy to select the best band.

Proposed Method
In this section, we introduce the proposed method (ANG-LSC) in detail. Specifically, based on the fact that the hyper-spectral bands are ordered, an adaptive subspace strategy is used to divide the hyperspectral image cube, and then, the bands with similar spectral characteristics are adaptively divided into a subcube, using local density, and the product of the distance factor sorts the bands and selects the required number of cluster centers. Finally, through information entropy and correlation analysis of the bands in different clusters, the most representative bands are selected from each cluster.
3.1. Adaptive Subspace Partition. There are two common problems in the selection of hyperspectral bands. First, if the band selection is directly performed on the entire hyperspectral image cube, it will consume a lot of time and computer resources. Second, the correlation between adjacent bands is higher than that of nonadjacent bands. Therefore, it is necessary to perform a simple segmentation of the hyperspectrum before selecting the band.
Suppose that the hyperspectral dataset is X i = ½x 1 , x 2 , ⋯, x L ∈ R N×L , where N is the number of pixels and L is the total number of bands. x i = ½x 1i , x 2i , ⋯, x Ni is a vector composed of the ith band. In order to reduce the calculation time, we need to divide the hyperspectral image cube into equal-width subcubes, i.e., if the number of bands we need to select is K, then the number of bands of the sub- Next, the Euclidean distance is used to construct a similarity matrix between the ith band and the jth band.
In conventional clustering algorithms, the final clustering results are obtained by distance between and within classes. Inspired by this, this idea is used to further subdivide the already segmented coarse subcube. Considering   Figure 2: Flow chart of the band selection algorithm.

Input:
Hyperspectral image X i = ½x 1 , x 2 , ⋯, x L ∈ R N×L , the number of bands to be selected K, the size of the Gaussian window r. Output: The number of selected bands n. 1: The original hyperspectral image cube is segmented into subcube P i by adaptive subspace partition method. 2: Calculate the information entropy H i of each band. 3: The information entropy matrix E is convolved with the Gaussian kernel function, and the shape is the same. 4: According to Equation (9), the similarity matrix between the two bands is calculated. 5: Calculate local density ρ i according to Euclidean distance and similarity matrix. 6: By calculating the minimum distance between each band and other high-density bands, the distance factor coefficient δ i of each band is obtained. 7: Take the product of the three factors as the comprehensive weight w i of each band. After sorting the weights in descending order, select the expected number of optimal bands as the clustering center to construct the desired band subset. 8: Calculate the local structural similarity index S of each band in the subcube. 9: A new weight W i is defined to reevaluate the quality of each band. According to Equation (16), the band with the largest weight is selected from each cluster as the representative band.
Algorithm 1: ANG-LSC for band selection. 4 Journal of Sensors that two subcubes far apart have little correlation, only two adjacent subcubes (P i and P i+1 ) are considered. The mathematical expression is where D inter and D intra are the interclass distance and the intraclass distance, respectively, and t is the dividing point.
The maximum distance is selected as the interclass dis-tance D inter . The intraclass distance is composed of the sum of U 1 of P i and U 2 of P i+1 , which can be expressed as where U 1 and U 2 are, respectively, A new partition point can be obtained through the above equation, and then, the initial point t is updated in the same way to obtain the final division point (as shown in Figure 1, the example divides the hyperspectral image cube (eight bands) into four subcubes. In the dotted line, the dark area means that only these adjacent bands are considered to update the current segmentation point).

Adjacency Clustering and Local Structural Correlation.
After the hyperspectral bands are divided into subcubes, the subcubes are usually processed independently. When selecting the representative band of each cube, traditional methods usually select the most relevant or most informative band in each cube. However, this strategy makes the representative band may not be representative. To address this problem, an effective method, LSC, which simultaneously selects the information band and the most relevant band, is adopted.
The adaptive subcube partition method can reduce the correlation between the subcubes and effectively avoid the selection of redundant bands. Next, the subcube is selected Gaussian kernel size 11 13 15 Salinas Pavia University Indian Pines   First, information entropy is used to evaluate the amount of information in each band, i.e., where Ω is a particular band of gray space and pðzÞ denotes the probability of a certain gray level in the band. The information entropy H i of each band can be calculated according to the gray histogram. Next, a Gaussian filter is used to kernel transform the given input matrix E into E * . Using the spatial information of the matrix, the transformed spectral value is calculated    Journal of Sensors through the matrix pixel value. The two-dimensional Gaussian distribution is where σ is the standard deviation and ðx 0 , y 0 Þ is the sample mean of the variables x and y. Two-dimensional Gaussian filtering is performed on the input matrix, and the Gaussian kernel width is set as R, then where Eðx + u, y + vÞ is the pixel value of the pixel in the neighborhood, E * ðx, yÞ is the filtered result, and ðu, vÞ is the position coordinate of the pixel in the core area. The Gaussian kernel function is a discrete approximation of con-tinuous Gaussian. According to the characteristics of Gaussian distribution and data distribution, the width of the sliding window and the standard deviation σ are determined. Then, calculate the similarity between the two bands, i.e., where SNNðx i Þ represents the number of elements shared by the x i and x j bands in the K-nearest neighbor sets. The local density ρ i is calculated from the Euclidean distance and similarity matrix.
The distance factor σ i of each band is obtained by calculating the minimum distance between each band and other Then, take the product of the three factors as the comprehensive weight w i of each band.
After sorting the comprehensive weights of all bands in descending order, the expected number of optimal bands is selected as the clustering center to construct a subset of the desired bands.
Finally, the optimal band is selected by measuring the local structure similarity index and information entropy of image quality, where the local structure similarity index (SSIM, abbreviated as S below) is where where μ x , μ y , σ x , σ y , and σ xy are the local means, standard deviations, and cross-covariance for images x, y. If α = β = γ = 1 and C 3 = C 2 /2, then S x, y ð Þ= 2μ x μ y + C 1 2σ xy + C 2 À Á

Journal of Sensors
In order to find a representative subset of bands with large information and low redundancy, a new weight W i is defined to evaluate the quality of each band. The specific equation is According to the above formula, the band with the largest weight is selected from each cluster as the representative band.
The algorithm flow chart of this paper is shown in Figure 2. For more details about the algorithm, the procedures are summarized in Algorithm 1.

Experiment
In this section, we will conduct a large number of experiments to evaluate the superiority of the proposed algorithm in hyperspectral image classification. First, we introduce three common hyperspectral image datasets. Then, the experimental environment, evaluation criteria, and comparison algorithm are described. Finally, we analyze the experimental performance of several methods in detail, and the

Runtime Environment.
The experimental environment is the tenth-generation Intel Core i7-10750H six-core processor, the main frequency is 2.60 Hz, the effective memory is 16 GB, and the development environment is in MATLAB R2016b.

Number of Selected Bands.
Since the actual number of bands to be selected in the three public hyperspectral image datasets is unknown, experiments are carried out in the range of 5-50 bands (with an interval of 5) to explain the impact of the number of different bands on classification accuracy.

Classifier.
In this experiment, KNN and SVM, two common classifiers, are used for classification. The parameter of the KNN classifier is set to 5. The SVM classifier employs the RBF kernel. Considering that these classifiers are supervised, we randomly select 10% of the samples from each category as the training set; the remaining 90% is the test set. To reduce the impact of randomly selecting 10% samples, the algorithm runs 10 times to obtain a mean result.

Accuracy
Measures. This paper employs overall accuracy (OA), average overall accuracy (AA), and Kappa coefficient as the accuracy measure for hyperspectral image classification. The larger the OA, Kappa value, the better the image classification effect, and AA is a common indicator to measure the classification results of small categories.

Results.
In this section, the effectiveness and superiority of the proposed algorithm are illustrated from three aspects, including Gaussian kernel size, number of selected bands, and classification performance to analyze the three datasets.

Gaussian Kernel
Size. The size of the kernel is critical. If it is too large, the finer features of the image may be smoothed out. But kernel selection is too small to eliminate the noise. Therefore, in order to select the appropriate kernel size, the kernel was successively set to 3, 5, 7, 9, 11, 13, and 15 to conduct experiments on the three datasets. The experimental results of the three datasets are shown in Figures 3-5.
From Figures 3-5, we can draw this conclusion. As the window continues to increase, the changes of OA, AA, and Kappa of the Salinas dataset tend to be stable. However, it can be seen from the other two data that as the window continues to increase, the three indicators of OA, AA, and Kappa are constantly decreasing. In summary, the size of the kernel is chosen to be 3 × 3 in this experiment.

Number of Bands.
In order to verify the band selection method proposed in this article, experiments are performed on three hyperspectral datasets in the range of 5-50 bands. Table 1 provides the classification results of the SVM classifier on the three datasets.
It can be seen from Table 1 that by setting the number of different bands, the values of OA, AA, and Kappa are gradually increasing. Although there is a temporary decline in 13 Journal of Sensors some bands, the value of the decrease is almost negligible. Therefore, it can be considered that the evaluation indexes are still increasing with the increase of the band. Besides, the table shows that when the number of bands is 15-20, the best effect can be achieved for the three datasets. For the Indian Pines dataset, it is more appropriate to select a band of 15. For the Pavia University and Salinas datasets, when the number of bands is 20, OA has reached 92.04% and 92.57%, respectively. In addition, from the experiment of three hyperspectral image datasets, some key results can be summarized, i.e., the algorithm in this paper can realize the classification of different datasets through multiple classifiers and the information entropy distribution of each band, and can obtain stable classification performance. The classification results are shown in Figures 6-8.
For the Indian Pines dataset, it can be seen from Figure 6(b) that the classification effect is not ideal, and there is a phenomenon of misclassification. Combined with the information entropy distribution diagram of the selected band, it can be seen that the number of bands in the interval 106-142 is too dense. The dense and large amount of information will easily affect the bands on both sides. Another reason is that for the Indian Pines dataset, there are too many features and features with small samples are not removed in the experiment. However, compared with previous methods for band selection, especially for low-dimensional data, the 14 Journal of Sensors method presented in this paper has certain advantages in OA, AA, and Kappa. For the Pavia University and Salinas datasets, by comparing the truth map and the classification result map, it can be seen that the classification effect is very good, which exactly corresponds to the values in Table 1. It can also be seen from Figures 7 and 8 that there are few misclassifications, which basically meet the classification requirements.

Classification Performance.
To prove the effectiveness of the proposed method, the method in this paper is compared with the four state-of-the-art algorithms. Two classifiers (KNN and SVM) are employed to analyze the hyperspectral image by using three accuracy evaluation criteria. For the Indian Pines dataset, Figure 9 and Table 2 clearly show the fact that the method in this paper is better than other methods in the case of fewer bands, which exactly meets the purpose of dimensionality reduction. And in 5 bands, the OA of this method has reached 73.32%, which is 4.44% higher than the state-of-the-art ASPS-MN method. On the other hand, when the number of selected bands exceeds 25, although the OA of the ASPS-MN method in some bands will be higher than that of this method, it can be seen from Table 2 that the stability of this method on the SVM classifier is better than other ones. The corresponding standard deviations of OA and Kappa are ±0.11 and ± 0.09, respectively. In addition, the accuracy of this algorithm is better than other algorithms, and its advantages compared with MDSR are very obvious. In summary, the method in this paper has better stability and superiority, and the classification is really effective as shown in Figure 10.
For the data of Pavia University, Figure 11 and Table 2 show that the algorithm in this paper has good performance on both OA and Kappa indicators, reaching 91.34% and 88.45%, respectively (the classification result is shown in Figure 12). Compared with the ASPS-MN method, it is improved by 2.78% and 2.55%, respectively. Compared with the other four methods, OA and Kappa can improve the maximum on the KNN classifier by 2.52% and 3.9%, respectively. When the number of selected bands is small, some algorithms show the instability of accuracy, especially MDSR and SOPSRL. All algorithms perform well when the number of selected bands exceeds 25. The classification advantage of this method in the Indian Pines dataset is not obvious, but it is well demonstrated in this dataset. Also, according to the SVM classification result curve, the advantages of this algorithm are more obvious than other algorithms. As for other algorithms, their accuracy increases steadily as the number of bands increases, and the ASPS-MN method can better illustrate this point.
For the Salinas dataset, Figure 13 and Table 1 reveal that the OA and Kappa of this method are significantly higher than other methods. It can be seen from Table 1 that the Kappa coefficient at this time can reach 92.58%, which shows that the method in this paper has higher classification accuracy and better effect for each ground feature. Furthermore, the following conclusion can be drawn from Figure 13, i.e., although the accuracy of the proposed method is lower than that of other algorithms in the initial stage, its superiority becomes very obvious when the number of bands exceeds 20. Particularly for the KNN classifier, the OA coefficient of this method is continuously rising and leading other algorithms. When the number of selected bands is small, SOPSRL and ASPS-MN fluctuate greatly. In addition, from the classification result map (Figure 14), it can be seen that there are fewer misclassifications in this method.
Through a large number of experiments in the three hyperspectral datasets, it can be seen that the performance of most comparison algorithms in the Indian Pines dataset is not very ideal. However, for the method proposed in this paper, it has a good performance in this dataset. In terms of

16
Journal of Sensors the classification performance of the other two datasets, the method proposed in this paper also has absolute advantages. In order to further illustrate the advantages and disadvantages of each band selection algorithm, the band selection is carried out on three datasets (taking 15 bands as examples) and the classification accuracy of each ground object by different algorithms is given (as shown in Figures 15-17). The band selection results of different algorithms on three datasets are shown in Table 3.
The following conclusions can be drawn from Figures 15-17. For the Indian Pines dataset, due to the dataset itself, the classification accuracy of all current algorithms is not high. Particularly for oats classification, the best performance is the ONR algorithm; other algorithms can not even make classification. Compared with other algorithms, it can be seen from the figure that the algorithm presented in this paper is superior in the classification of some ground objects (as shown in the yellow color in the figure). For the Pavia University dataset, the algorithm in this paper and other algorithms have a good classification performance, with 100% classification for shadows. But its classification effect is slightly better than that of other algorithms. At the same time, for the Salinas dataset, from the classification results, the algorithm in this paper does not show better classification 17 Journal of Sensors performance than other algorithms, but the algorithm in this paper can have the same excellent classification performance as the latest algorithms. The classification performance of the MDSR algorithm is far inferior to several other algorithms. The above experiment proves the effectiveness and superiority of the algorithm in this paper.

Conclusion
Recently, many clustering-based band selection methods have been proposed, but most of them only take into account redundancy between bands, neglecting the amount of information in the subset of selected bands. Furthermore, these algorithms never consider the hyperspectral bands as ordered. We propose an unsupervised hyperspectral band selection method based on adaptive neighborhood grouping and local structure correlation. First, the method divides the hyperspectral cube into multiple subcubes through adaptive subspace partition and employs the product of the local density and the distance factor to sort the bands and select the desired number of cluster centers. Then, according to the  Euclidean distance matrix and the cluster center, all the bands are divided into several clusters. Finally, based on information entropy and correlation analysis, the most representative band is selected from each cluster. Different from other clustering methods ranking-based, this method employs the clustering method to divide the spatially ordered hyperspectral cube into multiple subcubes to generate an overall framework, which effectively avoids the selection of highly correlated subsets. Next, the LSC method is utilized to consider the combination of bands in different clusters to avoid falling into the local optimal solution. A large number of experimental results on three publicly available hyperspectral image datasets show that this method has better robustness and superiority than other methods.
The quality of the cluster centers has an important impact on the performance of the method. In future work, we will explore a better cluster center selection mechanism to improve the robustness and effectiveness of the selected band.

Conflicts of Interest
The authors declare that they have no conflicts of interest.