Spectral-Spatial Hyperspectral Image Semisupervised Classification by Fusing Maximum Noise Fraction and Adaptive Random Multigraphs

College of Geophysics, Chengdu University of Technology, Chengdu 610059, China School of Electronic Information and Eletical Engineering, Chengdu University, Chengdu 610106, China Key Laboratory of Pattern Recognition and Intelligent Information Processing of Sichuan, Chengdu University, Chengdu 610106, China Geomathematics Key Laboratory of Sichuan Province, Chengdu University of Technology, Chengdu 610059, China Digital Hu Line Research Institute, Chengdu University of Technology, Chengdu 610059, China School of Intelligent Engineering, Sichuan Changjiang Vocational College, Chengdu 610106, China


Introduction
Due to the advancements in remote sensing technology, hyperspectral images (HSIs) are containing increasing spectral and spatial information (SSI), resulting in their extensive use in domains, such as forest inventory [1], urban area monitoring [2], road extraction [3], geological surveys [4], precision agriculture [5], environmental protection [6], military applications [7], hydrocarbon detection [8], oil reservoir exploration [9], and lake sediment analysis [10]. HSI classification is a crucial research topic related to these applications. In contrast to those of Synthetic Aperture Radar (SAR) [11] or RGB images [12], the two main challenges associated with HSI classification are the high dimensionality of the dataset and the redundancy of spectral information.
However, because the noise in hyperspectral data can easily mask subtle hyperspectral features, careful noise removal is required to extract useful information. is phenomenon is problematic for PCA, which maximizes the variance of the orthogonal set of the projection sample and vector. Unlike PCA, MNF aims to maximize the signal-tonoise ratio (SNR) rather than the variance. MNF is worth considering in HSI classification because removing the noise is effective during dimensionality reduction.
HSI classification can be significantly improved by employing an appropriate object-based [19], spectral-spatial fusion-based approach [20], decision fusion-based method [21], or deep learning-based method [22]. However, two challenges are unavoidable for most researchers. On the one hand, the traditional depth learning-based approaches rely heavily on a large amount of labeled data to achieve competitive results, whereas HSI annotation is expensive and time-consuming because it requires expert knowledge and skills. On the other hand, many methods require the use of a large number of parameter settings during the experiment, which involves expert experience [23]. HSI classification has been widely studied, and various classification methods have been adopted, namely, support vector machine (SVM) [24], random forests [25], neural networks [26], low-rank representation [27], sparse representation [28], deep learning methods [29,30], and meta-learning methods [31]. Xu et al. [29] directly used the random patches extracted from the HSI image as the convolution kernel, without any training, to improve the classification efficiency. Yin et al. [30] proposed a CapsNet-based alternative data-driven HSI classification model. e aforementioned results demonstrate that the proposed method can achieve ideal results when the training samples are sufficient.
To integrate the spatial information, researchers have proposed many spectral-spatial feature extraction (FE) techniques, such as the extinction profile (EP) [32] and local binary patterns (LBP) [33]. In contrast with the EP features, LBP features facilitate the mining of the HSI texture information, such as global contrast information and texture depth [34]. In this study, we adopted the LBP features as spatial features.
However, when the labeled samples in the training set are insufficient, the classification accuracy achieved by the traditional methods significantly reduces [35] because of the so-called Hughes effect or the curse of dimensionality [36]. Moreover, many methods require a series of manual parameter settings. For instance, to extract spatial features, researchers must select an appropriate window size for the neighborhood. e selection is time-consuming [37] and requires expertise [38]. Since 2010, ensemble learning methods for HSI classification have received significant attention because of their dependence on limited training samples [39]. Many methods have been developed, such as support vector machines [40], boosting [41], segmentation-based methods [19], unsupervised methods [42], and semisupervised methods [43]. However, the graph-based semisupervised method is rarely considered in HSI classification [44,45].
Motivated by the aforementioned discussions, by combining the SSI, we proposed a new spectral-spatial HSI semisupervised classifier based on MNF and adaptive random multigraphs (SS-MNF-ARMG). e primary contributions of this study are as follows: (1) A novel spectral-spatial HSI semisupervised classification framework was developed. Because of the adaptive properties, the optimal parameters can be determined without artificial auxiliaries. (2) By introducing the MNF, the noise in the HSI can be removed more efficiently during dimensionality reduction. On the basis of dimension-reduced HSI, the SSI is combined, which can degrade the curse of dimensionality. (3) In contrast with several studies, SS-MNF-ARMG can achieve competitive performance for three real HSI datasets while leveraging tiny labeled samples, which is further improved by introducing RMG in a new mode.

MNF.
Let X be the HSI data, and S and N are the signal and noise part of X, respectively. e goal of MNF is to seek out a linear transformation matrix W to maximize the SNR of the transformed data. Assume that S and N are uncorrelated; then X can be represented as [43] X � S + N.
And, the covariance matrix (CM) of X can be obtained by where S and N are the CMs of S and N, respectively. e MNF transform can be expressed as where Y is the MNF result of X, W are the eigenvectors associated with the L largest eigenvalues of −1 N X , and L is the number of MNF principal components. en the SNR of each y i ∈ Y can be described as where cov · { } computes the variance and w i is the ith component in W. en we can obtain W by solving the following problem: where P is the number of neighbors represented on a circle with the radius r, and N c and N i , respectively, represent the gray level intensity values of the center and the ith neighbour. e binary threshold function g(x) is described as Take the 10th band of Indian Pines HSI as an example; the procedure of LBP is shown in Figure 1. As shown in Figure 1, for a given center pixel in a 3 × 3 window, binary labels ("0" or "1") are assigned to adjacent pixels according to whether the gray value of the center pixel is large. Starting at the top left corner, all binary codes are joined clockwise to produce an 8-bit binary number. e resulting binary number is called LBP code. e results show that LBP algorithm has significant rotation invariance and gray invariance [46], and it can be effectively applied to HSI classification [47].

RMG. Given the HSI dataset
, then a weighted graph can be obtained. e vertices in the graph consist of X lab and X ulab . Weighted edges, which can be defined as a matrix W � R (l+u)×(l+u) , represent the similarities between associated nodes. For a c-class classification problem, it can be defined as a quadratic optimization problem [43]: where tr(·)is the trace function and F ∈ R (l+u)×c is the prediction matrix to be learned. e indicator vector y i ∈ Y � (y 1 , y 2 , . . . , y l , 0, . . . , 0) T ∈ R (l+u)×c is the label vector corresponding to x i and 0 is the 0 vector. In addition, we can obtain y ij by the following equation: en each x i can be classified to the jth class if F ij is the largest one in the ith row of F which can be described as is a diagonal matrix, and its element c i can be calculated as where α l and α u are two parameters. e popular choice forLis the graph Laplacian [48], which is defined as where W is the weight matrix of the graph, which can be formulated by Gaussian kernel as where σ is the kernel width parameter to be adjusted. And the diagonal matrix D is the row sum of W.
However, it is difficult to discover the neighborhood structure inherent in the graph and learn the proper compact representation automatically. To solve this problem, researchers have proposed the anchor graph algorithm (AGA) [49] and multiview anchor graph algorithm [50]. e AGA extrapolates the Laplacian eigenvector of the graph to the eigenfunction, allowing constant time hashing of new data points. en the hierarchical threshold learning method is used to make each feature function generate more than 1 bit to improve the search accuracy. And the label prediction function in the AGA can be expressed as whereP ij is the data-adaptive weight, and each a j in A � a j m j�1 is an anchor point. By this formula, the solution space of the unknown labels can be reduced from a larger space to a smaller one. e centers of the K-means cluster are selected as anchor points because these centers have a powerful representation that covers the entire dataset. In this paper, we use Local Anchor Embedding (LAE) [49] to calculate the anchor points. Figure 2 shows the flowchart of RMG.

Proposed Method
e SS-MNF-ARMG framework is shown in Figure 3. Because of their adaptive properties, the optimal parameters can be determined without artificial auxiliaries. e framework comprises three main modules: (1) Preprocessing the HSI image by applying MNF, the noise in the HSI can be removed effectively during dimension reduction. is result avoids the problem of the dimension. (2) rough LBP, spatial vectors corresponding to different neighborhood regions are obtained. ese spatial vectors are stacked with the spectral vector, respectively, and a series of spectral-spatial stacked feature information can be obtained.
(3) For classification and decision fusion, a set of necessary parameters for the RMG is established and the spectral-spatial feature information is integrated into the RMG. Next, a set of classification results with different accuracies are obtained, based on which the optimal classification results are obtained through decision fusion. In addition, by injecting randomness into the graph in the RMG, overfitting due to the limited training sample can be avoided.
e proposed SS-MNF-ARMG algorithm is summarized in Algorithm 1.
Discrete Dynamics in Nature and Society

Experimental Datasets.
ree hyperspectral datasets were employed to evaluate the performance of the SS-MNF-ARMG.
Indian Pines: this scene was gathered by the AVIRIS sensor over the Indian Pines test site in northwestern Indiana and consists of 145 × 145 pixels and 224 spectral reflectance bands in the wavelength range 0.4 to 2.5 μm. is scene, which includes 16 different ground truths, contains two-thirds of agriculture and one-third of forest or other natural perennial vegetation. e number of bands was reduced to 200 by removing the 24 water absorption bands.
Pavia University: this scene was acquired by the ROSIS sensor during a flight campaign over Pavia, northern Italy. e number of spectral bands was 103 at Pavia University. It is a 610 × 340 pixel image containing nine different ground objects with a geometric resolution of 1.3 meters.
Salinas: this scene was collected by the 224-band AVIRIS sensor over the agricultural area of Salinas Valley, California, and is characterized by high spatial resolution (3.7 m pixels). After discarding 20 water absorption bands, the size of this data image was 512 × 217, with 204 bands. Salinas ground truth contains 16 classes, including vegetables, bare soils, and vineyard fields.

Analysis of Experimental Parameters.
Parameters can affect the classification accuracy, such as the number of spectral bands ban dN um, the patch size W, the number of sampling points nr, the number of graphs kg, and the number of features kf. e adaptive properties of the proposed SS-MNF-ARMG are such that most optimal parameters for W, nr, kg, and kf can be determined without artificial auxiliaries.
Based on the existing result [

Input data
Randomly choose k f features to construct graph 2 Graph-based inference to get label prediction F2 Graph-based inference to get label prediction Fk g

Voting Predicted results
Randomly choose k f features to construct graph 1 Randomly choose k f features to construct graph k g Graph-based inference to get label prediction F1 In this study, we set the number of bands from 3 to 35 to evaluate the impact on the three HSI datasets. We conducted the experiment five times, and the optimal experimental results are shown in Figure 4.
It can be observed that, on the Indian Pines dataset and the Salinas scene dataset, the value of overall classification accuracy (OA) shows a trend of steady growth. In other words, the increase in the number of spectral bands number of training has a positive promotion on the classification performance. In particular, on the Pavia University dataset, there is a sharp growth when the number of bands is above 15. In general, the OA of the proposed method is above 78% for all three HSI datasets. Especially in the Salinas scene dataset, the OA of the method is not less than 92% even with a small number of spectral bands, demonstrating the robustness of our method.

Comparison and Analysis.
e proposed SS-MNF-ARMG method was compared with several state-of-the-art spectral-spatial fusion methods, such as Pixon-based classifier [19], PCA-SPCA-2D-SSA [14], R-VCANet [20], RN-  (1) for each h i ′ ∈ bandNum do (2) Obtain the dimension-reduced HSI D i ′ ∈ R x×y×h i ′ by using MNF; (3) Extract the spectral vector Spe vec of D i ′ ; (4) for each w j ∈ W do (5) Calculate the spatial vector Spa vec j by using LBP; (6) Obtain the spectral-spatial vector by stacking the Spe vec and the Spa vec j ; (7) Obtain the best OA i,j (overall accuracy) by voting, and the corresponding confusion matrix ConfMat i,j ; (8) end for (9) Obtain the best OA′ from OA i,j by decision fusion, and the corresponding confusion matrix ConfMat′; FSC [31], iCapsNet [30], RPNet [29], and MBFSDA [21]. A comparison of the above algorithms can be seen in Table 1.
To quantitatively compare the classification performance of the methods shown in Table 1, we used the average classification accuracy (AA), overall classification accuracy (OA), and the kappa coefficient (kappa) to assess the classification performance. To demonstrate the superior performance of the SS-MNF-ARMG with a limited number of training samples, we randomly selected 10 samples from each class as training samples. Tables 2-4 show the ground truth classes and their respective training and testing numbers for the three HSI datasets. Tables 5-7 summarize the experimental results for the three HSI datasets, from which the following conclusions can be drawn: (1) e results on the Indian Pines dataset show that almost all algorithms are effective. We can observe from Table 5 that the proposed SS-MNF-ARMG achieves a 3.78-28.6% advantage over the other methods in OA. In addition, for the classes that other methods do not recognize accurately, SS-MNF-ARMG can obtain better results, such as objects 1#, 2#, 3#, 5#, 6#, and 7# in Table 5.
(2) e results on the Pavia University dataset demonstrate that our method has advantages over some state-of-the-art methods. As a decision fusion-based algorithm, the proposed SS-MNF-ARMG surpasses MBFSDA by 3.87% in OA. In addition, for the classes that other methods do not recognize accurately, SS-  Discrete Dynamics in Nature and Society MNF-ARMG can obtain better results, such as objects 2#, 3#, 6#, and 8# in Table 6. (3) e results on the Salinas dataset show that all methods show close results in AA, but SS-MNF-ARMG has a competitive advantage in OA and kappa. From Table 7, we can observe that, compared with R-VCANet, SS-MNF-ARMG obtains a 0.1% improvement in OA, a 3.13% improvement in AA, and a 7.31% improvement in kappa. Furthermore, the proposed SS-MNF-ARMG surpasses iCapsNet by 11.01% in OA. In addition, for the classes that other methods do not recognize accurately, SS-MNF-ARMG can obtain better results, such as objects 3#, 7#, 8#, 9#, 15#, and 16# in Table 7. (4) In general, the decision fusion-based methods outperform the segmentation-based methods and feature fusion-based methods. First and foremost, the LBP features improve the performance of the SS-MNF-ARMG, and the application of MNF reduces the HSI data dimension, controlling the noise in the Table 2: e ground truth classes and their respective training and testing numbers of the Indian Pines datasets (10 classes). Training  Testing  1  Corn-notill  10  1418  2  Corn-mintill  10  820  3  Grass-pasture  10  473  4 Grass-trees 10 720 5

Name of class
Our experiments were performed using MATLAB 2018b on a computer with an IntelCore(TM) i5-4300M 2.60 GHz CPU, 16 GB memory, and a 64-bit Windows 7 system. For the three real HSI datasets, the duration to execute our algorithm was several minutes to several hours; notably, the duration to execute the other methods that we reviewed in this study was shorter than that of our method.

Conclusions and Further Research
In this study, we developed a novel decision fusion method for HSI data classification. In the proposed SS-MNF-ARMG, MNF and multiscale LBP were integrated to extract local SSFs. On the one hand, MNF helped reduce the dimension of the HSI, remove the noise in the HSI, and extract the spectral features from the HSI data. On the other hand, multiscale LBP was applied to the MNF dimension-reduced images to derive spatial features at different scales. ese spatial features were further fused with the MNF spectral feature to form the SSFs. Compared with some state-of-theart spectral-spatial classification methods, our experimental results have demonstrated that SS-MNF-ARMG can achieve higher classification accuracy with limited training samples.
is method is effective for distinguishing different land cover types. In addition, a set of optimal parameters for different hyperspectral data can be obtained automatically.
Although the SS-MNF-ARMG algorithm has provided promising results, the classification accuracies of various datasets remain different. Further research could attempt to further improve the generalization ability of our method. Due to human ecological destruction or natural disasters (e.g., earthquakes), the vegetation in some areas has significantly changed. Monitoring vegetation restoration in these areas is of substantial importance. erefore, we plan to research the application of the HSI classification in vegetation restoration monitoring.

Data Availability
e data used to support the findings of this study are available at GIC (http://www.ehu.eus/ccwintco/index.php? title�Hyperspectral_Remote_Sensing_Scenes).

Conflicts of Interest
e authors declare no conflicts of interest.