Improving Hyperspectral Image Classification Method for Fine Land Use Assessment Application Using Semisupervised Machine Learning

Study on land use/cover can reflect changing rules of population, economy, agricultural structure adjustment, policy, and traffic and provide better service for the regional economic development and urban evolution. The study on fine land use/cover assessment using hyperspectral image classification is a focal growing area in many fields. Semisupervised learning method which takes a large number of unlabeled samples and minority labeled samples, improving classification and predicting the accuracy effectively, has been a new research direction. In this paper, we proposed improving fine land use/cover assessment based on semisupervised hyperspectral classification method. The test analysis of study area showed that the advantages of semisupervised classification method could improve the high precision overall classification and objective assessment of land use/cover results.


Introduction
Remote sensing can quickly obtain surface information, achieve understanding, and study surface characteristics of the spatial distribution through transferring, processing, and analyzing the data.The advantage of high spectral resolution remote sensing is that it can obtain many continuous band spectral images therefore, it achieves a fine description of ground targets and reaches the purpose of identifying features, especially suitable for plant fine classification compared with the conventional remote sensing methods [1][2][3][4][5].
The land use information has great significance to land resource survey, planning, and dynamic monitoring.The analysis of land use/cover assessment using hyperspectral image remote sensing has attracted more and more attention in recent years [6][7][8][9].Currently, supervised classification and unsupervised classification are the two traditional classification methods for land use/cover.The supervised classification is based on class probability density function for samples in spatial feature.Generally, it takes higher classification accuracy and needs a lot of correct training samples.Pal reported the usage of the extreme learning machine (ELM) algorithm for land use of hyperspectral image and achieved good effect [10].Bao et al. proposed that SVM and random feature selection (RFS) are applied to explore the potential of a synergetic use of the two concepts in order to produce highly accurate [11].Stankevich et al. proposed a new supervised hyperspectral imagery classification using the imagery spectral bands as fuzzy data source attributes and cumulative mutual information resulting fuzzy classification as decision tree inducing criterion [12].
Unsupervised classification is a clustering method and can detect unknown classes on images.The advantage of unsupervised classification is simple and efficient.However, it cannot guarantee the real relationship between the clustering features classes and surface features classes [13,14].
Accessing training data for land cover classification using hyperspectral data is time consuming and expensive especially for hard-to-reach areas.Semisupervised learning research for land use/cover classification using a small amount of 2 Journal of Spectroscopy labeled samples based hyperspectral image becomes a new research hotspot.Rajan et al. proposed that an active learning is well suited for learning or adapting classifiers when there is substantial change in the spectral signatures between labeled and unlabeled data [15].Jun and Ghosh proposed a semisupervised learning algorithm called Gaussian process expectation-maximization (GP-EM) for classification of land cover based on hyperspectral data analysis [16].Munoz-Mari et al. proposed a semiautomatic procedure to generate land cover maps from remote sensing images [17].Jun and Ghosh proposed a semisupervised spatially adaptive mixture model (SESSAMM) to identify land covers from hyperspectral images in the presence of previously unknown land-cover classes and spatial variation of spectral responses [18].
So far, these research results have promoted the development of the land use/cover assessment, but there were some problems with these study methods.On the one hand, the misjudgment probability of this strategy is relatively big.On the other hand, these methods still cannot improve classification accuracy significantly especially because the types classified are many.
A new fine land use/cover assessment method was proposed using hyperspectral image classification based on combining Rényi entropy and multinomial logistic regression semisupervised learning model.Finally, the paper has land use/cover assessment experiment by the real hyperspectral remote sensing image data.It shows that it can improve the accuracy of classification and improve fine land use/cover assessment.

Modelling of Fine Land
Use/Cover Assessment  1.The spectral curves of different tree species are shown in Figure 2. Furthermore, the green artificial paints and hyperspectral remote sensing can identify vegetation that cannot be distinguished by the human eyes.Liu et al. proposed that the outside laboratory spectrometer using ultraviolet, visible, near-infrared spectrometer can be used to measure all kinds of vegetation reflectance spectra such as Indus, Camphor, Broussonetia papyrifera, and Vine vegetable and compared with some reflection curve of green paint.The results show that the spectral curves feature can distinguish between vegetation and green paint [19].The differences between green paint spectrum and vegetation spectrum are shown in Figure 3.

The Method of Semisupervised Classification.
The core idea of this method of semisupervised classification is that, firstly, selected small amount of sample data are performed by multinomial logistic regression algorithm.The fitted regression coefficient can describe the direct relationship between selected sample pixel and its category effectively.
Then, hyperspectral image is classified by using the fitted regression coefficient.Secondly, the entropy of the experimental area is calculated through Rényi entropy calculation method that was proposed by Rényi in 1961 [20], and then some unlabeled samples of maximum Rényi entropy are selected from the calculation data to be added to the sample data.The classification of multinomial logistic regression is not iterated repeatedly for many times until the classification accuracy tends to be stable.

The Principle of Multinomial Logistic Regression Algorithm.
The multinomial logistic regression algorithm can predict the fitted coefficient quickly and accurately [21,22]; the vector of parameter coefficient  = ( 1 , . . .,   ) is gotten by using multinomial logistic regression algorithm for labeled training sample set.Equation ( 1) represents the probability formula that probability of the event occurring that multinomial logistic model is expressed as outcome variables: When  is equal to 1, . . ., , the ℎ(  ) can be represented as The coefficient of  is estimated through the estimation criteria of Bayesian maximum a posteriori and the loglikelihood algorithm [23,24]; the solving method is given as follows: Equation ( 3) calculates fitted coefficient of multinomial logistic classification; the estimating type CLA( +1 ) brought by a new pixel  +1 is calculated as follows: Each predicted category (CLA 1 , . . ., CLA  ) for hyperspectral remote sensing image is calculated through (4); it represents a classification process is completed.
The important issue studied is how to add new samples to the training sample in the process of semisupervised classification.This process is conducted automatically by predicting through the current classifier.The samples gotten through predicting by multinomial logistic regression classifier, which have the biggest probability, were added.Although sample category is exact, the amount of information contained is least.The positive effect on the next classification process is the smallest and it cannot improve classification accuracy if these samples are added to the training set.On the contrary, it will increase the computational burden of the classification algorithm in the process of selecting training samples.So the research for selecting unlabeled sample added to the training sample is very important.

Selected Unlabeled Samples Using
Rényi entropy can reflect the uniformity of the attribute value distribution, and it is a scale for measuring the degree of uncertainty for information and the amount of information.This paper uses Rényi entropy with two times to describe the amount of information.It is shown in where (  ) takes the predicted probability value of each pixel because the object of study is hyperspectral image data.Equation ( 6) is normalized to prevent infinite value appeared in the process of calculation and obtained as Hyperspectral image data set contains a large amount of implicit information, through the comprehensive analysis of the statistical information in the data set.Some regularity knowledge is used to obtain data connection.The classification result of sample that has the bigger entropy is uncertain for the current classifier and is the most informative sample.The new training set which is retrained with the unlabeled samples of maximum Rényi entropy is used for the new classification process.The overall accuracy of hyperspectral image classification is greatly improved. = ( 1 , . . .,   ) ∈  × represents the hyperspectral image data; the maximum Rényi entropy value is calculated by () represents the final choice of unlabeled set in the above formula,  2 () represents Rényi entropy calculated for each pixel on image in the process of the last classification, and then values of Rényi entropy are sorted.The  samples extracted from them are added to the existing training sample set. is the number of labels in the supplementary training set.
The process of the algorithm is shown as follows.
Input.The original hyperspectral image  = ( represents the remaining unlabeled tag on image.
Output.The classification result of the hyperspectral image data is shown.

The Process of Algorithm.
The process is described as below.
Step 4. Use regression coefficient to structure the model using the predicting classification.Step 8.Return to Step 3 and iterate  times of the semisupervised learning process.
Step 9. Output the classification result of the hyperspectral image, and the process of algorithm is over.

The Study Area and Validation Images
The typical test images and the parameters of which are presented in Table

Experiment of Land Use/Cover Assessment
4.1.Data Preprocessing.The first step is radiometric correction in data preprocessing.The radiometric correction is the key step of quantitative analysis in data preprocessing and is the premise of the research work of quantitative analysis, reflectance retrieval, and information extraction.The second step is dimension reduction method for hyperspectral image.Hyperspectral images have the ability to distinguish surface features nuances in high spectral resolution at the same time; also the dimension disaster is brought.This phenomenon has seriously affected the accuracy of classification and efficiency of classification for hyperspectral image.The purpose of dimension reduction techniques is based on image feature extraction to use lowdimensional data to effectively express the characteristics of high-dimensional data.It not only preserves the image information but also reduce the volume of data effectively.The common dimension reduction algorithm is principal component analysis [25].

Comparison for Different Classification Methods.
The important step of the land use/cover research is to establish a scientific classification method for study area.To illustrate the effectiveness of the proposed semisupervised classification method, the several representative machines learning classification methods including supervised method and unsupervised method are selected to compare with the method proposed.The Hyperion data set is used to analyze the performance of the proposed method in comparison with other   The results of experiment are shown in Figure 5 and Table 2. Based on the classification results of various classification methods, it can be concluded that the results are analogous with experiment.The overall classification of the -means method is lowest although the running time is shortest.The serious leak misclassification is shown in Figure 5 and the only five classes were distinguished from unsupervised classifier.
The overall classification accuracy of supervised classifier of minimum distance method is better than the unsupervised -means classifier method.However, poor quality can be clearly seen from Figure 5(b) that the buildings feature is more serious misclassification error.
The overall classification accuracy of artificial neural network classifier method in experiment is more than unsupervised method because the feature category is simpler and the image size is smaller in experiment.The method disadvantages include longer running time, inefficiency, and misclassification error.Figure 5(c) shows that the buildings feature and wilderness cannot be distinguished by the method.The road feature has almost disappeared in the classification results.
The support vector machine classifier is relatively high quality method including the running speed, the relatively high overall classification accuracy, and the best effect of classification for a large contiguous area.However, this method reveals the low ability for analyzing the small surface features showing the error of leakage.For instance, the SVM method will eliminate the small building and road feature in Figure 5(d).
The semisupervised classification method shows again good performance in experiment from Figure 5 and Table 2.The proposed semisupervised classier can effectively improve the accuracy of classification and obtain better classification results, especially small features such as road and building features.
From the above results in experiment, the new semisupervised method can obtain better classification results for hyperspectral image data and can effectively improve the accuracy of classification, which is received to be 97.31%,respectively.The experiment also showed that the new semisupervised method includes especially large-size images and complex images.

4.3.
Making the Thematic Maps.The thematic maps not only are objective assessment of classification result, but also can check the situation of land use/cover in different periods.
The thematic maps of a variety of categories of surface features are shown in Figure 6.
From Figure 6, it is shown that the barren vegetated features are the maximum areas in experimental zone.The buildings features occupy the minimum areas.The overall distributions of grasslands are of high concentration.The croplands are more concentrated around water bodies and are more effectively irrigated by water.

Conclusions and Future Work
In this study, a semisupervised method of classification for hyperspectral remote sensing images is applied to improve fine land use/cover assessment.Firstly, this paper sets forth the basic theories of fine land use/cover assessment, which are differences between the spectral characteristics.Secondly, the method of semisupervised classification is described in detail.Thirdly, the paper elaborates the study area and the parameters of validation images.Finally, the fine land use/cover test of study area is conducted using the semisupervised hyperspectral image classification.The experimental results show that the method has a high precision overall classification.The thematic maps of classification results are objective assessment of land use/cover.Study on land use/cover can reflect changing rules of population, economy, agricultural structure adjustment, policy, and traffic and provide better service for the regional economic development and urban evolution.
However, the proposed semisupervised method still has inadequacies, such as long-running time, and needs further more computing power and hardware.Future research priority will focus on optimizing method, saving the running time, and promoting working efficiency.

Figure 1 :
Figure 1: The description model of hyperspectral image data.(a) The model of original hyperspectral image.(b) The spectral model of original image.(c) The ground feature model of original image.

Figure 2 :
Figure 2: The spectral curves of different tree species.

Step 5 .Step 6 ..Step 7 .
Select unclassified pixels  ∪ new   in image  and predict the classification result (CLA 1 , . . ., CLA  ).Calculate the Rényi entropy value of each pixel in  ∪ new   The new  pixels of unlabeled sample set  new   of maximum Rényi entropy were extracted and used to update training set .  new =  ∪ new  +  new ,  = (1 ⋅ ⋅ ⋅ ).

Figure 4 :
Figure 4: The T-1 original image of the study area.(a) The original Hyperion hyperspectral image (bands 29, 20, and 11).(b) The training sites overlay of original image.(c) The spectral curves of 6 kinds of major categories feature representing the land cover types of study area.(d) The thematic maps definition of 6 classes.

Figure 5 :
Figure 5: The classification results of study area using various classifier methods.(a) The classified image using -means classifier method.(b) The classified image using minimum distance classifier method.(c) The classified image using artificial neural network classifier method.(d) The classified image using support vector machine classifier method.(e) The classified image using the semisupervised classifier of the proposed method in this paper.
methods.The classification results of a variety of classification methods are shown in Figure 5.The classification results of unsupervised classifier of the -means method are shown in Figure 5(a).The classification results of supervised classifier of the minimum distance method are shown in Figure 5(b).The classification results of supervised classifier artificial neural network method are shown in Figure 5(c).The classification results of supervised classifier of support vector machine method are shown in Figure 5(d), and the final Figure 5(e) is the classification results of the semisupervised classifier of the proposed method in this paper.

Figure 6 :
Figure 6: The thematic maps of a variety of categories of surface features in study area.(a) The thematic maps of barren vegetated.(b) The thematic maps of croplands.(c) The thematic maps of grasslands.(d) The thematic maps of wilderness.(e) The thematic maps of water.(f)The thematic maps of buildings.

Table 1 :
Parameters of the original test images.

Table 2 :
Overall accuracy and Kappa coefficient based on comparison of the varying classification methods.