Land Resource Use Classification Using Deep Learning in Ecological Remote Sensing Images

Aiming at the problems that the traditional remote sensing image classification methods cannot effectively integrate a variety of deep learning features and poor classification performance, a land resource use classification method based on a convolutional neural network (CNN) in ecological remote sensing images is proposed. In this study, a seven-layer convolution neural network is constructed, and then the two fully connected layer features of the improved CNN network training output are fused with the fifth layer pooled layer features after dimensionality reduction by principal component analysis (PCA), so as to obtain an effective remote sensing image feature of land resources based on deep learning. Further, the classification of land resources remote sensing images is completed based on a support vector machine classifier. The remote sensing images of Pingshuo mining area in Shanxi Province are used to analyze the proposed method. The results show that the edge of the recognized image is clear, the classification accuracy, misclassification rate, and kappa coefficient are 0.9472, 0.0528, and 0.9435, respectively, and the model has excellent overall performance and good classification effect.


Introduction
e remote sensing image is a comprehensive image reflecting various surface information obtained by sensors. e research on target classification of large-area remote sensing images is not only an important way to obtain land cover information but also provides important basic support for its application in the fields of sea situation monitoring, urban planning, environmental supervision, rescue, disaster relief, and military reconnaissance; it is of great significance both from the perspective of social economy and ecological environment [1]. With the continuous development of remote sensing technology, remote sensing images now show the characteristics of hyperspectral, high space, and high resolution. e information obtained from the images is more and more comprehensive, and its application field is also expanding [2,3]. e remote sensing image of land resources has a large amount of data, complex information, and fast update. erefore, how to accurately extract useful land information from massive remote sensing image data by computer to achieve efficient land use is a key problem to be solved [4].
For remote sensing image target classification, the computer automatically distinguishes the attributes of pixels in remote sensing images with patterns representing certain features through a pattern recognition system, so as to obtain the classification information of remote sensing images [5]. In the research of land resource use classification using remote sensing images, researchers mostly used visual interpretation and traditional pattern recognition classification methods at first. Visual interpretation is simple, but it takes a long time, and there are personal differences, resulting in inaccurate classification [6]. Traditional classification methods include a minimum distance method, maximum likelihood method, etc. [7,8]. Reference [9] studies feature extraction based on high-resolution remote sensing images for coastal land use planning. rough the research and analysis of space motion remote sensing image sequence, the characteristic parameters of land environment and moving objects are obtained, but the consideration of ecological factors is relatively single, which does not have good popularization. Reference [10] compares the classification results of remote sensing images in specific areas by four methods: random forest, support vector machine, regression tree, and minimum distance. Reference [11] proposed a method of normalized differential vegetation index (NDVI) using time series. Using a time series NDVI database to modify the classification results can significantly improve the classification accuracy of land cover products, but this method increases the amount of calculation and cost. Reference [12] used the object-oriented classification method combined with fuzzy classification and cart (classification and expression tree) decision tree classification method to classify the land information of Dongjiang River Basin and obtained a more accurate classification effect than the maximum likelihood method and unsupervised classification method. Although the pattern recognition classification method overcomes some shortcomings of visual interpretation, it is not good at extracting spatial information and has poor flexibility.
With the development of remote sensing technology and computer technology, many new classification methods are gradually emerging, mainly including artificial neural network (ANN), support vector machine (SVM), and fuzzy theory and expert system [13,14]. Reference [15] proposed a remote sensing image classification method combining SVM and k-nearest neighbor. Using the class separability of SVM and the spatial and spectral characteristics of remote sensing data, a distance formula is designed as the measurement standard considering vector brightness and direction, which effectively realizes the accurate classification of remote sensing images, but the classification efficiency is low. Reference [16] uses the land segmentation method of remote sensing image based on the convolutional neural network to realize the correct marking of different land cover types. However, for remote sensing images with complex background, more and larger database learning and training are needed to better complete the classification task. Aiming at the problem that traditional remote sensing image classification methods are vulnerable to the loss of spatial features, reference [17] proposed an image semantic segmentation method based on a dense coordinate transformation network, which improves the accuracy of semantic segmentation of high-resolution remote sensing images but still has a certain dependence on the training data set. Reference [18] proposed a feature integration network including multiscale features and enhancement stages for the classification of land remote sensing images and used two-dimensional extended convolution with different sampling rates for each scale feature layer to realize image classification with higher accuracy than ordinary depth learning methods, but the classification efficiency needs to be further improved. e improvement of most classification algorithms can improve the accuracy of land resources classification, but still, there are problems such as too large processing scale, complex calculation, and easy to fall into the minimum. In particular, it is difficult to meet the needs of current applications in classification efficiency and speed and cannot well solve many problems of high spectral remote sensing images for land resources [19]. erefore, this study proposes a land resource use classification method using deep learning in ecological remote sensing images. e innovations of this study are summarized as follows: (1) In this study, three high-level features of remote sensing images are extracted by using the convolutional neural network (CNN), and a variety of depth image features are fused in series. e fused features cover more complete information and have stronger discrimination.
(2) To further improve the classification performance, the proposed method designs a remote sensing image classifier based on SVM, which combines deep learning features with a deep classifier to solve the problem of poor classifier performance.

Study Area and Data
e visible shortwave infrared hyperspectral camera carried by the "Gofen 5" (GF-5) satellite has a spectral resolution of 5-10 nm, a spatial resolution of 30 m, and a width of 60 km. e camera can simultaneously obtain the spatial information and spectral information of 330 continuous spectral segments of ground objects in the range of 400-2500 nm. e collected data are mainly composed of two parts: visible near infrared (VNIR) and short wave infrared (SW). Among them, VNIR has 150 bands and SW has 180 bands, a total of 330. e VNIR band range is about 0.39-1.03, the spectral resolution is 5 nm, the SW band range is about 1.0-2.5, and the spectral resolution is about 10 μm.
e study area of this study is located in the Pingshuo mining area, Shanxi Province, covering about 400 km 2 , N39°24′52″-39°37′15″, and E 112°16′29″-112°33′43″. is area is the largest open-pit coal mine in China, and the ecological environment has been damaged due to perennial mining. erefore, it is of great significance to study the land cover types in this area. e data used in the proposed method are visible short wave infrared hyperspectral data of GF-5 satellite, with a total of 4 images. e corresponding high-resolution image of "Gaofen-2" satellite with the closest region (the spatial resolution of the fused image is 0.8 m) and the global 30 m land cover type map were obtained free of charge from Tsinghua University. First, atmospheric correction is carried out to remove the impact of atmosphere on the image. en, referring to the thematic map of land cover types with 30 m spatial resolution, the land cover types are manually drawn on high-resolution images using the Environment for Visualizing Images (ENVI) platform. Finally, the coverage type map is downsampled to 30 m resolution as the real label of land cover in this area, as shown in Figure 1.

System Model.
e proposed method first designs a seven-layer CNN and then inputs high-resolution remote are represented by the first layer, the second layer, the third layer, the fourth layer, and the fifth layer, respectively. e first layer, the second layer, and the fifth layer contain convolution layer and pool layer. Each convolution layer is represented by Conv1, Conv2, and Conv5, respectively, and each pool layer is represented by Pool1, Pool2, and Pool5, respectively. Both the third and fourth layers have only one convolution layer, which is represented by Conv3 and Conv4. e sixth and seventh layers are all connected layers, represented by FC6 and FC7, respectively. e overall architecture of remote sensing land image classification method based on a 7-layer CNN network structure is shown in Figure 2. (4) e training samples of remote sensing images are used to train the CNN. First, the remote sensing image training set is input into the built CNN to calculate the output value of each neuron of CNN.
Assuming that layer l is a convolution layer, the calculation of the j feature map y l j of layer l is as follows: where * is convolution operation, y l−1 i is the i feature map of layer l − 1, κ l ij is the convolution kernel used for connection between y l−1 i and y l j , b l j is the offset of y l j , δ is the activation function, and M l− 1 is the number of feature maps of layer l − 1.
Assuming that layer l is a pool layer, the calculation of the j feature map y l j of layer l is as follows: where α l j is the pooling parameter of y l j , y l−1 i is the j feature map of layer l − 1, f is the pooling function, and b l j is the offset of y l j . Assuming that layer l is a fully connected layer, the calculation of the j feature map y l j of layer l is as follows: where y l− 1 is the weighted result of all feature maps of layer l − 1 and b l j is the offset of y l j . Second, the overall loss function of CNN is calculated. Any one of the label samples G i (i � 1, 2, . . . , N × m) is set in the remote sensing image training set. e label of G i is actually a one-of-N label. at is, for sample G i , its classification label is as follows: For label sample G i , if the probability of model identification of class k(k � 1, 2, . . . , N) is p k i , then the error is defined as E i : Based on the errors of all training samples, the loss function φ E of the model is calculated as follows:

Computational Intelligence and Neuroscience
Finally, the gradient descent algorithm is used to minimize the loss function and update the parameters in the network. e purpose of training CNN is to find the optimal parameters to minimize the loss function φ E . e parameters of CNN are κ l ij , α l j , and b l j . ψ represents the above three parameters, that is, ψ � (κ l ij , α l j , b l j ); after the CNN is trained through the remote sensing image training set, a set of parameters ψ * can be obtained as follows: e gradient descent algorithm is used to update the parameter ψ of CNN and minimize the loss function φ E : where ε is the learning rate of CNN, which determines the adjustment range of each step; ψ (i) is the updated parameter of group i; ψ (i− 1) is the parameter of group i − 1; zφ E /zψ is the partial derivative of parameter ψ for loss function φ E .

Improved Activation Function TReLU.
In this study, a TReLU activation function combining the advantages of tanh function and parametric ReLU (PReLU) function with parameters is proposed. e TReLU activation function not only retains the advantages of fast convergence speed of PReLU function and can alleviate the disappearance of gradient but also uses tanh function to introduce negative half axis activation value and its soft saturation characteristics to prevent "neuron death" and offset and is more robust to noise [20,21]. e mathematical expression of TReLU activation function is as follows: where β is a variable parameter used to control the unsaturated region of the function. e function image corresponding to TReLU is shown in Figure 3 (assuming that β is 1). e initial value of β is set to 1. As can be seen from Figure 3, the function is approximately linear at the origin and has a fast convergence speed [22,23]. Compared with the existing activation functions Sigmoid, ReLU, and PReLU, the proposed improved activation function has the following advantages: (1) he Problem of Gradient Disappearance. When x > 0, the derivative value of the function is always 1, so the TReLU function maintains the gradient without attenuation at x > 0, which alleviates the problem. (2) Activation of Negative Values. e TReLU function retains some gradient values in the negative half-axis unsaturated region. When the activation value falls into the unsaturated region, it can still obtain effective activation and retain the characteristics of the image. At the same time, the size of the unsaturated region is controlled by parameter β to activate the negative value feature more effectively [24]. In the actual training, with the continuous training, by automatically adjusting the parameters of β, more eigenvalues falling on the negative axis can be activated and more information can be transmitted to the front layer, alleviating the phenomenon of gradient disappearance [25]. (3) Approximation to 0-Means Distribution. e TReLU function has an active value on the negative half axis, which ensures that the mean value of the output is approximately 0. e average value of the output of the upper layer is approximately 0, which effectively alleviates the offset of the ReLU activation function, and the weight can be updated quickly, so as to obtain a faster gradient descent speed. (4) Robustness to Noise. e TReLU function has soft saturation on the negative half axis when the output range of the function is [0, 1). Soft saturation means that the function can reduce the change of information output to the next layer, which is robust to noise and reduces complexity.

High-Level Feature Extraction.
e designed CNN is used to extract multiple depth features of remote sensing images. First, the whole data set, including all sample images in the training set and test set, is input into the trained CNN, and the first five layers of features of all sample images are automatically learned through the CNN model. Among them, the convolution kernel of the first layer mainly extracts the bottom features of the image, such as edges, angles, and curves. e input of the second layer is actually the output of the first layer. e filter of this layer can be used to detect the combination of bottom features, such as semicircle and quadrilateral and these information correspond to the color, edge, contour, and other features of the image. e third layer is the image texture feature. e fourth layer of learning obtains more distinctive features, which reflect the differences between classes. e fifth layer of learning obtains  complete and discriminative key features, which are a class of objects with significant differences in remote sensing images. Finally, the output result of CNN layer 5 pooling layer can be obtained, which includes all the characteristic diagrams calculated by layer 5 pooling layer [26]. en, using equation (3), the output results F 6 and F 7 of CNN layer 6 and 7 full connection layers FC 6 and FC 7 can be obtained, including all the characteristic diagrams calculated by FC 6 and FC 7 layers. Finally, F 6 and F 7 are two different high-level features of remote sensing images.

Feature Dimensionality Reduction and Classification.
For the output of the fifth pool layer of CNN, the principal component analysis (PCA) method is used to reduce the dimension, and the reduced dimension result is used as the third high-level feature of remote sensing image. PCA dimensionality reduction process is as follows: (1) Matrix Deformation. e output result of the fifth pool layer of CNN is transformed into a two-dimensional matrix C, and each row of the two-dimensional matrix reflects the feature vector corresponding to a remote sensing training sample. (2) Zero Mean. Each column of two-dimensional matrix C is zeroed to obtain a new matrix C 0 , and the average value of each column of C 0 is 0. After using PCA to reduce the dimension of deep features, the enhanced deep learning features are used to train an SVM model. A multiclass SVM is defined as follows: where ω ′ is the projection of multiclass SVM model; τ is the penalty parameter, set it to 0.01; s a is a nonnegative relaxation variable; z a is the enhanced feature after PCA dimensionality reduction. e depth feature of CNN is further enhanced, then the feature will be used to continue the training of the SVM classifier, and the trained SVM classifier is tested with the test set.

Experimental Environment.
is experiment is based on a Tensorflow framework. Tensorflow is a powerful visualization suite of low-level and high-level interfaces (Tensorboard) for huge and active community network training. It can track network topology and performance, making debugging easier and more convenient. e specific experimental environment is listed in Table 1. On the Ubuntu 16.04 operating system, some dependent libraries are first installed, such as Python and open CV, and then the python environment and Tensorflow are installed. In addition, the graphics processing unit (GPU) mode is used. After the environment is configured, the network is built according to the designed network structure, including network structure, convolution kernel size, step length, and the number of feature maps of each layer.

Evaluating Indicator.
e evaluation indexes include classification accuracy Acc, misclassification error, and kappa coefficient. e classification accuracy and misclassification rate are calculated as follows: Acc � TP Num , where TP is the number of correctly classified images in the remote sensing image test set, Num represents the total number of images, and FP is the number of incorrectly classified images.

Computational Intelligence and Neuroscience
Assuming that the actual number of samples is c 1 , c 2 , . . . , c N and the predicted number is η 1 , η 2 , . . . , η N , the kappa coefficient is defined as follows: where Acc is the actual accuracy and P e is the theoretical accuracy. e higher the kappa coefficient, the better the overall classification accuracy of the method.

Model
Training. e training set and verification set after PCA dimensionality reduction are used to train the improved CNN model. With the increase of the number of iterations, the variation trend of the training accuracy of the model and the loss function value of the training objective function are shown in Figure 4.
As can be seen from Figure 4, with the increase of iterations, the training accuracy of CNN model gradually tends to be stable, 93% of the training accuracy can be obtained at the highest, and the training loss gradually decreases and tends to be flat, indicating that the model has good convergence.

Land Classification Result Map.
On the test set samples, the proposed method is used to extract five types of land resources. e proposed method extracts three high-level features from remote sensing images and fuses them to generate the final classification map. e results are shown in Figure 5.
As can be seen from Figure 5, the five land resource use types are clearly identified, especially residential areas, which are relatively scattered and irregular, but the location of residential areas can be clearly seen in the identification result figure. e distribution of roads and cultivated land is very regular, and the overall recognition effect is ideal.

Comparison of Cultivated Land Classification Results.
In order to more intuitively evaluate the performance of the proposed method in cultivated land recognition, it is compared with the recognition results obtained by the methods used in references [12,15,17]. In the experiment, the trained model is applied to the Gaofen-5 for recognition, and the results are shown in Figure 6.
As can be seen from Figure 6(a), cultivated land has regular graphics and clear edges in the image and accounts for a very large proportion of the whole image. Reference [12] adopts the traditional fuzzy and decision tree classification, which can identify a large area of land types, but the identification effect of small land resources is poor, and the misclassification phenomenon is obvious. Reference [15] combines SVM and k-nearest neighbor to complete cultivated land recognition. Because it is not suitable for processing complex remote sensing images, there are many missing points at the edge, and there are many missing points in light-colored cultivated land. In reference [17], the depth CNN model is used to identify the cultivated land image.
e most important cultivated land position is extracted accurately, but there is a case of wrong points and missing points. e proposed method can better identify the cultivated land image, and the contour is clear, which is better than other comparison methods.     Computational Intelligence and Neuroscience

Comparison of Evaluation Indicator.
e performance of the four classification methods is quantitatively analyzed. e classification accuracy Acc, misclassification error, and kappa coefficient are listed in Table 2.
It can be seen from Table 2 that the classification accuracy, misclassification rate, and kappa coefficient of the proposed method are 0.9472, 0.0528, and 0.9435, respectively, which are better than other comparison methods. e proposed method adopts the 7-layer CNN network structure, improves the activation function, reduces the dimension by PCA, and improves the classification accuracy. Reference [17] proposed a dense coordinate transformation network for image recognition based on the depth CNN model, but it has not been optimized in terms of dimensionality reduction and activation function. Compared with  [12]. (c) Reference [15]. (d) Reference [17]. (e) Proposed model.  [12] is more traditional, so the classification effect is not ideal.

Comparison of Training and Testing Time.
Classification efficiency is another important indicator of land resource use classification. e time consumption of the four methods on the training set and test set is shown in Figure 7.
As can be seen from Figure 7, the proposed method takes the longest time in the training phase, which is 1.95 s. is is because the method used in Reference [12] is relatively simple. e training stage of reference [15] includes only the training of k-nearest neighbor model, whereas the training stage of reference [17] includes the training of convolutional neural network and the process of feature extraction. e training stage of the proposed method includes not only the training of convolutional neural network and the extraction of three depth features but also the fusion of three depth features. In addition, in the test stage, due to the simple calculation in reference [12], the test time is only 0.72 s. e methods used in reference [14,17] are complex, and the test time is more than 1.2 s. After training, the proposed method has the best performance in the test, and after PCA dimensionality reduction, the calculation speed is faster, and the test time is about 0.95 s. Overall, the proposed method has the best overall performance and has certain practicability in the application of land resource use classification.

Conclusion
Using a deep learning model to segment and extract ecological remote sensing images can obtain high-precision land use classification information, which plays an important role in the rational development of land resources and the development of precision agriculture. erefore, a land resource use classification method based on deep learning in ecological remote sensing images is proposed. e remote sensing image samples are input into the sevenlayer CNN model. e activation function of the model adopts the TReLU function, and the three high-level image features are fused in series and then input into the SVM classifier to complete the classification of land resources remote sensing images. e remote sensing images of Pingshuo mining area in Shanxi Province are used to analyze the proposed method. e results show that the improved CNN model can achieve rapid convergence, and the image edges recognized by the proposed method are clear. e Acc, error, and kappa coefficients are 0.9472, 0.0528, and 0.9435, respectively, and the training and testing times are 1.8 s and 0.95 s, respectively. e overall performance is better than other comparison methods.
Remote sensing images often contain complex geometric and semantic information. e next research work needs to consider not only the semantic information contained in the image itself but also some more complex factors such as occlusion, blur, and distortion. In addition, in terms of data amplification, the subsequent work can consider using the GAN model to generate some data with the same distribution as the real remote sensing image, so as to meet the requirements of the deep learning model for a large amount of training data.
Data Availability e data included in this paper are available without any restriction.

Conflicts of Interest
e authors declare that they have no conflicts of interest.   Computational Intelligence and Neuroscience