Remote Sensing Image Land Classification Based on Deep Learning

Aiming at the problems of high-resolution remote sensing images with many features and low classification accuracy using a single feature description, a remote sensing image land classification model based on deep learning from the perspective of ecological resource utilization is proposed. Firstly, the remote sensing image obtained by Gaofen-1 satellite is preprocessed, including multispectral data and panchromatic data. *en, the color, texture, shape, and local features are extracted from the image data, and the feature-level image fusion method is used to associate these features to realize the fusion of remote sensing image features. Finally, the fused image features are input into the trained depth belief network (DBN) for processing, and the land type is obtained by the Softmax classifier. Based on the Keras and TensorFlow platform, the experimental analysis of the proposed model shows that it can clearly classify all land types, and the overall accuracy, F1 value, and reasoning time of the classification results are 97.86%, 87.25%, and 128ms, respectively, which are better than other comparative models.


Introduction
Remote sensing technology is a technology of observing ground objects by detecting remote sensing images through different working platforms and then processing remote sensing information to obtain some dynamic information, so as to obtain ground information [1]. With the increasing development of remote sensing technology, there are more and more applications of remote sensing image data in various aspects. And remote sensing image classification is of great significance for obtaining image information. It has a wide range of applications in national defense and security construction, urban planning, disaster monitoring, land use, landscape analysis, agricultural remote sensing, etc. [2,3].
Remote sensing image classification is a hot issue in this field. In the past, artificial visual methods were mostly used for image interpretation. is not only consumes manpower and is inefficient, but also fails to improve accuracy [4,5]. With the development of computer technology, the technology of image classification using computers combined with appropriate algorithms has replaced manual classification methods and has become the mainstream. Commonly used methods include neural networks and genetic algorithms. [6]. In recent years, remote sensing image acquisition technology has developed rapidly, and the acquired images have become more and more abundant. Such as hyperspectral images and high-resolution images that contain richer feature information [7]. However, the rich feature information also brings certain difficulties to the classification. How to reasonably use the rich feature information to achieve efficient and accurate remote sensing image land classification is an urgent problem to be solved [8,9]. At the same time, the overall planning and application of remote sensing image data is also becoming more refined, and the monitored results and data need to be more comprehensive and accurate. erefore, a new challenge is presented to remote sensing image land classification [10].
Aiming at the comprehensive application of remote sensing image refinement, a remote sensing image land classification model based on deep learning from the perspective of ecological resource utilization is proposed. Compared with the traditional model, its innovations are summarized as follows: (1) Since the accuracy of single feature description and classification is not high, the proposed model extracts 9 features including color, texture, part, and shape. And based on the feature-level image fusion method for feature fusion, the effective discrimination information of the feature can be retained to the greatest extent, and the reliability of classification can be improved. (2) In order to improve the accuracy of remote sensing image land classification, the proposed model uses DBN to process the fused image features. Combining the results of forward unsupervised classification and label data, the training network is fine-tuned according to the law of error back propagation, thus shortening the time of land classification.

Related Research
Remote sensing image classification methods usually include supervised classification and unsupervised classification. However, the classification accuracy of traditional remote sensing image classification methods is low, and manual interpretation is easily affected by subjective factors, which reduces the credibility of classification [11]. Traditional classification methods for nonartificial classification usually include feature-space indicator kriging (FSIK), traditional parameter maximum likelihood (ML), widely used nonparametric method support vector machine (SVM), and random forest (RF). Reference [12] proposed a stochastic simulation classification algorithm based on feature-space index simulation. It shows good performance in improving the accuracy of remote sensing image classification, but the performance of image classification under complex background needs to be improved. Reference [13] aimed at the land use in the priority areas of national forest resources, based on satellite images in three periods, used the ML method for image supervision and classification, and achieved high classification accuracy. However, the high performance of pure spectral segmentation based on a single image pixel needs to be improved. Reference [14] studied the multisensor data fusion of land classification in semiarid environment. A multispectral image classification method based on wavelet transform is proposed, which achieves high classification accuracy. Reference [15] proposed a land remote sensing image classifier based on RF, which realizes the classification of land images in complex background environments in the case of multidata fusion. However, the algorithm parameter setting is more complicated, which is difficult for practical applications.
With the increase in the resolution of remote sensing images, the images contain a variety of complex and diverse information, but traditional methods cannot make good use of various features. e increase of high-resolution remote sensing data and the development of computer technology make it possible to apply deep learning in remote sensing image classification [16]. Convolutional neural network (CNN), DBN, and autoencoder (AE) are the main models of deep learning [17]. Reference [18] compared deep learning and non-deep learning SVM methods, and CNN's classification method showed better classification performance. Reference [19] proposed a remote sensing image segmentation method based on depth learning. Shallow learning can output different classification results for a certain input, whereas deep learning can continue to learn from the shallow output to improve the accuracy of image classification. However, the performance of the algorithm depends too much on the training samples, and the transferability is poor. Reference [20] proposed a multiscale dense network (MSDN) for hyperspectral remote sensing image (HSI) classification. It makes full use of the different scale information in the network structure and combines the scale information of the entire network to achieve two-dimensional HSI feature extraction with different accuracy levels. However, it is impossible to balance the image classification performance under dominant and individual factors. Reference [21] proposes a remote sensing image segmentation method based on Fletcher-Reeves CNN for the situation that there are few optional training samples in the actual image classification process. e anti-interference and convergence performance of the proposed model is analyzed from different training sample datasets, different batch number of training samples, and iteration time, but it has poor adaptability to image classification with different rules and different scales. e abovementioned research based on CNN algorithm has achieved good results in the field of remote sensing image classification, but it is rarely involved in the use of ecological resources. erefore, a remote sensing image land classification method based on deep learning from the perspective of ecological resource utilization is proposed, which improves the efficiency and accuracy of classification while taking into account the utilization of ecological environment resources.  Table 1.

Data Collection and Preprocessing
e standard products of Gaofen-1 satellite are mainly divided into two categories: A-level and 2A-level. A-level is a preprocessing-level radiation correction impact product, including the impact data of level 0 data processed by data analysis, normalized radiation correction, denoising, image stitching, and band registration, and provides RPC files generated by satellite in-orbit data. Level 2A is the initial geometric correction impact product, including the impact product produced by the geometric correction of the 1A-level data and the map projection.

Multispectral Data Preprocessing
(1) Radiometric calibration. e ENVI software is used to perform radiometric calibration on the GF-1 remote sensing image, and the gray value of the image is converted into the radiance at the pupil of the sensor. ENVI software can automatically select the radiation calibration coefficients released in the corresponding time period for GF-1 remote sensing images acquired in different time periods.
(2) Atmospheric correction. Electromagnetic waves are absorbed, scattered, and scattered by the atmosphere during their transmission through the atmosphere and are disturbed to varying degrees. It is necessary to perform atmospheric correction processing on remote sensing images. Fast line of sight atmospheric analysis of Hypercubes (FLAASH) module of ENVI 5.3 is used for atmospheric correction to reduce the influence of atmosphere. e software module can automatically read the sensor height, longitude, and latitude of the center point, sensor type, pixel size, imaging time, and other information. Set the average height of the imaging area, atmospheric model, and aerosol type, and finally get the surface reflectance image.
(3) Orthorectification. First, use the RPB file that comes with the GF-1 image to perform "orthorectification" processing on the surface reflectance image based on the RPC model. e digital elevation model (DEM) used is a ZY-3 DEM with a spatial resolution of 8 m.
Secondly, take the ZY-3 digital orthophoto map (DOM) with a spatial resolution of 2 m as a reference. e automatic matching algorithm is used to perform image-to-image registration processing on the "orthorectification" result image, and 30 control points are manually collected to check the correction accuracy. Among them, the east-west error of the GF-1 remote sensing image in 2018 is 4.15 m, and the north-south error is 1.25 m. In 2020, the east-west error of the GF-1 remote sensing image is 1.65 m, and the northsouth error is 2.05 m.

Panchromatic Data Preprocessing.
Since the research needs to fuse the panchromatic data of September 2020 into the multispectral data for classification, the panchromatic band of the GF-1 image of this period needs to be preprocessed.
Firstly, perform radiometric calibration and atmospheric correction on the panchromatic image in ENVI. Secondly, orthorectify the panchromatic image after radiometric calibration. e ZY-3 DEM is also used, but the ZY-3 DEM is oversampled before correction to make its pixel size 0.8 m. Furthermore, the panchromatic image is orthorectified based on the RPC model. en, use the automatic image registration tool in ENVI, with ZY-3 DOM as the reference image. By automatically finding control points and establishing a polynomial registration equation, the GF-1 panchromatic image after orthorectification is corrected to the geographic coordinate space of the DOM image. Finally, ENVI's fusion tool NNDiffuse Pan Sharpening is used to fuse panchromatic and multispectral images to obtain 0.8 m high spatial resolution multispectral remote sensing images.

Multifeature Fusion.
Image fusion improves image clarity and information content and can accurately, reliably, and comprehensively obtain target or scene information. Fusion is mainly divided into three levels: pixel level, feature level, and decision-making level. e most basic method among the three levels is based on pixel-level image fusion. rough pixel-level fusion, more detailed information, such as edge and texture information, can be obtained, which is helpful for image analysis and processing. e hidden target can also be revealed, which helps to judge the recognition and extraction of the hidden target point. Based on this method, more information in the original image can be saved, and the content and details of the fused image will also increase. is advantage is unique to pixel-level fusion [22]. But the pixel-level image fusion method also has drawbacks, because the method is aimed at pixel operations. An image contains a large number of pixels, which leads to a long computer calculation time, and the Scientific Programming fusion result cannot be obtained quickly. If the registration is wrong, the target and details of the fused image will be blurred directly, resulting in great errors [23]. e feature-level image fusion method is to extract feature information from the original image. e feature information is based on the researcher's analysis of the research object in the image, such as vehicles, pedestrians, and numbers, and then extracts the relevant feature information that can fully express the target. Compared with the original image, the accuracy of target recognition and extraction based on the feature fusion information will be significantly improved. e compressed image information can be obtained through the feature-level fusion method, and the compressed information is reused for computer analysis and processing. Compared with the pixel-level fusion method, the memory and time consumption are reduced, and the fusion result can be obtained faster.
Decision-level image fusion is a fusion method of the highest level based on cognition [24]. e method can be operated in a targeted manner according to the specific requirements of the problem and make a decision based on the characteristics obtained at the feature level, certain criteria, and the probability of the existence of the target. In the three-level image fusion, the amount of calculation is the smallest, but it depends on the previous level. erefore, compared with the previous two fusion methods, the image obtained is not clear, and the real implementation is also very difficult.
In summary, the feature-level image fusion algorithm is selected to associate the extracted multiple types of features. Feature-level fusion can complement the single feature information while eliminating redundant information. e image feature fusion process is shown in Figure 2. It can maximize the effective discrimination information of features and provide a basis for the classification decision of the classifier.

Classification
Model. DBN is a kind of probabilistic generative model, which establishes the joint distribution between input data and label data through the learning process [25]. Structurally, the DBN model is composed of a multilayer restricted Boltzmann machine (RBM) and the top Softmax classifier. Correctly constructing a DBN model is the key to accurately and efficiently extracting land types from remote sensing images. Reasonably designing the framework of the DBN model, such as the number of layers of the RBM network, can effectively improve the classification efficiency. Determining reasonable DBN model operating parameters, such as the learning rate, the number of positive unsupervised learning, and the number of hidden layer neurons, can greatly improve the accuracy of the classification results [26]. Considering the classification effect and training efficiency of the model, a DBN model with a network specification of 124-250-250-2 is constructed. e model structure is shown in Figure 3.
By setting up a control experiment and comparing the classification efficiency of the model, it is determined that the RBM layer stacked by the DBN model is two layers. RBM is a special generative neural network. A single RBM is a twolayer neural network composed of a visible layer and a  e classification process of the DBN model is based on pixel-level classification. erefore, the number of neurons in the visible layer is the same as the dimensionality of the multispectral data; that is, the visible layer is composed of 132 neurons. e essence of the forward learning process of the DBN model is the process of feature extraction. When the RBM maps the characteristic information of the neurons in the visible layer, the neurons in the hidden layer of the RBM have the same activation probability. After several trainings, the characteristics of the visible layer of neurons can be accurately expressed. rough setting experiments, the number of neurons in the hidden layer is determined to be 280. At this point, the RBM can be regarded as an autoencoder for extracting the characteristic information of neurons in the visible layer. Among them, the energy function between the visible layer and the hidden layer is expressed as follows: where ω ij is the weight connecting the visible layer neuron i and the hidden layer neuron j. b 1 and b 2 are the biases of the visible layer neuron and the hidden layer neuron, respectively. Among them, the joint probability distribution between neurons is calculated as follows: Assuming that the input value of the DBN model is X and the output value of the hidden layer is H, then the weight and bias update formula connecting the hidden layer neuron and the output layer neuron is where δ k is the difference between the actual output value of the DBN model and the true category of the input value. ε is the learning rate of the DBN model. e classification process of the DBN model consists of two stages: forward unsupervised "layer-by-layer initialization" learning and reverse supervised "fine-tuning" learning. e classification process is shown in Figure 4. e first stage of training is also called the pretraining process. e DBN model performs forward training through a layer-by-layer initialization learning method. By stacking the RBM layers, map and transfer the characteristic information of the input layer data in turn. e proposed model has a Softmax classifier on top of the top RBM. e Softmax classifier receives the output information of the top RBM as input information. e Softmax classifier outputs the classification result of the forward learning process by comparing the probability distribution. e Softmax classifier is constructed with a multinomial distribution as a model. It can be understood that the logistic regression classifier faces generalized induction of multiple classifications and can be used for multiclassification problems. e purpose is to convert the output information of RBM into a probability distribution. e mathematical representation of the Softmax classifier is as follows: where y is the output vector of RBM. e second stage of training is also called the fine-tuning process.
rough the first stage of pretraining, the RBM network of each layer can only ensure that the weight of this layer reaches the optimal expression of the characteristic information of the layer and cannot make the mapping of the input information of the entire DBN model reach the optimal.
is requires back propagation (BP) algorithm, combined with forward unsupervised classification results and label data, according to the law of error back propagation from top to bottom, fine-tuning the connection weight and bias between neurons in each layer of the whole DBN model layer by layer. e whole classification process greatly suppresses the overfitting phenomenon, which is easy to appear in a single BP neural network, thus obtaining the parameter setting, which makes the square sum of error of DBN model minimum.  second is feature selection. e optimal calculation method is used to select the feature vector that has the best classification result from the combined features, such as the feature fusion algorithm based on genetic algorithm, artificial neural network, and fuzzy logic. e third is feature transformation, which uses mathematical methods to transform the image into a new way of expression, such as methods based on complex principal component analysis, canonical correlation analysis, and complex independent component analysis.

Algorithm
Since there are 7 features selected in this article, there will be multiple combinations in feature fusion, and it takes many trial and error steps to select the best feature combination. erefore, this article adopts a simple and classic serial method for feature fusion. Combine multiple feature permutations into a new feature vector, use the deep belief network to classify based on the new feature vector, and finally determine the optimal feature combination. e specific fusion algorithm composes a new feature vector according to the end-to-end method and then uses the new feature vector for classification and recognition.

Algorithm Flow.
e specific process of the proposed DBN classification model based on multifeature fusion is (1) Nine features were extracted. Two texture features are extracted by gray histogram and wavelet transform algorithm. ree color features are obtained through color histogram and color moment. One shape feature is obtained by the moment invariant algorithm, and three local features are obtained by solving the census and scale-invariant feature transformation algorithm. Finally, a total of 9 characteristic values are obtained.
(2) Normalize the 9 features and convert the data to [0][1]. e selected normalization function is In the formula, X and X * are the data before and after normalization respectively. X max and X min are the maximum and minimum values of the data, respectively. (3) e nine normalized feature vectors are serially fused to obtain new features of the image as the input of the DBN model. Fully consider the computational complexity and classification accuracy to determine the final DBN network model. (4) e test data are input into DBN for testing using the same feature fusion method, and the Softmax method is used to complete the classification of remote sensing image land types.
e process of remote sensing image classification model based on multifeature fusion and DBN is shown in Figure 5.
Firstly, the DBN model is trained with training dataset, including feature extraction and fusion. en, feature extraction is performed on the test data set, including color, texture, shape, and local features. It is fused into the trained DBN model to obtain the land type of the remote sensing image.

Experiment and Analysis
e experiment was trained on Ubuntu 16.04 system, using NVIDIA Ge Force Titan X graphics device, and the device has a total of 2 pieces, each with 12 GB of graphics memory. And the DBN model is implemented in the open-source frameworks Keras and TensorFlow.
At the same time, indicators such as overall accuracy (OA), recall (Recall), precision (Precision), and Intersection over Union (IoU) are used to evaluate the classification performance of the proposed model. Among them, OA is the proportion of correct pixels that the classification model judges.
e recall rate is the proportion of positive samples that are correctly classified. e precision is the proportion of the classified positive samples to the total samples. IoU is the similarity between real and classified samples. e calculation of each indicator is as follows: In the formula, TP represents the classification of positive samples as positive sample instances; FP represents the classification of negative samples as positive sample instances; FN represents the classification of positive samples as negative sample instances; and TN represents the classification of negative samples as negative sample instances.

Number of Hidden Layer Neurons.
By adjusting the number of neurons in the hidden layer, on the basis of consistent experimental conditions, the classification accuracy when the number of neurons is 120, 160, 200, 240, 280, 320, 360, respectively, is compared, and the results are shown in Figure 6. It can be seen from Figure 6 that the classification accuracy of the DBN model for remote sensing images fluctuates with the change of the number of neurons in the hidden layer. When the number of hidden layer neurons reaches 280, the classification effect reaches the best, which is 97.28. When the number of neurons exceeds 280, the DBN model appears overfitting, and the classification accuracy decreases as the number of hidden layer neurons increases. e number of neurons in the hidden layer of the RBM determines the accuracy of the DBN model's description of the characteristics of the input data. If the number of neurons in the hidden layer is set too small, the characteristic information of the input data cannot be accurately expressed. Too many hidden layer neurons will increase the training time and complexity of the entire pretraining process, and even overfitting will occur.

Learning Rate.
e DBN model is trained according to the stochastic gradient descent algorithm, and the learning rate needs to be introduced to adjust the training rhythm of the training process. By adjusting the size of the learning rate, based on the consistency of other experimental conditions, the classification accuracy when the learning rate is 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, and 0.75 is compared. e result is shown in Figure 7.    It can be seen from Figure 7 that the classification accuracy of the DBN model varies with the learning rate. When the learning rate is low, the classification accuracy of the DBN model increases as the learning rate increases. When the learning rate reaches 0.45, the classification accuracy of the DBN model reaches the best, 98.65%. When the learning rate exceeds 0.45, the DBN model has overfitting phenomenon, and the classification accuracy shows a downward trend as a whole. e learning rate determines the step length of the weight in the direction of gradient movement. In the forward unsupervised learning process of the DBN model, the learning rate is mainly responsible for the weight update. In the reverse fine-tuning process, the learning rate is responsible for adjusting the convergence rate of the entire model. It has a vital impact on the classification accuracy of the DBN model. In addition, a lower learning rate will increase the credibility of the training results, but the training process will take a long time. A high learning rate will cause the connection weight to change too much, which may cause the training of the DBN model to fail to converge and the output result to be unstable.

Number of Positive Unsupervised Learning.
By adjusting the number of forward unsupervised training, comparing the classification accuracy when the number of training is 30, 60, 90, 120, 150, 180, and 210, the result is shown in Figure 8.
It can be seen from Figure 8 that the classification accuracy of the DBN model fluctuates with the change of the number of forward unsupervised learning. When the number of forward learning reaches 120 times, the classification accuracy of the DBN model reaches the best, which is 97.61%. When the number of learning times exceeds 120 times, the DBN model appears overfitting, and the classification accuracy gradually decreases. e forward training process of the DBN model is an unsupervised learning process. Each training of the DBN model is accompanied by the update of the weights and biases between neurons. erefore, the number of forward unsupervised learning determines the number of updates of each parameter of the DBN model. Setting more positive unsupervised learning during training is beneficial to the DBN model to express the characteristic information of the input neuron more effectively. However, multiple positive learning not only affects the training efficiency of the model, but also causes the classification results of the DBN model to appear overfitting.

Qualitative Comparison of Classification Results.
Using the proposed model for remote sensing image processing, the land classification results of Jinfeng District are obtained, as shown in Figure 9. Various features are presented in remote sensing images. e water body is blue, vegetation is dark green, light green is cultivated land, yellow is bare land, pink is construction land, and orange is road.
It can be seen from Figure 9 that the land types obtained by using the DBN model are very clearly presented in the remote sensing images, which provides ideal support for the monitoring and subsequent utilization of ecological resources. In order to demonstrate the classification performance of the proposed model, it is qualitatively analyzed with reference [14,19], and three local areas in Jinfeng District are selected for land classification. e result is shown in Figure 10.
It can be seen from Figure 10 that the classification performance of reference [14] is not satisfactory. Due to the irregular distribution of construction land and the complicated boundaries between foreground and background, it is difficult to extract its features in such a complex remote sensing image for accurate target recognition. In reference [19] because the design of the model follows a larger receptive field, the identification pays more attention to the overall information and only correctly identifies the construction land, which is slightly insufficient in detail. e proposed model comprehensively considers the characteristics of various aspects of remote sensing images and integrates them, and at the same time uses the DBN model for classification. erefore, the construction land area is clearly divided into the recognition result. Although there are slightly jagged edges, overall, the classification results are better than other comparison models.

Quantitative Analysis of Classification Results
. In order to quantitatively analyze the performance of the proposed classification model, compare it with reference [14,19]. e results of each evaluation index are shown in Table 2.
It can be seen from Table 2 that the visual sensory and data verification are relatively consistent. e OA of the proposed model is 97.86%, and the IoU is 95.09%, which are 2.34% and 2.55% higher than those in reference [19]. Reference [14] uses wavelet transform for image classification. e method is more traditional and difficult to apply to complex remote sensing images. erefore, the IoU is 88.58%, and the classification effect is not ideal.  e relationship between time consumption and accuracy has always been the focus of deep learning research. e amount of network parameters and GPU inference time reflect the overall time consumption of the network. In order to prove the universality of the proposed model, the official F1 index was used to measure the accuracy in the experiment, and the parameter quantity (Params) and the inference speed of the graphic processing unit (GPU) were used to measure the time consumption. e results are shown in Table 3.
It can be seen from Table 3 that the higher accuracy of reference [19] is at the cost of time. Since it does not perform operations such as dimensionality reduction or preprocessing, the network has a huge amount of parameters of 49.83 M. e image classification time is 290 ms, which is difficult to achieve high efficiency. e wavelet transform model of reference [14] has a simple network, so the parameter is only 23.95 M. e inference time to complete the classification is only 115 ms, but the overall accuracy is not high. e proposed model adopts operations such as image  preprocessing and feature fusion, which reduces the amount of parameters to 36.02 M. While ensuring the accuracy of classification, the reasoning time is shortened, and the time consumption is 128 ms. Even if the time consumption is not the shortest, but considering the speed and accuracy, the proposed model can meet the actual needs.

Conclusion
Land cover type is the key and extensive research field in ecological environment observation. However, the land form is affected by the season, and the recognition effect of most classification methods is not ideal. To this end, a remote sensing image land classification model based on deep learning from the perspective of ecological resource utilization is proposed, combining feature-level image fusion methods and DBN network model processing and analysis of remote sensing image data obtained by the Gaofen-1 satellite to obtain land types efficiently and accurately. Based on the Keras and TensorFlow platform, the proposed model is experimentally demonstrated. e results show that when the number of hidden layer neurons is set to 280, the learning rate is set to 0.45, and the number of forward unsupervised learning is set to 120, the classification performance of the DBN model is the best. And the proposed model can clearly classify the types of land. e results of OA, F1 value, and reasoning time were 97.86%, 87.25%, and 128 ms, respectively, which are better than other comparison models, providing technical support for deep learning in the field of remote sensing.
At present, deep learning is the mainstream way to analyze remote sensing images, but the problem of poor interpretability of deep learning has always hindered the development speed of deep learning. e development of traditional remote sensing technology has gone through many years, with a sound theoretical foundation and rich practical applications. Even if the accuracy is not as good as deep learning methods, similar exponential models and time series dynamic analysis models are instructive. In the future, how to use traditional remote sensing technology to provide more effective feature input for deep learning networks will greatly improve the generalization ability and classification accuracy of deep learning.

Data Availability
e data used to support the findings of this study are included within the article.