Land use and land cover (LULC) mapping in urban areas is one of the core applications in remote sensing, and it plays an important role in modern urban planning and management. Deep learning is springing up in the field of machine learning recently. By mimicking the hierarchical structure of the human brain, deep learning can gradually extract features from lower level to higher level. The Deep Belief Networks (DBN) model is a widely investigated and deployed deep learning architecture. It combines the advantages of unsupervised and supervised learning and can archive good classification performance. This study proposes a classification approach based on the DBN model for detailed urban mapping using polarimetric synthetic aperture radar (PolSAR) data. Through the DBN model, effective contextual mapping features can be automatically extracted from the PolSAR data to improve the classification performance. Two-date high-resolution RADARSAT-2 PolSAR data over the Great Toronto Area were used for evaluation. Comparisons with the support vector machine (SVM), conventional neural networks (NN), and stochastic Expectation-Maximization (SEM) were conducted to assess the potential of the DBN-based classification approach. Experimental results show that the DBN-based method outperforms three other approaches and produces homogenous mapping results with preserved shape details.
Urban land use and land cover (LULC) mapping is one of the core applications in remote sensing. Up-to-date LULC maps obtained by classifying remotely sensed data are essential to modern urban planning and management. In many remote sensing systems, the synthetic aperture radar (SAR) has long been recognized as an effective tool for urban analysis, as it is less influenced by solar illumination or weather conditions in contrast to optical or infrared sensors [
Nevertheless, most studies about urban mapping using SAR or PolSAR data are limited in identifying the urban extent or mapping very few urban classes. Few studies have focused on detailed urban mapping using SAR data. The difficulty in detailed urban mapping using SAR data is mainly due to the complexity of the urban environment. The urban environment is comprised of various natural and man-made objects with several kinds of materials, different orientations, various shapes and sizes, and so forth, which complicates the interpretation of SAR images. Problems can also originate from the nature of polarimetric SAR imaging such as inherent SAR speckle or geometry distortions such as shadow and layover [
Regarding the method of urban land cover mapping, approaches can be generally divided into pixel-based or object-based classification. Object-based methods, which directly explore the contextual information to improve the mapping accuracy, have been increasingly employed recently [
From the perspective of data modeling, LULC classification methods can be grouped into parametric and nonparametric approaches. Parametric approaches, such as the minimum distance classifier, maximum likelihood classifier, and the expectation-maximization (EM) algorithm, often require proper assumptions of data distribution [
As an advanced machine learning approach, deep learning has been successfully applied in the field of image recognition and classification in recent years [
The present study proposes a detailed urban LULC mapping approach based on the popular deep learning architecture DBN. This study is one of the first attempts to apply the deep learning approach to detailed urban classification. Two-date high-resolution RADARSAT-2 PolSAR data over the Great Toronto Area (GTA) have been used for evaluation.
The rest of this paper is organized as follows. Section
The proposed approach is based on the DBN model. This section briefly reviews the principle of the DBN model and describes the proposed method for land cover classification.
The DBN model was introduced by Hinton et al. in 2006 [
As the basic component of a DBN, Restricted Boltzmann Machine (RBM) can be treated as an unsupervised energy-based generative model. An RBM consists of a layer of visible units
Schematic of an RBM with
Assuming binary-valued units, the RBM defines the energy of the joint configuration of visible and hidden units (
The training process of the RBM can be described as follows. After the random initialization of the weights and biases, iterative training of the RBM on the training data is performed. Given the training data on the visible units
The DBN takes a layer-wise greedy learning strategy, in which RBMs are individually trained one after another and then stacked on the top of each other. When the first RBM has been trained, its parameters are fixed, and the hidden unit values are used as the visible unit values for the second RBM. The DBN repeats this process until the last RBM. Since pretraining is unsupervised, no label is needed. Unsupervised learning is believed to capture the crucial distribution of the data and can therefore help supervise learning when labels are provided. A batch-learning method is usually applied to accelerate the pretraining process; that is, the weights of the RBMs are updated every minibatch [
After the pretraining phase, the fine-tuning procedure is performed. A softmax output layer can be placed on top of the last RBM as a multiclass classifier, and the output-layer size is set to the same value as the total number of classes. To accomplish classification by utilizing the learned feature, we use the ordinary back-propagation technique through the whole pretrained network to fine-tune the weights for enhanced discriminative ability. Given that the fine-tuning procedure is supervised learning, the corresponding labels for the training data are needed. After training, the predicted class label of a test sample can be obtained by forward propagation, in which the test data pass from the lowest-level visible layer through multi-RBM layers to the softmax output layer.
To better understand the structure of the DBN-based LULC classification, a flowchart is given in Figure
Flowchart of the proposed DBN-based classification approach.
For the training of DBN, Pauli vectors of the training samples are assigned to the visible layer of the first RBM as input training features. With a layer-by-layer pretraining strategy, the spatiotemporal dependencies are successively encoded in the hidden layers
For the prediction, the input features of the test samples are prepared in the same way as that of the training samples. The classification labels for the test samples can be obtained from the forward propagation of the test features through the trained network.
The study area is located in northern Greater Toronto Area (GTA), Ontario, Canada. The ten major LULC classes in the study area are as follows: high-density residential areas (HD), low-density residential areas (LD), industrial and commercial areas (Ind.), construction sites (Cons.), Water, Forest, Pasture, golf courses (Golf), and two types of crops (Crop1 and Crop2).
Two fine-beam full polarimetric SAR images were acquired by the RADARSAT-2 SAR sensor on June 19, 2008, and July 5, 2008. The center frequency is 5.4 GHz, that is, C-band. The June 19 data were obtained from the descending orbit, whereas the July 5 data were obtained from the ascending orbit, as shown in Figures
PolSAR images of northern Greater Toronto Area. (a) Pauli RGB image of RADARSAT-2 data on June 19, 2008. (b) Pauli RGB image of RADARSAT-2 data on July 5, 2008. (c) Training set. (d) Test set.
During the preprocessing, the multitemporal raw data were first orthorectified using the satellite orbital parameters and a 30 m resolution DEM. Then, they were registered to a vector file National Topographic Database (NTDB). A multilook process was further applied to generate the PolSAR features with the final spatial resolution of about 10 meters.
In the classification scheme, 19 subclasses were defined for the abovementioned 10 major land cover classes according to different scattering characteristics (e.g., the man-made structures have varying scattering appearance due to their distinctive shapes and directions). Approximately 1000 training pixels were assigned to each subclass. 120617 pixels evenly distributed over the classification area were randomly selected as the test samples. The training and test samples are visually shown in Figures
The effective configurations of the DBN for detailed urban mapping were investigated. Comparisons with SVM, conventional neural networks (NN), and stochastic Expectation-Maximization (SEM) were conducted to assess the potential of our approach.
In this study, several experiments were conducted to validate the impact of different DBN configurations, including different network depths and hidden layer node numbers. To evaluate its classification efficiency, the DBN-based approach was compared with three other land cover methods: SVM, traditional neural networks (NN), and stochastic Expectation-Maximization (SEM). To quantitatively compare and estimate the capabilities of the proposed method, the overall accuracy (OA) and Kappa coefficient [
The performance of the DBN-based classification method is sensitive to the neighbor window size. As the window size increases, more spatial dependencies could be captured by the DBN; thus, it is expected that better classification accuracy could be obtained with larger neighbor window size. Nevertheless, larger neighbor window size does not ensure better classification performance. Overly large window sizes could decrease the classification performance because bound areas tend to be confounded under an overlarge window. In the following experiments, the neighbor window size is set to
Several parameters of the DBN are listed in Table
DBN parameters setting.
Pretraining stage | |
Learning rate | 0.01 |
Number of epochs | 50 |
Size of minibatch | 100 |
Momentum | 0.5 for the first 5 epochs, 0.9 thereafter |
Weight decay rate | 0.0002 |
Fine-tuning stage | |
Learning rate | 0.1 |
Number of epochs | 20 |
We first examine how the DBN depth influences the classification performance. The number of hidden layers is one of the key factors to the deep learning strategy. On one hand, it is proved that additional RBM layer can yield improved modeling power [
To find a proper network depth, DBN models with increased number of RBM layers (i.e., from one to four layers) were compared. Each DBN model had the same constant structure; that is, all the RBM layers had the same number of hidden neurons. Comparisons were also conducted by varying the number of hidden neurons from 100 to 600 per layer. The results in Figure
Impact of network depth.
To demonstrate the effectiveness of the proposed LULC classification method, a comparison was conducted with three other land cover classification approaches (i.e., SVM, conventional NN, and SEM). The same Pauli features as the DBN-based method were used in SVM and traditional NN. The SEM method [
Comparison of different classification methods.
SVM | NN | SEM | DBN | |||||
---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |
Water | 0.8521 | 0.9169 | 0.7847 | 0.9560 |
|
0.9733 | 0.8697 | 0.9052 |
Golf | 0.8588 | 0.5364 | 0.8922 | 0.6048 |
|
0.8346 | 0.8118 | 0.7727 |
Pasture | 0.5776 | 0.8949 | 0.6095 | 0.9198 |
|
0.8499 | 0.8139 | 0.8987 |
Cons. |
|
0.6879 | 0.6383 | 0.6657 | 0.7239 | 0.7750 | 0.7265 | 0.7899 |
LD |
|
0.8509 | 0.5771 | 0.8175 | 0.3160 | 0.7697 | 0.6703 | 0.8884 |
Crop1 | 0.9020 | 0.7548 | 0.7991 | 0.8971 |
|
0.6497 | 0.8800 | 0.8804 |
Crop2 | 0.7965 | 0.8882 | 0.8615 | 0.7671 | 0.8306 | 0.8649 |
|
0.8469 |
Forest | 0.8703 | 0.9098 | 0.8908 | 0.9408 |
|
0.7076 | 0.9095 | 0.9489 |
HD | 0.7203 | 0.5830 | 0.7195 | 0.4570 | 0.6264 | 0.4898 |
|
0.5867 |
Ind. | 0.7593 | 0.7556 | 0.6817 | 0.7394 | 0.4135 | 0.5811 |
|
0.7632 |
OA | 0.7679 | 0.7437 | 0.7243 |
|
||||
Kappa | 0.7398 | 0.7119 | 0.6906 |
|
Table
Confusion matrix (in percent) of the SVM method.
Water | Golf | Pasture | Cons. | LD | Crop1 | Crop2 | Forest | HD | Ind. | |
---|---|---|---|---|---|---|---|---|---|---|
Water |
|
3.70 | 0.07 | 0.96 | 0.00 | 0.02 | 0.00 | 0.00 | 0.01 | 0.18 |
Golf | 11.85 |
|
26.41 | 5.24 | 1.36 | 3.27 | 2.51 | 2.46 | 0.05 | 0.48 |
Pasture | 0.02 | 6.55 |
|
0.00 | 0.09 | 0.75 | 0.72 | 1.07 | 0.02 | 0.00 |
Cons. | 2.92 | 2.00 | 0.34 |
|
0.00 | 0.03 | 10.87 | 0.00 | 0.47 | 0.20 |
LD | 0.00 | 0.17 | 0.31 | 0.03 |
|
1.65 | 0.24 | 3.03 | 4.57 | 3.04 |
Crop1 | 0.00 | 0.55 | 6.80 | 0.83 | 5.34 |
|
1.53 | 0.20 | 4.31 | 1.12 |
Crop2 | 0.00 | 0.58 | 3.67 | 15.52 | 0.11 | 0.16 |
|
0.68 | 0.06 | 0.02 |
Forest | 0.00 | 0.11 | 2.25 | 0.07 | 1.57 | 0.65 | 4.20 |
|
0.13 | 0.04 |
HD | 0.00 | 0.07 | 2.27 | 0.09 | 15.65 | 3.12 | 0.21 | 3.71 |
|
18.98 |
Ind. | 0.00 | 0.37 | 0.12 | 0.88 | 7.40 | 0.15 | 0.07 | 1.83 | 18.34 |
|
Confusion matrix (in percent) of the NN method.
Water | Golf | Pasture | Cons. | LD | Crop1 | Crop2 | Forest | HD | Ind. | |
---|---|---|---|---|---|---|---|---|---|---|
Water |
|
2.05 | 0.00 | 0.35 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Golf | 19.56 |
|
19.21 | 6.71 | 0.24 | 3.05 | 1.12 | 0.40 | 0.02 | 0.17 |
Pasture | 0.00 | 1.93 |
|
0.77 | 0.13 | 2.68 | 1.18 | 0.13 | 0.01 | 0.01 |
Cons. | 1.64 | 3.28 | 0.68 |
|
0.00 | 0.04 | 9.88 | 0.00 | 0.04 | 0.01 |
LD | 0.00 | 0.14 | 0.01 | 0.00 |
|
5.25 | 0.02 | 2.34 | 5.61 | 2.07 |
Crop1 | 0.00 | 0.30 | 2.37 | 0.38 | 1.74 |
|
0.64 | 0.14 | 0.33 | 0.53 |
Crop2 | 0.15 | 2.45 | 14.48 | 26.70 | 0.71 | 3.13 |
|
0.94 | 0.19 | 0.02 |
Forest | 0.00 | 0.47 | 1.96 | 0.18 | 2.06 | 0.64 | 0.98 |
|
0.16 | 0.09 |
HD | 0.00 | 0.06 | 0.33 | 0.37 | 31.99 | 5.13 | 0.03 | 6.21 |
|
28.95 |
Ind. | 0.17 | 0.10 | 0.03 | 0.71 | 5.41 | 0.18 | 0.00 | 0.77 | 21.69 |
|
Confusion matrix (in percent) of the SEM method.
Water | Golf | Pasture | Cons. | LD | Crop1 | Crop2 | Forest | HD | Ind. | |
---|---|---|---|---|---|---|---|---|---|---|
Water |
|
0.53 | 0.08 | 0.13 | 0.08 | 0.12 | 0.09 | 0.09 | 0.04 | 0.11 |
Golf | 3.10 |
|
4.36 | 4.41 | 0.54 | 0.03 | 0.38 | 0.28 | 0.53 | 1.01 |
Pasture | 0.08 | 5.40 |
|
1.15 | 2.11 | 1.32 | 4.82 | 0.75 | 0.69 | 0.07 |
Cons. | 0.06 | 1.01 | 0.60 |
|
0.21 | 0.00 | 6.12 | 0.03 | 0.78 | 0.35 |
LD | 0.00 | 0.00 | 0.01 | 0.03 |
|
0.16 | 0.03 | 0.36 | 6.54 | 3.18 |
Crop1 | 0.06 | 0.07 | 6.80 | 1.21 | 6.28 |
|
3.58 | 0.85 | 4.67 | 10.90 |
Crop2 | 0.04 | 0.35 | 2.55 | 20.16 | 0.94 | 0.22 |
|
1.21 | 0.73 | 0.43 |
Forest | 0.00 | 0.15 | 0.53 | 0.32 | 34.84 | 1.69 | 1.89 |
|
2.43 | 0.53 |
HD | 0.00 | 0.02 | 0.04 | 0.13 | 10.20 | 0.29 | 0.01 | 0.76 |
|
42.07 |
Ind. | 0.00 | 0.01 | 0.00 | 0.06 | 13.20 | 0.00 | 0.01 | 0.24 | 20.95 |
|
Confusion matrix (in percent) of the DBN method.
Water | Golf | Pasture | Cons. | LD | Crop1 | Crop2 | Forest | HD | Ind. | |
---|---|---|---|---|---|---|---|---|---|---|
Water |
|
5.72 | 0.02 | 0.21 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Golf | 11.92 |
|
5.93 | 3.42 | 0.16 | 0.66 | 0.29 | 0.40 | 0.01 | 0.38 |
Pasture | 0.11 | 10.55 |
|
0.31 | 0.03 | 2.28 | 0.55 | 0.17 | 0.09 | 0.00 |
Cons. | 0.99 | 1.26 | 0.26 |
|
0.00 | 0.01 | 6.16 | 0.00 | 0.31 | 0.05 |
LD | 0.00 | 0.09 | 0.27 | 0.03 |
|
2.22 | 0.23 | 1.95 | 3.35 | 1.46 |
Crop1 | 0.00 | 0.14 | 2.49 | 0.93 | 2.47 |
|
1.22 | 0.23 | 0.71 | 0.26 |
Crop2 | 0.00 | 0.66 | 7.20 | 21.05 | 0.23 | 0.63 |
|
1.70 | 0.12 | 0.00 |
Forest | 0.00 | 0.22 | 1.11 | 0.19 | 1.70 | 0.69 | 1.39 |
|
0.15 | 0.08 |
HD | 0.00 | 0.15 | 1.31 | 0.31 | 19.67 | 5.40 | 0.24 | 2.61 |
|
18.41 |
Ind. | 0.00 | 0.04 | 0.02 | 0.90 | 8.70 | 0.11 | 0.05 | 2.00 | 17.04 |
|
Obviously, SEM obtained the highest accuracies in most natural classes (Water, Golf, Pasture, Crop1, and Forest). However, it performed extremely badly in several man-made classes (LD, HD, and Ind.). Generally, SEM provided the lowest overall classification accuracy of 72.43%.
Although SVM attained higher producer’s accuracies in Cons., LD, and Crop1, its overall accuracy was still below DBN by 5%. The improved classification accuracy by the DBN method mainly originated from the significant increase of Pasture and Crop2. Tables
Compared with conventional NN, DBN obtained higher classification accuracies for almost all land cover types, resulting in a notable increase in OA of 7%. The reason behind the superiority of DBN over NN is that, with an unsupervised pretraining process, more appropriate initial weights are assigned to the network, while the traditional neural network just sets random values for initial weights. The DBN-based method combines the advantages of both unsupervised and supervised learning; thus it can better distill spatiotemporal regularities from SAR data and improve classification performance.
The effects of different land cover classification methods are further illustrated in Figure
Zooming comparison of (a) Google Earth image and (b) PolSAR Pauli image and the classification results using (c) SVM, (d) NN, (e) SEM, and (f) DBN in a selected area.
Zooming comparison of (a) Google Earth image and (b) PolSAR Pauli image and the classification results using (c) SVM, (d) NN, (e) SEM, and (f) DBN in an Ind. area.
A detailed urban LULC classification method based on the DBN model for PolSAR data is proposed. The effects of different network configurations are discussed. It is found that DBN with two hidden layers were appropriate for such detailed LULC mapping application. The experimental results demonstrate that the proposed method provides homogenous mapping results with preserved shape details and that it outperforms other land cover classification approaches (i.e., SVM, NN, and SEM) in a complex urban environment. Our future work will focus on more deep learning models for SAR data to further improve the classification results.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is mainly supported by the National Natural Science Foundation of China under Grants U1435219, 61125201, 61202126, 61202127, and 61402507.