^{1}

^{2}

^{1}

^{2}

^{1}

^{2}

Information regarding the current status of urban green space is crucial for urban land-use planning and management. This study proposes a remote sensing and data-driven solution for urban green space detection at regional scale via employment of state-of-the-art metaheuristic and machine learning approaches. Remotely sensed data obtained from Sentinel 2 satellite in the study area of Da Nang city (Vietnam) are used to construct and verify an intelligent model that hybridizes Marine Predators Algorithm (MPA) and support vector machines (SVM). SVM are employed to generalize a decision boundary that separates features characterizing statistical measurements of remote sensing data into two categories of “green space” and “nongreen space”. The MPA metaheuristic is used to optimize the SVM training phase by identifying an appropriate set of the SVM’s hyperparameters including the penalty coefficient and the kernel function parameter. Experimental results show that the proposed model which processes information provided by all of the Sentinel 2 satellite’s spectral bands can deliver a better performance than those obtained from the model based on vegetation indices. With a good classification accuracy rate of roughly 93%, an F1 score = 0.93, and an area under the receiver operating characteristic = 0.98, the newly developed model is a promising tool to assist local authority to obtain up-to-date information on urban green space and develop plans of sustainable urban land use.

In many regions around the globe, fast pace of urbanization leads to various problems including traffic congestion, poor air quality, and noise pollution. As pointed out by Xian et al. [

Urban green space is generally defined as green infrastructure that contains vegetated spaces including urban parks, road, and workplace green space [

Therefore, up-to-date spatial information regarding the current status of urban green space is crucial for urban land-use planning and management. This information has become increasingly difficult to obtain via conventional landscape surveying approaches since green spaces have been constantly modified, fragmented, and dispersed due to the fast pace of urbanization. Moreover, surveying tasks at a regional scale are daunting because of both time and labor consumptions required for field data acquisition, processing, and report. Thus, there is a pressing need for advanced methods to automate the green space surveying task.

Recently, medium-resolution imagery coupled with advanced machine learning methods has provided effective solution for urban landscape survey [

Rafiee et al. [

Li et al. [

It is noted that besides SVM, deep neural networks (DNNs) have also been successfully applied in remote sensing–based land-use classification [

Based on literature review, there is an increasing trend of applying machine learning in remote sensing–based urban green space study. Since the problem of interest is challenging due to the involvement of multivariate and nonlinear data analysis, other advanced machine learning solutions need to be investigated to improve the urban green space detection accuracy. Moreover, the current literature also points out that individual machine learning methods are the commonly employed approach. Hybrid machine learning models that harness advantages of various computational intelligence techniques are rarely investigated to construct urban green space detection models. Specifically, previous studies have mainly relied on the individual machine learning approach [

SVM [

The task of determining hyperparameters of a machine learning model is known as model selection [

The employed metaheuristic approaches include symbiotic organisms search [

Marine Predators Algorithm (MPA), first introduced in [

Remote sensing data obtained from Sentinel 2 satellite in the study area of Da Nang city is used to train and verify the MPA-SVM hybrid model. In this work, the MPA optimized SVM model trained by remote sensing data with all of the Sentinel 2’s spectral bands is compared with the models that use commonly employed vegetation indices including normalized difference vegetation index (NDVI) [

The rest of the article is organized as follows: Section

As mentioned earlier, urban green spaces play a significant role in the urban living environment; they serve a variety of functions including climatic modification, aesthetics, recreation, and physical/mental health improvement. Nevertheless, due to the physical expansion of Da Nang city (Vietnam), certain areas of green spaces have been replaced by impervious surface such as buildings and roads. Therefore, the current status of urban green space in this city needs to be updated in a timely manner and this city has been selected as the study area of this research work.

Da Nang is a crucial coastal city located in Central Vietnam. Da Nang’s location is at 15^{o}55’ to 16^{o}14’North and 107^{o}18’ to 108^{o}20’ East [

Da Nang urban area.

To survey the urban green space status of Da Nang city, remote sensing data in form of spectral bands have been collected from Sentinel 2 on July 16, 2020. These spectral bands (see Table

The Sentinel-2 spectral bands.

Band number | Description | Wavelength range (nm) | Resolution (m) |
---|---|---|---|

B1 | Coastal aerosol | 433–453 | 60 |

B2 | Blue | 458–523 | 10 |

B3 | Green | 543–578 | 10 |

B4 | Red | 650–680 | 10 |

B5 | Red-edge 1 | 698–713 | 20 |

B6 | Red-edge 2 | 733–748 | 20 |

B7 | Red-edge | 773–793 | 20 |

B8 | Near infrared (NIR) | 785–900 | 10 |

B8a | Near infrared narrow (NIRn) | 855–875 | 20 |

B9 | Water vapour | 935–955 | 60 |

B10 | Shortwave infrared/Cirrus | 1360–1390 | 60 |

B11 | Shortwave infrared 1 (SWIR1) | 1565–1655 | 20 |

B12 | Shortwave infrared 2 (SWIR2) | 2100–2280 | 20 |

Gray-scale image demonstrated the Sentinel 2’s spectral bands: (a) B1, (b) B2, (c) B3, (d) B4, (e) B5, (f) B6, (g) B7, (h) B8, (i) B8a, (j) B9, (k) B10, (l) B11, and (m) B12.

In remote sensing field, vegetation indices have been widely used to extract vegetation biophysical information from satellite image data [

Marine Predators Algorithm (MPA), first introduced in [

The searching process of MPA consists of three phases considering three scenarios: (i) high velocity ratio when a prey is moving faster than a predator, (ii) unit velocity ratio when the rates of movement of a prey and a predator are similar, and (iii) low velocity ratio when the rate of movement of a predator is higher than that of a prey. The searching operation of the MPA metaheuristic is demonstrated in Figure ^{st} phase aims at search space exploration and is applied for the first one-third of the searching iteration number; the mathematical equation used to revise the prey position is given by_{B} is a vector including random numbers generated from a normal distribution which mimics the Brownian motion.

Flowchart of the MPA algorithm.

The 2^{nd} phase serves as an intermediate phase and occurs within the second one-third of the searching iteration number. The positions of the first half of the population members are updated as follows:_{L} denotes a vector of random numbers generated from the Lévy distribution which represents the Lévy movement.

The positions of the second half of the population members are updated as follows:

The last phase of the optimization process aims at exploitation of the search space. The population members’ positions are updated in the following equation:

In addition, to model behavior shift in marine predators according to the eddy formation or Fish Aggregating Devises (FADs) effects [

Introduced by Vapnik [_{U}, a hidden target function

To construct a SVM model, it is required to solve the following constrained optimization problem [^{n} and

One advantage of the SVM method is that the explicit formula of _{k} and _{l} is represented as a kernel function _{k}, _{l}):

For multivariate and nonlinear data classification problem, the radial basis kernel function (RBKF) is commonly utilized:

By solving a Lagrangian dual of the aforementioned constrained optimization problem and using a quadratic programming solver, the SVM model used for data classification can be expressed compactly as follows [

This section of the article is dedicated to describing the integrated model used for remote sensing–based urban green space detection. The core of the proposed model is a hybridization of the MPA metaheuristic and the SVM machine learning. These two methods work synergistically to analyze patterns hidden in a set of remotely sensed data collected for the study area of Da Nang urban center. In detail, SVM is used to construct a decision boundary that separates the input data space into two distinctive regions of “nongreen space” and “green space”.

To further enhance the performance of the SVM model, MPA is utilized to autonomously fine-tune the SVM training process by identifying a set of appropriate model hyperparameters. The optimized hyperparameters include the penalty coefficient and the RBKF parameter. In this study, the searching range of the penalty coefficient is [1, 100]; the searching range of the RBKF parameter is [0.1, 100].

These two hyperparameters strongly influence the learning and the predictive capability of the integrated urban green space detection model. A too large penalty coefficient or a too small RBKF parameter leads to overfitted models. On the other hand, a too small penalty coefficient and a too large RBKF parameter tends to construct underfitted models [

The overall model structure is presented in Figure

The proposed MPA optimized SVM for urban green space detection.

The average and standard deviation of gray intensity are given by [_{i,c} = 0,1,2, …, 255. NL = 256 represents the number of discrete color values.

To construct the integrated MPA-SVM model for urban green space detection, it is necessary to prepare a training dataset with assigned ground truth labels. This study has performed sampling process to collect data in the nongreen space and green space areas within the study area (demonstrated in Figure

Demonstrations of the collected image samples using natural color composite: (a) nongreen space class and (b) green space class.

Demonstration of the extracted dataset.

Sample | X1 | X2 | X3 | X4 | X5 | … | X21 | X22 | X23 | X25 | X26 | Y |
---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | 165.96 | 152.68 | 145.48 | 168.60 | 168.72 | 15.58 | 13.33 | 5.21 | 35.76 | 20.36 | 20.32 | 0 |

1 | 152.96 | 101.28 | 95.32 | 138.36 | 158.08 | 17.27 | 7.10 | 9.08 | 31.78 | 12.11 | 16.31 | 0 |

2 | 133.12 | 90.92 | 83.68 | 143.56 | 160.76 | 14.02 | 11.95 | 6.57 | 21.24 | 29.14 | 31.10 | 0 |

3 | 122.40 | 91.84 | 84.52 | 140.20 | 168.60 | 12.01 | 5.77 | 3.92 | 18.13 | 22.55 | 26.75 | 0 |

4 | 119.40 | 90.52 | 76.00 | 104.08 | 124.00 | 13.80 | 11.45 | 9.80 | 0.00 | 21.92 | 21.31 | 0 |

… | … | … | … | … | … | … | … | … | … | … | … | … |

996 | 23.08 | 24.68 | 25.76 | 26.20 | 67.60 | 16.22 | 6.67 | 11.00 | 21.87 | 10.96 | 7.93 | 1 |

997 | 25.16 | 25.56 | 31.16 | 25.64 | 69.84 | 16.18 | 9.55 | 9.90 | 18.37 | 6.81 | 3.75 | 1 |

998 | 25.24 | 30.76 | 30.16 | 35.68 | 72.16 | 8.07 | 5.14 | 8.14 | 18.49 | 7.19 | 5.39 | 1 |

999 | 16.80 | 19.52 | 21.00 | 18.12 | 63.16 | 14.17 | 16.83 | 5.39 | 18.13 | 12.39 | 6.39 | 1 |

1000 | 19.20 | 20.80 | 22.00 | 19.68 | 58.92 | 15.80 | 12.60 | 0.98 | 0.00 | 3.04 | 1.51 | 1 |

It is worth noticing that the extracted dataset including the input features which characterize statistical properties of the spectral bands and the corresponding class labels has been randomly separated into a training (70%) dataset and a testing dataset (30%) [_{Z} and _{D} denote the normalized and the original features, respectively. _{X} and STD_{X} denote the mean value and the standard deviation of the features, respectively.

To optimize the SVM model used for urban green space detection, the objective function of the MPA metaheuristic has employed a 5-fold cross validation process and the indices of false negative rate (FNR) and false positive rate (FPR). This objective function (OF) is described as follows [_{k} and FPR_{k} denote FNR and FPR computed in the

The FNR and FPR indices are given by [

In this study, the source code of the MPA metaheuristic is provided by Faramarzi et al. [

In this section, a set of performance measurement indices is used to express the model predictive accuracy. This set includes classification accuracy rate (CAR), precision, recall, negative predictive value (NPV), F1 score, and area under the receiver operating characteristic curve (AUC) [_{C} and _{A} are the numbers of correctly predicted data and the total number of data, respectively.

Besides the MPA-SVM model which utilizes information provided by 13 spectral bands, this study has employed the MPA-SVM models using the aforementioned vegetation indices as benchmark models. The MPA-SVM employing all of the Sentinel 2’s bands is denoted as MPA-SVM-13B. The benchmark models that use the NDWI, NDVI, SAVI, and MTCI are denoted as MPA-SVM-NDWI, MPA-SVM-NDVI, MPA-SVM-SAVI, and MPA-SVM-MTCI, respectively. The MPA-SVM-13B utilizes the statistical information obtained from all of the 13 spectral bands (i.e., the mean and the standard deviation of each band). Meanwhile, the MPA-SVM-NDWI, MPA-SVM-NDVI, MPA-SVM-SAVI, and MPA-SVM-MTCI employ the statistical information of the vegetation indices of NDWI, NDVI, SAVI, and MTCI, respectively. Therefore, the feature extraction phase of the benchmark models is similar to that of the MPA-SVM-13B. This feature extraction phase also computes the two indices of mean and standard deviation of image patches. The model optimization processes of the constructed models are demonstrated in Figure

Optimization process of the MPA metaheuristic: (a) MPA-SVM-13B, (b) MPA-SVM-NDWI, (c) MPA-SVM-NDVI, (e) MPA-SVM-SAVI, and (e) MPA-SVM-MTCI.

MPA-based optimization results.

Hyperparameters | MPA-SVM-13B | MPA-SVM-NDWI | MPA-SVM-NDVI | MPA-SVM-SAVI | MPA-SVM-MTCI |
---|---|---|---|---|---|

Penalty coefficient | 79.782 | 3.938 | 89.886 | 9.605 | 32.495 |

RBFK | 10.244 | 0.438 | 4.065 | 1.423 | 1.622 |

Cost function values optimized by MPA.

As stated earlier, the constructed dataset has been randomly divided into a training set (70%) and a testing set (30%). The first set is used for model training and the second set is reserved for model validation. Moreover, in order to reliably evaluate the model predictive performance, this study has repeated the model training and prediction processes 20 times. It is noted that the training and testing datasets are resampled in each run. The statistical measurements obtained from this multiple model construction and validation phases are used for model assessment. This repeated process aims at diminishing the variation caused by the randomness in data sampling. The model prediction outcomes are summarized in Table

Experimental result comparison.

Phase | Metrics | MPA-SVM-13B | MPA-SVM-NDWI | MPA-SVM-NDVI | MPA-SVM-SAVI | MPA-SVM-MTCI | |||||
---|---|---|---|---|---|---|---|---|---|---|---|

Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | ||

Training | CAR (%) | 95.429 | 0.586 | 89.564 | 0.516 | 89.514 | 0.755 | 88.929 | 0.439 | 84.686 | 0.785 |

Precision | 0.941 | 0.009 | 0.882 | 0.008 | 0.885 | 0.011 | 0.881 | 0.008 | 0.819 | 0.009 | |

Recall | 0.970 | 0.005 | 0.914 | 0.006 | 0.907 | 0.013 | 0.901 | 0.009 | 0.894 | 0.015 | |

NPV | 0.968 | 0.005 | 0.911 | 0.007 | 0.905 | 0.011 | 0.898 | 0.007 | 0.882 | 0.015 | |

F1 score | 0.955 | 0.006 | 0.897 | 0.006 | 0.896 | 0.008 | 0.891 | 0.005 | 0.855 | 0.008 | |

AUC | 0.991 | 0.002 | 0.940 | 0.007 | 0.943 | 0.006 | 0.936 | 0.006 | 0.916 | 0.006 | |

Testing | CAR (%) | 93.100 | 1.411 | 89.400 | 1.356 | 89.300 | 1.729 | 88.983 | 1.062 | 83.850 | 1.698 |

Precision | 0.916 | 0.024 | 0.881 | 0.024 | 0.894 | 0.029 | 0.886 | 0.017 | 0.803 | 0.022 | |

Recall | 0.947 | 0.018 | 0.911 | 0.017 | 0.895 | 0.020 | 0.893 | 0.023 | 0.888 | 0.030 | |

NPV | 0.947 | 0.019 | 0.908 | 0.020 | 0.893 | 0.022 | 0.895 | 0.017 | 0.881 | 0.031 | |

F1 score | 0.931 | 0.014 | 0.896 | 0.013 | 0.894 | 0.017 | 0.889 | 0.012 | 0.843 | 0.018 | |

AUC | 0.979 | 0.006 | 0.926 | 0.014 | 0.946 | 0.013 | 0.937 | 0.012 | 0.910 | 0.016 |

In terms of AUC score, MPA-SVM-13B is the best model (AUC = 0.979), followed by MPA-SVM-NDVI (AUC = 0.946), MPA-SVM-SAVI (AUC = 0.937), MPA-SVM-NDWI (AUC = 0.926), and MPA-SVM-MTCI (AUC = 0.910). The AUC values of the employed models used for urban green space detection are demonstrated in Figure

ROCs of the prediction models: (a) MPA-SVM-13B, (b) MPA-SVM-NDWI, (c) MPA-SVM-NDVI, (d) MPA-SVM-SAVI, and (e) MPA-SVM-MTCI.

Result comparison in terms of CAR.

Result comparison in terms of precision, recall, NPV, F1 score, and AUC.

Box plot of CAR values obtained from the employed machine learning models.

Box plot of F1 score values obtained from the employed machine learning models.

Box plot of AUC values obtained from the employed machine learning models.

In addition, to confirm the superiority of the proposed MPA-SVM model that employs all of the Sentinel 2’s spectral bands, the Wilcoxon signed-rank test [

Wilcoxon signed rank test results with CAR index.

MPA-SVM-13B | MPA-SVM-NDWI | MPA-SVM-NDVI | MPA-SVM-SAVI | MPA-SVM-MTCI | |
---|---|---|---|---|---|

MPA-SVM-13B | 0.00000 | 0.00010 | 0.00013 | 0.00013 | 0.00009 |

MPA-SVM-NDWI | 0.00010 | 0.00000 | 0.76496 | 0.31380 | 0.00009 |

MPA-SVM-NDVI | 0.00013 | 0.76496 | 0.00000 | 0.49869 | 0.00009 |

MPA-SVM-SAVI | 0.00013 | 0.31380 | 0.49869 | 0.00000 | 0.00009 |

MPA-SVM-MTCI | 0.00009 | 0.00009 | 0.00009 | 0.00009 | 0.00000 |

Wilcoxon signed rank test results with F1 score index.

MPA-SVM-13B | MPA-SVM-NDWI | MPA-SVM-NDVI | MPA-SVM-SAVI | MPA-SVM-MTCI | |
---|---|---|---|---|---|

MPA-SVM-13B | 0.00000 | 0.00009 | 0.00010 | 0.00010 | 0.00009 |

MPA-SVM-NDWI | 0.00009 | 0.00000 | 0.70891 | 0.10843 | 0.00009 |

MPA-SVM-NDVI | 0.00010 | 0.70891 | 0.00000 | 0.31346 | 0.00009 |

MPA-SVM-SAVI | 0.00010 | 0.10843 | 0.31346 | 0.00000 | 0.00009 |

MPA-SVM-MTCI | 0.00009 | 0.00009 | 0.00009 | 0.00009 | 0.00000 |

Wilcoxon signed rank test results with AUC index.

MPA-SVM-13B | MPA-SVM-NDWI | MPA-SVM-NDVI | MPA-SVM-SAVI | MPA-SVM-MTCI | |
---|---|---|---|---|---|

MPA-SVM-13B | 0.00000 | 0.00009 | 0.00009 | 0.00009 | 0.00009 |

MPA-SVM-NDWI | 0.00009 | 0.00000 | 0.00132 | 0.02762 | 0.00642 |

MPA-SVM-NDVI | 0.00009 | 0.00132 | 0.00000 | 0.06195 | 0.00010 |

MPA-SVM-SAVI | 0.00009 | 0.02762 | 0.06195 | 0.00000 | 0.00010 |

MPA-SVM-MTCI | 0.00009 | 0.00642 | 0.00010 | 0.00010 | 0.00000 |

Urban green space detection map of the study area.

Urban green space plays a crucial role in improving the living quality of urban environment and has a positive effect on citizens’ physical/mental health. Nevertheless, few researches have been dedicated to detecting, locating, and quantifying green space in the study of Da Nang urban center. This study is an attempt to fill this knowledge gap by developing a remote sensing and data-driven approach for urban green space detection applied in the study area. Remotely sensed data obtained from the Sentinel 2 satellite are used to train and validate a hybrid metaheuristic-machine learning approach of MPA-SVM. This hybrid method is employed to construct a decision boundary that separates the input space into two distinctive regions of green space and nongreen space.

The experimental results supported by the Wilcoxon signed-rank test show that the MPA-SVM model employing all of the spectral bands is superior to those of the models relying on individual vegetation indices. Good green space detection results with CAR = 93.100%, precision = 0.916, recall = 0.947, NPV = 0.947, F1 score = 0.931, and AUC = 0.979 demonstrate that the proposed method is highly suited for the task at hand. Moreover, the MPA metaheuristic is confirmed to be a capable method for optimizing machine learning models. Accordingly, the green space mapping of the entire study area can be constructed by the proposed hybrid approach. The information provided by the newly developed model can be helpful for local authority to evaluate the status of green spaces in Da Nang city.

Although MPA-SVM has attained a good predictive performance in urban green space mapping in the study area, the proposed approach also has several limitations. The first limitation is that the MPA-SVM model has not been integrated with feature selection algorithms used for dimensionality reduction. In addition, although the RBFK is widely used for SVM-based pattern recognition, the effectiveness of other sophisticated kernel functions (e.g., hybrid kernel functions [

Investigating other state-of-the-art metaheuristic algorithms used for optimizing data-driven urban green space detection

Studying the effects of the maximum number of searching iterations and the number of population members on the performance of the SVM-based urban green space detection models

Employing other advanced texture descriptors to further meliorate the detection accuracy

Performing detection tasks at different time periods to inspect changes and trends in urban green space

Performing urban green space detection using high-resolution satellite images

Incorporating advanced feature selection algorithms and kernel functions into the current model structure

The dataset used to support the findings of this study has been deposited in the repository of GitHub (

The authors confirm that there are no conflicts of interest.

This research was funded by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant no. 105.99-2019.339.