Inversion of Regional Economic Trend from NPP-VIIRS Nighttime Light Data Based on Adaptive Clustering Algorithm

Night lighting is closely related to social and economic development. Inversion of socioeconomic parameters using nighttime light (NTL) remote sensing data is a research hot spot currently. In this paper, a calibration method based on adaptive clustering algorithm for the Suomi National Polar-orbiting Partnership Visible Infrared Imaging Radiometer Suite (NPP-VIIRS) NTL data was proposed to remove background noise in the original imagery. e validity of the calibration method was evaluated through comparing the correlation between the corrected NTL data and the regional economic data.e result indicated that the NTL data obtained by this calibration method have higher correlation with the regional GDP data, and the values of R and the root mean square error (RMSE) were 0.8531 and 133.18, respectively. On this basis, the total nighttime light (TNL)-gross domestic product (GDP) regression model obtained from this paper was used to invert the GDP of Liaoning Province from 2012 to 2016. Using the TNL-GDP regression model established in high-quality regions to verify the fraudulent economic statistics of Liaoning Province, it can be proved that NTL data can be a reliable reference for reecting regional economic development trends.


Introduction
Regional economic data are considered the main basis for understanding the socioeconomic development of the region. Statistics on GDP data are particularly important when we conduct economic censuses. Due to various limitations of traditional statistical methods, the economic data obtained often lack certain accuracy and are also rough in the spatial dimension [1]. Using remote sensing technology can provide more objective and e cient auxiliary information for us. Among many data sources of remote sensing, arti cial light in NTL images is a unique embodiment of social public lighting and commercial lighting. Many studies have shown that NTL is closely related to regional economic development [2][3][4]. NPP-VIIRS NTL data have higher spatial resolution than the data released by the US Department of Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS) in the 1970s, and it has solved the problem of image over-saturation [5][6][7].
e Luojia-1 satellite was launched in June 2018. It provides nighttime light data with higher spatial resolution, providing a new opportunity for nighttime light remote sensing [8,9]. However, considering the study of time series, this paper chooses NPP-VIIRS and DMSP-OLS as data sources. e NPP-VIIRS imagery is a preliminary product. In addition to the NTL of human beings in human settlements, gas burning, eet, aurora, combustion sources, and background noise can all be detected in the NPP-VIIRS NTL data. e confounding factors which are irrelevant to social economic activities should be removed. Among existing research studies, the calibration methods for NPP-VIIRS NTL data are mask calibration method and threshold calibration method. e mask calibration method is to generate a mask with all positive value pixels from the DMSP-OLS imagery and multiply the NPP-VIIRS imagery by the mask to derive denoised NTL imagery as NPP-VIIRS calibration data [1]. e threshold calibration method is to calibrate the NPP-VIIRS NTL data by using the theoretical characteristics of the rivers and lakes that are not luminous in the city. e minimum threshold is set to 0.30 × 10 − 9 Wcm − 2 sr − 1 to identify the noise pixels in NPP-VIIRS NTL imagery [10]. e purpose of this study is to propose an image calibration method combining urban building area classification and adaptive clustering algorithm. We selected the original image, the commonly used DMSP mask method, and the threshold method to calibrate the NPP-VIIRS NTL data and then established a regression model with the urban economic data by using the calibrated NTL data to prove the validity of the proposed method.

Study Area.
Our study area is Liaoning Province, northeastern China, with a total area of 148, 000 km 2 . Liaoning Province is the only coastal province in the northeast region that is also edge-bound. It is the gateway to the northeast region and the eastern region of Inner Mongolia Autonomous Region. e region covers 14 prefecture-level cities, such as Shenyang, Dalian, and so on. As one of the old industrial bases in Northeast China, Liaoning Province has experienced a sharp economic decline since 2013. It is crucially important to note that in early 2017, Liaoning Provincial officials admitted that there is data injection in the province's GDP from 2011 to 2014 [11,12]. When GDP statistics are unreliable, the NTL data closely related to social and economic development can objectively measure regional economic development trends to some extent. is makes Liaoning Province an interesting study area.

Data.
Given the relatively large size of Liaoning Province and the higher spectral resolution of the Landsat data compared with other high spatial resolution images, the images from June to August 2016 of Landsat 8 Operational Land Imager (OLI) were selected (https://earthexplorer.usgs. gov) which include nine spectral bands (six multi-spectral bands, two thermal bands, and one panchromatic band). e available composite NPP-VIIRS NTL data and the DMSP-OLS stable light data used in this study can be obtained from the National Oceanic and Atmospheric Administration National Geophysical Data Center (http://ngdc.noaa.gov). Unlike DMSP-OLS data, the NPP-VIIRS data have not been filtered to remove light detections associated with fires, gas flares, volcanoes, or aurora and the background noise has not been subtracted [13]. e economic statistics are from the China Statistical Yearbook issued by the National Bureau Statistics (http://www.stats.gov.cn).

Methodology
Aiming at the background noise of NPP-VIIRS NTL data, an adaptive clustering calibration method based on neural network classification and adaptive clustering algorithm was proposed. After calibration, we used the TNL data and statistical GDP data to establish a linear regression model. e main process is shown in Figure 1. e first part is to create the dataset of urban GDP data. e second part is to preprocess remote sensing images, and then the backpropagation (BP) neural network algorithm was used to finish classification. e third part is to obtain the corrected NPP-VIIRS NTL data through the adaptive clustering algorithm and calculate their TNL value. Finally, the TNL-GDP regression model was established, and the corresponding accuracy evaluation and comparative analysis were carried out.

Creating a City Economic Statistics Set.
In the selection of model cities, the general public budget revenues of the provinces and some prefecture-level cities in 2015 and 2016 are the main reference. Cities with higher economic quality in the five provinces on the eastern coast of China (Shandong, Jiangsu, Zhejiang, Fujian, and Guangdong) were preselected. Assume that the general public budget income of the i th prefecture-level city in 2015 is Di, and the general public budget income in 2016 is Ei, which satisfies the growth rate of the general public budget revenue of 5% or more in 2016, thus ensuring that its economy is growing steadily. According to the above conditions, 32 prefecture-level cities such as Nanjing, Hangzhou, and Suzhou were selected.

Image Preprocessing and Classification.
e original NTL image used in this paper comes from the NPP-VIIRS NTL monthly data of 2016. e preliminary preprocessing steps are as follows: synthesize 2016 NTL data with monthly NTL data, convert the annual image into Lambert azimuthal equal area projection and resample, and remove negative values. e radiometric calibration was applied for the Landsat 8 OLI multi-spectral imagery to convert digital number (DN) values to the top-of-atmosphere (TOA) reflectance values.
e Fast Line-of-Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) was adopted for atmospheric correction [14]. e BP neural network algorithm was used to classify Landsat 8 OLI preprocessed image. e BP neural network is a generalization of the least mean square algorithm, which transforms sample decision problem into nonlinear optimization problem. It uses a gradient search technique to minimize the cost function, which is equal to the mean square error. e main idea of neural network applied to supervised classification of remote sensing imagery is as follows: the extracted feature of remote sensing imagery is used as the input signal of neural network. After the neural network is trained according to certain rules, the output signal can be classified at the output port. According to the calibration requirements in this paper, the urban built-up area should be effectively extracted, so the image was mainly divided into two categories: the building area and other areas (vegetation, bare land, water, and so on). First, based on image feature recognition, the type of training sample was determined by visual interpretation in conjunction with the corresponding region in Google Earth.
en, we put the samples into the three-layer BP neural network with only one hidden layer to train a feed-forward neural network [15] and get the classification result. e logarithm was selected as activation function and the other parameters of BP neural network were set as follows: initial weight σ � 0.9, weight adjustment speed η � 0.2, momentum factor α � 0.9, and global network error E � 0.1. To achieve classi cation accuracy evaluation, this study has used ground truth regions of interest (ROIs) and confusion matrix. Table 1 shows the accuracy evaluation of classi cation.
As can be seen from Table 1, the overall classi cation accuracy and kappa coe cient of most cities achieved by neural network algorithm are above 0.8.

Adaptive Clustering
Algorithm. Due to the diversity and complexity of spectral information, common defects in the initial classi cation image are isolated points and holes distributed in the classi cation results, which will make the classi cation image lack spatial continuity [16]. Clustering is an ordinary post-classi cation processing method. e principle is to use mathematical morphology operators (dilation and erosion) to combine the selected classi cations with dilation operations and then use the transformation kernel to perform erosion operations [17]. As a classical clustering algorithm, K-means clustering algorithm has good stability [18]. is study improves on this basis. Different from the xed transformation kernel size, this paper has creatively proposed to determine the kernel size adaptively for clustering according to the scale of the urban builtup area in the NPP-VIIRS NTL remote sensing image. It was assumed that when the urban built-up area is relatively concentrated and the scale is large, the built-up areas extracted by classi cation are clustered and lumped. At this time, a smaller kernel size should be selected for clustering. e adaptive clustering size of urban built-up area was calculated from the proportion of the number of pixels above the average brightness value in the urban NTL remote sensing image.
First, we de ne the following parameters: the size of the NTL image of the kth city is m k × n k , and the range of clustering scale CS is CS min ∼CS max . en, the reference clustering scale of the kth city is where λ k denotes the adaptive normalized coe cient of the k th city. e nal clustering scale is where and mod represent the round-up operation and the modulo operation, respectively. e adaptive threshold of the k th city (θ k ) is de ned as where L k i,j indicates the NTL value of the pixel in the ith row and the jth column of the kth city. e function I(x) is de ned as  Figure 1: e framework proposed for this study.

Mathematical Problems in Engineering
where θ k calculates the average NTL value of the effective region of the kth region and is used as an indicator to measure the degree of development in the region. Next, the parameters φ k and λ k are defined as According to the calculated clustering scale, the classification results were clustered, the clustered appropriate files were used as masks to extract the NTL images of the corresponding cities, and the TNL value was calculated. Figure 2(a) shows the original images of NPP-VIIRS NTL data. e NTL data were extracted by using neural network classification results (Figure 2(b)).
ere are so many holes and fragmentary spots that may result in the loss of a large amount of important luminous information. e images in Figure 2(c) are NTL data extracted after adaptive clustering of the classification results. By comparison, it can be found that the results of the calibration method proposed in this paper include the central extent of the cities basically and are sharper than those of Figure 2(b).

Establishing the TNL-GDP Regression Model.
Methods such as linear regression models [19,20] and second-order regression models [21] can be applied to the TNL-GDP regression model. Among them, the linear regression model is easy to implement and has higher relative accuracy [13]. erefore, this study used a linear regression model as follows: where G refers to the statistical GDP data of an administrative unit, L denotes the TNL measured by the sum value of all pixels in the regression analysis based on the sample pixels, and ω indicates a coefficient.

Regression Result.
In order to further verify that the calibration effect which uses the combination of neural network classification and adaptive clustering algorithm is good, we compare the scatter diagrams of linear regression obtained by the original image of the NTL data and the calibration methods commonly used in other papers. e R 2 value and the RMSE value are used as indicators to measure the calibration effect, so that the evaluation is more objective. By comparing the R 2 values and RMSE values of the regression models listed in Figure 3, the R 2 value of the calibration method used in this paper is 0.8531 and the RMSE value is 133.18. Both of them are better than other methods. It shows that the TNL-GDP regression model obtained by the calibration method of this study has higher correlation and better fitting effect. en, we selected Nanjing, Hangzhou, Suzhou, Fuzhou, Beijing, Shanghai, and Guangzhou, which are 7 cities with relatively accurate GDP data as the research objects, to verify the fitting effect of the calibrated NTL data on the GDP data in the time dimension. e results are shown in Table 2. It can be seen that R 2 is relatively large and the fitting effect is better.

Economic Inversion Results of Liaoning Province.
is study selected Liaoning Province as an economic inversion object because in the first half of 2017, the relevant departments of Liaoning Province admitted that the economic data during the period of 2011-2014 were not true and the GDP data were relatively high. It is difficult to quantitatively analyze the water content of the GDP data during 2012-2016 only by the inversion results of the linear regression model. However, combining the correlation between the total amount of urban NTL and economic development, we can understand the general trend of urban economic development in a macroscopic way.  0   km  120  90  60  30  15  0  km  120  90  60  30  15  0  km  160  120  80  40  20  0  km  120  90  60  30   According to economic statistics, the GDP data of Liaoning Province have steadily increased during the period of 2012-2015, and the GDP data in 2016 have declined in a cli -like manner. e inversion results show that the GDP data of Liaoning Province have been declining since 2013, and the 2014 GDP data show the most rapid decline, while in 2016, the GDP data trend has rebounded. GDP growth is related to general public budget revenue. e general public budget refers to the scal revenue with taxation as the main body, which is arranged to maintain the normal operation of state institutions' budget [22]. e growth of GDP will a ect the growth of general public budget revenue, and the increase  or decrease of the general public budget will react to economic growth [23,24]. In order to prove the rationality of the inversion trend, Figure 4(b) lists the growth rate of the inversion GDP data and the government's general public budget revenue during the period of 2012-2016. Both the inversion GDP data and the government's general public budget revenues maintained positive growth during the period of 2012-2013; both of them showed negative growth during 2013-2015; during the period of 2015-2016, the growth rates of the two have turned into a positive growth. After the official has released that the GDP economic data were fraudulent, the corresponding adjustments and corrections were made for the GDP data in 2016. erefore, compared with the 2015 statistics, there is a cliff-like decline. But whether it is from the perspective of total NTL or the government revenue, the economic development of Liaoning Province has shown a warming trend in 2016. e results suggest that the TNL-GDP regression model based on the classification of urban buildings and adaptive clustering algorithm proposed in this study has a good correlation with socioeconomic data and performs well in inverting the trend of urban economic development.

Conclusion
In this paper, we propose a NPP-VIIR NTL data calibration method which combines urban build-up area classification and adaptive clustering algorithm and then construct a linear regression model using the calibrated NTL data and urban economic data. rough the comparative analysis of several common NTL calibration methods in 32 prefecture-level cities along the southeast coast of China, the calibration method of NPP-VIIRS NTL data in this study has significantly improved the correlation between the total NTL and urban economic data. Finally, the TNL-GDP linear regression model derived from this study was used to invert the GDP of 14 prefecture-level cities in Liaoning Province in China during 2012-2016. ese findings show that the model can be a powerful tool for inverting the regional economic development trend, especially in the regions where economic census data are difficult to access. erefore, follow-up research can be carried out around the TNL-GDP inversion of these difficult-to-obtain census data areas. In addition, this study can be applied to existing GDP statistics problems, such as inaccurate local GDP data, providing an effective method for verifying the accuracy and authenticity of local GDP data.

Data Availability
e datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.