Deducing Leading Factors of Spatial Distribution of Carbon Reserves in Nanjing Metropolitan Area Based on Random Forest Model

Improving carbon reserves is considered to be an important way to alleviate global warming. However, there is a lack of research work based on the perspective of metropolitan area, and there is also a lack of analysis on the leading influencing factors of spatial distribution of carbon storage in subregions of metropolitan area. In this study, Nanjing metropolitan area (NMA) is taken as the research area, and the InVEST model is used to calculate the spatial distribution of regional carbon reserves, and the evolution of carbon reserves distribution in recent 20 years is analyzed. Then, based on the random forest (RF) model, taking the whole study area and subareas as the research scope, a regression model of each selected impact factor and carbon reserves is established, and the leading factors of spatial distribution of carbon reserves in NMA are obtained. The results show that the overall carbon reserves level in the study area is in a downward trend. Through the application of the RF model, the leading factors of the spatial distribution of carbon reserves in NMA and its subareas are derived. The research proves that the application of the RF model in the analysis is helpful for city planners and governments to make plans and improve regional carbon storage more effectively.


Introduction
Global warming is an international challenge that all mankind needs to face in the twenty-first century [1]. To meet this challenge, China will strive to achieve peak carbon dioxide emissions by 2030 and carbon neutrality by 2060. Carbon reserves are widely regarded as an important indicator of ecosystem services [2]. Carbon stored in terrestrial ecosystems plays a very important role in global carbon cycle, atmospheric carbon dioxide concentration, and global climate change. Terrestrial ecosystems capture greenhouse gases such as CO 2 and CH 4 through forests, grasslands, and other green infrastructure to regulate the regional climate and increase the carbon reserves [3].
Since 1850, changes of LULC have caused the global terrestrial ecosystem to lose 145 Pg C [4]. erefore, quantitative analysis and prediction of ecosystem carbon reserves based on LULC plays an important role in achieving carbon neutrality, improving global climate, and fully protecting the ecosystem. Cities are the areas with the most transformation, the fastest change rate, and the most frequent land use activities in the urban terrestrial ecosystem [5]. With the rapid development of industry, cities have become the main areas where chemical fuels are burned, and more than 80% of CO 2 emissions come from cities [6,7]. Urban carbon reserves have become an important factor affecting the regional climate and regional ecosystem.
At present, the methods used to study the carbon reserves at home and abroad mainly include biomass method [8,9], bookkeeping method [10], and inventories method of the Intergovernmental Panel on Climate Change (IPCC) [11,12]. In recent years, most scholars began to use the Integrated Valuation of Ecosystem Services and Trade-offs (InVEST) model, which runs fast, requires less data, and is practical, to study the carbon reserves in watersheds [13,14], cities [15][16][17], special geographical area [18], and other areas.
ose points out that the study of carbon reserves has important reference value for the management of terrestrial ecosystem carbon pool. With the deepening of China's urbanization process, the development of Chinese cities has shifted from focusing on each city's own urbanization to focusing on interregional cooperation and development among different cities [19,20].
ere have been examples of establishing large metropolitan areas across municipal or even provincial administrative regions. At present, the academic research on urban carbon reserves mainly focuses on the change of carbon reserves based on LULC change [21][22][23][24]. Some studies analyzed the evolution of historical LULC in the region, and used some simulation models such as CA-Markov to predict the LULC change in multiple scenarios in the future, so as to deduce the future change trend of regional carbon reserves [25][26][27].
ose studies have achieved fruitful results in solving the corresponding research issues, but there is still a lack of analysis and research work based on the perspective of metropolitan area, as well as the analysis of the leading influencing factors of spatial distribution of carbon reserves in subregions at all levels in the metropolitan area. e random forest (RF) model is an ensemble learning algorithm [28,29], which aggregates multiple classification trees, and each tree is assigned by random vectors sampled independently. Among the existing algorithms, the RF algorithm has good accuracy and can be effectively run on large data sets [29,30]. It can be used to assess the importance of each feature in the classification [31]. While using RF model, researchers do not need to worry about the problem of multicollinearity. At the same time, RF model can calculate the nonlinear interaction between variables and reflect the interaction between them [32]. Besides, RF model are not sensitive to outliers [29]. It has shown amazing performance on classification and regression issues. Compared with the other methods, the random forest is suitable for processing high-dimensional data, is not easy to generate overfitting, and performs well when processing the data [31,33,34]. At present, RF model has been widely used in medicine [35,36], economics [37,38], remote sense [39,40], and some other fields, but it has not been widely used in the research of carbon reserves.
To sum up, we select Nanjing metropolitan area (NMA), the first national metropolitan area in China, as the research area. We calculate the regional carbon reserves spatial distribution by using the InVEST model, and verify and analyze the evolution of carbon reserves distribution in the past 20 years (2000-2020). e core of our work is to establish the regression model of each influencing factor and carbon reserves based on the RF model, and obtain the leading factors of spatial distribution of carbon reserves in NMA, in order to provide the basis for realizing carbon neutralization, improving the carbon reserves capacity of the ecosystem, and reasonably formulating the land use strategy in NMA.

Study Area and Data
Collection. Nanjing Metropolitan Area (NMA) is the first planned national metropolitan area in China (Figure 1). NMA is in the core area of the urban belt along the Yangtze River in eastern China, spanning Jiangsu and Anhui provinces. NMA includes Nanjing City, Zhenjiang City, Yangzhou City, Huai'an City, Ma'anshan City, Chuzhou City, Wuhu City, Xuancheng City, Liyang City, and Jintan County in Changzhou City. It includes 33 municipal districts, 11 county-level cities, and 16 counties, with a total area of 66,000 square kilometers. Located in the alluvial plain formed by humid monsoon climate in East China, NMA belongs to the humid area in the north subtropical zone. e regional vegetation belongs to the mixed forest of deciduous and evergreen broad-leaved vegetation in the northern subtropical zone. e key service function of this area is urban ecology. However, with the unlimited expansion of cities and metropolitan areas, the ecological carrying capacity of NMA is seriously overloaded, the ecological functions are reducing, the pollution is serious, and the quality of human settlements is reduced [41][42][43].
NMA involves a number of different administrative divisions belonging to different provinces, and the regional scale is large, and the economic volume, ecological environment, and other factors of each constituent city are very different. erefore, we not only explore the influencing factors of carbon reserves change in NMA, but also explore the differences of those factors in different subareas of NMA. Generally speaking, the planning of administrative regions at all levels in China is mostly based on the county level or above (county is city's next administrative unit, Xi'an or Qu in Chinese). Taking each county-level region in NMA as a unit and combining with the development plan of NMA published by the National Development and Reform Commission of China, the research area is divided into three subareas: urban core area (UCA), urban planning area (UPA), and urban expansion area (UEA). Table 1 lists the county-level administrative regions included in the study area and subareas. All the data used in the thesis are projected and resampled in ArcGIS software before further processing. e data describing the spatial distribution characteristics of annual precipitation come from the National Earth System Science Data Center (https://www.geodata.cn) of China's national science and technology infrastructure. e extracted images of elevation and slope are from GDEM·V2 of Land Processing Distributed Activity Archive Center (LP DAAC) of NASA. Normalized Difference Vegetation Index (NDVI) data is calculated by using 59 Landsat-8 remote sensing images taken in 2020. Potential evapotranspiration, average temperature, road network, railways network, population, and urban water system data are from the Data Center of Resources and Environmental Sciences, Chinese Academy of Sciences (https://www.resdc.cn/).

Study Design.
is thesis can be divided into the following three steps: First, taking Nanjing Metropolitan Area (NMA) as the research area, the spatial distribution of regional carbon reserves is calculated by using the carbon module of InVEST model, and the evolution of carbon reserves distribution in recent 20 years (2000-2020) is analyzed. Second, we select the land use types and the driving factors of land use evolution required for the research as the influencing factors adopted in the research.
ird, based on the RF model, the regression models of various influencing factors and carbon reserves are established for the whole study area and each subarea to obtain the dominant factors of NMA carbon reserves spatial distribution.

Calculation and Evolution Analysis of Carbon Reserves Based on InVEST Model.
is paper discusses the regional evolution trend of carbon reserves in NMA based on In-VEST model. We calculate the spatial and temporal distribution of Carbon reserves in NMA from 2000 to 2020 using the Carbon module of InVEST model. e carbon reserves calculated by Carbon module include four carbon pools, i.e., aboveground biomass carbon pool, underground biomass carbon pool, soil carbon pool, and dead organic matter carbon pool. e calculation formula of carbon reserves is as follows [15]: where C total is the regional total carbon reserves, C above is the aboveground biomass carbon reserves, C below is the underground biomass carbon reserves, C soil is the soil carbon reserves, and C de a d is the dead organic matter carbon reserves (t/km 2 ). Carbon reserves is affected by factors such as location, altitude, hydrothermal condition, and land use, which leads to differences in carbon sequestration characteristics in different regions, thus affecting terrestrial ecosystems [31,44]. e values of carbon density in the study area are quoted from the calculation results of Chinese scholar Liu and Zhu [45], etc., as shown in Table 2.
Based on the above, combined with the LULC in 2000, 2010, and 2020, we can obtain the carbon reserves distribution of NMA in the corresponding years. en, according to the principle of equal interval, the distribution of carbon reserves in NMA obtained in the previous steps is divided into level A, B, C, and D from low to high, and the proportion of the four levels in each year is calculated, respectively. Finally, the carbon reserves data of adjacent years are subtracted to get the change of carbon reserves every 10 years.

Deducing of Leading Factors of Carbon Reserves in NMA Based on RF Model.
e distribution of regional carbon reserves is influenced by many factors, which are the result of the interaction of physical and chemical conditions of various types of land and various factors such as nature and society [31,44]. e impact of land use change on regional carbon reserves is very prominent, and the driving factors of land use change can also affect the spatial and temporal changes of regional carbon reserves [46]. erefore, in this study, various land use types and commonly used driving factors of land use evolution are used to infer and analyze the driving factors of carbon reserves evolution. e driving factors selected in this study include DEM (elevation), SLOPE, TEM (temperature), PRE (precipitation), PET (potential evapotranspiration), RRO_ROAD (proximity to roads), RRO_RAIL (proximity to railways), POP (population), GDP (Gross Domestic Product), RRO_WATER (proximity to water bodies), FOREST COVERAGE, and LULC. e data sources of DEM, TEM, PRE, PET, GDP, and POP have been given above. SLOPE is calculated from DEM data in ArcGIS using the slope calculation tool in the surface analysis module and FOREST COVERAGE is obtained by converting NDVI data of Nanjing for the whole year of 2020. RRO_WATER and RRO_ROAD are calculated by applying Euclidean distance tool to the corresponding data. e impact factors are shown in Figure 2.
According to the scale of the study area and the actual situation of the required data, a sampling surface with a unit grid of 3 × 3 km was set up in NMA by using ArcGIS software, and the grid data including carbon reserves distribution and various influencing factors are sampled. On this basis, taking the carbon reserves of NMA in 2020 as dependent variable and various land use types and driving factors as independent variable, the RF model is used to deduce and analyze the leading factors of carbon reserves' distribution.
e random forest algorithm is a combination classification algorithm proposed by Breiman [28], and it is a supervised learning algorithm that integrates multiple Computational Intelligence and Neuroscience decision trees [28,35]. It is often used for classification and regression in machine learning. Classification and regression can be completed based on the data processing results. Its working principle is to build multiple decision trees in a given training time, and output the model as the class (classification) or average prediction (regression) of each tree. e core of the RF model is the binary tree model finally generated by CART [28]. Error estimation and tree pruning are carried out by using the recapture technology, and the best model is selected as the final decision tree.
In the process of learning and training of RF, data are generally divided into training set and test set for modeling and simulation (prediction) [33]. Firstly, the training set is used for training and learning, and the optimal parameters are found. Secondly, the test set is used to evaluate the actual performance of the model learned from the training data set. e selection of training samples and test samples is the precondition of modeling random forest. Figure 3 is a framework of the principle of RF model in regression analysis.
In this study, the RF model is constructed by using the extracted related variables. According to the method of modeling, the data (feature) sample set is divided into two parts: training set (70% of the total) and test set (30% of the total). e error and goodness-of-fit (R 2 , coefficient of determination) between the predicted values and actual values of test set are analyzed. Generally speaking, if the R 2 of the model is greater than 0.8, the model can be considered as meeting the requirements. After the running of the model, we can get the order of the characteristic importance of each influencing factor, so as to get the leading factors of carbon reserves distribution.

Analysis on the Distribution and Evolution of Carbon
Reserves in NMA. Figure 4 shows the spatial distribution of carbon reserves in NMA based on the InVEST model. It can be seen from the figure that rivers, lakes, and urban built-up areas are carbon reserves depressions. Taken together, the overall carbon reserves of the study area decreased obviously, which reflects the trend of outward expansion centered on the built-up areas of various administrative regions. Table 3 shows the statistics of carbon reserves evolution of each level in NMA. It can be found that the region with the lowest carbon reserves grade expanded significantly, from 9.52% in 2000 to 13.85% in 2020. e area proportions of the two middle carbon reserves levels both decreased, while the proportion of the region with the highest grade increased slightly, from 9.54% in 2000 to 10.11%. e distribution of regional carbon reserves change values can be seen from Figure 5. In combination with Figure 5 and Table 3, the change rate of carbon reserves is faster in the first decade (2000-2010) than in the second decade (2010-2020). e first decade corresponds to the so called "golden decade" of China's urban development. During this period, China's major cities are in a period of rapid expansion, focusing on the development of urban scale [19,47]. e change in the second decade reflects that China has begun to pay attention to solving urban ecological problems and intends to control the speed of urban expansion [48]. e characteristics of urban development have changed from extensive expansion to refined urban details and infrastructure construction.
In summary, the development of cities has reduced the regional carbon reserves, although the planning work of some regions has emphasized the ecological restoration, objectively improved the carbon reserves level of the regions. However, minor repairs in some regions cannot reverse the decline in urban carbon reserves. In addition, the pace of urbanization cannot be stopped, so it is impossible to achieve the goal of promoting regional carbon neutrality by limiting the urban development. erefore, it is urgent to analyze the metropolitan area and all levels of subareas to obtain the leading influencing factors of the spatial distribution of carbon reserves as planning's support, so as to make reasonable planning and arrangement for the spatial elements of all levels of regions in order to achieve the goal of carbon neutrality.        Figure 6 shows the comparison between the predicted and actual values of the simulation predictions, and the prediction accuracy is good overall. In summary, the overall performance of the model is good, and the prediction ability is better, which indicates that the RF model has good application effect in NMA. e importance ranking of factors obtained from the operation of the RF model is shown in Figure 7. It can be seen from the figure that the relative importance of the corresponding impact factors in each area is not the same, indicating that it is very necessary to carry out internal segmentation and further analysis of the metropolitan area. e importance of the sixteen factors is different, and the relative influence has a very wide gap. erefore, the factors whose characteristic importance is less than 5% are not listed in this thesis. It can be seen from the figure that the importance of each factor in the whole area of NMA ranged from large to small is x 16 > x 02 > x 14 > x 11 , that in the UCA ranged from large to small is x 11 > x 13 > x 14 > x 16 , that in the UPA ranged from large to small is x 16 > x 14 > x 11 > x 13 , and that in the UEA ranged from large to small is x 16 For NMA, LULC_Water bodies, slope, vegetation coverage, and LULC_Artificial surfaces are the main leading factors, of which LULC_Water bodies account for almost half (42.80%) and is the most important factor affecting the distribution of carbon reserves in NMA. NMA is located in the middle and lower reaches of the Yangtze River with abundant rainfall. ere are a large number of lakes in the area, among which Xuanwu Lake and Gucheng Lake of Nanjing City, Gaoyou Lake of Yangzhou City, Hongze Lake of Huai'an City have relatively large water scales, which directly affects the regional carbon reserves distribution. In urban areas, the vegetation coverage is better in the areas with larger slope, such as Mount Zijin, Mount Niushou, and Mount Lao, where the carbon sequestration is relatively higher. For UCA, vegetation coverage, LULC_Cultivated lands, LULC_Artificial surfaces, and LULC_Water bodies are the leading factors, of which, vegetation coverage has the largest effect (64.10%) and contributes more than half. e UCA has a high degree of urbanization and is the core functional area of NMA. e artificial surfaces account for the largest proportion, and human activities have the most profound impact on the surface cover. e most important source of carbon reserves in this area is the vegetation. Compared with forests, the vegetation coverage derived from NDVI can better reflect the regional vegetation cover, and has a stronger explanation power for the regional carbon reserves distribution. For UPA, LULC_Water bodies, LULC_Artificial surfaces, vegetation coverage, and LULC_Cultivated lands are the leading factors that dominate the distribution of carbon reserves in the region, of which LULC_Water bodies, LULC_Artificial surfaces, and vegetation coverage have the similar explanatory power (28.30%, 20.60%, and 20.30%), with the comprehensive impact rate exceeding half. A large number of lakes are distributed in the UPA, and the urban development in the region cannot be underestimated. At the same time, the forest coverage is better than that of UCA. erefore, the contribution to the distribution of carbon reserves is higher than that to UCA. For UEA, LULC_Water bodies, slope, and LULC_Artificial surfaces are the leading factors affecting the spatial distribution of regional carbon reserves, of which the contribution of LULC_Water bodies is more than half (54.40%). e urbanization rate of UEA is in the relatively lowest position in NMA, and the area distribution of water bodies involved in the region is relatively larger, such as part of Hongze Lake. In addition, a large number of mountains with good coverage of associated vegetation directly affected the regional carbon sequestration and carbon reserves distribution.

Limitations and Further Research.
is paper provides a new idea for the study on the influencing factors of the spatial distribution of regional carbon reserves, and the indexes system adopted can provide a reference for the study of urban carbon reserves. is paper can help landscape and urban planners planning, but also has some limitations. First, InVEST model can be used to evaluate ecosystem services at different scales, and the evaluation results can be displayed visually. However, the evaluation results of this model are greatly affected by the land use classification. For example, in this study, land use types are divided into six categories, and each land use type is assumed to be homogeneous internally, so the model lacks consideration of regional internal heterogeneity. In addition, the differences       Computational Intelligence and Neuroscience based on land use, factors other than land use may be overlooked in the analysis. Firstly, in future research, we can model different types of cities and evaluate them according to local conditions. Secondly, it can take more cities as the research object, and the commonness and difference between each category can be analyzed at the same time, thus proposing a more universal evaluation indexes system, determining the commonness problems between cities according to the regression analysis results, and proposing a more universal planning strategy.
irdly, we can list other influencing factors except land use in the follow-up research, so as to carry out analysis and discussion. At the same time, as a holistic and systematic project, carbon reserves research should be actively integrated into related disciplines to promote the overall cooperation among different disciplines.

Conclusion
In this study, Nanjing metropolitan area (NMA) was taken as the study area, and the Carbon module of InVEST model was used to calculate the spatial distribution of regional carbon reserves, and the evolution of carbon reserves distribution in recent 20 years was analyzed. en, based on the random forest (RF) model, taking the whole study area and subregions at all levels as the research scope, a regression model of each influencing factor and carbon reserves is established, and the leading factors of spatial distribution of carbon reserves in NMA were derived. e results show that the overall carbon reserves in the study area is in a downward trend. e leading factors of the whole study area are LULC_Water bodies, slope, vegetation coverage, and LULC_Artificial surfaces, among which the influence of LULC_Water bodies is the most obvious. e leading factors of urban core area are vegetation coverage, LULC_Cultivated lands, LULC_Artificial surfaces, and LULC_Water bodies, among which the vegetation coverage contributes the most. e leading factors of urban planning area include LULC_Water bodies, LULC_Artificial surfaces, vegetation coverage, and LULC_Cultivated lands, among which the explanatory power of LULC_Water bodies, LUL-C_Artificial surfaces, and vegetation coverage is similar, and the combined contribution offers more than half. e leading factors of urban expansion area include LULC_Water bodies, slope, and LULC_Artificial surfaces, among which LULC_-Water bodies is the most important one.
Based on the results of the RF model, we can get the order of the characteristic importance of factors, thus providing the basis for decision-making. According to the research results and the above analysis, we can put forward the direction of NMA and the planning of each subareas, especially in the context of the increasingly prosperous carbon trading market. How to improve the regional carbon reserves more efficiently in urban planning, the RF model can have more room to play its role.
Data Availability e dataset can be accessed upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.