Refining National Forest Cover Data Based on Fusion Optical Satellite Imageries in Indonesia

. Precision mapping towards tropical forest cover data is critical to address the global climate crisis, such as land-based carbon measurement and potential conservation areas identifcation. In the recent decade, accessibility to open public datasets on forestry is rapidly increased. However, the availability of fner-resolution of forest cover data is still very limited. As a developing country with numerous rainforests, Indonesia sufered multifaceted threats, particularly deforestation. Tus, precise forest cover data can be useful to fulfll Indonesia’s nationally determined contribution to climate change. In this study, we mapped the national forest cover data for Indonesia using a new object-based image classifcation approach based on combined Planet-NICFI and Sentinel-2 optical imageries. Our fndings had relatively high accuracy compared with the other studies, with the F score ranging from 0.67 to 0.99 and can capture the fragmented forest in fne resolution (i.e., ∼ 5 m). In addition, we found that Planet-NICFI bands had a higher contribution in predicting forest cover than Sentinel-2 imageries. Utilizing forest cover data for further analyses should be performed to help the achievement of national and global agenda, e.g., related to the FOLU net sink in 2030 and the Global Biodiversity Framework.


Introduction
For the last few decades, ecosystem service has been the main issue in international nature conservation and rural development [1], and it is still a concern as the exploitation of natural resources, human-induced land use change, and global greenhouse gasses continue at a high rate [2]. Forests are not only afected by human activity but also serve an important role in mitigating upcoming threats such as landslides, foods, and loss of biodiversity [3]. Tropical rainforests, in particular, are known for their richness and contribution to the earth's landbased ecosystem. Indonesia is one of the countries that have relatively massive forests, accounting for 39% of Southeast Asia's forest extent [4]. Depending on the altitude and regional climate, it can range from lowland to mountainous forests. Each of these forest types contributes signifcantly to the ecosystem services that humans rely on, such as raw materials, reservoirs of biodiversity, soil protection, sources of timber, biomedicines, carbon sequestration, climate, and water regulations [5][6][7]. Indonesian tropical forests also play a critical role in the livelihood of local communities and the national economy [8].
However, deforestation has been one of the main issues of climate and biodiversity crises. Te negative environmental consequences of tropical deforestation were far-reaching and long-lasting [9]. Tis rapid deforestation rate has contributed to biodiversity losses due to habitat degradation and fragmentation, particularly in Indonesia [10]. Te latest studies from Margono et al. [11] suggest that forest loss in Indonesia has been recorded as one of the highest rates of primary losses in the tropics for the period 2002-2012, with annual primary forest cover loss in 2012 being the highest, totaling 0.84 Mha, more than the ofcial forest loss report of Brazil (0.46 Mha). During the same period in other tropical rainforest countries, Mexico lost 0.28 Mha and Colombia with a primary forest cover loss of 0.69 Mha [8]. Indonesia, as a developing country, still struggles with infrastructure development which puts the forest with all the ecosystem services that it provides at risk [12]. Many policies drive investment in Indonesia to support economic growth in the form of infrastructure and land-based permits that will directly threaten forest cover. In Kalimantan and Sumatra, the amount of foreign investment toward infrastructure and extractive industries is fve times greater than international funding for forest conservation schemes [13]. Barri et al. [14] analyzed that 50% of total deforestation (5.72 million hectares) in 2013-2017 occurred in logging concession, timber and oil palm plantations, and mining. Other numerous research studies also reported some factors that contribute to deforestation in Indonesia, such as road development [15], agricultural expansion [16], wildfres [17], and illegal logging and encroachment [18].
Halting deforestation and retaining the intactness of the forest ecosystem is a prevalent challenge in climate change mitigation [19], which may be assessed by the reliable assessment of carbon storage based on accurate mapping of forest types. Furthermore, the spatially explicit mapping of forest cover is critical for carbon stock estimation [20], wildfre behavior simulation [21], and wildlife habitat modeling [22]. In this regard, mapping the precise and reliable expected forest cover will support monitoring which can also be used as input in forest management and policymaking related to sustainable forest management.
With rising satellite availability and image resolutions, remote sensing data archives are continuously growing, possibly enabling users to access and analyze enormous time-series datasets. Remote sensing has become popular as a valuable tool for monitoring land cover, and it also works well for forest cover identifcation. Many previous research studies have shown that remote sensing data can predict forest and other land cover types with excellent accuracy [23][24][25][26][27]. In addition, combining two or more sensors can improve the model's performance in depicting forest cover data [28][29][30].
Te methods for identifying forest cover in Indonesia rapidly grew from 1995 until the recent years. Regarding [31], the map of Indonesia's current land cover and land use was created using visual interpretation based on medium resolution imageries (i.e., Landsat). Te accuracy of the forest cover classes is reported to be high (>90%), based on feld verifcation and the operators' local knowledge. However, visual-interpreting methods were relatively timeconsuming, and the use of numerous interpreters over space and time compromises the consistency of the output map product [11]. Margono et al. [24] conducted a study about forest cover identifcation using a pixel-based method. Machine learning (ML) algorithms (e.g., random forest, support vector machine, and regression trees) typically produce better results than conventional classifers since they do not require preconceptions regarding the distribution of the input data [32]. Machine learning is a subfeld of artifcial intelligence concerned with the development and investigation of systems that can learn from data. In the machine learning model, there are three approaches: supervised learning, semisupervised learning, and unsupervised learning. A machine learning system could, for instance, be trained on images to learn to diferentiate between forest and nonforest images. After learning, it can then be used to classify new images into forest and nonforest object. Te fandom forest algorithm is a classifcation method that used multiple and random subsets of data and features to produce multiple decision trees. A random forest classifer (RF) is a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest [32].
Currently, the need for forest cover data with very high spatial resolution is increasing to support monitoring, reporting, and decision-making [33]. Nevertheless, recently available data related to the precise forest cover data is very limited, e.g., global forest change (∼30 meters; [34]), PAL-SAR forest (25 meters; [35]), and Indonesia's primary forest cover (30 meters; [24]). By mapping the presence of forests in Indonesia, a consistent forest distribution and area can be obtained, which can then be used as a base map and also as a reference for management and information in Indonesia. Tis is because the maps produced use inputs that are specifc to conditions in Indonesia, so the maps that are the resulting data are more specifc when compared to globally processed forest maps. Forest mapping is essential because it can be used to support preservation programs, such as efforts to protect and preserve biodiversity. In the presence of fragmented forests, the area and distribution of forests can sustain biodiversity existence. Fragmented and isolated forest sections vary greatly in ecology and composition and may not support the same level of biodiversity or ecosystem function as forests of the same size but within large forest systems [36]. Mapping of the forest in Indonesia also plays an essential role in forest management. Te spatial and temporal variation in primary forest loss documents the continuing appropriation of natural forests within Indonesia, including the increasing loss of primary forests in wetlands and in land uses meant to limit or prohibit clearing, with implications for accurate greenhouse gas emissions estimation.
In this research, a random forest classifer (RF) is used to classify forest cover using an object-based image classifcation approach. Te primary objective of this study is to demonstrate the simplicity of the random forest ensemble method and its efcacy in image classifcation. Tis study's ultimate objective is to achieve the utmost classifcation accuracy by implementing high-quality image data acquired by a modern sensor (Sentinel-2 and Planet) and a mathematically robust classifer that is a random forest. Results from this study highlight the importance of spatially and temporally explicit data in bringing transparency to an important land use dynamic. Here we present a refned forest cover dataset at the national level in Indonesia with a spatial resolution of ∼5 meters based on spectral combinations from Sentinel-2A and Planet-NICFI imageries using the random forest algorithm. Moreover, we also evaluated our data using reference points to assess model performance and compared the forest cover data with another forest cover dataset. In addition, we also explored current forest cover dynamics in the 2017-2021 period to improve national forest monitoring in Indonesia.

Study Area.
Tis study was conducted in Indonesia ( Figure 1), a tropical country that harbors various forest ecosystems (e.g., dryland forest, swamp forest, and mangrove forest) with a total area of about 189.1 million ha [11]. Local communities had signifcant connectivity with the forest ecosystems [37]; therefore, understanding more precise forest cover data is crucial in Indonesia.

Training Data.
We used binary information related to the forest cover (i.e., forest and nonforest) for the response variable. We conducted a visual interpretation of the forest cover information using very high-resolution satellite images (e.g., Planet-NICFI and ESRI World Basemap) in 45 selected plots (1 × 1 degree) to capture the various forest ecosystems that occurred across Indonesia ( Figure 1). In this study, we also compiled the training data from the feld surveys and secondary sources of the Ministry of Environment and Forestry (MoEF) in 2017 and 2021. Afterward, we collected 45,119 and 38,886 points for forest and nonforest information, respectively, by using homogenous purposive random sampling in [38].

Data Preprocessing.
Sentinel-2 is a high-resolution, multispectral sensor developed by the European Space Agency (ESA) to support Copernicus Land Monitoring research (https://sentinels.copernicus.eu/web/sentinel/ home). In collaboration with Planet, Norway's International Climate and Forests Initiative (NICFI) provides very high-resolution imagery to support tropical forest monitoring, cope with global climate change, retain biodiversity, and facilitate sustainable development (https:// www.planet.com/nicf/). Tis study used combined optical sensors of the harmonized Sentinel-2 multispectral instrument (MSI) Level-2A (spatial resolution: ∼10 to 20 meters, except B1 with 60 meters resolution) and Planet-NICFI (spatial resolution: 4.77 meters) based on surface refectance data to identify forest cover data in the study area.
Cloud cover was the most common obstacle in using optical remotely sensed data to retrieve land cover information, particularly in the tropical region [39]. We used the Sentinel-2 quality assurance data of the cloud mask to eliminate the cloud pixels of the spectral refectance data. We performed flter median over a yearly time window during each time frame of analysis (i.e., 2017 and 2021) to obtain the nearly free cloud images [40].

Data Covariates.
To predict forest cover data, we used 20 variables retrieved from Sentinel-2 Level A and Planet-NICFI imageries as the model predictors. Te covariates consist of refectance and spectral indices from both sensors. Table 1 shows the details of the predictors used in this study.

Forest Cover Prediction.
In this study, we performed a random forest algorithm with the diferent parameterization of the number of trees (N): N � 50, N � 100, N � 500, and N � 1000 to produce forest and nonforest categories, following Condro et al. [40]. All preprocessing and forest classifcation were performed using the Google Earth Engine. Google Earth Engine is a cloud-based geospatial analysis platform that delivers massive computing capabilities to address a variety of high-impact societal issues including deforestation, drought, disaster, disease, food security, water management, climate monitoring, and environmental protection [45]. In this study, we used the Google Earth Engine Platform to generate a good quality of satellite imageries and perform the machine learning classifcation to predict the forest cover maps across Indonesia. Te preprocessing consists of geometric, radiometric, and spectral corrections. We used refectance of Sentinel top of atmosphere (TOA) with 2-level data where the data were already georegistered with a root mean square error (RMSE) less than 10 m resolution which represented the best quality imagery available for the collected data. Cloud-masking using Quality Assessment band (BQA) and flter statistics by pixels (i.e., median) over the period of Sentinel-2 TOA refectance imageries were performed before further analysis within Te Google Earth Engine. To identify the national commodity cover in Indonesia, i.e., oil palm, rubber, cofee, cacao, and rice paddies, we used machine learning classifcation through the random forest (RF) algorithm. A random forest classifer provides an ensemble model that efectively distinguishes spectrally similar agricultural land and forest cover by generating multiple trees from training data and its predictors [45][46][47]. Many studies have investigated the performance of the RF algorithm to identifying land cover from hyperspectral, multispectral imageries, and digital elevation model data as well [48][49][50]. Te number of variables per split has been defned as the root square of the number of features. Tis cloud computing platform is very useful for big-data analysis, particularly for the planetary scale of remotely sensed data [45].

Model Evaluation.
We used the confusion matrix approach to assess the model performance by comparing the forest cover model with the testing reference data [51]. Tis matrix was applied to calculate discrimination metrics, i.e., International Journal of Forestry Research overall accuracy (OA) and F score [52]. In this study, the Kappa coefcient was not considered as a reliable metric due to the fndings from previous studies that showed the faws of using this metric [53]. We performed k-fold crossvalidation (k � 5) to create data partitioning (i.e., training and testing) for model evaluation [54]. Finally, we performed the model evaluation within diferent regions (i.e., Sumatra, Kalimantan, the Lesser Sunda, Sulawesi, Maluku, and Papua) to capture the variance of the accuracy.
In addition, we explored the variable importance of forest cover data based on the fusion spectral features using a mean decrease in Gini (MDG). Te MDG predicts each variable contribution to the nodes' homogeneity [55,56]. We also evaluated the spectral characteristics of forest cover areas through two diferent sensors (i.e., Sentinel-2 and Planet-NICFI).

Model Evaluation.
Tis study found that random forest performs relatively well in estimating forest cover across Indonesia, with the OA and F score ranging from 0.69 to 0.99 and 0.67 to 0.99, respectively. Te random forest algorithm (N � 1000) outperformed the other parameterization, with the F score ranging from 0.89 to 0.99 and OA ranging from 0.92 to 0.99. Te discrimination metrics obtained from the various parameterizations of the random forest algorithm are compared in Figure 2.
Our result showed that Planet-NICFI images had higher contributions (79.7%) to forest cover identifcation than Sentinel-2 (20.3%). Te highest relative contribution of the predictors to the model was the red band of Planet-NICFI (53.8%). Te green band of Planet-NICFI imageries had the    (Figure 3). Our study provided exploratory analysis towards spectral imageries to diferentiate forest and nonforest classifcation. Sentinel-2 had a wider range of wavelength in capturing object refectance (ranging from ∼450 nm to 2200 nm) rather than the Planet-NICFI dataset (ranging from ∼450 nm to 900 nm). Tis study found lower forest refectance than nonforest in blue to red channels for both sensors. On the other hand, we found higher refectance of forest cover than nonforest in the near infrared to shortwave infrared bands (Figure 4).
Our study indicated that the forest cover had a relatively high similarity in the spectral distribution with the nonforest category due to the remaining other vegetation areas that were classifed as nonforest. Hence, we also tested some spectral indices from both sensors to characterize the forest cover within the study area.  Figure 5). We found that Papua Province had the highest forest cover areas in 2017 (∼22.9 ha) and 2021 (∼20.7 million ha) compared with the other provinces. In recent years, most of the eastern Indonesia provinces still have a relatively high forest cover. On the other hand, DKI Jakarta was the province with the least forest, covering only 0.4% of the total area. Te Java region had the lowest forest cover areas, with a total percentage of about 20% of the total area. Te national forest extents for each province in 2017 and 2021 are shown in Table 2.

Discussion
Te pixel quality of satellite imagery is crucial for land resource identifcation, particularly for forest cover prediction [57]. Cloud coverage is one of the obstacles that are mostly found in optical satellite imageries, such as Sentinel-2 and Planet-NICFI [58]. Although Sentinel-2 had a higher spectral resolution than Planet-NICFI, we found more noise efects due to cloud cover in it. Te cloud can obscure important information about the object behind the closed area [59]. Terefore, our fndings indicate that the model covariates used to identify forest cover data had a relatively good quality of pixels.
Tis study found that the OA for our data ranged from 0.69 to 0.99. On the other hand, Margono et al. [24] conducted primary forest cover identifcation in Indonesia from 2000 to 2012, with OA ranging from 0.7 to 0.91. In addition, [29] integrated three diferent satellite sensors to produce a land cover dataset using Google Earth Engine with OA ranging from 0.67 to 0.82. Moreover, [60] identifed forest cover and produced OA ranging from 0.73 to 0.77 for the pixel-based method and 0.8 to 0.84 for visual interpretation method.

International Journal of Forestry Research
Te diference in spatial resolution in satellite imagery also makes the forest cover captured vary. By using images with very high spatial resolution (≤5 m) the captured forest cover data becomes more precise as well as fragmentation information. On the other hand, missed detection can lead to the false conclusion that a natural ecosystem is an intake where in fact they have gone with high level of disturbance, e.g., fragmentation [61]. Forest degradation and fragmentation can lead to loss of biodiversity due to the missing connectivity and a reduction of water quality [62].
In Figure 6, we choose a case study in Bogor botanical garden, one of the ex-situ conservation locations in Bogor city, to compare the analysis that we have done with some other data that has a lower spatial resolution. Figure 6 depicted that CCI (Figure 6 (B), 300 m spatial resolution) and PALSAR (Figure 6 (D), 25 m spatial resolution) data could not capture the fragmented forest areas in the area.   Meanwhile, the global forest change (Figure 6 (C), 30 m spatial resolution) and the dynamic world ( Figure 6 (E), 10 m spatial resolution) can capture the fragmented forest but not completely. Te comparison conducted by Boyle et al. [63] also showed that the Global Forest Change data only correctly detect 70.8% of forest fragments with an area >30 m while very high-resolution images (IKONOS, with a spatial resolution of 6 m) can precisely detect 100% forest fragments with an area of >6 m. In this study, we also found that Planet-NICFI had more signifcant contribution than Sentinel-2 imageries based on its variable importance in depicting forest cover in the tropical region. Previous studies also found that Planet-NICFI data provided better outputs than the other optical imageries in predicting forest cover [29,64]. Variable selection is one of the methods for solving multicollinearity, and it also has the beneft of being simple to perform and resulting in a sparse model [65]. However, fndings from Chan [65] show that variable selection drops variables and reduces information gain, while the multicollinearity measures to optimize are subjective. Tis study conducted machine learning approaches in predicting forest cover data based on combined optical satellites to improve model performance. Feng et al. [66] found that deleting highly correlated variables had no efect on model  performance due to machine learning's capacity to control model complexity by downplaying the signifcance of redundant variables. On the other hand, it has no efect on the accuracy of predictions.
Apart from the advantages of the data (e.g., precise spatial resolution), we also found some caveats within our dataset. Te use of our data is limited in some instances for several reasons. Due to a large amount of input data, we aggregated various forest types in the tropical region of Indonesia, which could not capture forest diversity. Te dynamics of forest cover changes can be seen more clearly with a higher temporal resolution. Forest cover data with a higher temporal resolution is much better for systematic forest cover change analysis [67]. Unfortunately, the data we present also has relatively low temporal resolution, i.e., annually, which makes the seasonal dynamics of forest cover impossible to capture. Our data also only captured general forest cover.

Conclusions
Our fndings provide useful information regarding the detailed spatial resolution of the forest cover dataset at the national level of Indonesia. Random forest algorithm had an excellent performance in capturing tropical forest cover based on optical satellite imageries, with an overall agreement between 92% and 99%. Our data can deliver better precision and detail in depicting forest patches within smallscale areas than other data. To achieve a better sustainable forest management, stakeholders need to have a precise dataset regarding forest resources. Tis information can be useful for forest monitoring and planning, particularly for the national agenda related to the forest and other land uses as net carbon sinks in Indonesia. Further work on precision forest cover data utilization should be incorporated into carbon dynamics, conservation management, and spatial planning. Exploration regarding prediction techniques should be addressed for future studies (i.e., deep learning and neuromorphic computing).

Data Availability
Te forest cover data that support the fndings of this study are available in Zenodo at https://zenodo.org/record/ 7115068#.YzKnb7TP2Uk.

Conflicts of Interest
Te authors declare that there are no conficts of interest.