Impact of Tree Cover Loss on Carbon Emission: A Learning-Based Analysis

Describing the processes leading to deforestation is essential for the development and implementation of the forest policies. In this work, two different learning models were developed in order to identify the best possible model for the assessment of the deforestation causes and trends. We developed autoregressive integrated moving average (ARIMA) model and long short-term memory (LSTM) independently in order to see the trend between tree cover loss and carbon dioxide emission. This study includes the twenty-year data of Pakistan on tree cover loss and carbon emission from the Global Forest Watch (GFW) platform, a known platform to get numerical data. Minimum mean absolute error (MAE) for the prediction of tree cover loss and carbon emission obtained through ARIMA model is 0.89 and 0.95, respectively. The minimum MAE given by LSTM model is 0.33 and 0.43, respectively. There is no such kind of study conducted in order to identify the increase in carbon emission due to tree cover loss most specifically in Pakistan. The results endorsed that one of the main causes of increase in the pollution in the environment in terms of carbon emission is due to tree cover loss.


Introduction
Global warming is a burning issue causing catastrophic changes and calamities around the world. Te increasing occurrence of climate dissipation has been noticed with current global warming that had several biophysical impacts worldwide [1]. One of the main causes of global warming is carbon emission. Sources of carbon emission are both natural and human. Human sources include a variety of man-made actions such as tree logging, forest fres, burning of fossil fuels, cement production, natural gas production, and so on [2,3].
Hence, the degradation of forests that play an essential role in maintaining a balance in the ecosystem directly adds to global warming. It results in rapid environmental degradation, not only leading to a scarcity of natural resources, decline in quality of life, and long-term public health issues but also inficting economic losses [4]. Terefore, controlling deforestation can signifcantly minimize carbon emissions and help improve the environment. Recent studies revealed that the deforestation rate increases with extreme drought and wet years. In another study [5], the authors examined the carbon dioxide emission and climatic efects on major agricultural crop production in Pakistan. Te results revealed that the crops such as wheat, maize, sugarcane, cotton, and so on have a constructive association with carbon dioxide emissions. Combating deforestation is being evaluated by cost-efective means of reducing greenhouse gas emissions [6].
Due to the negative impact of carbon emissions on the environment, there is a lot of ongoing research work to fnd out the efcient methods that have the ability to predict carbon emissions and determine their causes [7][8][9][10]. Learning-based techniques have provided new approaches to prediction problems that represent interactions between variables in a deep and layered hierarchy. ML-based techniques like support vector machines (SVMs) and random forest (RF), as well as DLbased algorithms like recurrent neural network (RNN) and LSTM, have attracted lots of attention in recent years because of their applications in a variety of felds [11][12][13][14]. In time series forecasting, DL approaches are capable of identifying data structure and pattern, such as non-linearity and complexity [15,16]. LSTM has been extensively utilized in time series prediction in [17][18][19][20][21]. Autoregressive integrated moving average (ARIMA) is also another forecasting model [22] that predicts the future values based on the past values. ARIMA is the best model for one-step out-of-sample forecasting and is good for the data which consist of linear and short-term dependency (weekly or hourly) [23].
Te rapid advancement in the Internet of Tings (IoT) would be a future enhancement of this system. Tere are multiple privacy-related challenges in IoT architecture during communication that can be addressed using blockchain-enabled IoT architecture [24]. Moreover, there is a need for a decentralized auction-based resource allocation mechanism in edge computing-enabled IoT, which would be helpful to make computer resources closer to the devices [25]. To imporved the data quality during the data communication, few studies foucsed on the age of information (AOI) from the prespective of game theory [26]. Game theory is a useful tool to optimize wireless networks by assisting scarce wireless resource allocation, e.g., bandwidth and channels.
Objective of this study is to develop an accessible methodological approach that allows for rapid evaluation of statistical relationships and trends in forest monitoring data using both ML and DL. In this study, two independent learning models were developed using the autoregressive integrated moving average (ARIMA) model and long shortterm memory (LSTM). Moreover, this study also focused on understanding the correlation between tree cover loss and carbon emission by taking Pakistan as a case study. Since, there is no such comprehensive efort found in open literature by the authors, it is a novel attempt in this direction. Te remaining part of the paper is arranged as follows. Te methodology is discussed in Section 2. In Section 3, results are discussed. Section 4 describes the conclusion and future work.

Methodology
2.1. Data Collection. Two time series datasets, i.e., tree cover loss and carbon emission in Pakistan, were taken from the GFW platform [27]. Each dataset consists of past 20 years' information on carbon values added to the environment and the tree cover loss from year 2001 to 2020. It should be noted that a more organized dataset comprising other variables of consideration for this study is not available in the open literature. Figure 1 shows the proposed methodology of the framework. Following the collection of the dataset, the preprocessing stage is carried out to make the data stationary, as shown in Figure 2. Te dataset is then divided into training and testing portions in order to train and evaluate the models. Te training component of the dataset is used to train the models, and the testing portion is used for evaluation. Te MAP and MAPE error evaluation metrics are evaluated for fnal analysis of the result. Figure 2 shows the preprocessing framework. Firstly, we deal with all missing values of the dataset. For any time series forecasting, being stationary is a mandatory property for a statistical model. A series is called stationary if its statistical property does not change with time. To verify this feature in our dataset, we have used the augmented Dickey-Fuller (ADF) test. After applying ADF, it was found that our dataset taken from GFW platform [27] does not fulfll this condition.

Data Preprocessing.
In order to make our dataset stationary, we performed a series of transformations such as power log transformation and diferencing before applying (ADF) again for the verifcation. Figure 1 illustrates our adopted methodology in the form of a fowchart. After preprocessing, the dataset is divided into test and training subsets. Te training set is used to train the prediction model and the test set is used to evaluate it. Te split between the training and test data is kept at 70% and 30%, respectively.
For forecasting, the ARIMA and LSTM models are used to make predictions and are applied on both datasets.

ARIMA and LSTM Models.
Te ARIMA model is a generalization of the simpler autoregressive moving average that incorporates the concept of integration. Te ARIMA model parameters are as follows: (i) p: lag order (previously predicted values). (ii) d: degree of diference. (iii) q: order of moving average. An ARIMA model is a time series forecasting model. It incorporates the properties of two autoregression and moving average models, where in autoregression, lags or previously predicted values are known as "autoregression" while lag or previously predicted error is known as "moving average." "Diference" is to make time series stationary (also known as integrated stationary time series version).
First step is to determine appropriate hyperparameters of ARIMA, p, h, and q, accurately to predict the behavior of the time series. Ten, these hyperparameters are ftted into the training data. Finally, the model ftting residuals are analyzed to check whether the model assumptions are satisfed [6].
Tis study also utilizes LSTM model, which is a special type of RNN and is able to deal with long-term time dependencies [28]. Tere are many types of LSTM models that can be used for specifc type of time series forecasting 2 Computational Intelligence and Neuroscience problem. In univariate LSTM, single series of observation is required to learn from the past values. On the contrary, multivariate LSTM makes use of two or more kinds of parallel time series information to learn from the past observations. Te basic architecture of univariate LSTM is shown in Figure 3. Basic LSTM network consists of cells that store the data. Tese cells resemble a transport line that connects one module to another conveying data from the past and gathering the present values [29]. For LSTM, three-layer-based architectures is proposed in this study with a dropout probability of 0.3 and zero non-trainable parameter. Te lagged value of time series is used to predict future value with 40,901 learnable parameters.
Te LSTM is applied on both of the parameters of carbon emission and tree cover loss with same layers and parameters, and the model for each dataset is trained with 50 epochs. Two separate univariate LSTM models are applied to two diferent time series variables, i.e., tree cover loss and carbon emission. Te models are not only validated based on the diference between observed and predicted values also known as residuals but also exploited for future prediction of tree cover loss association with carbon emission for next three years with upper confdence level of 80% and lower confdence level of 90%.

Stationary Time Series.
Being stationary is an essential condition in time series analysis. Most of the time series models assumed that each point is independent of one another. To check this feature in our time series, ADF is applied. Moreover, to make the time series information stationary, diferencing and log transformation are performed. Te result of ADF test is shown in Table 1. It can be noted that the value of P is less than 0.05. Tis clearly indicates that after applying log transformation and diferencing, we have obtained a stationary time series suitable for further prediction analysis.  1, 1, 1). Similarly, the minimum MAE for tree cover loss is 1.4 and 1.2 with ARIMA (1, 2, 1). Values of p, d, and q are cross-checked using auto ARIMA function in Python. Figure 4 depicts the train (actual) and test (predicted) data showing that the data predict some trends and are at a right scale. In Figure 5, data for past twenty years (2001-2020) are utilized to train the model for prediction of tree cover loss for next fve years. Te results indicate an increase in tree cover loss in the coming years. Figures 6 and 7 show the carbon emission results using ARIMA. Te model is trained with multiple order of ARIMA. Te minimum MAE of 1.20 and mean absolute percentage error (MAPE) of 1.24 are obtained for carbon emission forecasting with ARIMA order of (1, 1, 1) and (1, 2, 1), respectively. Te twenty-year data, from 2001 to 2020, are used for the model training. Te same model is further exploited to carry out forecasting carbon emission over next fve years. Te results indicate an increase in the carbon emission for this coming period.

Carbon Emission with ARIMA.
Te two results also exhibit that increase in the carbon emission has signifcant and similarly directed impact on the tree cover loss.

Tree Cover Loss with LSTM.
LSTM has also been applied on the same dataset to compare the results with ARIMA model for tree cover loss. Figures 8 and 9 demonstrate the results for this study. Training and testing loss decay per epoch is shown in Figure 8 while Figure 9 illustrates the tree cover loss prediction for next three years, i.e., 2023-2025. Figures 10 and 11 show the results of LSTM-based carbon emission. Te LSTM model is trained up to 50 epochs for both tree cover loss and carbon      emission. It is observed from Figure 10 that the train and testing loss is decreasing over time after each epoch while using LSTM. Te results indicate that a linear correlation exists between the carbon emission and tree cover loss. Future predictions of the LSTM model also refect an increasing pattern of the carbon emission.

Carbon Emission with LSTM.
3.6. Comparison of ARIMA and LSTM. Te ARIMA model works by fltering high-frequency noise from data, detecting local patterns based on linear dependencies, and predicting future trends [30]. In addition, the ARIMA model converts tree cover losses and carbon emission features into special temporary variables before matching them and only Computational Intelligence and Neuroscience considers the linear portion of the series [31]. Te ARIMA model is simple and forthright and only requires to adjust the values of p, d, and q. Te ARIMA model, however, is unable to deal with the non-linear relationship between the tree cover loss and carbon emissions. On the contrary, the neural network such as LSTM can deal with both linear and non-linear patterns [32]. LSTM is a type of RNN that is meant to learn temporal patterns, capture non-linear dependencies, and preserve relevant memory for a longer period of time, resulting in achieving more accurate predictions [33].
Working of the two approaches, ARIMA and LSTM, for tree cover loss and carbon emission prediction is investigated using mean average precision (MAP) and MAPE evaluation matrices. Results are summarized in

Computational Intelligence and Neuroscience
Te results clearly show that LSTM has performed better than the ARIMA model in estimating and predicting tree cover loss and carbon emission for the analyzed data, hence ratifying that LSTM architecture is more suitable for time series prediction than ARIMA. Te results also refect the strong reliance and linear relationship between the tree cover loss and carbon emission.

Conclusion
In this work, a detailed analysis of tree cover loss and carbon emission data is carried out using ARIMA and LSTM techniques. 20-year data, from 2001 to 2020, are utilized to train and test the models and get predictions for next 5 years. Te relationship between the two environmental factors is also established. Te results have shown that temporal variations in the trend component of both carbon emission and tree loss cover are remarkably associated with each other. It has established that increase in the tree cover loss directly afects carbon emission in the atmosphere. Carbon emission could be one of the signifcant causes of the tree cover loss and deforestation.
Working of LSTM is found to be more vigorous in these prediction studies. Tough very signifcant, this study is limited by the limited data availability. Both ARIMA and LSTM models showed the same trends. However, LSTM is a model that can learn the long-term dependencies, and it can remember the information that is processed in the model for a very long time [22]. In terms of computational time, the ARIMA models consume more time when using the rolling forecast method, and it is unfeasible to train new models when the orders of p, d, and q increase [34]. LTSM models take signifcantly less time to train, and once trained, constant predictions can be obtained, while ARIMA models need to be retrained.
Future aspects of this study include better understanding of carbon emission impact and control by considering more factors such as wood fuel, fre, and timber harvest. Inclusion of more factors and parameters can improve the overall prediction accuracy of the models while providing a broader understanding of causes of carbon emission. If more variables that contribute to carbon emissions and deforestation are taken into account, the work described in this paper will be more sophisticated.

Data Availability
Te data used to support the fndings of this study are available publicly on Global Forest Watch.

Conflicts of Interest
Te authors declare that they have no conficts of interest.