Forecasting Primary Energy Requirements of Territories by Autoregressive Integrated Moving Average and Backpropagation Neural Network Models

Forecasting energy data, especially the primary energy requirement, is the key part of policy-making. For those territories of different developing types, seeking a knowledge-based and dependable forecasting model is an essential prerequisite for the prosperous development of policy-making. In this paper, both autoregressive integrated moving average and backpropagation neural network models which have been proved to be very efficient in forecasting are applied to the forecasts of the primary energy consumption of three different developing types of territories. It is shown that the average relative errors between the actual data and simulated value are from 4.5% to 5.9% by the autoregressive integrated moving average and from 0.04% to 0.47% by the backpropagation neural network. Specially, this research shows that the backpropagation neural network model presents a better prediction of primary energy requirement when considering gross domestic product, population, and the particular values as predictors. Furthermore, we indicate that the single-input backpropagation neural network model can still work when the particular values have contributed most to the energy consumption.


Introduction
Since the emergence of the issue about applying mathematical models to predict energy consumption, relevant researches have been carried out [1] and several related experiments have been implemented [2]. e recent study [3] has combined the nonlinear metabolic grey model (NMGM) and autoregressive integrated moving average (ARIMA) model and has used the linear ARIMA to correct NMGM forecasting residuals, which has improved forecasting accuracy steadily and given useful policy recommendation. e time-series forecasting techniques based on the metabolic grey model, autoregressive integrated moving average model-grey model, and induced ordered weighted geometric averaging operator have been investigated in [4], which has found a way to provide reliable information and has indicated that the results from the time-series and econometric forecasting technique are consistent. e above two studies have shown that the prediction of relevant models is effective and also indicate the research prospects and directions in this field. In addition, the single-linear, hybrid-linear, and nonlinear forecasting techniques based on grey theory have been presented in [5] to forecast energy demand in both China and India more accurately. In a novel study [6], researchers have found that the NMGM-ARIMA technique can significantly improve forecasting effectiveness and outperform other related forecasting models, which have been a guide to practical applications. In the field of primary energy consumption (PEC), many mathematical models have been successfully applied in prediction. For example, a new hybrid method (HAP) has been proposed for estimating energy demand in Turkey using particle swarm optimization (PSO) and ant colony optimization (ACO), and the research has found that different models have different advantages [7]. Two new models based on artificial bee colony (ABC) and particle swarm optimization (PSO) techniques have been proposed in [8] to estimate electrical energy demand in Turkey. In addition, the research methods in [9,10] also have made good predictions. Among these models, the ARIMA model [11] and backpropagation (BP) neural networks [12] are the two most commonly used predictive models.
e forecast of energy consumption, especially the PEC, has always been a significant reference point for a territory [7]. Specifically, on the one hand, it is always important for a territory to make the future energy policies, to achieve a better development and to reduce greenhouse gas emissions [13]. In the recent study [14], particle swarm optimization (PSO) and artificial bee colony (ABC) techniques have been applied in estimating CO 2 emission in Turkey based on socioeconomic indicators. On the other hand, the PEC refers to the overall energy consumption within a geographic territory and represents the total supply of energy available to the territory which supports all the requirements for energy transformation and final consumption in that territory. In general, the PEC of a territory includes both its indigenous energy sources and imported energy commodities consumed within the territory [15].
As for models, both economists and applied mathematicians have built a large number of models and accumulated experiences in forecasting energy [16]. For instance, the macroeconomic analysis of energy [17] and the analytic network process in energy policy planning [18] are two methods which are based on econometrics transforming the forecasting problem into an economic one and are seemingly in accordance with the thought of "big data." However, applied mathematicians have developed serial linear or nonlinear mathematical models to complete the prediction [19].
Motivated by previous studies which show that the consumption of primary energy could be approximated by using linear or nonlinear econometrics with economic and/or noneconomic indicators, mathematical models [20] and especially the ARIMA model [21] and the BP neural network [22] are common models used for prediction. erefore, in this paper, we will adopt the ARIMA model and the BP neural network one to make predictions and draw conclusions through making comparisons of models' correctness.
is paper also focuses on solving some gaps in previous research and filling these gaps. Compared with the previous research concerning the single model [1], this paper adopts two common and effective mathematical models and then makes a comparison. In contrast to the research taking into account multiple models [2], this paper not only explores the accuracy of comparison but also opens up a new way to use neural networks to reveal the possible time of economic transformation. e novelty and usefulness of this paper can be summed up as follows: (a) Compared with the available studies focusing on only one territory or one mathematical model which leads to the lack of comparison between different models and different types of territories, this study, however, includes three cities of different developing types from Greater Bay Area to make a comparison and then obtain more reasonable conclusions. It is shown in this paper that both the ARIMA model and the BP neural network model can be applied in different developing types of territories. (b) As for the novelty, this paper concludes that the single-input BP neural network model can be used for judging whether the territory has transformed completely. (c) In terms of practicality, this paper explores that ARIMA can be applied to predict the PEC in addition to manufacturing territories, and the BP neural network model can be applied to predict the PEC in different types of territories.

Models and Territories.
It is shown in [23] that both the ARIMA model and the BP neural network model are suitable for time-series data and can better catch on the related data and predict future points in the series. In this paper, we consider three cities including Hong Kong, Shunde, and Zhaoqing from Greater Bay Area in China as involved representative territories. e reason why we choose these territories is that they have varying development models; that is, they are three different industrydominated cities. Hong Kong has been a service-dominated city since the 2000s or earlier. Shunde has turned to be a manufacturing-dominated city since the beginning of the Chinese economic reform. But Zhaoqing is famous for its agriculture, so it is regarded as an agriculturedominated one. e ARIMA is a commonly used time-series model which achieves the object's characteristics of self-similarity, periodicity, suddenness, and trends [23] and has a better achievement in the short-term subject's forecast. erefore, it has been applied in the prediction of the stock price index, the blood glucose concentrations [24], the current blockchain technology [25], the wind generation [26], and so on. e BP neural network is a commonly used timeseries and nonlinear prediction model applied in prediction of short-term wind power, indoor temperature, wind speed [27], and hydraulic press machine.
In short, we deliberately consider the agricultural territory, the manufacturing territory, and the service-dominated territory combined with effective ARIMA and BP neural network models.

Datasets.
e monthly data are adopted to be the time step of the data in this paper. Four monthly data series adopted in this paper are addressed in detail below. e first one is the PEC which means the total energy consumed by a geographic territory. Generally, it represents the energy including the transformation parts, the final consumption ones [28], the energy produced locally, and the imported energy sources consumed locally. erefore, to calculate the PEC, we take Hong Kong as an example; first, we need to sum up the renewable energy produced locally, net import of coal, oil products, and electricity and then minus the net usage of energy storage needed, and finally, we need to adjust the results by the supply from stock. is type of data comes from 2000 to 2017 for the agricultural territory, comes from 2005 to 2017 for the manufacturing one, and comes from 1979 to 2017 for the service-dominated one. Data  e second one and the third one are population and GDP. Population is used as a proxy for the demands of humans, and GDP can be regarded as a symbol of the growth situation of the economy. ese two types of data are selected from 2000 to 2017 for the agricultural territory, from 2005 to 2017 for the manufacturing one, and from 1979 to 2017 for the service-dominated one. It should be noticed that the way we obtain these data is the same as the one we obtain PEC [29].
Last but not least, additional three different forecasting parameters are employed in this paper, which are the value added from primary products, the manufacturing industryadded value, and the values of total services. ese values are of great importance for cities of different developing types. e value added from primary products indicates the growth of products directly from the natural sector (including plantation, forestry, animal husbandry, and fisheries) in this liquidation cycle compared to the previous liquidation cycle. It is shown in [29] how the territory gains from the agricultural products. In addition, the manufacturing industry-added value represents the final result of the production activities of the secondary industry in the form of money during the reporting period, which is the total result of all production activities of the production unit minus the value of the physical products and services consumed or transferred in the production process. Finally, the values of total services include the exports of services and imports of services, where the exports of services refer to the services that Hong Kong have sold to other entities and the imports of services mean the services purchased from the other entities. e data are obtained from Hong Kong Trade in Services Statistics published by Census and Statistics Department of Hong Kong [30].

ARIMA Modeling.
Developed by Box and Jenkins, the ARIMA model is a linear regression model and is widely applied in forecasting when it is with time-series data [31]. ARIMA(p, d, q), an autoregressive integrated moving average model where parameters p, d, and q are nonnegative integers, consists of three parts including the autoregressive (AR) model, differencing (I) model, and moving average (MA) model. Specially, p represents the order of the AR model, d means the degree of I, and q is the order of the MA, respectively. To achieve the prediction through ARIMA(p, d, q), these three steps must be completed, that is, to smooth the sequence through d-order difference calculation, to obtain a more suitable p and d by calculating and comparing the autocorrelation coefficient with the partial autocorrelation one, and to carry out the prediction by the selected model. Mathematically, the expression of ARIMA(p, d, q) is as follows [31]: where X i and ε i are the actual value and random error at every year and ϕ i (i � 1, 2, . . . , p) and θ i (i � 1, 2, . . . , q) are two parameters of the ARIMA model. To be clear, p and q are integers and the orders of the ARIMA model. By extracting the common factor, a new expression can be obtained as follows: It can be expressed as by setting where ϕ(L) and Θ(L) are the p-order and q-order characteristic polynomials of the independent variable "L." To conclude, in ARIMA(p, d, q), d is the integral sum order, p is the autoregressive coefficient, and q is the moving average coefficient [32].

BP Neural Network
Modeling. Suppose that the network has R nodes [33] and the transfer function of each layer is of sigmoid type [34]. e following notations are used throughout this section: α 1i denotes the output of the i-th hidden-layer node; α 2k denotes the output of the k-th output-layer node; ω 1ij and ω 2ki represent the weight between node i and node j and the weight between node k and node i, respectively; b 1i denotes the threshold for hiddenlayer node i; and b 2k denotes the threshold for output-layer node k [35]. e BP neural network can be understood in detail below [36].
Firstly, the sample forward propagation is input, which is a way of calculating the sample from the input layer, passing through the hidden layer, and finally to the output layer. e input samples should be known, and then the corresponding output is obtained by the above relevant steps.
In the hidden layer, the output of the i-th neuron is where p j is the input value, s 1 is an integer denoting the number of neurons in the hidden layer, and f 1 is the activation function defined as follows: Mathematical Problems in Engineering In the output layer, the output of the k-th neuron is where s 2 is an integer denoting the number of neurons in the output layer and f 2 is defined as follows: Secondly, the error backpropagation is output. rough the above forward propagation calculation, the actual output can be obtained. However, in general, the actual expectations are different. When two values are different or the errors of two values exceed some specified values, corresponding learning corrections should be made for the network.
In particular, the error function E is defined as follows: where t k is the predicted value.
In the output layer, the weights Δω 2ki from the i-th input to the k-th output are where p j is the input value and δ kj can be described as where e k is the range fault. After implicating the layer weight correction, the weight (Δω ij ) from the j-th input to the i-th output can be described as follows: where η is the learning coefficient. e error of the output is reversed from the output layer, through the intermediate layer and finally to the input layer, where each layer is corrected once. e threshold can be seen as one of the weights and also contributes to the adjustment of the weights.
irdly, the memory training is made to be cyclic. In order to improve the accuracy of the network and reduce the appearance of the output error, it is necessary to carry on the loop memory training for all the samples input to the network. In addition, the number of loops should not be too few so that this sample mode can be effectively remembered by the network.
Last but not least, the end of the study is checked. e output error is checked to see if it meets the required standard whenever the loop memory training is completed. If it meets the requirements, the process is ended. Otherwise, the loop training is made again until it meets the requirements [37].

Results and Discussion
3.1. ARIMA Modeling. Following the above methodology, the PEC data of three different developing territories are used as the training dataset in the ARIMA model. In addition, unit root tests are stochastic trends in time series which are executed for the stationarity in time series. e stationarity in time series is that a shift in time does not cause a change in the shape of the distribution [38]. To obtain a smooth sequence and "p" parameter of each model, we conduct three different unit root tests from zero orders to three orders for each model and make a comparison of three types of results. e unit root tests and results are shown in Tables 1-6.
From Tables 1-6, the augmented Dickey-Fuller (ADF) statistic is a statistical method to check whether a time series is stationary or not [37]. And the test critical value means threshold values, comparing the ADF statistic with the following three values at three significant levels. It is observed from the data from Tables 1-6 that the second-order difference is suitable for both the agricultural territory's model and the model of the manufacturing territory and the zero-order difference is suitable for the one of the service-dominated territory. erefore, we obtain the correlogram which shows the autocorrelation and the partial correlation with the help of EViews, where the autocorrelation is the correlation of a signal with its delayed copy and the partial correlation means the degree of association between two random variables [39]. Both the autocorrelation and the partial correlation are shown in Figures 1-3. In addition, the AC and PAC in Figures 1-3 represent the autocorrelation coefficient of the sequence and the partial correlation coefficient of the sequence, respectively. Q-Stat represents the output of the statistical test about whether any of a group of autocorrelations of a time series are different from zero which obeys the chi-square distribution, and Prob represents its possibilities [40]. Representatively, it is presented in Figure 3 that the smoothness of the correlation data is good without difference. Furthermore, the autocorrelation function image is decreased, and the partial autocorrelation function image is truncated after the first image. erefore, we can determine the specific values of the coefficients p, d, and q based on these facts. e specific determination method can be shown as follows.
e determination of the coefficients p, d, and q requires experience, but the theoretical support is still very meaningful. According to some available literatures like [41], the determination of the values of p, d, and q can be summed up as follows: (a) Since the sequence requires a first-order difference to achieve smoothness, we determine that the value of d is 1. (b) Next, based on the fact that the autocorrelation images are truncated and the partial correlation images are smeared, we conclude that the sequence is suitable for the AR model. Among them, since none of the partial correlation images has been outside the confidence interval, we determine that the value of p is 0. (c) We also determine that the value of q is 1 because the autocorrelation images are truncated. (d) Finally, we find out that the best model is obtained by taking a combination of several values of p and q and simulating the prediction. To conclude, as it is shown in the correlogram, ARIMA(0,1,1) can be used for the prediction of the primary energy consumption of the service-dominated territory when both AC and PAC are considered. Similarly, it is seen that the applicability of ARIMA(2, 0, 2) and ARIMA(2, 0, 5) for the agricultural territory and the manufacturing territory can be guaranteed,         en, we construct three models, and the summaries of these three models are shown in Table 7. e stationary R 2 in Table 7 represents the measurement which compares the stationary part of the model with a simple mean model. A positive stationary R 2 means the model under consideration is better than the baseline one. In addition, the R 2 in Table 7 means the goodness of fit. e closer the value of R 2 to 1, the better the fit of the regression line to the observations [42].
It is noticed that the R 2 of ARIMA(2, 0, 2) is 0.941 and the corresponding one of ARIMA(0, 1, 1) is 0.955, which means that both of them are greater than 0.80. So the models are considered as the fitting ones for the agricultural and service-dominated territories [37]. However, ARIMA is not suitable for the manufacturing territory because the R 2 of ARIMA(2, 0, 5) is 0.721 which is less than 0.80. Finally, the results of predicting numbers of primary energy consumption of three territories are

Output layer
Hidden layer Input layer shown in Figure 4, where UCL and LCL denote the upper control line and the lower control line, respectively. eir mathematical definitions are as follows: where μ is the average value of the output and σ is the standard deviation of the output.

BP Neural Network Modeling.
A three-layer backpropagation neural network with three hidden layers is developed for predicting three territories' primary energy consumption. Based on Kolmogorov's theorem [43], for any continuous function, it can be implemented with a three-layer network, where the input layer has m units, the hidden layer has 2m + 1 nodes, and the output layer has n units. So the number of hidden nodes is set to be three. And the following results show that our models are feasible under this condition. Two activation functions, i.e., the pure linear function and the logsigmoid transfer function, are considered in the hidden layers. With the help of MATLAB software, we establish two types of models including three inputs and single input. e single-input model means only one among four predictors (population/primary GDPs/manufacturing industry-added value/values of total services) is adopted. e single-input model can be used to observe whether one variable has a significant impact on the PEC and can also be used to determine whether the prediction can be completed with only one variable. In contrast, the three-input model includes three predictors in the model. e schematic diagram is presented in Figure 5.
After debugging repetitions, we set the number of training iterations to be 50000. It should be noticed that the number of hidden-layer nodes can influence the network structure to some extent. On the one hand, if the number of hidden-layer nodes is too large, an overmatching phenomenon will occur. On the other hand, if the number of hidden-layer nodes is too small, the useful information obtained by the network from the original input may be scanty, which is not enough to discover the characteristics of the data, and the generalized nonlinear learning ability of the model may be weaker. erefore, according to Kolmogorov's theorem, for any continuous function, it can be implemented with a three-layer network, where the input layer has m units, the hidden layer has 2m + 1 nodes, and the output layer has n units [44]. So the number of hidden nodes is set to be 3. As mentioned above, we selected the relevant data from 1999 to 2014 as the training sample for the service-dominated territory and for other territories similarly. Different models' predictions are presented in Figures 6-9.
We can see that the predicting data of the three-input BP neural network model are more close to the actual data compared to the three other single-input models. erefore, we consider the three-input model as the best BP neural network for prediction. Furthermore, we think some values have a significant impact on the PEC of both the agricultural and manufacturing territories, which means that the prediction of the PEC can be conducted with only one value. However, we see from Figure 8 that the single-input model is not suitable for the servicedominated territory. e regression of the three-input models is presented in Figure 10. As we can see in Figure 10, the regression line fits the observations very well, and the values of R 2 are very close to 1, which indicates that the three models for three different developing types of cities perform well.

Comparison of Models.
We examine two forecasting models in three different developing types of cities in this paper. From Figures 4, 9, and 10, we draw an elementary conclusion that the forecasting capability of BP neural  network models is better than that of ARIMA models on the whole.
In order to verify accurately the correctness, we make a comparison of these two types of models through calculating the mean absolute percentage error (MAPE) and the root mean square error (RMSE). e mathematical definition of the MAPE and RMSE is as follows: where A t is the accurate value of the data, F t is the predictive value of the data, and n is the number of data we employed. e result of the correctness is shown in Table 8.

Conclusion
In this paper, the autoregressive integrated moving average (ARIMA) model and the backpropagation (BP) neural network model are applied to forecast the primary energy requirement of three different developing types of territories, and a comparison of the accuracy is made. ree different models including ARIMA(2, 0, 2), ARIMA(2, 0, 5), and ARIMA(0, 1, 1) are conducted to make predictions. eir accuracies are within the range from 0.721 to 0.941. We believe that the ARIMA model is not suitable for the manufacturing territories since the R 2 of ARIMA(2, 0, 5) is 0.721 which is less than 0.80. Furthermore, we conclude that ARIMA can predict accurately without employing too much data.
As for the BP neural network model, the data concerning primary products, manufacturing industry-added value, and values of total services are employed as the predictor in the model, and the most accurate prediction is made. In addition, we also use only one predictor to build three singeinput models, and the corresponding result shows that not only the accuracy of the single-input models is acceptable but also the single-input models can demonstrate whether the territory has been totally supported by certain industry. is conclusion is supported by economics. In the economic literature [45], experts have pointed out that Hong Kong has completely transformed into a service-dominated territory since 2009. And in the literature [46], Shunde was regarded as a manufacturing-dominated territory in 2005. e above results support the conclusions of this paper. As for limitations, we believe that if researchers aim to obtain more accurate predictions, they need to collect multiple types of and a large amount of data, preferably monthly data, which will make the research work cumbersome. As for policy-makers, this paper presents a judgment to determine whether the territory is fully transformed. When the territory has completely transformed into a specific development model, the corresponding energy policy should be more inclined to the industry.
In the process of this research, we also find that some new technologies can be well applied in predicting energy.In particular, the following papers deserve further research in the future: the literature [8],in which swarm intelligence approaches have achieved excellent predictions and comparable advantages over artificial neural networks (ANNs), and the literature [14], in which swarm intelligence approaches have also been applied in predicting greenhouse gas emissions, which is very important for sustainable development.

Data Availability
Previously reported data (four monthly data series) were used to support this study, as addressed in Section 2.2. e prior studies (and datasets) are cited at relevant places as references [18,28,30].

Conflicts of Interest
e authors declare that they have no conflicts of interest.