A Comparative Analysis of Traditional SARIMA and Machine Learning Models for CPI Data Modelling in Pakistan

,


Introduction
Te consumer price index (CPI) measures the monthly price changes in commodities purchased by consumers [1].Te Pakistan Bureau of Statistics (PBS) computes CPI as an average weight of the cost of a basket of goods and services represented by an aggregated Pakistan consumer's expenditure.Te choice of the basket of goods and services is based on popular commodities and services mostly purchased by consumers in the area [1].CPI values are estimated as percentages.A CPI value less than 100 indicates a general reduction in the prices of goods and services in the current year compared with those in the base year [2,3].However, values greater than 100 show a general increase in prices compared to those in the base year [2,3].CPI values are used to compute the infation rate (IR) [1,4].
Infation rate (IR) values are estimated as percentages [5].A negative IR value indicates a reduction in the cost of goods and services paid for by consumers, whereas a positive value indicates an increase in the cost of goods and services paid for by consumers [1].Te aftermath of COVID-19 and the Russian-Ukraine war has put the world in a CPI crisis [6].Costs of goods and services are skyrocketing everywhere, and Pakistan is no exception.
Tere has been considerable study recently for the prediction and forecasting of CPI using diferent several competing models.Kharimah et al. [7] modelled and predicted CPI for Indonesia's Lampung Province using different time series and autoregressive integrated moving average (ARIMA) models and compared them using the mean square error (MSE), Akaike information criteria (AIC), and Bayesian information criteria (BIC).Tey found ARIMA (1, 1, 0) as the overall beftting model for CPI modelling and forecasting in Indonesia's Lampung Province.Mia et al. [8] used both autoregressive moving average (ARMA) and ARIMA models as the appropriate models for CPI data in Bangladesh, thereby settling for ARIMA (2,2,0) as the overall best model in forecasting Bangladesh's CPI [9].Nyoni [10] used ARMA and ARIMA models to forecast Germany's CPI, establishing ARIMA (1, 1, 1) as the best ft for the given dataset.
Mohamed [11] made a comparative analysis between ARMA and ARIMA models together with regression using ARIMA errors to model and forecast CPI for Somaliland, selecting models based on AIC and BIC, thereby establishing ARIMA (0, 1, 3) as the overall suitable model for forecasting Somaliland's CPI.Norbert et al. [12] applied a similar methodology to forecast the CPI for a short-term period.Teir resulting output showed that ARIMA (4, 1, 6) is more suitable for short-term CPI forecasting for Rwanda.Molebatsi and Roboloko [13] proposed ARIMA(1,1,1) as a suitable model for Botwana's CPI data.
In macroeconomics, neural networks have not been thoroughly explored.Kuan and White [14] are among the few with a frst attempt at introducing neural network forecasting in macroeconomics.Maasoumi et al. [15] implemented the backpropagation artifcial neural network (ANN) model to predict United States (US) macroeconomic variables like CPI, money infow, unemployment, gross domestic product (GDP), and wages.Aiken [16] proposed an ANN to forecast CPI in the United States of America.Choudhary and Haider [17] showed how powerful ANN models are when forecasting monthly IR for twenty-eight countries in the Organization for Economic Cooperation and Development (OECD).Tey implemented the ANN together with quasi-ANN approaches.Teir results indicated that average neural network models work excellently in forty-fve percent of the OECD member countries, whereas basic autoregressive (AR) of order one (AR1) emerges as the best in twenty-three percent of the OECD member countries.Tey proposed an arithmetical combination of an ensemble of multiple networks for further accuracy.
Moshiri and Cameron [18] performed a comparison of backpropagation ANN (BPANN) models with basic econometric techniques to predict IR.McAdam and McNelis [19] implemented basic as well as complex neural network-based models to forecast IR in the USA, Japan, and Europe based on Phillips curve formulations.Te complex models represented cropped mean predictions from numerous neural networks.Te complex models performed better than the basic models for normal timing and bootstrapping predictions for numerous indices in Europe and elsewhere.Wang et al. [20] surveyed partly parametric and erratic AR models with exogenic variables depending on neural networks for CPI modelling and forecasting.Tus, more exploration needs to be done in CPI modelling and forecasting with ANN.Several applications of ANN have been seen in CPI analysis and forecasting [21,22].
Te CPI is a widely used measurement of the cost of living [1].CPI does not only afect the government's monetary, fscal, consumption, prices, wages, and social security but also closely relates to the daily lives of people in a country [23,24].As an indicator for development, it is essential to model and forecast fuctuations in the CPI.In the post-COVID-19 era, almost every country is in a CPI crisis and Pakistan is not exempted from this catastrophic situation.Most of the previous studies in the literature applied conventional modelling techniques for the prediction of CPI [8,9,23,25].Tose that applied neural networks used single models [18,19].In our study, we investigated and used the machine learning approaches (i.e., multilayer perceptron (MLP) and neural network autoregressive (NNAR) models) in forecasting CPI [16] and compared them with the seasonal univariate ARIMA (SARIMA) model.Te models are compared using root mean square (RMSE), mean square error (MSE), and mean absolute percentage error (MAPE).
Te rest of the article is ordered as follows.Section 2 illustrates the source of data and modelling methods.Section 3 indicates the results of the modelling and its discussion.Section 4 presents the conclusions.

Data.
Te data used for modelling the yearly CPI span from 1960 to 2021.Te data were sourced from the ofcial website of the Pakistan Bureau of Statistics (https://www.pbs.gov.pk/).Figure 1 shows the time series plot, whereas Figure 2 shows the autocorrelation function (ACF) and partial ACF plots of the CPI data.Te summary statistics of the data are presented in Table 1.
A time series {Y t } is said to follow the ARMA (p, q) (p, q) model [26][27][28][29] if ө n e t−n + e t . ( Using the back-shift operator, equation ( 1) can be written as where 2 Applied Computational Intelligence and Soft Computing where p and q are greater than zero and p refers to the AR part while q refers to the MA part and e t is the white noise term of the model.μ is a constant that can be zero or not and e t is the white noise having a mean 0 and variance σ 2 [31][32][33].
For the nonstationary time series, frst, we convert it by taking the diference of the series to obtain the ARIMA model.Denoting the number of diferencing of the series by d, the ARIMA (p, d, q) model [26][27][28][29] is given by When the ARIMA model has an extra lag ofset for seasonality as an additional component of the autoregressive and moving average, the seasonal ARIMA (SARIMA) is obtained [31][32][33].
With the backward shift, SARIMA (p, d, q) × (P, D, Q) s [26][27][28][29] is given by where with P being the autoregressive seasonal term, Q being the moving average seasonal term, and D being the diferencing seasonal term based on s seasonal periods.Figure 3 shows the fowchart of SARIMA methodology.We can, therefore, write SARIMA (p, d, q) × (P, D, Q) s [34,35] in the form

Neural Network Autoregressive (NNAR) Model.
Te neural network autoregressive (NNAR) [36][37][38] model is an application of neural networks in supervised classifcation, prediction, and nonlinear time series forecasting.A simple feedforward neural network's design can be characterized as a network of neurons arranged in input, hidden, and output layers in a specifc order [36].Each layer uses weights that are acquired using a learning method to relay information to the subsequent layer [37].Te NNAR model is a variation of the straightforward ANN model created specifcally for challenges involving time series datasets [38].Te time series' lagged values are used as inputs in the NNAR model together with fxed number of hidden neurons.Te NNAR (p, k) model applies one hidden layered feedforward neural network with k hidden units to time series data with p-lagged inputs [39].Let f be a function of a neural network with the following design, and let x represent a vector of p-lagged inputs [38]; then, where C 0 , a j , and w j are linking adjustable weights, b j is a pdimensional weight vector, and a is a nonlinear sigmoidal function with a bounded domain (e.g., logistic function or tangent hyperbolic activation function).
Te structure of the NNAR model is illustrated in Figure 4, while the fowchart for NNAR modelling is illustrated in Figure 5 [37,38].

Multilayer Perceptron (MLP) Model.
Similar to the NNAR model, the multilayer perceptron (MLP) model also uses artifcial neurons to migrate processed information from one layer to another [40,41].Te hidden layers receive the processed information from the input layers and pass it through an interconnected processed fact in a random ramifcation to the output layers in a manner that will ensue reciprocation of a feedforward system with disjoint layers [19].Te MLP network function [40,41] is given by where d j is the input network and represents the bias of the network, c j .Te intermediate layers have g as the function of activation and g w the output layer function of activation.x is the signal output, and k i nj are the weights for the intermediation layer, while k 0 1n denote the connections of the neuron's output [15].Figure 6 illustrates the structure of an MLP model while Figure 7 shows the fowchart for the MLP modelling procedure [41].
Tese models have been applied to complete CPI data in Pakistan and compared with MSE, RMSE, and MAE.Expressions for these measures of performance are where Y 1 , . . ., Y M as well as Y, . . ., Y K is a subdivision of our data.Te model having the least MSE, RMSE, and MAE is chosen as the preferred model with our data [36][37][38].R version 4.3.1 was used for all analysis.

Results and Discussion
Figures 1 and 2 clearly show that the data are nonstationary.Te series was diferenced once and plotted to observe the trend as well as the ACF and the PACF.To check the stationarity of our data statistically, the Augmented Dicky-Fuller (ADF) [42] test was applied under the hypothesis of a 0.05 level of signifcance.
H 0 : the series has a trend and is nonstationary H 1 : the series has no trend and is stationary Te results of the analysis show that H 0 is rejected in favor of H 1 (since the p value is less than the 0.05 signifcance level) and conclude that the series is stationary.Due to the extra lag present, we applied the second diferencing to correct it.Figure 8 shows the time series plot of the second diferenced series, while Figure 9 shows the ACF and PACF plots of the second diferenced series [33].Figure 10 shows the ACF and PACF of the competing models, and Figure 11 illustrates the Q-Q plot.Tus, diferencing the series  2 suggest that the null hypothesis is not rejected (since the p value is greater than the 0.5 signifcance level) and conclude that the residuals are normally distributed, which validates our results.After checking all possible accurate combinations of the SARIMA, diferent iterations of the NNAR model were applied and the best three were selected among the diferent iterations.Te bestselected iterations for the NNAR were 20, 30, and 40.Te performance indicators for all three iterations were computed.Te machine learning MLP model was next to be implemented on the CPI data.We set the hidden layers to 5, 10, and 20.We also computed the performance indicators for all possible combinations of the MLP.Te results of all competing models are presented in Table 3.
Table 4 shows the performance indicators for all combinations of SARIMA, NNAR, and MLP.It is evident from Table 4 that the 20-hidden-layered MLP outperformed all other competing models since it had the least RMSE, MSE, and MAPE. Figure 12 portrays the ftted as well as the original values, whereas Figure 13 shows the CPI values forecasted using the 20-hidden-layered MLP.Our fndings are in contrast with the results of Qin et al. [43] and Hwang [39].
From Figure 13, the 20-hidden-layered MLP model gives multiple horizon forecasts as it indicates that the series may behave in numerous directions with restrictions [43].Table 4 shows the annual forecasted CPI values from 2022 to 2031.Te forecasted values show an increasing trend with high values.We entreat Pakistan authorities to initiate policies to

Input Layer
Hidden Layer Output Layer     Applied Computational Intelligence and Soft Computing reduce these fgures as it is not healthy for consumers and the general population.A similar result was obtained by Ansar and Asghar [44].Te computation of CPI takes into account basic commodities like transportation, medical services, goods, and food that can easily be purchased by all consumers.CPI is labelled as a representative of infation and used as a major indicator of economic growth.CPI is an important variable that is used to measure the IR [39].Te prices paid by consumers or households are represented by the CPI.Additionally, the changes in the purchasing power of money in a country are traced by the CPI [25].Stock prices, exchange rates, and interest rates highly afect infation, while the rise in the prices of yield and fall in the prices of bonds causes unexpected infation [24,43].Stock prices are negatively afected by the increase in the interest rate.Our result therefore refects an increasing trend in CPI values in Pakistan that, if not checked, will adversely cripple Pakistan's economy.

Conclusion
Tis study delves into the critical domain of CPI analysis, recognizing its paramount importance in gauging economic stability and its ramifcations for households and individuals.To provide a robust understanding and prognosis of CPI dynamics in the context of Pakistan, a spectrum of time series forecasting models was employed, including SARIMA, NNAR, and MLP.Tese models were used for modelling and analyzing the historical CPI data spanning from 1960 to 2021.In addition to constructing these models, rigorous diagnostic assessments were conducted to ascertain their suitability and reliability.Tese diagnostic steps were pivotal in establishing the adequacy and robustness of the selected model.Utilizing the selected model, the study proceeded to generate short-term forecasts for annual CPI values in Pakistan, traversing the horizon from 2022 to 2031.Tese projections unveiled a discernible and persistent upward trajectory in the CPI over this time frame.Tis observation holds considerable implications for purchasing power dynamics as it implies a continuous escalation in the cost of living.Te situation could potentially exert fnancial strain on individuals and households, particularly if income levels and wages remain stagnant throughout this period.Tis rigorous study makes a substantial contribution to comprehending the intricate behaviour of CPI in the specifc   context of Pakistan and its potential macroeconomic and microeconomic repercussions.It is of paramount importance for all stakeholders to meticulously contemplate the implications of escalating CPI when making fnancial, investment, and economic decisions to proactively mitigate the potential consequences on purchasing power, economic stability, and overall welfare of the less privileged in Pakistan.Te efect of this high CPI is a higher cost of living, thereby increasing poverty levels [44].Te government should devise economic policies and put in initiatives in such a way that the increasing trend of CPI can be minimized so that the burden on the poorest class of people in Pakistan can be minimized, thereby reducing poverty.

Figure 1 :Figure 2 :
Figure 1: Time series plot of CPI from the year 1960 to 2021.

4
Applied Computational Intelligence and Soft Computing improved the qualities of the series, thereby making it better.Diferent combinations of the SARIMA model are obtained, and their performance indicators are checked.Te various combinations were SARIMA(3, 1, 3)(0, 1, 0) 12 , SARIMA (2, 1, 2)(1, 1, 0) 12 , and SARIMA(2, 1, 2)(0, 1, 0) 12 .Te correlogram for the selected model for full sample data is given in Figure 9, whereas the diferenced series is shown in Figure 8.To check the normality of the model's residuals, we applied the Shapiro-Wilk normality test at 5% signifcance level to the hypotheses [30]: H 0 : the residuals are normally distributed H 1 : the residuals are not normally distributed Te results in Table

Figure 11 :
Figure 11: Q-Q norm plots of the candidate model for CPI.

Table 1 :
Summary statistics of the CPI of Pakistan from the year 1960 to 2021.

Table 3 :
Estimated candidate models for CPI (%) from the year 1960 to 2021.

Table 4 :
Forecasted CPI values for the year 2022 to 2031 with the MLP with 20 layers.