Comparative Evaluation of the Multilayer Perceptron Approach with Conventional ARIMA in Modeling and Prediction of COVID-19 Daily Death Cases

COVID-19 continues to pose a dangerous global health threat, as cases grow rapidly and deaths increase day by day. This increasing phenomenon does not only affect economic policy but also international policy around the world. In this paper, Pakistan daily death cases of COVID-19, from February 25, 2020, to March 23, 2022, have been modeled using the long-established autoregressive-integrated moving average (ARIMA) model and the machine learning multilayer perceptron (MLP) model. The most befitting model is selected based on the root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE). Values of the key performance indicator (KPI) showed that the MLP model outperformed the ARIMA model. The MLP model with 20 hidden layers, which emerged as the overall most apt model, was used to predict future daily COVID-19 deaths in Pakistan to enable policymakers and health professionals to put in place systematic measures to reduce death cases. We encourage the Government of Pakistan to intensify its vaccination campaign and encourage everyone to get vaccinated.


Introduction
From the beginning of this contagious coronavirus disease 2019 (COVID- 19), it was acknowledged as a crisis that has negatively impacted almost all aspects of public and economic life. Due to the increasing infectious cases of COVID-19, there is also an increase in the death rate of patients, which creates a chaotic and mental disorder among humans across the globe. Predicting the behavior of contagious diseases is a major headache for both policymakers and health professionals [1,2].
Jabardi et al. [3] utilized the autoregressive-integrated moving average (ARIMA) model to forecast the infection and death cases of COVID-19 in Iraq. Tey selected their model by implementing the root mean square error (RMSE) criteria. Shareef et al. [4] used four diferent models for analyzing the drift of COVID-19 cases in Pakistan and found the ARIMA model as an optimum forecasting model. Nesa et al. [5] utilized the ARIMA model for forecasting confrmed recovery and death cases of COVID-19 in Bangladesh. Banda [6] used the ARIMA model in predicting the cumulative confrmed cases of COVID-19. In their work, the appropriate model is selected based on the root mean square error (RMSE), mean square error (MSE), and mean absolute percentage error (MAPE).
Xu et al. [7] applied three machine learning models, namely, convolutional neural networks (CNNs), long shortterm memory (LSTM), and CNN-LSTM to forecast new cases of COVID-19 and found that the LSTM has high accuracy in prognosticating new COVID-19 cases. Naimoli [8] compared the heterogeneous autoregressive (HAR) model and the ARIMA model in fnding the positive rates of COVID-19 in Italy and concluded that the HAR model outperformed the ARIMA model. Chyon et al. [9] used the ARIMA model and machine learning propositions to predict COVID-19-afected individuals.
Machine learning approaches to time series modeling and forecasting seems to perform better with more accurate forecast values than those of the traditional time series models [7][8][9]. Terefore, more machine learning time series approaches ought to be explored.
1.1. Literature Review. Predictive and statistical models have been used constantly for modeling diseases and other pandemics. Te conventional models used in time series analysis are ARIMA models proposed by Box-Jenkins for modeling and forecasting time series data.
Mohan et al. [10] put forward a hybrid ARIMA model to model and predict the daily confrmed and cumulative confrmed cases of COVID-19. Te results showed that the modifed ARIMA model outperformed the traditional ARIMA model in predicting the daily confrmed and cumulative confrmed cases. Argawu [11] applied the ARIMA model to prognosticate COVID-19 new cases in Algeria, Egypt, Ethiopia, Morocco, and South Africa. Rachman [12] and Zhang et al. [13] conducted a study to compare and forecast the vaccination of COVID-19 using the ARIMA and LSTM models. Chen et al. [14] employed three time series models to predict confrmed cases of COVID-19 for different provinces in Canada. Tey found out that the neural network outperformed the others in short-term forecasting. Ribeiro et al. [15] used ARIMA models, Cubist model, random forest (RF), ridge regression (RIDGE), support vector regression (SVR), and stacking ensemble learning in predicting one, three, and six days forward confrmed cumulative COVID-19 cases in ten Brazilian states. Warssamo and Sciences [16] developed the ARIMA model for analyzing verifed recuperate and death cases in Ethiopia, while Sahai et al. [17] utilized the ARIMA model for estimating and predicting the infected cases from the top fve countries with a high number of COVID-19 cases at a particular time frame, namely, the United States (US), Brazil, India, Russia, and Spain. Biswas [18] and Zeroual et al. [19] conducted a comparative study on the new daily cases of COVID-19 using fve deep learning models to predict the number of recovered and new cases.
Li et al. [20] reported diferent ARIMA models for diferent countries to forecast coronavirus incidence, and their model was selected based on AIC criteria. Tan et al. [21] developed the seasonal autoregressive moving average (SARIMA) model for the analysis of the trend of the third wave of COVID-19 in Malaysia. Teir model selection was based on the RMSE, mean absolute percentage error (MAE), and Bayesian information criterion (BIC). Rajab et al. [22] suggested an approach to predict the spread of COVID-19 in the United Arab Emirates (UAE), Saudi Arabia, and Kuwait by utilizing the vector autoregressive (VAR) model. Rguibi et al. [23] employed the ARIMA and LSTM models to forecast and predict the time evolution of COVID-19 in Morocco.
Te epidemiological viewpoint on displaying contagious sickness spread includes the thought of a bigger number of demonstrating boundaries enumerating the spread of the infection and recuperation from the infection, extra compartments relating to mature classifcation, and other related decisions [24,25]. An information-driven way to deal with displaying COVID-19 has likewise arisen, in which measurable and machine learning models are utilized for gauging cases, hospitalizations, passings, and efects of social separating [26,27]. Considering machine learning approaches, forecasting by using artifcial and wavelet neural networks with meteorological conditions has been studied by Guo et al. [28]. Guo and He [29] predicted confrmed death cases together with confrmed global COVID-19 confrmed cases utilizing artifcial intelligence. Guo et al. [30] explored the changes in air quality from COVID-19 to the post-COVID-19 era in the Beijing-Tianjin-Tangshan region of China using the air quality index in machine learning, while He et al. [31] implemented artifcial neural networks to predict monthly PM2.5 concentration in China's Liaocheng province.
It is clear from the above that there is an inconclusive approach to modeling COVID-19 death cases using ARIMA and machine learning techniques. In this study, we modeled daily COVID-19 death cases in Pakistan using the classical ARIMA model and the machine learning multilayer perceptron (MLP) model [32][33][34][35]. Te models are compared using performance indicators (KPIs). Te most appropriate model is selected to predict future cumulative COVID-19 deaths in Pakistan. Forecasting through the selected modeling technique will assist authorities in Pakistan to observe the daily death trend due to COVID-19 in Pakistan, thereby providing them with a valid tool for controlling the efects of the pandemic. Tis will, in the long run, help Pakistan authorities to put in place strategic prevention measures and mechanisms to curtail death cases in the country. It will also assist the authorities concerned to ascertain the intensity of the pandemic in future. Our proposed model can be compared with existing models in the literature to show predictive strength and accuracy.
Te remainder of the article is organized as follows: in the upcoming section, we present the data and methods, followed by the results and discussion. In the last section, we present the conclusions of the study.

Data.
Te data consist of daily confrmed COVID-19 death cases from February 25, 2020, to March 23, 2022, which are available on the ofcial website of the Pakistan Ministry of National Health Services, Regulation and Coordination (https://covid.gov.pk). Te data were collected by a joint action between the Government of Pakistan, the Pakistan Ministry of National Health Services, Regulation and Coordination, and the World Health Organization. Table 1 shows the summary statistics of COVID-19 death cases in Pakistan.

Autoregressive-Integrated Moving Average (ARIMA)
Model. Te ARIMA model, also known as the Box-Jenkins methodology [36], is among the best classical time series models that are used for short-term forecasting purposes. Tis model [ARIMA (p, d, q)] is a combination of three components; namely, autoregression (AR), gives us information about how the series is dependent on its past lag and denoted by a parameter p, the moving average (MA) part which tells us about the dependency of error terms on past lags and is denoted by q, and the last part is the integrated part which is used when the series is not stationary and denoted by d. Tis methodology comprises four procedures, namely, model identifcation, estimation of parameters, diagnostic checking, and forecasting. Te series is checked by applying some tests of stationarity, and after that, the model is identifed based on the correlogram of the data. It proceeds with the estimation step, and after that, the estimated models are examined based on diagnostic checking; if the candidate model fulflls the criteria, the model is utilized for forecasting. Mathematically, this model can be written as if the series is nonseasonal. However, if the model is based on seasonal components, then we can write this model in terms of the backshift operator as where Φ p stands for the autoregressive part and ө q stands for the moving average part, while ∆ d y t denotes the difference in the series. φ P (B) is the seasonal autoregressive polynomial of order P and Θ Q is the seasonal moving average polynomial of order Q. ∆ d ∆ D s y t is the seasonal difference. Figure 1 shows the fowchart for this methodology.

Multilayer Perceptron (MLP) Model.
Te multilayer perceptron (MLP) machine learning model [37][38][39] is acknowledged as one of the most fexible mathematical algorithms according to its potential applications as well as its precision in time series predicting and forecasting. Te MLP model is particularly useful in approximating any type of continuous, nonlinear, diferentiable, and limited function. Tis has made it a universal approximator. Structurally, the MLP model comprises an input layer and an output layer vis-a-vis one or more hidden layers. Artifcial neurons are used to process information from one layer to another layer. Hidden layers receive the information from the input layers and then pass the information in a nonlinear function to another space, depending on the study of interest. Tis interconnected information then enters the output layer, resulting in the network response. Te structure of the network is a feed-forward information algorithm, with connecting layers being disjoint. Mathematically, the network of the MLP model is given by the following equation: where the network inputs u n are the bias of the network b n , f is the activation function of the intermediate layers, and f s is the output layer activation function. y is the output signal, w i kn is the weight of the intermediate layer, and w 0 1k is the connection of the output neurons. Figure 2 represents the diagrammatic structure of the MLP model.
We used both models to predict the cumulative death cases in Pakistan and compared the models based on KPIs such as the mean square error (MSE), RMSE, and MAE. Mathematical expressions for KPIs are given as follows: where Y 1 , . . . , Y N and Y, . . . , Y T are a partition of the data. Te model with the smallest KPI is selected as the most apt for the series and used for forecasting. All analyses were performed in R. Figure 3 shows the visual features of the series. It can be deduced that the series is not stationary. Te correlogram in Figure 4, the autocorrelation function (ACF), and the partial autocorrelation function (PACF) plot confrmed that the series is not stationary. We applied the augmented Dicky-Fuller test of stationary at a 0.05 signifcant level to the following hypotheses:

Results and Discussion
Te p value of the series was found to be 0.5385, which means we fail to reject H o , confrming that indeed the series is not stationary. To make the series stationary, we applied diference transformation, thereby fnding the order of the candidate model. Tis was achieved by making a correlogram of the transformed series. Figure 5 shows the correlogram of the transformed series. From the fgure, it is easy to estimate the diferent candidate models, and the best candidate model is selected according to KPIs. Te estimated candidate models are given in Table 2. From Table 2, we notice that the candidate model, ARIMA (6,1,6), is the best ft since it has the least KPIs among the other competing models. We used this ARIMA (6, 1, 6) to prognosticate future values of everyday death due to COVID-19 in Pakistan. We also present the graph of the ftted values versus the original values of the series. Figure 6 shows the graph of the ftted versus original series, while Figure 7 shows the forecasted values given. From Figure 7, it  I  I I  III  IV  I  II  III  IV  I   2020   can be observed that by using the ARIMA (6, 1, 6) model, we get the 95% and 90% confdence interval values, with the dark blue showing 95% confdence interval values and the light blue showing 90% confdence interval values. It can be noticed that the ftted values of this model efciently follow the original series of data, which indicates that this model is efcient with a given confdence interval to forecast the daily death cases of COVID-19 in Pakistan. Our results contradict those obtained by Shareef et al. [4].
We then applied the machine learning MLP model to predict the death cases of COVID-19. To achieve this, we set the hidden layers to fnd the optimum estimates. Figure 2 shows the diferent candidate models of the MLP.
From Table 3, we found that the MLP model with 20 hidden layers outperforms the other candidates of MLP models. It is interesting to note that as we increase the number of hidden or intermediate layers, the KPI decreases with optimum efciency. However, increasing the hidden layer must be done with caution as the model may not remain efcient at some point after some fxed number of hidden layers. Figure 8 shows the ftted versus the original values, while Figure 9 shows the forecasted values for the MLP with 20 hidden layers. From the fgures, we can observe that the MLP model gives us multiple horizon forecasts as it indicates that the series can behave in many but limited directions. Furthermore, the residual plot indicates that the model fts to the data very efciently and can forecast the future values efciently. Additionally, it can also be noticed

Conclusion
Te COVID-19 death cases in Pakistan have been analyzed using the classical time series ARIMA model and the machine learning MLP model. Diferent candidate models of both models were applied and compared using diferent KPIs. Te KPIs used, which have been frequently used in numerous classical and machine learning time series modeling, pointed to the fact that the MLP model with 20 hidden layers outperforms all other competing models for modeling and prediction purposes. It must be noted that increasing the hidden layer should be done with caution as the model may not remain efcient at some point after some fxed number of hidden layers. Te MLP model was then used to forecast COVID-19 confrmed deaths in Pakistan.
Tis will, in the long run, help authorities to put in place strategic prevention measures and mechanisms to curtail the death cases in the country. It will also assist authorities to ascertain the intensity of the pandemic in future. Although there is a strong campaign for vaccination, people should be encouraged to take vaccination seriously. It is the responsibility of the Government of Pakistan and the whole society to make the vaccination process successful.

Data Availability
Daily confrmed COVID-19 data from February 25, 2020 to March 23, 2022, provided by the Pakistan Ministry of National Health Services, Regulation and Government of Pakistan, were used for this study (https://covid.gov.pk).