COVID-19 Pandemic Forecasting Using CNN-LSTM: A Hybrid Approach

COVID-19 has sparked a worldwide pandemic, with the number of infected cases and deaths rising on a regular basis. Along with recent advances in soft computing technology, researchers are now actively developing and enhancing diﬀerent mathematical and machine-learning algorithms to forecast the future trend of this pandemic. Thus, if we can accurately forecast the trend of cases globally, the spread of the pandemic can be controlled. In this study, a hybrid CNN-LSTM model was developed on a time-series dataset to forecast the number of conﬁrmed cases of COVID-19. The proposed model was evaluated and compared with 17 baseline models on test and forecast data. The primary ﬁnding of this research is that the proposed CNN-LSTM model out-performed them all, with the lowest average MAPE, RMSE, and RRMSE values on both test and forecast data. Conclusively, our experimental results show that, while standalone CNN and LSTM models provide acceptable and eﬃcient forecasting performance for the conﬁrmed COVID-19 cases time series, combining both models in the proposed CNN-LSTM encoder-decoder structure provides a signiﬁcant boost in forecasting performance. Furthermore, we demonstrated that the suggested model produced satisfactory predicting results even with a small amount of data.


Introduction
e year 2020 witnessed the global spread of the coronavirus disease (COVID-19) pandemic [1].As of September 29, 2020, the virus had infected over 33.2 million people and had killed over 1 million in more than 216 countries [2].COVID-19 was first discovered by Chinese authorities in Wuhan City, China, on January 7, 2020, as the cause of a new type of pneumonia [1].It was then identified as a member of the zoonotic coronavirus family [3].COVID-19, which is highly infectious, quickly transmitted through close human-tohuman contact.erefore, to minimize the number of infected cases, many countries followed procedures that included quarantine, online schools and businesses, and bans on travel [4,5].
Given the severity of the disease, identifying the COVID-19 spread rate is vital for governments.Hence, by knowing the spread rate at a given time, governments can act accordingly by planning public health and forming policies and strategies to minimize COVID-19 consequences [6,7].
is can be achieved by performing COVID-19 tests on a large scale.However, as of April 23, 2020, no country was able to test more than 13.4% of their population [8].Another method that identifies the COVID-19 spread rate is accurately predicting the figure of active cases at a specified time.However, COVID-19 cases are exponentially increasing and the data are nonlinear and nonstationary.erefore, predicting the epidemic's future is challenging.As a result, the situational demand is to introduce an efficient model with the highest accuracy [9].
Unfortunately, statistical techniques generally make assumptions about the stationarity and linear correlation of historical data, whilst machine-learning methods seem incapable of detecting and recording the nonlinear and complex behavior of COVID-19 time series.Hence, none of the previous studies could ensure the creation of an accurate and robust COVID-19 forecasting model, as they concluded moderately deprived outcomes [27] or their predictions did not align with real data [28,29].In addition, according to [18], the current models have several defects, including being linear, nontemporal, and based on several assumptions.Moreover, it seems that they are unable to deal with noisy and chaotic time series data.
On the other hand, deep learning techniques have lately been effectively utilized to a range of difficult prediction issues encountered in the real world, including time-series forecasting [30][31][32].Deep learning is often regarded as the most effective technique for dealing with the noisy and chaotic character of time series predicting issues, since it produces more accurate forecasts.One of the most efficient and extensively utilized deep learning approaches is long short-term memory (LSTM).LSTM has been applied by several studies to forecast COVID-19 cases [14,17,18,21,25].LSTM models can easily capture sequence pattern information, but they are tailored to deal with temporal correlations and only use the features specified in the training set.Another popular deep learning method is convolutional neural networks (CNNs).CNN models are capable of filtering out noise in the input data and extracting more valuable knowledge for the final forecasting model.While vanilla CNNs are compatible for handling spatial autocorrelation data, they are seldom modified to handle complicated and lengthy temporal dependencies [33].As a result, a time-series model that takes advantage of both deep learning techniques, i.e., LSTM and CNN could enhance forecast accuracy.
e primary goal of this study was to aid in the accurate forecasting of COVID-19 confirmed cases.We therefore proposed a hybrid CNN-LSTM forecasting model.Seventeen baseline predictive machine-learning models were also built in this study for comparison with our proposed model.
e key contributions of this study are (i) A hybrid CNN-LSTM model was proposed to combine the advantages of the CNN model, which is effective at filtering out noise in the input data, obtaining valuable information, and learning the time series internal representation, with the benefits of the LSTM model, which is effective at identifying and modeling short-and long-term temporal dependencies embedded in the data sequence.(ii) On test and prediction data, the proposed model was assessed and compared to 17 baseline models.e findings indicate that our proposed CNN-LSTM model beats the other 17 models in predicting new confirmed cases with the lowest error value.In terms of RRMSE, the proposed hybrid model outperforms the standalone CNN-1D and LSTM models by 1.15 percent and 3 percent, respectively.Even with a small amount of data, this result demonstrates the efficacy of merging CNN-1D and LSTM models.
e remaining sections of this manuscript are as follows: Section 2 studies the related literature, Section 3 explains the materials and methods used in this research, Section 4 reports the main results and compares the forecasted trend to the actual trend, Section 5 reports the threats to validity, and Section 6 summarizes the study and suggests future works.

Related Works
Time-series prediction is a forecasting method that analyses historical data to capture the relationship and trends of a random variable.It will then be applied to forecast the value of that random variable in the future [34]. is method is particularly useful if the underlying distribution/process data generation is unknown or if there is no explanatory model capable of precisely linking the prediction variable with other explanatory variables.A great deal of effort and production of research has gone into the construction and advancement of time series forecasting techniques over the last several decades.e next paragraph summarizes many fruitful researches that demonstrate several models for forecasting COVID-19 cases.
Many researchers have employed the standard forecasting method with statistical modeling to predict COVID-19 outbreak [9].For example, Ceylan [16] used ARIMA techniques to forecast the pattern of COVID-19 prevalence in France, Spain, and Italy from 21/2/2020 to 15/4/2020, using data from the World Health Organization (WHO) website.Several ARIMA regressors were built using various ARIMA parameters.
ey chose three different ARIMA regressors to predict the spread of COVID-19 for the three selected countries based on the lowest MAPE values.e ARIMA (0, 2, 1) was found to be the best model for Italy with MAPE � 4.7520, ARIMA (1, 2, 0) for Spain with MAPE � 5.8486, and ARIMA (0, 2, 1) for France with MAPE � 5.6335.Another study performed by Roy et al. [19] also used ARIMA to forecast the epidemiologic trend in the prevalence and incidence of COVID-19 using an Indian dataset from 30/1/2020 to 26/4/2020.ARIMA (2, 2, 2) was discovered to be the most reliable model for predicting COVID-19 events, with RMSE � 95.322 and MAE � 50.109.However, because COVID-19 data are nonlinear and nonstationary, the ARIMA model is not optimal with such 2 Journal of Control Science and Engineering cases [35].
is applies to other statistical approaches.Statistical analysis can be used to perform modeling on known data.However, it struggles to grasp the complexities of the analyzed data when dealing with extremely complex models [10].As a result, statistical tools are deemed inadequate for analyzing pandemic unpredictability, and generated models are difficult to generalize [9].
Alternatively, algorithms based on artificial intelligence (AI) learn from historical data to forecast future results.Machine-learning and deep-learning algorithms are two types of AI algorithms.It is a field that is focused on computer algorithms learning and developing on their own.Machine-learning-based forecasting regressors change their parameters to match their forecasts to the actual data.Some related studies that used machine-learning algorithms in forecasting the dispersion of COVID-19 disease are discussed in the following paragraphs.
Car et al. [10] implemented an MLP-ANN to forecast the number of COVID-19 deceased, recovered, and infected cases worldwide using a dataset taken from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) from 22/1/2020 to 12/3/2020.eir best models used the ReLU activation feature and have four hidden layers, each with four neurons, with coefficient of determination (R 2 ) � 0.98599 for infected patients, 0.99429 for deceased patients, and 0.97941 for recovered patients.Salgotra et al. [13] used genetic programming (GP) to model the possible effects of COVID-19 on confirmed and death cases in 15 of the world's most affected countries between January 2020 and May 2020.ey discovered that the GP efficiency was superior, with RMSE and R 2 values close to 1.However, most studies that relied primarily on machinelearning models experienced underfitting or overfitting issues [36], limiting to retrospective analysis, or only projecting short-term trends due to noisy time series data or a lack of training data and appropriate features [37][38][39].
To address the aforementioned problems, time series forecasting has lately included deep learning algorithms [30][31][32], resulting in more accurate predictions.As one of the most successful deep learning methods, LSTM has been utilized to predict COVID-19 cases in many researches [14,17,18,21,25].ese studies revealed that LSTM models can easily capture sequence pattern information, but they are tailored to deal with temporal correlations and only use the features provided in the training set.Convolutional neural network (CNN) is another well-known deep learning and has also been applied in forecasting COVID-19 cases [40][41][42].Results from these studies showed that CNN is excellent for filtering out noise in input data and extracting more beneficial features for the final forecasting model.Although standard CNNs are compatible to dealing with spatial autocorrelation data, they are seldom modified to cope with complicated and lengthy temporal dependencies [33].Consequently, a time-series forecasting model that employs both deep learning methods, namely, LSTM and CNN, may improve prediction accuracy.A research performed by [43] used a hybrid CNN-LSTM model to determine whether individuals had COVID-19 disease based on lung ultrasound.e hybrid method provided the highest levels of accuracy, recall, and AUC.To the best of our knowledge, however, the CNN-LSTM technique has not been tested on COVID-19 time series data.As a result, in this study, we proposed using the hybrid CNN-LSTM model to predict the number of COVID-19 infected patients across the globe.

Materials and Methods
is section provides information on the study's materials and procedures.

Materials.
e data for this analysis came from the WHO COVID-19 dashboard [2].It contains information on coronavirus cases in each specific country, such as the number of confirmed, dead, cumulative confirmed, and cumulative deaths (defined by the name of the country, country code, and WHO region) every day from the beginning of the COVID-19 infections (4/1/2020) to 24/9/ 2020.e dataset contained 62,510 records for 216 different countries and 265 days, totaling 31,798,308 new cases and 973,653 death cases at the time this analysis was conducted.Figures 1 and 2 show the spatial distribution of cases that have accumulated at three different time stamps.e global scope of COVID-19 confirmed and death cases is depicted in Figure 3.

Methods.
e proposed research method consists of three phases: Preparing time series data, building the predictive models, and applying the predictive model.We used the time series "New COVID-19 cases."e first phase "Preparing time series data" consists of three steps: Convert dataset into time series, normalize time series data, and split time series data.e second phase "Building the predictive model" consists of three steps: Optimize the models, train the models, and evaluate the models.e models were optimized to get the best hyperparameter.e models were then trained using the best hyperparameter on a train set, the time series of which starts on January 4, 2020, when the first case of COVID-19 started and ends on July 17, 2020.e trained models were then evaluated on a test set, the time series of which starts on July 18, 2020 and ends on August 14, 2020.e forecasting between July 18, 2020 and August 14, 2020 was estimated and compared with the real values.e third stage "Applying the predictive model" entails the information of COVID-19 starting on August 15, 2020 and ends on September 18, 2020.en, the forecasting between September 12, 2020 and September 18, 2020 was estimated and compared with the real values.e second and third phases were applied 18 times, one time per predictive model (1 proposed model and 17 baseline models).Figure 4 depicts a high-level description of the entire procedure.

Phase 1: Preparing Time Series Data
Step 1.1: Convert dataset into time series In this study, we used the "New_cases" (COVID-19 confirmed cases) dataset.To make the dataset a time series, we set the date as parse_dates when opening the csv file.e loaded dataset was resampled and grouped by day using resample() function with the argument "D." e total of 265 data was divided into weeks.is gives 37 weeks of data.
Step 1.2: Normalize time series data e time-series data from COVID-19 confirmed cases were normalized using min-max normalization within the range [0, 1]. is normalization must be reversed after predicting COVID-19 time series on testing data, such that the predicted data are similar to the original testing time-series data.
Step 1.3: Split time series data Sequentially, to prepare the time series for model development, 37 weeks of the normalized time series data were divided into training, test, and forecast sets using the NumPy split() function.Table 1 shows the details of the output from the splitting procedure while Figure 5 visualizes them in graph.

Phase 2:
Building the Predictive Models.In this study, we proposed a CNN-LSTM encoder-decoder model.e proposed model together with 17 baseline models were built (1) Proposed Model.A hybrid deep learning architecture called CNN-LSTM was proposed to maximize the utilization of the CNN model for obtaining valuable knowledge and learning the time series internal representation with the efficacy of the LSTM model for detecting and modeling short-and long-term temporal dependencies embedded in the data sequence.
To do this, our suggested model, CNN-LSTM, is composed of two primary components: e first component, a one-dimensional CNN, is composed of convolutional and pooling layers that execute complex mathematical procedures on the input data to create features, while the second component makes use of the generated features through LSTM and dense layers.
Convolutional and pooling layers [44] function as purpose-built data preprocessing layers, filtering incoming data, and extracting important information to be used as an input to a fully connected network layer.Convolution is performed between the raw input data and convolution kernels by the convolutional layers, resulting in the creation of new feature values.Because this method was initially designed to extract features from picture datasets, the input data must be in organized matrix form [45]. Consider the convolution kernel to be a narrow window containing the coefficient values in the matrix form. is window moves across the input matrix, executing convolution on each subregion it intersects.All these processes result in a convolved matrix that signifies a feature value defined by the coefficient values and filter dimension size.By utilizing various convolution kernels to the input data, numerous convolved features may be produced, which are often more valuable than the input data's original starting features, thus improving the performance of the model.
Following the convolutional layers, a nonlinear activation function (e.g., a rectified linear unit) is typically used, trailed by a pooling layer. is layer is a technique for subsampling that removes certain values from the convolved features and creates a matrix with a smaller dimension.Similarly, for the operations done on the convolutional layer, the pooling layer employs a tiny sliding window that accepts the values of each patch of the convolved features as input and outputs a single new value determined by an operation provided for the pooling layer.For instance, max pooling and average pooling are used to determine the maximum and average values of each patch.Consequently, the pooling layer generates new matrices that may be thought of as summarized versions of the convolutional layer's convolved features.e pooling process may contribute to the system's robustness by ensuring that minor changes in the input do not affect the pooled output values.
LSTM [46] is a subclass of recurrent neural networks (RNNs) that may learn long-term dependencies through feedback connections.Traditional RNNs seek to resolve the issue associated with feedforward neural networks, referred to as "loss of memory," which results in low performance on sequence and time-series issues.ese models make use of cyclic connections in their hidden layer to develop shortterm memory and have the ability to extract knowledge from time series and sequence data.Nonetheless, RNNs are constrained by the well-known loss gradient issue, which prevents the model from learning long-range dependencies.
us, LSTMs address this issue by keeping valuable  Journal of Control Science and Engineering information in memory cells and erasing irrelevant data, resulting in a performance that is generally superior to that of a conventional RNN.
Every LSTM unit has a memory cell and 3 primary gates: input, output, and forget.is structure enables the LSTM to maintain a regulated information flow by determining which information must be "forgotten" and which must be "remembered," thereby learning long-term dependencies.More precisely, the input gate i t , in conjunction with a second gate c * i , regulates the volume of new knowledge saved in the memory state c t at time t. e forget gate f t determines whether knowledge from the past must be deleted or retained on the memory cell at time t − 1, while the output gate o t determines which knowledge may be used for the memory cell's output.Equations ( 2)-( 5) summarize the activities of an LSTM unit.
where x t represents the input, W * and U * signify the weight matrices, b * signifies the bias term vectors, σ signifies the sigmoid function, and the operator n represents componentwise multiplication.Conclusively, the hidden state h t which corresponds to the memory cell's output is computed by In general, the CNN-LSTM method uses the CNN as an encoder to learn features from subsequence of input data that are fed into an LSTM as time steps.e LSTM will function as a decoder, identifying and modeling both shortand long-term temporal relationships inherent in the data stream.Figure 6 illustrates the structure of our proposed CNN-LSTM architecture, which combines CNN and LSTM to create a deep hybrid architecture.
A brief description for each layer is presented in the following: Input layer: Receive input of 7 days' COVID-19 confirmed cases.First Conv1D layer: e first convolutional layer scans through the input sequence, acquires new information, and deals with noise in the input data, before projecting the findings onto feature maps.Second Conv1D layer: e second layer repeats the process on the feature maps generated by the first, trying to enhance any noteworthy features.We utilized 64 feature maps per convolutional layer and a kernel size of 3 time steps to read the input sequences.Max pooling layer: By removing specific values from the convolved features, the max pooling layer simplifies the feature maps and produces a matrix with a smaller dimension.Dropout layer: is layer was added to the network to prevent the model from being overfit.Due to the random subsampling of the outputs of a layer under dropout, this has the effect of decreasing the capacity or thinning the network during training.Flatten layer: Following the dropout layer, the distilled feature maps are flattened into a single long vector that may be utilized as input to the decoding process.Repeat Vector layer: Several times, once for each time step in the output sequence, the internal representation of the input sequence is repeated.e LSTM decoder will be shown this vector sequence.LSTM layer: e decoder is then defined as a 200-unit hidden layer.Notably, the decoder will output the whole sequence, with each of the 200 units supplying a value for each of the seven days, serving as the foundation for predicting what would happen on each day in the output sequence.Fully connected layer: Before the final output layer, a fully connected layer is utilized to understand each time step in the output sequence.Notably, the output layer makes a single prediction about the output sequence.
is indicates that every step in the output sequence will have the similar layers applied. is implies that the decoder will operate every time step using a similar fully connected layer and output layer.
is was done by wrapping the interpretation and output layers in a Time-Distributed wrapper, which was utilized for each time step from the decoder. is allows the LSTM decoder to define the context needed for every step in the output sequence, while the wrapped dense layers interpret every time step individually, while still reusing the similar weights.Output layer: e number of new COVID-19 cases for the 8 th day was predicted.
(2) Baseline Models.Seventeen baseline predictive models were also built in this study for comparison with our proposed model: 2 deep learning models (CNN and LSTM), 2 statistical models (ARIMA and Fbprophet), 3 linear models (LR, Ridge, and Lasso), 5 ensemble models (AdaBoost Regressor, Random Forest Regressor, Gradient Boosting Regressor, Extra Trees Regressor, and Bagging Regressor), 5 other machine-learning models (XGBoost Regressor, Gaussian Process Regressor, Support Vector Regressor, Decision Trees Regressor, and K-Nearest Neighbor Regressor) using Python (Jupyter Notebook) on the Anaconda platform.A brief description of each model is presented in the following subsections: (i) Deep Learning Models.CNN: LeCun et al. [47] pioneered the use of convolutional networks in their current form for zip code recognition.CNNs are typically composed of convolutional, pooling, and fully linked layers.A collection of feature maps, also known as activation maps, is generated in the convolutional layers.Each neuron in the feature map is linked to a subset of neurons in the input layer that corresponds to it.e feature map's neurons all have the same weights, substantially decreasing the number of parameters compared to a fully connected neural network.Pooling layers alternate with convolutional layers in the most popular CNN designs.e pooling layer lowers the spatial dimension of the feature maps in preparation for the subsequent computational stages, thus reducing computational burden and avoiding overfitting.After an arbitrary number of preceding layers, fully connected layers aggregate the resultant feature maps and provide a classification measure at the network's conclusion.LSTM: Hochreiter and Schmidhuber [46] suggested the LSTM model, which Graves and Schmidhuber [48] improved and promoted.e memory-based RNN cell is at the heart of the LSTM's basic structure.It is useful for storing and retrieving information from the past.It also facilitates the transmission of previous information to the next level.LSTM chooses previous data based on its training requirements.Remembering beneficial information over time is an ordinary practice, but it is also an essential behavior of the LSTM network [49].When the data pass in the model, the cells in the LSTM determine what they are, and the information that follows the rules is retained, while the information that does not is lost.is notion will resolve the long sequence dependence problem in neural networks by enhancing the hidden layer structure.(ii) Statistical Models.ARIMA: Presented by Box  , where p stands for autoregression order (AR), d for degree of variance (I), and q for moving average order (MA) [51].ese three aspects are relevant to classify time series, where p is in charge of storing and retrieving the process's past information, d is in charge of converting nonstationary time series to stationary time series, and q is in charge of regulating the process's noise-related past information.Fbprophet: Fbprophet is a Facebook-developed time series forecasting model that was created to solve business time series problems.While there are numerous approaches to forecasting market outcomes, many of them share common characteristics, such as seasonal effects [51].It employs a decomposable time-series model [52] with 3 main elements: pattern, seasonality, and holidays.It is a regression model with interpretable parameters that fit with the default values, as well as allows the user to automatically choose the elements that are pertinent to their forecasting predicament and use the appropriate modifications with ease [51].e Fbprophet predicts trends using two methods: a saturation model of growth and a linear model.In the case of growth forecasting, a model such as the population growth model in natural ecosystems is utilized, in which nonlinear growth reaches a saturation point at a carrying capacity [53].If the saturating point is never attained, a piecewise model of a constant growth rate can be an alternative solution.Fbprophet employs Fourier series to give periodic effects to a flexible model of [52], whereas holidays must be accounted for using a  Journal of Control Science and Engineering predetermined list of past and future events.It is simple to incorporate holiday effects into the model because they are considered independent [51].(iii) Linear Models.Linear Regression: LR is a technique for modeling the correlation between a dependent variable and an independent variable using a linear approach.Ridge: Ridge regression (RR) is a kind of linear regression that uses a sum-of-squares error function and a regularization method to manage the biasvariance trade-off, with the goal of revealing the linear patterns hidden in the data.[54].Lasso: Lasso [55] is a technique for shrinking and selecting variables in linear regression.It considers the total of l 1 -norms of the regression coefficients as a penalty in addition to the total of squares error reduction.is bias-variance trade-off almost always results in improved forecasts.(iv) Ensemble Models.XGBoost Regression: XGBoost is Tianqi Chen's gradient boosting equipment from 2016 [56].e algorithm's implementation has been optimized for consistency based on processing time and memory space.XGBoost is usually very fast when compared to other gradient boosting applications.For classification and regression predictive modeling problems, XGBoost dominates standardized or tabular datasets.As a supervised learning algorithm, XGBoost predicts a target using train data with several features.Although XGBoost follows the similar processes as gradient boosting, it has its own tree.e predictive initial value set distinguishes regular and extreme gradient boosting in regression.e initial prediction value in gradient boosting is the result of the average real value of one feature that will be predicted.In XGBoost, the initial prediction value is chosen at random, but the most used value is 0.5.AdaBoost Regressor: e AdaBoost Regressor converted the delicate regression model into a robust regression learning model, from which the prediction model was built [57].Random Forest Regressor (RFR): e RFR method is similar to Breiman's pioneering Regression Tree Analysis [58].RFR isolates the predictand (desired parameter) iteratively using a series of binary splits.Each of these divides corresponds to a value on an individual predictor grid that maximizes the disparities between the branches of the "tree."A split, together with its associated branches, is regarded to constitute a single decision tree.Each branch is constructed using a random selection of nodes representing individual predictors.Each predictor node has a large number of potential predictands, and it is at these nodes that a random choice is taken to divide the branch further, thus adding two additional predictors. is is repeated repeatedly until no further splits occur, resulting in terminal nodes, or "leaves."Typically, the RFR will do binary splits until a single predictor on a leaf is discovered.
Gradient Boosting Regressor: Friedman [59] developed the gradient boosting regressor (GBR) in 2000.e GBR combines a huge amount of ineffective learning methods to create a more effective learning algorithm.It learns from earlier learning algorithms' errors.
Extra Trees Regressor: Like the random forest tree, the extra trees regressor (ETR) is built using many decision trees [60].On ETR, all decision trees are trained in their entirety utilizing all training sets.To get the bifurcation value, the ETR model randomly bifurcates the decision tree.
Bagging Regressor: Bagging is a kind of parallel ensemble model, and the bagging model is constructed using bootstrap sampling [61].at is, given an initial dataset of m samples, a sample is chosen for the sampling set using the replacement technique for it to be chosen again for the subsequent sampling round.
en, by iterating bootstrap sampling n times, n sampling sets are produced.During the sampling process, some of the samples in the original dataset may be selected many times, while others may never be chosen.In addition, the DT model is trained on n weak learners, and the final judgment is reached through majority vote for the classification problem or average meaning for the regression problem.
(v) Machine-Learning Models.Gaussian Process Regressor: Gaussian process regression (GPR) is a very effective method.Indeed, in addition to their simple structure and computationally acceptable predictions, GPRs have the major advantage of being nonparametric and able to account for projected value uncertainty.
Support Vector Regressor (SVR): SVR is a regression algorithm that employs a technique similar to support vector machines (SVMs) for regression analysis [62].SVR offers the freedom to define the acceptability of an error in a model and finds an appropriate line to adapt the data.SVR's goal is to reduce coefficients, in particular, the l2-norm of the coefficient vector.e error term is managed in the constraints, where the absolute error is found to be less than or equal to a denoted margin, known as the epsilon (maximum error).
Decision Trees Regressor (DT): A DT divides the dataset into two nodes and repeatedly builds a tree-like structural model using the information gain (IG).e gain parameter specifies the anticipated decrease in entropy associated with the chosen features.
K-Nearest Neighbor Regressor: e K-nearest neighbor regressor (KNNR) is a nonparametric model that seizes a sample of K-nearest neighbors and predicts the sample value using the nearest neighbor response value (y).
Step 2.1: Optimize models Since selecting the best and most accurate forecasting model for predicting the COVID-19 pandemic is a very complicated process, the 18 forecasting models Journal of Control Science and Engineering constructed in this study were fine-tuned based on the selected hyperparameters under various prediction criteria using the Optuna framework to optimize the hyperparameters for each model.e science of tuning or selecting the best set of hyperparameters for a learning algorithm is known as hyperparameter optimization.Any machine-learning algorithm's output is heavily influenced by a collection of optimal hyperparameters.It is one of the most time-consuming steps in the machine-learning training pipeline, but it is also one of the most important.In forecasting COVID-19 confirmed events, Table 2 presents the best hyperparameter values for each chosen hyperparameter and its range for each forecasting model.
Step 2.2: Train the models e models were trained using the best hyperparameter on a train set which is time series that starts on January 4, 2020 and ends on July 17, 2020.
Step 2.3: Evaluate the models e trained models were then evaluated on a test set, the time series of which starts on July 18, 2020 and ends on August 14, 2020.From a forecasting perspective, the forecasting horizon is critical for an intelligent model's prediction accuracy.e forecasting horizon refers to the number of daily confirmed cases that a forecasting model considers projecting the next daily verified case.More precisely, when the forecasting horizon is  equal to 9, the algorithm uses verified instances from the previous nine days to estimate the price on the tenth day.In this research, the predicting horizon was set to seven days.
Taking into consideration the occurrence of randomness, the train and prediction processes were run 10 times.e average forecasting between July 18, 2020 and August 14, 2020 was computed and compared with the real values.
e performance of models was then assessed using testing data and statistical error measures such as MAPE, RMSE, and relative root mean square error (RRMSE).MAPE: is metric is calculated as the average absolute difference between the estimated and measured values.
RMSE: is metric is normally employed to evaluate forecasting errors of different models.In terms of absolute variance, the lower the RMSE value, the better a model's predictive capability.Nevertheless, the existence of a few major errors will lead to a higher RMSE value.
RRMSE: is metric is computed by dividing RMSE by the average value of the measured data.e accuracy of the model is deemed excellent when the RRMSE is less than 10 percent, good when the RRMSE is between 10 and 20 percent, fair when the RRMSE is between 20 and 30 percent, and poor when the RRMSE is greater than 30 percent [63].
Since most of the models are in stochastic nature, we evaluated all 18 predictive models 10 times and reported the mean performance on a test time series data.

Phase 3: Applying the Predictive Models
Step 3.1: Forecast COVID-19 new cases is step requires the information of COVID-19 that starts on August 15, 2020 and ends on September 18, 2020.e new COVID-19 cases from August 15, 2020 until September 11, 2020 (4 weeks) were used as input to forecast the new cases for the subsequent week (September 12, 2020-September 18, 2020).
Step 3.2: Evaluate the models en, the forecasting between September 12, 2020 and September 18, 2020 was computed and compared with the real values.
e performance of the models was calculated based on MAPE, RMSE, and RRMSE.
To guide us in evaluating the results, the following research questions were constructed: RQ1: How good is the performance of the proposed model in this study compared to the 17 baseline models in forecasting the new cases of the COVID-19 pandemic around the world based on the test data?RQ2: How good is the performance of the proposed model in this study compared to the 17 baseline models in forecasting the new cases of the COVID-19 pandemic around the world based on the forecast data?RQ3: How good is the performance of the proposed model in this study compared to the state-of-the-art?

Results and Discussion
e empirical results of this study are presented in this section to reply to RQ1-RQ3.

RQ1 Answer: e Performance of the Proposed Model
Compared to the Baseline Models on the Test Data.e predicted and actual values between July 18, 2020 and August 14, 2020 were plotted in 5 different graphs according to the type of the models.
e values predicted by the proposed model were plotted in each graph to compare the trend obtained with other baseline models.show the comparison between actual and predicted data for each type of model compared to the proposed model in forecasting confirmed cases.From these figures, we observe that most of the models are following the trend of confirmed COVID-19 cases on the test data except for SVR model.
To evaluate the performance of each model quantitatively, the MAPE, RMSE, and RRMSE for the proposed model and 17 baseline models in predicting the confirmed cases of COVID-19 were calculated using equations ( 7)-( 9), respectively, based on the predicted and actual values between July 18, 2020 and August 14, 2020.e mean for each performance metric was then computed.e example of the computed RMSE for the proposed model compared to the CNN-1D and LSTM models is illustrated in Figure 12.Based on the mean values of MAPE, RMSE, and RRMSE shown in Table 3, the proposed model surpassed the 17 baseline models in forecasting the new confirmed cases with the minimum error value: MAPE � 0.19, RMSE � 13275.00,and RRMSE � 5.30.In terms of the RRMSE value that is less than 10%, the proposed model performed excellently in predicting the confirmed cases of COVID- 19. e RRMSE values also show that compared to CNN-1D and LSTM vanilla models, the combination of both models in the structure of CNN-LSTM increase the performance by decreasing the RRMSE value by 1.04% and 1.9%, respectively.

RQ2 Answer: e Performance of the Proposed Approach
Compared to the Baseline Models on the Forecast Data.Similar to the processes that were conducted on the test data, the forecasted and actual values between September 12, 2020 and September 18, 2020 were plotted in 5 different graphs according to the type of the models.e values forecasted by the proposed model were plotted in each graph to compare the trend obtained with other baseline models.Figures 13-17 show the comparison between actual and forecasted data for each type of model compared to the proposed model in forecasting confirmed cases.From these figures, again, we observe that most of the models are following the trend of confirmed COVID-19 cases on the forecast data except for the SVR model.
To evaluate performance of each model, the MAPE, RMSE, and RRMSE for the proposed model and 17 baseline models in forecasting the confirmed cases of COVID-19 were calculated using equations ( 7)-( 9), respectively, based on the forecasted and actual values between September 12, 2020, and September 18, 2020.e mean for each performance metric was then computed.
e example of the computed RMSE for the proposed model compared to the standalone CNN-1D and LSTM models is illustrated in Figure 18.Based on the values of MAPE, RMSE, and RRMSE shown in is result shows the effectiveness of combining both CNN-1D and LSTM models even on a small size of data.

RQ3 Answer: e Performance of the Proposed Approach
Compared to the State-of-the-Art Approach.Table 5 compares the hybrid models developed in and the current study.As demonstrated in Table 5, our method performs better in terms of MAPE and RMSE when forecasting confirmed viruses globally.As we can see, our approach has a lower MAPE and RMSE than theirs, although we examined 216 countries compared to theirs which was only seven.

reats to Construct Validity.
e performance metrics used in our analysis relate to threats to construct validity.In this study, 3 evaluation metrics based on statistical measures of errors were selected: RMSE, MAPE, and RRMSE.ere are other measures, such as MAE, R 2 , root mean squared   relative error (RMSRE), and mean bias error (MBE) that can be used to evaluate time series forecasters.However, the 3 metrics selected in this study are widely used measures to evaluate the time series forecasting of the spread of COVID-19 disease.

reats to Internal
Validity.e risks are primarily concerned with the unregulated internal variables that may affect the results of the experiment.e key internal threat is the possible faults during the implementation of our process.We used seven machine-learning techniques obtained from sci-kit-learn libraries, one from the xgboost library, five from ensemble library, one from the fbprohet library, two from the Keras library, and one from the Statsmodels library, to reduce this hazard.e best hyperparameter values for each module were set automatically using an established Optuna framework.

reats to External Validity.
External validity threats are related to the possibility of generalizing our findings.e experiments conducted in this study used the COVID-19 dataset from 4/1/2020 to 24/9/2020.e performance of the predictive models used in this study depends on how the dataset is split into train and test data.Different results can be generated by using different timelines of COVID-19 data.

Conclusions and Future Work
In this research, a novel hybrid forecasting model termed CNN-LSTM was presented for predicting the global number of COVID-19 infection cases.e proposed CNN-LSTM model was compared against 17 baseline models, including two deep-learning models, two statistical techniques, three linear models, five ensemble learning models, and five machine-learning models.ree performance measures were used to evaluate and compare forecasting performance: MAPE, RMSE, and RRMSE.e primary finding of this research is that, when compared to 17 baseline time series forecasting models, our proposed CNN-LSTM model outperformed them all with the lowest average MAPE, RMSE, and RRMSE values on both test and forecast data.Finally, we note that although solo CNN and LSTM models perform well and efficiently for predicting verified COVID-19 instances time series, combining both models in the proposed CNN-LSTM encoderdecoder structure significantly improves forecasting performance.In addition, we demonstrated that the suggested model produced acceptable predicting results even when just a limited quantity of data was available.
Ultimately, the proposed CNN-LSTM model takes an efficient step in dealing with noise in the input data and using the internal representation of the time series through convolutional layers.e LSTM and dense layers are then used to

20
Journal of Control Science and Engineering exploit the produced features to detect short-and long-term relationships in the time series and give a precise forecast.However, there is still opportunity to improve the forecasting accuracy of the COVID-19 application.In the future, by adding more data and external factors to the COVID-19 datasets, such as changes in season, vaccination plan, and additional lockdowns, other resampling and restructuring forecasting methods will be used to further improve the accuracy of the COVID-19 forecasting system.In addition, an uncertainty management strategy should be developed in order to quantify uncertainty and provide users with more relevant information on the COVID-19 pandemic.

Table 1 :
Output of the splitting procedure.

Table 4 ,
the proposed model again outperformed the 17 baseline models in forecasting the new confirmed cases with the minimum error value: MAPE � 0.43, RMSE � 8780.71, and RRMSE � 3.01.e RRMSE values show that the proposed hybrid model increases the performance of the standalone CNN-1D and LSTM models by reducing the RRMSE value by 1.15% and 3%, respectively.

Table 3 :
e average model performance evaluation for forecasting confirmed cases on test data.

Table 4 :
e average model performance evaluation for forecasting confirmed cases on forecast data.bold values present the lowest error values of MAPE, RMSE, and RRMSE.ese values show that the proposed approach outperforms the baseline models based on the forecast data. e