Univariate Time Series Analysis of Short-Term Forecasting Horizons Using Artificial Neural Networks: The Case of Public Ambulance Emergency Preparedness

This study examined the applicability of artificial neural network models in modelling univariate time series ambulance demand for short-term forecasting horizons in Zimbabwe. Bulawayo City Councils’ ambulance services department was used as a case study. Two models, feed-forward neural network (FFNN) and seasonal autoregressive integrated moving average, (SARIMA) were developed using monthly historical data from 2010 to 2017 and compared against observed data for 2018. The mean absolute error (MAE), root mean square error (RMSE), and paired sample t-test were used as performance measures. Calculated performance measures for FFNN were MAE (94.0), RMSE (137.19), and the test statistic value p = 0:493(>0.05) whilst corresponding values for SARIMA were 105.71, 125.28, and p = 0:005(<0.05), respectively. Findings of this study suggest that the FFNN model is inclined to value estimation whilst the SARIMA model is directional with a linear pattern over time. Based on the performance measures, the parsimonious FFNN model was selected to predict short-term annual ambulance demand. Demand forecasts with FFNN for 2019 reflected the expected general trends in Bulawayo. The forecasts indicate high demand during the months of January, March, September, and December. Key ambulance logistic activities such as vehicle servicing, replenishment of essential equipment and drugs, staff training, leave days scheduling, and mock drills need to be planned for April, June, and July when low demand is anticipated. This deliberate planning strategy would avoid a dire situation whereby ambulances are available but without adequate staff, essential drugs, and equipment to respond to public emergency calls.


Introduction
Artificial neural networks are currently receiving a huge amount of interest and particularly used in areas of pattern recognition, classification, clustering, and forecasting applications [1]. Short-term forecasting remains an integral component in public ambulance emergency preparedness. [2] highlighted that quantitative decision processes are becoming increasingly important in providing public accountability for the resource decisions that have to be made and that any solution to such problems requires careful balancing of political, economic, and social objectives. [3] emphasised that predicting demand in emergency medical services (EMS) is crucial for saving people's lives. Predictions for ambulance demand can be done for the next few days in a week, a month, or a full calendar year based on time-ordered historical data for planning purposes. Such forecast will help in the mobilisation of both human and equipment resources. The Fire and Ambulance profession is about numbers if lives and property are to be saved. The question that arises to a public ambulance service provider is whether they are prepared for isolated incidences or even unexpected disasters such as fire, major road accidents, and disease outbreaks. [4] alluded that response time, which is the time taken to reach the patient after an emergency call has been received, is a critical component in EMS as it might mean the difference between life and death of a patient. This calls for robust and smart planning in ensuring that a skilled manpower and well-serviced equipment, including ambulances, are available to respond. The prediction of future demand using the historical time series by applying neural networks will play an integral role in the strategic and planning process.
According to [5], time series is a general analysis tool of great practical interest in many disciplines and it allows you to discover, with some margin of error, the future values of a series from its past values. Time series data can be generated from measurements of some physical, biological, economical, or environmental phenomena of interest. According to [6], time series analysis has three goals: forecasting, modelling, and characterisation. Although autoregressive integrated moving average (ARIMA) models have been used for time series analysis for decades, there has been a shift of focus to the growing use and application of artificial neural networks because of their flexibility and remarkable features [7]. Real-world problems in the modern day are characterised by nonlinear patterns and behaviours which demand the use of nonlinear techniques. Although various linear and nonlinear methods have been proposed, these have a huge drawback as they also require specific model assumptions [8]. Such models include the vector autoregressive models (VAR), autoregressive conditional heteroskedastic (ARCH), generalised autoregressive conditional heteroskedastic (GARCH), smooth transition autoregressive (STAR), and nonlinear autoregressive (NAR) models [7,9,10]. Hybrid forecasting models have also emerged over time which in some cases include hybridisation of both linear and nonlinear models [11].
A neural network is a nonlinear statistical model which could be a two-stage regression or classification model, typically represented by a network diagram and is good at modelling any complex function where the relationship between variables is unknown [12]. [13] defines an artificial neural network (ANN) as an information processing system that has been developed for the generalisation of mathematical models of human neural biology. According to [11], these nonlinear models overcome the limitation of linear models as they are able to capture the nonlinear pattern of data, thus improving their prediction performance. The ANN have been successfully applied and proven to be useful in time series modelling where the future values of a variable is determined using its past values [14].
Over the years, there has been a substantial use and comparison between neural networks and traditional time series forecasting techniques in different areas of applications, such as health [15], geophysics [16], geomechanics [17], stock markets [18], chemical engineering [19,20], electrical engineering [21], global logistics [22], construction engineering [23], financial business support [24], and in insurance [25]. [13] made a comparison between neural networks and traditional forecasting methods in inventory management. The results showed that forecasting with neural networks offers better performance in comparison with traditional methods. [26] applied the artificial neural network approach for predicting reverse osmosis desalination plant performance in the Gaza Strip, and the developed models were compared with other statistical models and it was found out that the prediction results of ANN were better than those of the conventional methods.
A few other researchers such as [27,28] have made an attempt at applying neural networks on univariate data and made comparisons with the traditional methods. [18], in their research study on the application of feed-forward neural networks (FFNN) for forecasting Indonesia exchange composite index, concluded that with less data observations (54 training data points, 13 model validation data points), FFNN was still able to produce accurate predictions, react to sensitive changes, and model the trend series well. Such comparisons have not been applied to a case of demand for public emergency ambulance services. In order to test the success of the implementation of the neural network approach and apply it for strategic public emergency ambulance planning, Bulawayo City was chosen as a case study since the organisation has well-structured and managed systems for public ambulance services. According to [13], many varieties of neural network techniques including multilayer feed-forward NN, recurrent NN, time-delay NN, and nonlinear autoregressive exogenous NN have been proposed, investigated, and successfully applied to time series prediction and causal prediction.
The Bulawayo ambulance service department continues to rely on the judgemental methods, and there is limited use of historical data for future predictions. The organisation continues to face problems, which are common to any ambulance service provider all over the world today. Despite the ever-increasing rural to urban migration which has seen the population of urban areas in Zimbabwe increasing, Bulawayo City included, heath service delivery remains low in terms of efficiency, effectiveness, and equality. This migration trend has seen the increase of unemployment in urban areas, sprouting of new residential areas, both formal and informal, and this has put pressure to the limited resources in terms of housing, health, and education among many other critical human social amenities [29][30][31].
It is expected that as the population increases, there is a subsequent increase in demand for public emergency services. This has not been the case with the demand trends for public emergency ambulance demand in Bulawayo as shown in Figure 1. One would suggest that if service delivery is poor, an affected individual might turn to other alternatives which might be more expensive or leave everything to fate. Despite reaching the highest annual public ambulance service demand total of 66630 in 1998, the trend has been decreasing up until 2008 (6595). This decrease might be a sign of limited capacity of the service rather than demand itself. A significant increase was experienced between 2009 and 2011. However, there was a steadily decreasing trend between 2012 and 2018 as shown in Figure 1.
The situation therefore calls for consolidated efforts across all sectors including research, in order to restore confidence among residents, reduce health risk, and prevent loss of lives. Planning into the future to restore a balanced fire and ambulance level of preparedness is vital in order to prevent the loss of lives in isolated incidences or even a disaster such as the cholera outbreaks that have been experienced in recent years [32] and fire or major road accidents. This study seeks to examine the applicability of artificial neural network models in modelling univariate time series ambulance demand for short-term forecasting horizons and to provide recommendations for strategic resource planning using empirical data.

Materials and Methods
In this section, sources of data, time series analysis models, and the model building processes are discussed in detail. Performance measures and selection criteria of the two models of interest, the FFNN and seasonal ARIMA models, will be discussed together with their mathematical formulation and application to time series analysis.
2.1. Model Input Data. Historical data on ambulance services for the Bulawayo City municipality, covering the period January 1991 to December 2018, were retrieved from the archives. For purposes of developing forecasting models, data for the period January 2010 to December 2018 was used because of the absence of cyclic and irregularity components in the time series data as compared to previous years. Each call for public ambulance emergency services is received and recorded at a control centre operated continuously everyday throughout the year. Monthly compilation is done and the information captured is stored in a database for billing and other management processes. The data splitting approach was adopted in order to compare the two suggested models. Data from January 2010 to December 2017 was used for model building and the data for the year 2018 was used for cross-validation and comparisons of the two modelling approaches. This translates to 96 observations for model building and 12 observations for model cross-validation. The flowchart of the methodology is presented in Figure 2.

Feed-Forward Neural
Networks. The feed-forward neural network architecture was trained by the "neuralnet" function of the R package, which is a network training function that updates weights and bias values during training. The network is called feed forward because information flows only from the input to the output and there are no recurrent or backward connections ( Figure 3). Each layer consists of neurons, and there is no connection between neurons that are in the same layer. A three-layer FFNN structure is shown in (Figure 3).
The input vector is represented by Y j denoted by Y j = fy 1 , y 2 , y 3 , ⋯, y n g; W jk ðj = 1, 2, 3, ⋯, n ; k = 1, 2, 3, ⋯, mÞ is   Journal of Applied Mathematics the connection weight vector of the j nodes of the input layer to the k nodes of the hidden layer; X k ðk = 1, 2, 3, ⋯, mÞ is the vector of k neurons in the hidden layer; W k ðk = 1, 2, 3, ⋯, mÞ is the connection weights of the k nodes of the hidden layer to the output layer; and Y is the unit output vector for the neural network with one output neuron. Θ k ðk = 1, 2, 3, ⋯, kÞ is the bias value of the hidden layer nodes and Θ is the bias value of the output layer.
The output of the hidden layer is determined by the formula The output of the output layer is determined by the formula where f is the activation function for the hidden and output layers.
The input data consisted of seven (7) inputs based on the lagged annual monthly demand for the years 2010 to 2016. The input and output data of the neural network is summarised in Table 1. Here, we assume that the demand for the next month is a function of the past values of the previous months recorded at the same time. The model equation is represented by the generalised Equation (3).
y t is the observed demand value for the current month, y t−12 is the demand value of the previous month, and so on, whilst y t+12 is the demand for the next month, and Θ t+12 is the associated bias. [33] implemented the same approach in his study of forecasting annual gross electricity demand.
The selection of the optimal number of inputs is based on trial and error as mentioned in [34].

Data Preprocessing.
When designing ANN, data is preprocessed first before model fitting. Preprocessing represents data coding, enrichment, and cleaning which involves accounting for noise and dealing with missing information. Preprocessing also involves the normalisation of data (scaling down) to an interval (0,1) in order to prevent saturation of hidden nodes before feeding into the neural network [1]. There are three basic normalisation techniques that can be used in data preprocessing, namely; min-max normalisation, Z-score normalisation, and the Sigmoid normalisation technique according to [14]. The input values were first normalised using the minimum-maximum criteria in order to optimise on the convergence rate of the neural network model. The formula for the minimum-maximum criteria can be generalised by: where y t is the monthly demand at time t, y t ′ the normalised value at time t, Min ðy t Þ and Max ðy t Þ the minimum and ma ′ ximum values, respectively, of variable y t over the range of data.

Neural Network Model Training and Testing Sets.
When the data has been processed, the network model building process begins. The processed data is separated into two sets, namely, model-building set and testing set. The first set, called the model-building set or training set, is used to develop the network model. The second set, called the testing or prediction set is used to evaluate the forecasting ability of the model. Higher percentages are usually assigned to the training set and lower percentages to the testing set [24]. In different neural network architectures in literature [1,8], 10%, 15%, 20%, and 30% have been implemented to time series data as the length of the testing set. Seventy-two (72) and twenty-four (24) Figure 3: Feed-forward neural network for time series forecasting.

Neural Network Model
Architecture. The architecture of the neural network is determined by the number of hidden and output layers, the number of neurons in all of the layers, training algorithm parameters, and the performance measure. There is no general rule for the selection of the appropriate number of hidden layers and the most popular approach in finding the optimal number of hidden layer is by trial and error [35]. The architecture of the neural network can be generalised by Equation (5).
where I represents the number of input nodes, H n the number of neurons in hidden layer n, and O the number of neurons in the output layer. An example is an ANN with seven (7) input nodes, one hidden layer with three (3) neurons, and one (1) output neuron can be represented as 7-(3)-1, respectively.

Neural Network Model
Training. Training of a NN is a process of determining weights of the neural network as well as the number of neurons in the different layers of the network. According to [20], this is an optimisation process whose aim is to find a set of optimum weights with which reasonable predictions can be made. Supervised training with resilient backpropagation as the training algorithm was used and demand calls of 2017 were set as the target values. Backpropagation, also called backward error propagation or backprop, is the most popular and widely used network learning algorithm [17,[36][37][38]. Backpropagation is a rule that generalises the gradient descent method as a way of changing the weights in the hidden layer of a FFNN. It gives the change Δw jk ðiÞ in the weight of the connection between neurons j and k at iteration i as where α is called the learning coefficient, Δw jk ði − 1Þ the weight change in the immediately preceding iteration, and μ the momentum coefficient [35]. The learning coefficient ensures a maximum decrease of the error function thus increasing the convergence speed. If it is too small, convergence will be extremely slow, and if too large, the error function will not converge. The momentum coefficient tends to aid convergence as it works as a lowpass filter by reducing rapid fluctuations. It applies smoothed averaging to the change in weights whilst also avoiding local minima [16].
The resilient backpropagation was used with the threshold value for the training data set at 0.01. The threshold or average error ensures that a network only converges when the error reaches below the chosen threshold value [11]. The learning rate factors were set at a minimum of 0.5 and maximum of 1.2. The momentum was not preset allowing the machine to adopt default values.
The selection of the initial number of hidden neurons was initially based on the logarithm of the training observations [39] and the calculated value (log 84) was two. These were increased during training of the neural network; however, a general rule is that there must not exceed two-thirds of the input neurons.
The logistic function and the linear function were implemented as the activation functions in the hidden and output layers, respectively. [8] in their statistical approach in selecting activation functions recommended the use of the logistic activation function in the hidden layers and the linear function in the output layer. The results obtained accurate short-term forecasting horizons. A single output neuron with a linear activation function was applied to the neural network.

Neural Network Model
Selection. The number of hidden layers and neurons in the selection process was systematically varied in order to obtain the most accurate models as proposed by [17]. It is important to note that neural networks without hidden units are equivalent to linear statistical forecasting methods [40]. Hidden units perform the mapping between the input and output variables as well as to provide the nonlinearity feature to neural networks, in addition to finding out patterns in the dataset [7]. The selection of the best three models among the neural networks was based on the performance measures, namely, mean absolute error (MAE) and the mean square error (MSE). The formulations of the MSE, RMSE, and MAE are given by where y t is the actual observation for the period t,ŷ t the forecast for the same period, and N the length of the test set. The MSE and the MAE are both measures of accuracy and the degree of spread of data points [7]. The MAE is a measurement of how close forecasts are to the actual data points, the average of the absolute errors [18]. Values predicted from the training sample and values in the test set were utilised to calculate the performance measures. Forecasts for the year 2018 were calculated by using the best weight values obtained in the selected model.

Multiplicative Seasonal Autoregressive Integrated Moving
Average (SARIMA) Model. Various linear models can be used for prediction including exponential smoothing models, generalised autoregressive conditional heteroscedasticity, and stochastic volatility models, which are predominantly used in predicting stock returns [11]. However, [41] applied the Box-Jenkins methodology using autoregressive integrated moving average models (ARIMA) on a univariate time series 5 Journal of Applied Mathematics in a study to forecast the second-hand car importation in Zambia, and the methodology was found to be superior to exponential smoothing models.
The Box-Jenkins methodology was adopted in this study.

ARIMA Model Building.
When developing an ARIMA model using the Box-Jenkins methodology, four stages have to be followed: identification, estimation, diagnosis, and forecasting. Components of time series data, namely, seasonality, trend, cyclicity, and irregularity, are key to model development. Seasonality refers to a consistent shape in the series that recurs with some periodic regularity. It can also be referred to as regular up and down fluctuation or shortterm variations due to seasonal factors. Monthly ambulance call data is likely to have a strong yearly component occurring at lags that are multiples of s = 12, because of strong relations of activities to the calendar year. This might lead to the adoption of autoregressive and moving average polynomials that identify with seasonal lags called multiplicative seasonal autoregressive integrated moving average models [42] given by ARIMA (p, q, d) x (P, D, Q)s. The model can be written in the general form where w t is the value at time t of white noise, Y t is the observed value at time t, ϕðBÞ = 1 − ϕ 1 B − ϕ 2 B 2 −⋯−ϕ p B p is the ordinary autoregressive component of order p, θðBÞ = 1 + θ 1 B + θ 2 B 2 +⋯+θ q B q is the ordinary moving average is the seasonal difference of order D at lag s, and Φ p ðB s Þ and Θ Q ðB s Þ are the seasonal autoregressive and moving average difference of orders P and Q at lag s, respectively.

Stationarity Assumption.
When applying ARIMA models to time series data, it is assumed that the data is stationary. The stationarity assumption requires that the mean, variance, and autocorrelation of the time series data does not change over time. Nonstationarity is common when components of time series data such as trend, seasonality, cyclicity, and irregularity are not accounted for in the model building process. When time series data is not stationary, differencing both the ordinary and seasonal components as presented in Equation (10) is performed to ensure that stationarity is achieved.

Model Selection and Diagnosis.
The Akaike Information Criterion (AIC) proposed by [43] for time series data was used to select the best model considering that the time series is stationary and that the model attains a minimum AIC value. Model diagnostic checks were implemented to check whether the residuals of the model resembled white noise. The plot of residuals against time was also applied to    Journal of Applied Mathematics check for constant mean and variance assumption. A normal probability plot and the histogram plot were applied to test for normality of the residuals. The ACF and the PACF of the residuals were plotted to check for independence of the residuals. To complement the visual approaches, the Ljung-Box Pierce statistic [42] was used to further ascertain independence of the residuals.

Comparison of the Neural Network and SARIMA Models.
After selecting the best model from each of the FFNN and SARIMA models, monthly forecasts of ambulance demand for 2018 were generated for the purpose of model crossvalidation. The predicted values and the actual values were used to calculate the performance of each model using the mean absolute error (MAE) and the root mean square error (RMSE) as the performance measures. MAE gives equal weighting to differences from actual values whilst the RMSE apportions a huge penalty in these differences and is more appropriate in identifying outliers [7]. This makes the MAE a more appropriate tool for testing goodness of fit as compared to the RMSE. The paired sample t-test (at 5% level of significance) was carried out to validate any significant differences between the actual values and predicted values of each of the two models using the Minitab statistical package. For a model to be of good fit, there should not be significant differences between the actual and predicted values (p > 0:05).
Using the selected superior model, forecasts for 2019 were carried out and inferential deductions made.

Results and Discussion
Results of ANN models are presented first followed by results of the SARIMA models, and finally the best model is selected and used for prediction.
3.1. Selecting the Best Neural Network Model. Several models of different architectures were systematically selected starting with two hidden units (log 84 = 2) in a hidden layer and gradually increasing them. The models were predominantly divided into two distinct sets, one with a single hidden layer and the other with two hidden layers, respectively. The selection of the number of inputs in the model was based on trial and error [34]. The mean square error (MSE) and mean absolute error (MAE) were used as the performance measures during training. Three models were selected and forecasts were generated for the year 2018 as a model cross-validation process. The RMSE and MAE were used as final performance measures for selecting a suitable model for the neural network and the results are summarised in Tables 2 and 3, respectively. The architecture of the FFNN (7-(4)-1) with seven input nodes, one hidden layer (4 neurons), and one output neuron was the best model with the lowest MAE of 94.0 and RMSE of 137:19. The neural network topology is presented in Figure 4.

3.2.
Multiplicative SARIMA Model Selection. Several models were considered by the algorithm in the R package but the ARIMAð0, 1, 1Þð0, 0, 2Þ 12 model was the best according to the Akaike Information Criteria with a value AIC = 1162:57 as presented in Table 4.   The histogram of the residuals showed that the data is normally distributed with mean zero. The normal probability Q-Q plot of residuals showed a linear trend to confirm that the residuals are normally distributed. When visualizing the straight line, more emphasis was given to the central values of the plot than on the extremes [44]. The plots of the ACF and PACF of residuals had none of the spikes being significantly different from zero. The Ljung-Box test statistic (at 5% level of significance) value of p = 0:6118 (>0.05) calls for the failure to reject the null hypothesis and hence, conclude that the residuals are not correlated. This validates the requirement that the residuals must be independent. The plot of the residuals against the order of data exhibited that the mean of the residuals oscillates closely to zero with a constant variance as expected.

Comparison of the Neural Network and SARIMA Models.
The actual monthly ambulance demand data for 2018 was compared with the forecasts using the two selected models as presented in Table 5.
It can be observed in Figure 5 that the pattern of the FFNN model is inclined to value estimation as compared to SARIMA which is directional as depicted by the linear pattern over time. The FFNN was able to detect the hidden patterns of the time series data. These findings concur with findings from a study carried out by [28] for stock price predictions.
In terms of the MAE, the FFNN is superior to the SAR-IMA model. When considering the RMSE, the SARIMA model is superior to the FFNN. However, it must be noted from literature [7] that the MAE gives equal weighting to differences from actual values whilst the root mean square apportions a huge penalty in these differences and is more appropriate in identifying outliers. This makes the MAE an appropriate tool for goodness of fit.
Paired sample t-tests at 5% level of significance for the actual values and predicted values were applied to both models. The calculated p value for FFNN was 0.493 (>0.05) ( Table 6), and we conclude that there is no significant difference between the forecast and actual values of public emergency ambulance demand. The calculated p value for the SARIMA model was 0.005 (<0.05) ( Table 7), and we conclude     January  1622  31  406  53  February  1494  28  374  54  March  1713  31  429  56  April  1368  30  342  46  May  1482  31  371  48  June  1318  30  330  44  July  1391  31  348  45  August  1526  31  382  50  September  1572  30  393  53  October  1541  31  386  50  November  1532  30  383  52  December  1638  31  410  53   8 Journal of Applied Mathematics that there is a significant difference between the forecast and actual values of public emergency ambulance demand. The results resonate with findings in the literature that propounded that neural network models are superior to ARIMA models in time series prediction.

Public Emergency Ambulance Demand Forecast for 2019.
The selected neural network model was used to forecast the public emergency ambulance demand for 2019 as shown in Figure 6. The demand is expected to peak in January, March, September, and December 2019. Lower demand is projected in April, June, and July. Even though models that forecast weekly, daily, and hourly demands were not developed at this level, such important quantitative measures can be derived from such forecasts and be fully utilised for strategic planning purposes to ensure that there is adequate preparedness to respond to ambulance calls and a summary is given in Table 8 and

Conclusion
Performance measures, MAE and the paired sample t-test, indicate that the FFNN models are superior to traditional SARIMA models in time series prediction of ambulance demand in the city of Bulawayo, over a short-term forecasting horizon. It was found that the FFNN model is more inclined to value estimation as compared to SARIMA model, which is directional as depicted by the linear pattern over time. Therefore, the FFNN model derives its model prediction accuracy from this unique characteristic. The FFNN model building process used 96 observations, where seventy-two (72) and twenty (24) observations were used as training and testing sets, respectively. With such small sample data, it can be concluded that the FFNN model is able to accomplish accurate predictions as it was able to detect the hidden patterns of the time series better than the SARIMA model. The SARIMA model required the assumption of stationarity to be satisfied before model building and also to be verified post model development. However, such assumptions are not required in the development of FFNN. Therefore, the FFNN is a parsimonious, simple model with no assumptions and requires fewer variables in model building but with greater explanatory power as compared to the SARIMA model. It was observed that using the architecture of one hidden layer produced more accurate results than those obtained from architectures with two hidden layers for short-term forecasting horizons. Therefore, the use of a single hidden layer is adequate in developing FFNN for short-term forecasting horizons. Reducing the number of input neurons did not improve the model accuracy. The researchers recommend that Bulawayo City Council should deliberately adopt and integrate such forecasting tools to assist them in their strategic resource planning activities.

Data Availability
All the data required for the publication has been included in the research article. These are in the form of tables in the research paper.