Comparison of Time Series Methods and Machine Learning Algorithms for Forecasting Taiwan Blood Services Foundation's Blood Supply

Purpose The uncertainty in supply and the short shelf life of blood products have led to a substantial outdating of the collected donor blood. On the other hand, hospitals and blood centers experience severe blood shortage due to the very limited donor population. Therefore, the necessity to forecast the blood supply to minimize outdating as well as shortage is obvious. This study aims to efficiently forecast the supply of blood components at blood centers. Methods Two different types of forecasting techniques, time series and machine learning algorithms, are developed and the best performing method for the given case study is determined. Under the time series, we consider the Autoregressive (AUTOREG), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA, Seasonal Exponential Smoothing Method (ESM), and Holt-Winters models. Artificial neural network (ANN) and multiple regression are considered under the machine learning algorithms. Results We leverage five years worth of historical blood supply data from the Taiwan Blood Services Foundation (TBSF) to conduct our study. On comparing the different techniques, we found that time series forecasting methods yield better results than machine learning algorithms. More specifically, the least value of the error measures is observed in seasonal ESM and ARIMA models. Conclusions The models developed can act as a decision support system to administrators and pathologists at blood banks, blood donation centers, and hospitals to determine their inventory policy based on the estimated future blood supply. The forecasting models developed in this study can help healthcare managers to manage blood inventory control more efficiently, thus reducing blood shortage and blood wastage.


Introduction
Blood performs several important functions in the human body such as transporting oxygen, carrying supplements to our cells, disposing ammonia, carbon dioxide, and other waste items. Four of the most critical elements are the red blood cells (RBC), white blood cells (WBC), plasma, and platelets [1]. e American Red Cross reported that over 35,000 RBC units, 10,000 plasma units, and 7,000 platelet units are required day-to-day within the US [2]. Due to the short shelf life of blood components, hospitals and blood centers are faced with the challenge of maintaining appropriate inventory levels to avoid outdating and shortage.
Managing blood supply and demand is the core part of the healthcare supply chain system as blood plays a very crucial role in saving human lives. Blood supply forecasting is essential for making supply chain decisions, such as donor drive scheduling, vehicle routing policies, and inventory management, at blood centers and hospitals. Accurate forecasts of the timing and amount of future blood requests have been considered as the key inputs to donor recruitment decision making and inventory control. It is important to gather data for several years to forecast monthly demand and to recognize seasonality in demand [3][4][5][6]. Lestari et al. [7] indicated that the forecasting can predict the data trend observed and future demand for blood components.

Literature Review
Several studies have leveraged time series forecasting techniques for predicting the blood demand at hospitals and blood centers. For instance, Pereira [8] investigated and evaluated the autoregressive integrated moving average (ARIMA) model and Holt-Winters exponential smoothing model to predict monthly demand for red blood cell transfusions at a tertiary care. While these methods focused on using time series forecast, Bosnes et al. [9] used the statistical regression technique for the forecast of blood donor arrivals at the blood bank of Oslo and found that the most important factors among 18 explanatory variables were: donor age, time from making an appointment to arriving at the drive, contact methods used, number of prior donations, and donor no-show rate. Fortsch and Khapalova [10] introduced numerous practical methods to predict future demand of blood. Several forecasting models, including the naïve, exponential smoothing, moving average, and time series decomposition, were tested using the daily demand data from a blood center that were obtained for January 2006 to December 2012. ey also compared the performance of these methods with an autoregressive moving average (ARMA) model. e results revealed that the ARMA forecasting model performed better for eight out of nine time series model settings. Similarly, Khaldi et al. [11] explored the capabilities of employing machine learning algorithms such as the artificial neural network (ANN) model to predict future demand for blood.

Materials and Methods
As discussed earlier, the study aims to develop effective forecasting methods to predict the supply of RBCs using two different techniques: time series forecasting methods and machine learning algorithms.

Time Series Forecasting.
is section discusses the seven time series forecasting methods used in this study. [12,13]. e AUTOREG procedure estimates and forecasts linear regression models for time series data when the errors are autocorrelated. e autoregressive model regresses the value of the series at time t (Y t ) on the values during the time

Autoregressive (AUTOREG) Model
e mathematical formula is expressed as follows: where α 0 , α 1 , α 2 , . . . , α p are the linear regression coefficients, Y t is the forecasted value at time t, and ε t is the random error variable and is generally assumed to have a normal distribution with mean 0 and variance σ 2 (i.e., normal (0, σ 2 )). [12][13][14]. ARMA model is one of the basic tools in time series modeling. Suppose the time series Y 1 , Y 2 , . . . , Y t is a stationary stochastic process time series, the expression ARMA (p, q) represents the model with autoregressive order of p and moving-average order of q. is model is a combination of the AR (p) and MA (q) models, where AR (p) is written as

Autoregressive Moving Average (ARMA) Models
As in the AUTOREG model, Y t is the observation value at time t. e ARMA (p, q) process is generally written as follows: where a, b, and c are constants, ε t is the random error variable and is generally assumed to have a normal distribution with mean 0 and variance σ 2 ; ∅ 1 , ∅ 2 , . . . , ∅ p are the autoregressive coefficients to be estimated, and θ 1 , θ 2 , . . . , θ q are the moving average coefficients to be estimated.

Autoregressive Integrated Moving Average (ARIMA)
Model [12][13][14]. e ARIMA (autoregressive integrated moving average) approach was made popular by Box-Jenkins models [11]. e ARIMA procedure is functioning as a linear combination of its current values, past values, past errors, and past values of other time series (predictor time series) to predict a future response value in a time series.
With time series nonstationary behavior, the above ARMA (p, q) model can be extended and written using difference which is defined as and B is the backward shift operator, which means that B has the effect of shifting the data back one period (i.e., BY t � Y t− 1 ). [12,13,15,16]. Seasonal ARIMA model is written with the general expression ARIMA (p, d, q)(P, D, Q) s . e symbol p is the order of the nonseasonal autoregressive component, d is the order of the differencing, q is the order of the nonseasonal moving-average process, P is the order of the seasonal autoregressive part, D is the order of the seasonal differencing, Q is the order of the seasonal moving-average process, and s is the duration of the seasonal cycle.

Seasonal ARIMA Model
Let Y t be a dependent time series Y t : 1 ≤ t ≤ n at time t, then the mathematical formula for the seasonal ARIMA model is expressed as follows: where μ is the constant mean, B s is the seasonal backward shift operator, is the seasonal moving-average component. [12,13,15,16].

Seasonal Exponential Smoothing Model
In the seasonal exponential smoothing method (ESM), the equation of forecast value at time t + k (Y t+k ) is given by 2 Journal of Healthcare Engineering e smoothing equations are as follows: where X t is given observation at time t, and α and c are the level and seasonal smoothing parameters, respectively, L t is the estimated level component at time t, S t is the estimated seasonal component at time t, and p is the periods after which the seasonal cycle repeats itself. [12,13,15,16]. e Holt-Winters model, also known as the triple exponential smoothing, applies three types of exponential smoothing to the time series-value, trend, and seasonality. e model equation for the Holt-Winters method can be either additive or multiplicative model. In this section, we present the multiplicative Holt-Winters model, whereas Section 3.1.7 presents the additive model. e mathematical formula relevant to a time series with a trend and constant seasonal component using the Holt-Winters additive technique has the forecast at time t + k (Y t+k ) given by following equation:

Multiplicative Holt-Winters Model
e smoothing equations are given using the following equations: where X t is given observation at time t, α, β, and c are the level, trend, and seasonal corresponding constants, respectively, L t is the estimated level at time t, T t is the estimated trend at time t, SI t is the seasonality index at time t, and p is the periods after which the seasonal cycle repeats itself. [12,13,15,16]. In this section, we present the additive Holt-Winters Model. For the additive model, the forecasted supply estimate for time t + k is given by the following equation:

Additive Holt-Winters Model
e estimates of level, trend, and seasonal factors for additive model equations are given using the following equations: 3.2. Machine Learning Algorithms. Machine learning is a technology exploring the algorithms to analyze a set of data, learn from the insights gathered, and make predictions on data [17]. For the blood supply forecasting, we leverage the two most widely used machine-learning techniques, artificial neural network and regression.

Artificial Neural Networks (ANN).
ANN is a reinforcement learning method that is an adaptation of biological neural network. e network consists of several nodes that are distributed across numerous layers, and each layer is connected to its previous and subsequent layers within the network [17]. ese interconnected elements work closely to process information that they receive from the nodes of the previous layers and transfer them to the next layer based on the sigmoid function. ey are particularly useful for modeling complex relationships in high-dimensional data or where the relationship between the input and output variables is not easy to understand [17].

Multiple Regression.
Multiple regression is another class of problem in machine learning that is trying to predict a continuous value of a variable instead of a class unlike in classification problem [17]. Linear regression with ordinary least square is one of the classic machine learning algorithms in this domain. e mathematical formula for the regression model is represented as follows: Y � β 0 + β 1 X 1 + · · · + β n X n + ε, (15) where Y is the response variable, X n is an independent variable, β 0 is the intercept, β i is the slope of the coefficient X i (both β 0 and β i are unknown coefficients to be estimated by the model), and ε is the error variable.

Evaluation of the Different Methods.
We use four different measures of forecast errors for evaluating the model performance and the accuracy of the methods; they are MAE, MSE, BIAS, and MAPE [12,15,18]. Assume X 1 , X 2 , . . . , X n are actual data and F 1 , F 2 , . . . , F n are forecasted data, and then the n values of forecast errors, e 1 , e 2 , . . . , e n , are given by e 1 � F 1 − X 1 , e 2 � F 2 − X 2 , . . . , e n � F n − X n .

Data Collection.
e historical supply data for five years from 2013 to 2017 are first gathered from the health records.
e summary statistics are given in Table 1.
From Table 1, it is observed that the average blood supplies of the weekdays for each year are steady. Also, we can see that Monday supply is very high, ursday and Friday supplies are quite high, Tuesday and Wednesday supplies are moderate, and Saturday and Sunday supplies are significantly lower.

Time Series Forecasting Results.
After running the seven different time series models discussed in Section 3.1 and obtaining the forecasts, we evaluate them using the error measures given in Section 3.3, and the results are presented in Table 2. It is clear that Seasonal ARIMA Model, Seasonal Exponential Smoothing Method, and Multiplicative Holt-Winters Model yield minimal error measures. Hence, we conclude that, under the time series methods, these three models are best forecasting the blood supply for the case study data under consideration.

Machine Learning Algorithm Results.
e performance of the machine learning algorithms is compared in Table 3. For this particular dataset, results show that regression is a better predictor of the blood supply, nevertheless, the power of the results using regression is quite low (R 2 � 63.71%). erefore, regression is used to predict the supply for the first week of January 2018 as shown in Table 4. A summary of the results obtained under the time series method and regression is given in Table 4.
Clearly from the results, we can infer that there is not a single method that predicts the supply accurately, and hence we recommend using the average value of the forecasts obtained under these four methods for estimating the future supply [15,[19][20][21].

Discussion
is study focusses on predicting the supply of red blood cells for Taiwan Blood Services Foundation (TBSF) [22], a nongovernmental and nonprofit organization. So far, more than seven million citizens have donated blood in Taiwan through this foundation (which accounts for over 25% of the total population of Taiwan) [23]. Currently, blood centers at TBSF do not have a proper blood forecasting system, and some blood centers face blood shortage problems as a result to lack of accurate forecasting of blood supply. is paper focusses on developing a blood supply forecasting decision support tool for TBSF using time series and machine learning algorithms. e accurate forecasting models will enable TSBF to make good blood supply chain management planning decisions, such as when to collect blood from donors, how much units to collect, proper assignment of the workforce for collecting blood in donor drives, and blood component testing process. Upon accurately forecasting the future supply using the methods discussed in this study, inventory models can then be developed to make decisions on the number of units to order and time between orders.
ere are some limitations on forecasting methods. Accuracy of forecasting could be affected by various factors. If there are some unknown variable(s) that could cause some of the fluctuations in the data, then it will be more difficult to forecast unless there are known explanatory variable(s) accounting for the variations. Blood supply forecasting is vital for blood supply chain decisions, and they have to be updated as more reliable information becomes available. Hence, after appropriate forecasting methods are selected, it is important to continuously monitor the forecast accuracy.

Data Availability
e data used to support the findings of this study have not been made available because they are confidential to the case study blood center and hospitals.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.  Journal of Healthcare Engineering 5