A Bayesian Network-Based Probabilistic Framework for Drought Forecasting and Outlook

1Department of Civil and Environmental Engineering, Hanyang University, Seoul 04763, Republic of Korea 2Department of Agricultural Engineering, University of Engineering and Technology, Peshawar 25120, Pakistan 3Department of Civil Engineering, Chonbuk National University, Jeonju 54896, Republic of Korea 4Department of Civil and Environmental Engineering, Hanyang University, Ansan 15588, Republic of Korea


Introduction
Drought is a natural disaster caused by lack of precipitation or available water.Droughts are destructive, causing significant damage to both natural environments and human lives [1].Several studies examining hydro-climate projections reported that drought frequency and severity are expected to increase in the future due to climate change [2,3].Several researchers have reported the occurrence probability of a mega drought, which is defined as a severe drought lasting one decade or longer, in southwestern United States, South America, and Southern Africa [4,5].
Recent severe droughts have occurred in different parts of the world, such as the African drought in 2011 and the California drought in 2015.South Korea experiences drought approximately every two years [6].The most recent severe drought in South Korea occurred in the Han River basin in 2015 due to significantly below normal rainfall since 2014.In South Korea, more than 50% of the annual rainfall is concentrated during the flood season from June to early September.This summer-intensive weather pattern contributes to increased vulnerability to drought [7].Because of this vulnerability, the water manager conservatively operates multipurpose reservoirs, even during the flood season, so as to secure sufficient water for the dry season.Therefore, prediction of drought plays an important role in water resource management in South Korea.
Unlike other natural disasters, detecting drought onset is difficult because it develops slowly and affects diffuse local communities.Nevertheless, well-timed mitigation measures combined with appropriate monitoring and forecasting can reduce drought damages.Various studies have been conducted to predict the occurrence of future droughts and conditions using stochastic and/or statistical methods such as Markov chain, stochastic time series model, artificial neural networks, and hybrid model [1,[8][9][10][11][12][13].However, drought forecasting is often accompanied by high uncertainty due to the prediction uncertainty of hydro-meteorological variables.Therefore, drought prediction with uncertainty estimation is needed to provide reliable forecast information to water managers [14].For example, the drought probability forecast method was developed by combining the historical temperatures and precipitation with the seasonal forecast from the United States National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC) [15].Historical climate records were applied to nonparametric autoregressive models for producing drought ensemble forecasts and resampling residuals to constrain the forecast bounds [14].Currently, ensemble forecast models are widely used in the practice of probabilistic drought forecast.
This study proposed a probabilistic model for forecasting drought based on Bayesian networks, which can be applied to complex systems to explicitly represent the uncertainties of variables [16][17][18].Bayesian networks have been applied in various academic fields, such as medical sciences, economics, industrial engineering, sociology, and environmental engineering, in order to make decisions and predictions [16,19].In hydrology and water resources, few studies have attempted to utilize Bayesian network models for risk assessment [20][21][22]; to the best of our knowledge, only one study employed Bayesian networks to calculate the conditional probability of copula-based forecasting [23].In this study, the Bayesian network and its inference algorithm were applied as a main tool to forecast drought considering the persistence of a drought index.Since Bayesian networks express uncertainties through probability distribution [24], the present study suggested a drought outlook framework based on meteorological drought forecasting results.

Study Area and Available Data.
To forecast the probability of drought occurrence, we secured two data sources, observed past precipitation and predicted future precipitation.The observed past precipitation values from 1973 to 2014 were acquired from the Korea Meteorological Administration.Daily precipitation data from 16 weather stations, the locations of which are shown in Figure 1, were used to compile monthly precipitation.
Predicted future precipitation data were provided by the Asia-Pacific Economic Cooperation Climate Center (APCC).Since 2007, the APCC has maintained a data bank for seasonal forecasting products (e.g., precipitation, temperature at 850 hPa, and geopotential height at 500 hPa) using a multimodel ensemble (MME) method.The APCC MME method includes a single probabilistic method and four deterministic methods: simple composite method (SCM), super ensemble method (SEM), synthetic super ensemble (SSE), and stepwise pattern projection-(SPP-) based MME [25].The present study adopted the SCM, the simplest and most widely used method.The SCM gives equal weight to each single model for constructing a multimodel prediction [25,26].The predicted precipitation (termed APCC MME precipitation) gridded at 2.5 ∘ latitude and 2.5 ∘ longitude spatial resolution for the study area (longitude: 35.0∼37.5 ∘ east, latitude: 127.5∼130.0∘ north) was extracted from the APCC data service system web portal (http://cis.apcc21.org/).The prediction products of APCC MME are applied for predicting Asia summer monsoon, extreme flood, and drought [27][28][29][30].In order to predict drought, statistical downscaled APCC MME prediction was used to estimate SPI and SPEI (Standardized Precipitation Evapotranspiration Index) [28,30], while we focused on the development of drought forecasting method and the statistical downscaling method was not adopted.For this purpose, the APCC MME precipitation data were converted into the site-based data for 16 stations by applying predictive anomaly to the monthly historical precipitation mean values.

Bayesian Network-Based Drought Forecasting (BNDF)
Model.Bayesian networks have been applied to many domains including forecasting, estimation, classification, recognition, and inference [31,32].A Bayesian network-based stochastic predictive model was employed in this study to determine the drought forecasting uncertainty.The network is a type of directed acyclic graph that represents the dependencies among variables.Therefore, the network consists of a set of nodes representing random variables and directed arcs.The set of arcs connects a pair of nodes, and the direction of an arc is represented by arrows demonstrating the causal relationships among the nodes [33,34].The arc starts from a casual or preceding event of the parent node and progresses to an outcome event of the child node.And the relationship between nodes is defined as a conditional probability based on prior information or statistically observed correlations [19].
Advances in Meteorology The proposed Bayesian network-based drought forecasting (BNDF) model is a probabilistic drought prediction system based on the drought forecast model in [23] and applied historical runoff in Bayesian networks.We proposed that three types of variables from past, current, and future drought conditions could be used for predicting drought.Various stochastic forecasting methods have developed based on past and current drought related variables [1,8,35], while, in other forecasting studies, precipitation production of climate model was used to estimate future drought condition [14,36].Similarly, precipitation derived from climate models has high uncertainty and drought prediction only with precipitation can show low skill score [37].Therefore, we have developed the BNDF model which uses both historical and predicted drought conditions utilizing Bayesian networks and its conceptual framework is shown in Figure 2. The structure of the model is composed of four nodes: three parent nodes ( −1 ,   , and  +1 ) and one child node ( +1 ), as shown in Figure 3.Each node (parent and child nodes) was described by monthly drought condition defined using the probability distribution of SPI [38], and the predicted drought information was interpreted using the APPC MME forecasted precipitation.
The proposed model assumed that each node was a continuous variable under a Gaussian distribution, as given in where  is precipitation data,  is the local parameter, and  is the scale parameter.A Kolmogorov-Smirnov (K-S) test was performed to determine the goodness of fit and to justify the application of the Gaussian distribution.In addition, the maximum deviation between the empirical distribution and Gaussian distribution was calculated [39], as given in where  max is the test statistic of the K-S test, () is the empirical distribution of the observed data, and   () is the Gaussian distribution.If  max is larger than the value of significant level (this study supposed 5%,  0.05 = 0.2243), then null hypothesis is rejected.
The test results showed that Gaussian distribution could be applied to majority of the months in the selected 16 stations.The Gaussian probability density function (PDF) of each node was estimated using the three-month SPI, which was calculated from the monthly precipitation.Then the Gaussian PDF of  month (current drought condition, node   ) was constructed using the SPI values from 1973 to the current year in month .And  − 1 month SPI PDF (past drought condition,  −1 ) was obtained by the same method.Similarly,  +1 (future drought condition) was estimated with +1 month SPI, calculated from 1973 to the current year for the one-month lead APCC MME precipitation.In this study, the distribution parameters were estimated employing the maximum likelihood method, as given in where ln  is the log likelihood function,   is the SPI values of each node, and  represents the Gaussian distribution parameters (mean and standard deviation).The child node, posterior probability of our forecast result, was given in where   is the PDF of SPI in the current month,  −1 is the PDF of SPI in the previous month,  +1 is the PDF of SPI in the next month with APCC MME predicted precipitation, and  +1 is our forecasted result of SPI PDF.
The inference algorithms for variable elimination are likelihood weighting, rejection sampling, and Gibbs sampling which are generally used to compute conditional probability in Bayesian networks.The posterior probability was calculated based on the likelihood weighting [40,41] of the Bayesian network approximated inference algorithm [17,33]: likelihood weighting is a simple model that can be applied to discrete and continuous type of nodes [42].The model fixes the values of the evidence variables and samples only the nonevidence variables [33].To infer the conditional probability (( =  |  = ), where  denotes a set of observed nodes of the network and  is the nodes not contained in ), ( = ,  = ) and ( = ) were estimated using the posterior probability equation.A path probability distribution ((, )) and a weighting distribution ((, )) were used to generate the relative approximation of probabilities (( = ,  = ) and ( = )), as given in The weighted probability of ( = , = ) was calculated by multiplying ( 5) and ( 6), and ( = ) was estimated by a weighting distribution [43].Therefore, the conditional probability can be estimated from the ratio of ( = ,  = ) and ( = ).
A one-month lead forecasting example is shown in Figure 4, which demonstrates application of the BNDF model for the Wonju station in March 2008.The observed SPI values in January and February from 1973 to 2008 were applied to construct nodes  −1 and   .Similarly, node  +1 was constructed using observed SPI values from 1973 to 2007 and APPCC MME precipitation for March in 2008.In addition, the prediction from our proposed model based on the posterior probability was calculated using the likelihood weighting method.

Drought Probability Forecasting.
The results obtained from this study demonstrated a future probabilistic drought forecasting method that showed prediction uncertainty.The proposed BNDF model was described in Section 2.2.Specifically, Figure 4 shows the probabilistic forecast results in the form of PDF of SPI ( Mar , (−1.067,0.9872)) for Wonju station in March 2008.The predicted PDF of SPI also followed a Gaussian distribution, and confidence intervals (CIs) were estimated by the quantiles of the predicted PDF in Table 1.
Table 1 shows the forecast results of the predicted PDF, for example, parameters of predicted Gaussian distribution, the 95% CIs, the observed SPI, and the drought occurrence probability.The observed drought conditions (the observed SPI is under zero) are in italic font in Table 1; almost drought occurrence probability (the area under the curve bounded by zero and minus infinity of the predicted PDF) is over 0.5 in italic font in the table.The probabilistic forecast results were estimated using the BNDF model from March 2008 to December 2012 at the study area.Figure 5 shows the SPI probabilistic forecast examples with 50% and 95% CIs and the time series of observed SPI data, which are represented by dotted lines.Also the observed drought condition which is under the zero value of SPI is presented in the figure using the color bar.The figures show that the 95% CIs contained almost all observation data and the 50% CIs included a sufficient number of observations.In Figure 5, most of study areas have experienced drought for two years from 2008 to 2009; the predicted results showed similar trend of the drought condition.
In addition, the Ranked Probability Score (RPS) [44] was used to measure the overall performance of the probabilistic forecasts [45].The RPS evaluates probability forecasting that matches the probability distribution with the observed outcome and is expressed as where CDF fc, is the cumulative distribution function (CDF) of forecasts (the probabilistic drought forecast result) and CDF obs, is the CDF of observations.The lower RPS value indicates small forecast probability error and a perfect forecast world result in an RPS value of zero [46].The average RPS values during September 2008 to December 2012 for each station are shown in Table 2; the average RPS was between 1.3 and 1.7.Compared to results reported in [47] where the drought outlook was performed using the ensemble technique, the overall average RPS (1.6) of the BNDF model showed statistically more accurate forecast.3.2.Drought Outlook.This section demonstrates the drought outlook method and its corresponding results for probabilistic drought forecasting.Drought outlook provides expected drought status and drought improving or deteriorating conditions and compares them with the current drought condition [12].Unlike traditional weather forecasts, which consist of weather maps that predict exactly how much rain might fall or the daily maximum temperature of an area, outlooks offer users forecasts of future weather conditions relative to what is normal for the region (NOAA, https://www.climate.gov/).The NOAA CPC provides seasonal and monthly drought outlook information.The climate data outlook information is widely provided by meteorological administrations in countries such as USA, Australia, New Zealand, and South Korea.Figure 6 illustrates the framework of drought outlook proposed in this study, and its corresponding procedure is outlined below.First, current drought condition (mild, moderate, severe, extreme drought, or no drought) was determined using the SPI value.Then drought occurrence probability was estimated from the PDF of drought forecasting employing the BNDF model.The forecasted results for drought occurrence were determined based on the cumulative probabilities of SPI that takes values less than or equal to zero; we supposed that the drought begins when the SPI falls below zero [38].If the CDF, equal to the area underneath a PDF, of drought forecasting is over 0.5, then drought will occur.This process has been expressed using the CDF of the drought forecast in CDF drought forecasting ( = 0) < 0.5 no expected drought, (8a) CDF drought forecasting ( = 0) ≥ 0.5 expected drought.

(8b)
Based on (8a) and (8b), when the CDF of drought forecasting of  = 0 is lower than 0.5, we can conclude that drought will not occur; however, if it is higher than 0.5, we can expect drought to occur.For an expected drought to occur, future drought conditions could be estimated by   comparing the probabilities of each drought condition, for which drought condition is classified into mild (D1), moderate (D2), severe (D3), and extreme drought (D4) in Table 3.
The drought occurrence probability of each condition was calculated by the area under the PDF curve bounded by SPI values in Table 3.In this situation, the highest probability value of drought condition is adopted for future drought condition.The drought outlook was decided based on the comparison between the current drought condition and its forecasted value, as shown in the last step of Figure 6.If future drought condition shows a value higher than the current condition, it could be the base for the expected drought.In Figure 7, the CDF of drought forecasting was estimated at 0.84 (the area under the curve bounded by zero and minus infinity as indicated by the shaded area in Figure 7) which indicates that drought will occur in the next month.In addition, the probability of the mild drought (D1) shows the highest value of 0.38 compared to D2, D3, and D4.In order to make decision about the drought outlook, we compared the current drought conditions (calculated D1 values) with the expected drought conditions (forecasted D1 values).Finally, drought outlook in the next month was determined.If the current drought conditions showed D2, then the drought outlook results in a weak drought.If current drought conditions showed D2, then the result of drought outlook is forecast as "drought remains but is weak." Figure 8 shows the drought outlook examples in August and September 2012; South Korea experienced drought from May to August 2012.

Conclusions
In this study, we proposed a new stochastic drought forecasting method and drought outlook framework using probabilistic drought forecasting results.The proposed BNDF  model estimated near future drought with forecast uncertainty via Bayesian networks.These networks were useful tools that can be applied to complex systems with a large number of variables [48] and were efficient under certain circumstances [49].Similarly, the model structure was easy to understand since it is based on nodes and arrows.The probabilistic presentation was a benefit of the model to assess uncertainty explicitly.The predictions based on the BNDF model included the SPI Gaussian distribution, followed by the forecasting uncertainties via their corresponding CIs.In addition, the significant agreement between the observed and forecasted data indicated that the BNDF model showed reliable results.Moreover, this study suggested a drought outlook framework using probabilistic drought forecasting.Drought outlook predicted the changes in drought status in the coming months, which can render future forecast information understandable to the public.

Advances in Meteorology
Probabilistic drought forecasting has the flexibility to respond to undesirable future drought risk identification.In the current study, the simple BNDF model considered only the past and forecasted SPI for meteorological drought forecasting; however, Bayesian network applicability can be extended to forecast other types of droughts (e.g., agricultural and hydrological) by incorporating other hydroclimatological variables.

Figure 1 :
Figure 1: Weather stations of which rainfall data are used in this study.

Figure 2 :Figure 3 :
Figure 2: Schematic diagram of the Bayesian network-based drought forecasting.

Figure 4 :
Figure 4: An example of the Bayesian network-based drought forecasting model for Wonju station ( * one-month lead predicted precipitation of APCC MME was applied to calculate March SPI).

Figure 5 :
Figure 5: Results of drought forecasting (2008/03∼2012/12) with confidence interval (the drought states (SPI value is under zero) are provided with the background bar color).
Current drought < expected drought Current drought = expected drought Current drought > expected drought

Figure 6 :
Figure 6: Drought outlook framework using the BNDF results.

Figure 8 :
Figure 8: The map of drought outlook in study areas.

Table 1 :
Results of drought forecasting in Wonju station during September 2008 to December 2009 ( * the drought conditions (observed SPI is under zero) are in italic font in the table).

Table 2 :
Average RPS values for 16 stations during September 2008 to December 2012.