Prediction of Drought Severity Using Model-Based Clustering

Drought is a common climatic extreme that frequently spreads across large spatial and time scales. It aﬀects living standard of people throughout the globe more than any other climate extreme. Therefore, the present study proposed a new technique, known as model-based clustering of categorical drought states sequences (MBCCDSS), for monthly prediction of drought severity to timely inform decision-makers to anticipate reliable actions and plans to minimize the negative impacts of drought. The potential of the proposed technique is based on the expectation-maximization (EM) algorithm for ﬁnite mixtures with ﬁrst-order Markov model components. Moreover, the proposed approach is validated on six meteorological stations in the northern area of Pakistan. The study outcomes provide the basis to explore and frame more essential assessments to mitigate drought impacts for the selected stations.


Introduction
Drought is a multifaceted and recurring event characterized by precipitation insufficiency, which has significant effects on hydrological systems, agriculture, and society [1,2]. Drought lasts for a long time and brings extreme meteorological consequences, causing distress to crop yield and other plant reproduction [3]. In recent decades, drought has dramatically impacted the environment and economies worldwide [4,5]. e determination of the incoming and termination times of the drought is still problematic for drought management. Structurally, the effects of drought slowly add over a period, and it may linger for an extended period. Albeit having abstruse visual effects, these impacts of drought become severe without proper action and remain for a prolonged period even after termination [6][7][8].
According to drought occurrences and their characteristics, the well-known drought categories are meteorological, agricultural, hydrological, and socioeconomic [9,10]. Among these categories of drought, a meteorological drought is a climatic event that is associated with a decrease in precipitation. In contrast, all other drought categories have more extensive human and social features [8,11]. Moreover, the meteorological drought can lead to the other three types of drought; because of the intricacy and severity of drought, it becomes challenging to recognize and evaluate drought characteristics. erefore, in recent decades, many drought indices have been developed to assess and monitor drought events. Reliable and quality drought knowledge is essential for mitigation policies and preparation in disasterstricken regions globally. Obtaining knowledge about drought occurrence is crucial for an early warning to lessen the adverse effects. Several drought indices are available in the literature and have been used by decision-makers to mitigate the negative impacts of drought.
ere are different commonly known drought indices; for example, Palmer [12] has proposed a drought index called the Palmer Drought Severity Index (PDSI). is index incorporates soil moisture, precipitation, and temperature in a water balance model. Gibbs and Maher [13] have introduced a Decile Index (DI), Shafer and Dezman [14] proposed the Surface Water Source Index (SWSI), while the Standardized Precipitation Index (SPI) was introduced and has been used as a meteorological index by McKee et al. [15]. Albeit having a subtle discrepancy among the indices, the present analysis is accomplished using the SPI [15], which frequently has been used for drought monitoring policies and acquired endorsement from the World Meteorological Organization [16,17]. It produces a consistent interpretation across various regimes and various spatial climates. Furthermore, it depicts ideal characteristics in forecasting and risk analyses as probabilistic approaches [18][19][20].
Moreover, multiple techniques have been developed in various studies to evaluate and predict drought occurrences [21][22][23][24]. However, drought is considered a complicated dynamic; therefore, much more fundamental work needs to be done to clarify the critical issues and demonstrate the effectiveness in enhancing both the monitoring and prediction of droughts. Hence, it is important to handle a drought process as a predictable dynamic system that helps to reduce the critical effects [5,23,25,26]. erefore, the current study proposes a new technique, known as model-based clustering of categorical drought states sequences (MBCCDSS) for grouping the categorical drought state sequences to predict the drought severity in the selected stations. e MBCCDSS may accurately and timely inform decision-makers to anticipate reliable actions and plans to mitigate negative drought impacts.

Standardized Precipitation Index (SPI).
e SPI is commonly used for computing and recording drought occurrences [15]. It can be calculated for different periods based on monthly precipitation data. It provides a spatially reliable interpretation across several climates [27,28]; Guttman 1998; [20]. Furthermore, the use of SPI is significantly high in geographical and temporal circumstances. e simplicity of calculation and availability of the SPI make it the most familiar worldwide. Usually, SPI-1 and SPI-3 consider meteorological drought, and SPI-6 and SPI-9 envisage agriculture drought. Moreover, hydrological drought is usually envisaged by SPI-12 and SPI-24 [29,30]. However, the present study considers SPI-1 for quantifying drought occurrences from the data ranging from January 1971 to December 2017.

Model-Based Clustering of Categorical Drought State Sequences (MBCCDSS).
e primary focus of the clustering technique is to group the data based on similar information. In contrast, specific information can be available from one another. It is prevalent in statistics and computer science due to its great variety of applications.
ere are numerous clustering techniques contemplated in the literature. Among them, there are various hierarchical clustering algorithms [31,32], wellknown k-means [33], and k-medoids [34] clustering algorithms. Moreover, model-based clustering is a technique that groups the objects of the data and assumes that each object of the cluster can be observed as a sample from some probability distribution [35,36]. In case there are numerous data groups, various distributions are desired, and finite mixture models are needed [37]. Model-based clustering performance is outstanding in distinctly grouping objects [38]. Multiple challenging applications can be addressed by this technique, including mass spectrometry data [38,39], text classification [40], and social networks [41]. Some works related to model-based clustering have been done in time series [42] and regression time series [43]. A high number of applications can be handled more reliably by using categorical grouping of sequences [39,[41][42][43]; however, in drought analysis, it has not established greater attention yet. In drought classification, the analysis of categorical sequences is important to obtain consistent results. erefore, the present study proposed MBCCDSS that considers the transition pattern of the drought states and provides the basis for using model-based clustering to substantiate more reliable results about drought occurrences.
e MBCCDSS is based on finite mixture modeling. e mathematical form for the finite mixtures can be written as where K is representing the total number of component distributions f k (./θ k ) with corresponding parameter vectors θ k and α 1 , α 2 , . . . , α K showing the mixing proportions, subject to α k > 0 and showing the entire parameter vector that has to be estimated. Moreover, the MBCCDSS models each data group by using a functional form of the first-order Markov model components. Furthermore, MBCCDSS used various sequences of drought states. ese sequences reflect the steering behavior of drought states and reflect the importance of this on the application site. e drought states (extremely dry (ED), severely dry (SD), normal dry (ND), median dry (MD), median wet (MW), severely wet (SW), and extremely wet (EW)) are classified according to [44]. Now, let X i � (X i1 , X i2 , . . . . . . , X iS i ) T show the i-th categorical drought state sequence of length S i following the first-order Markov model with p unique states. en, we can write 2 Mathematical Problems in Engineering where x is , s � 1, 2, . . . , S i , takes values in 1, 2, . . . p and shows the drought state observed in the s-th position of x i . Furthermore, to specify the notation, we denote the initial state probability as β j � P(X i1 � j) and transition probability as

Now it can be written as
where I (.) is considered as an indicator function and y ijj′ shows the frequency of the transitions from state j to state j ′ within the i-th sequence and assume that each categorical sequence originates from one of the K components. K represents the total number of components. e order of these components is detected by minimizing the Bayesian information criterion (BIC) [45]. Using the notations and the final form of the finite mixture model with the first-order Markov model with p × p matrix Y i having elements y ijj′ , equation (1) can be written as where information of the i-th sequence is summarized in terms of the first state observed and the transition frequency matrix, which is considered as a minimal sufficient statistic, i.e., a pair (x i1 , Y i ) for estimating parameters of the model given in equation (2). e estimation of the parameters is done by the expectation-maximization (EM) algorithm [46]. e EM algorithm consists of two steps: expectation (E step) and maximization (M step). In the E step, the EM algorithm finds the conditional expectation of the complete-data loglikelihood function given observed data, and θ is used to maximize the conditional expectation in the M step. In the expectation step of the EM algorithm, posterior probabilities are calculated at the l-th iteration as and the maximization step involves updating the parameter estimates by the following equations:

Prediction of Drought States.
Using the set of transition probability matrices Γ 1 , Γ 1 , . . . , Γ k and a probability distribution π 1 , π 2 , . . . , π K linked with mixture components, the L-step transition probability matrix can be found by where Γ L k shows the matrix Γ k raised to the power L. For instance, Γ 4 k � Γ k , Γ k , Γ k , Γ k . e choice of the distribution π 1 , π 2 , . . . , π K depends on the specific application. However, in the current scenario, the mixing proportion estimated vector (i.e., α 1 , α 2 ,. . .., α K ) and the posterior probability estimated vector (i.e., z i1 , z i2 ,. . .., z iK ) are associated with a specific sequence used to calculate the probability distribution for future drought state occurrences.

Application
e proposed technique is validated on six meteorological stations of the region, Northern area, Pakistan (Figure 1). e selection of the region is based on its structural importance and significant climatological characteristics [47]. e appearance in the atmosphere of the selected region adds significant effects on other parts of the country. Moreover, several changes have been observed in the country due to fluctuating weather patterns in the season in various regions. However, the highest temperature has been observed in larger parts of the country, and these parts were highly affected by global warming [48,49]. Furthermore, global warming has not been affecting the Pakistan atmosphere alone but also the world. Its impact can be observed on temperature and water that cause high temperature and water deficiency. Although future climate changes can be problematic, these changes substantially impact rural livelihoods and their coping accomplishments. Furthermore, drought occurrences can damage several vital sectors of the country; for example, these occurrences can negatively affect the economy, agriculture, and natural resources. erefore, it is important to understand the drought occurrences more instantaneously by developing inclusive and efficient techniques. In these perspectives, the present study proposed a new technique that meaningfully improves the competency of observing drought occurrences in the selected area. ese findings may enhance the capabilities of drought monitoring and mitigation policies.

Results.
e inadequacy of precipitation and anarchy to an expected precipitation pattern cause drought events. e summary statistics of precipitation for selected stations is presented in Table 1. e monthly occurrence of precipitation in various months of the Chilas station is presented in Figure 2, and the precipitation occurrence over the selected period for Gilgit is presented in Figure 3. We took these two stations to present precipitation occurrence; however, the precipitation occurrence for other selected stations can be presented accordingly. e theoretical versus empirical histograms of SPI-1 for selected stations are presented in Figure 4. e presented results in histograms can be envisaged as the discrepancy among stations; this divergence can be arisen due to the natural enactment of the data. In the recent past, many researchers have been working on modeling such discrepancy recitals in the data. Moreover, new procedures were proposed for the standardization based on nonparametric functions and mixture distribution functions [50], but still handling the discrepancy is under contemplation. Furthermore, the temporal behavior of the SPI-1 at various stations can be envisaged in Figure 5. Furthermore, the selected stations show more similar behavior in data over the region for a specific drought index [44]. However, varying distributions can be observed in selected stations (     associated disciplines [51] and has more significant candidacy features for standardization.
Furthermore, the concept of varying probability distributions selected for the varying stations advocates finite mixture modeling. erefore, MBCCDSS is proposed for the prediction of various categorical drought states using a mixture of first-order Markov models. e use of Markov models reflects the dynamics of the drought occurrences.
e MBCCDSS assumes the first-order Markov models in this analysis; however, higher-order Markov models can be included [38]. Furthermore, the MBCCDSS uses the categorical values corresponding to each drought state. ese categorical values are specified for the various drought states that are classified according to Niaz et al. [44]. Moreover, it assumes that each categorical sequence of the selected states instigates from one of the K components. e mixture model order K is detected by minimizing the BIC [45]. e mixture model with two components (K � 2) based on BIC values is selected for the analysis. e performance of the model is detected by including initial state probabilities and without initial state probabilities. It can be observed from Table 3 that, for the first case, the maximized log-likelihood (LogL) value is equal to −3467.271, while in the second case, it is equal to −3477.991. Expectedly, the inclusion of the initial state probabilities in the model yields a higher LogL value as it slightly better fits the data. e variability is rather marginal, and based on the BIC value, the model with initial state probabilities with BIC 7061.757 is preferred over the model without initial state probabilities with BIC equal to 7065.28. However, the BIC have superiority over other competitors in finite mixture modeling, which is used for model selection and its performance.
Moreover, the mixing proportions and the posterior probabilities associated with a specific sequence are used to  ese categorical sequences of the drought states are calculated by SPI-1 and used in MBCCDSS to predict drought severity (i.e., ED, SD, MD, ND, MW, SW, and EW) in the selected stations. Furthermore, the outcomes of the MBCCDSS provide information about drought occurrences more plainly and accurately and can be used to support the mitigation strategies. Moreover, the probabilities obtained from MBCCDSS may be used to compare various drought indices, get more precise results about the drought occurrences for various drought states, find several propagations, and calculate various thresholds for different drought intensities in the selected region. Moreover, in MBCCDSS, the initial state and the transition probabilities are considered constant. e MBCCDSS assumes that the observations are time-homogeneous; however, these probabilities can be constructed by considering time as a function. e inclusion of temporal characteristics will improve the efficiency of MBCCDSS for drought monitoring. Furthermore, the results obtained in this study are significant for the existing conditions of the application site as the forthcoming promising climate circumstances can be unsuitable for the extrapolations based on the present analysis.

Conclusion
Drought is a slowly emerging issue, and the determination of its occurrence is still an issue to be solved. Structurally, the consequences of drought gradually accumulate over a period, and they may last for a long period. Drought distresses the lives of the people directly more than any other natural hazard and causes maleficent results for the society and the economy of the country. erefore, it is necessary to handle drought occurrences as a predictable dynamic system, which used a particular memory and helps to minimize the critical effects. A new technique, known as MBCCDSS, is proposed for the monthly prediction of drought severity using model-based clustering. e MBCCDSS employed an EM algorithm for finite mixtures with first-order Markov model components. e MBCCDSS provides future probabilities for each of the drought states in selected stations. Moreover, the outcomes of the study may accurately and timely inform decisionmakers to anticipate reliable policies and plans to mitigate the adverse effects of drought.  e obtained results from sequence-1 show that the most likely state to visit in next month (i.e., January 2018) is ND; the probability associated with this prediction is slightly higher than 0.624. In other sequences, the ND is also prevailing among other states. is shows that policymakers should make their plans according to this drought state (category) to mitigate its negative impacts. 8 Mathematical Problems in Engineering Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.