Prediction for Various Drought Classes Using Spatiotemporal Categorical Sequences

Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan Department of Mathematics, College of Sciences and Arts (Muhyil), King Khalid University, Muhyil 61421, Saudi Arabia Department of Mathematics and Computer, College of Sciences, Ibb University, Ibb 70270, Yemen National Engineering Research Center of Geographic Information System, School of Geography and Information Engineering, China University of Geosciences (Wuhan), Wuhan 430074, China State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China Faculty of Health Studies, University of Bradford, BD7 1DP, Bradford, UK


Introduction
Drought is relatively more volatile than other natural disasters, and traditional valuations or forecast procedures are failed to predict it. Its relatively unperceptive onset and the multifaceted impacts cause the new assessment methodologies [1][2][3][4][5]. Since last decades, it has become more prominent to distress the environment and economic sectors worldwide than other natural hazards [6][7][8]. Moreover, determining the onset and end times of the drought is still challenging for drought management. Structurally, the effects of droughts slowly add over a period, and they may linger for a long period [8][9][10]. However, it can be characterized by a precipitation deficiency, which has substantial impacts on the agriculture, hydrological systems, and on living standards of the people [11,12]. Despite perceptible effects of drought, these effects acclimatize severity without appropriate measures and are sustained for the long term even after termination [9]. e advancements in drought assessing and monitoring procedures can lead to better drought preparation and decrease the susceptibility of society to drought and its forgoing influences [8,10,13]. erefore, it is essential to find more suitable techniques and procedures to predict drought occurrences more instantaneously. e improved method can be helpful to make plans for the early warning system, drought mitigation policies, and water resource management and decrease the severe effects of drought. Furthermore, the occurrences and characteristics of drought trigger the discussion about the various methodologies and techniques. Generally based on the occurrences and characteristics of the drought, authors have been categorizing the drought into various groups, including "meteorological, hydrological, agricultural, and socioeconomic" [14]. Chang [15] and Eltahir [16] defined that meteorological drought can be occurred due to the shortage of precipitation over a region for some time. Several studies have considered precipitation data to analyze meteorological droughts [17,18]. e streamflow data have been frequently used for analyzing hydrological drought [19][20][21]. Furthermore, the reduction in soil moisture usually causes agricultural drought. e reduction in soil moisture can be affected by meteorological and hydrological droughts. Socioeconomic drought is linked to the shortfall in water resource systems, and in this case, the water supply is unable to meet water demands.
In the past few decades, numerous drought indices have been proposed to assess the drought occurrences [22][23][24][25][26]. e drought indices are frequently used to characterize the drought. e indices are based on various parameters that describe the spatial and temporal extents. Obtaining accurate and precise information about drought occurrences using several drought indices is crucial for an early warning policy; however, consistent and eminent drought information plays a crucial part in preparing drought monitoring and mitigating policies. Numerous drought indices with their strengths and weaknesses exist in the literature and are used by decision-makers who build action plans for drought early warning systems and mitigation policies. For example, Palmer [27] developed a drought index named the Palmer Drought Severity Index (PDSI). e PDSI worked well especially for subhumid and semiarid regions. e PDSI provided weekly information related to abnormal evapotranspiration deficit for the various regions. Information obtained from PDSI can be helpful for the crops in the region. e moisture condition of the regions can be assessed. Gommes and Petrassi [28] have proposed the national rainfall index (NRI). e NRI was used to provide synthetic discussion in sub-Saharan countries in Africa. ey used NRI to determine the pattern recognition of rainfall in various regions. e Surface-Water Supply Index (SWSI) was introduced by Shafer and Dezman [24]. e computation of the SWSI is based on two major sources of irrigation water supply, namely, spring-summer streamflow runoff and reservoir carryover. Both sources are accumulatively analyzed to determine the total availability of surface water supply in season. Van Rooy [29] developed the Rainfall Anomaly Index (RAI).
e RAI helped to find geographical anomalies of the rainfall pattern in varying regions. Weghorst [26] has introduced the Reclamation Drought Index. Palmer [22] has introduced crop moisture index (CMI). Bhalme and Mooley [23] has developed Bhalme and Mooly drought index (BMDI). e BMDI used precipitation data and provided both negative and positive values to measure drought intensities. McKee et al. [25] developed the Standardized Precipitation Index (SPI). e SPI considered the time series of a long-term record of precipitation in the climatic areas. e dynamic characteristic of SPI is that it can be studied for different time scales and used to compare varying climatic areas. erefore, SPI is being used extensively for evaluating and recording drought characteristics [30][31][32][33][34][35]. Furthermore, the drought indices that are mentioned above have been used frequently for drought monitoring in the different studies, although having discrepancies among the indices, to gain consistent interpretation across several regimes and spatial climates. is study utilized SPI, which is often employed to assess and monitor meteorological drought and is recommended by the World Meteorological Organization [36].
Furthermore, many clustering techniques are considered in the literature [37][38][39][40][41]. e clustering techniques focus on grouping the data so that the data group with similar characteristics would be selected within the cluster, while distinct information can exist among other clusters. Various clustering techniques have been frequently considered in machine learning approaches, especially in statistics and computer science, due to the variety of their applications [41][42][43][44][45]. Among the various techniques, model-based clustering groups data and presumes that each data cluster can be perceived as a part of any probability distribution [46,47]. In various data groups, numerous distributions are preferred, and finite mixture models are desired [48]. e performances of the model-based clustering are outstanding in spectrometry data, text classification, social networks, and distinct grouping objects. Model-based clustering is used for time series [49] and regression time series analyses [50]. Several studies related to model-based clustering are available in the literature; however, it has not yet received greater attention in drought analysis. erefore, this study developed a new technique known as Model-Based Clustering for Spatio-Temporal Categorical Sequences (MBCSTCS) to precisely predict drought occurrences for spatiotemporal categorical sequences. e performance of the proposed technique is assessed by using six meteorological stations in the northern area of Pakistan.

Standardized Precipitation Index (SPI).
e long-term record of precipitation in the climatic area observed in the time sequence can be used to compute SPI. e vital feature of SPI is that it can be considered for various time scales and is being widely used to calculate and record drought occurrences [34,35,51,52]. e analysis with various time scales can provide varying information. For example, the moisture conditions in different seasons can be assessed using SPI at a three-month time scale. e SPI can assess information related to the water deficiency at a twelve-month time scale. Furthermore, the use of SPI describes the best characteristics in forecasting and analyzing risks as a probabilistic approach [31,35,53]. e SPI has been frequently used for drought monitoring in several aspects, for example, spatiotemporal analysis, forecasting, frequency analysis, and climatic studies [33,35,51,52]. As precipitation is only used to determine the climatic condition for a particular area, it offers spatially reliable interpretations across various climates [32,34,35]. erefore, it can be advantageous for the areas where other parameters are available that are required to calculate other indices and of significantly great concern to the various environmental and temporal circumstances [54].
is study focuses on the new methodology developed for monitoring drought more precisely and comprehensively in a specific area. e SPI at various time scales (1, 3, 6, 9, 12, and 24) is used for the current analysis.

Model-Based Clustering for Spatio-Temporal Categorical Sequences (MBCSTCS).
Model-based clustering has been used for time series [49] and regression time series analyses [50]. Various studies associated with model-based clustering are available in the literature; the technique has significant importance for many applications; however, it has not yet received greater attention in drought analysis. Furthermore, in drought classification, categorical sequences are required for obtaining reliable results for the drought characterization. In this perspective, this study proposed MBCSTCS to analyze the categorical drought sequences for various time scales and stations. e MBCSTCS provides more significant results by using a categorical grouping of sequences than traditional approaches that have been used for the prediction. e MBCSTCS reflects the steering behavior of drought classes on various time scales and stations. Moreover, the selected drought classes (states) ("(Extremely Dry (ED), Severely Dry (SD), Normal Dry (ND), Median dry (MD), Median Wet (MW), Severely Wet (SW), and extremely Wet (EW)") are considered for the region [55].
Moreover, the first-order Markov model has a rationale in statistical modeling. e MBCSTCS considers the functional shape of first-order Markov model components for each data group. Furthermore, in the MBCSTCS the data groups consist of various sequences of drought states. For example, we let observation X � (x 1 , . . . , x m ) T that specifies for an ordered sequence, where each of its elements x J consists of a categorical value that is specified for varying drought states and coded by natural integers. Furthermore, it is assumed that the number of unique drought states equals p, i.e., x J ε {1, 2, . . . p} for j � 1, 2, . . . m. Moreover, using a joint probability expression it can be written as In this format, the first-order Markov model provides an interesting method to describe the transitions between varying states. e probability of transitions of drought states in the next step depends only on the present state and has no connection to the drought states that are observed in the past. e joint probability using the first-order Markov model is given in the following equation: (1) Furthermore, to simplify the notations, we use β to denote initial state probability and c to represent the transition probability. For example, β x 1 shows the probability that the initial state is x 1 and transition probability of x j−1 to x j is represented by c x j−1 x j . So, utilizing the given notations, we can write as there are p states in the Markov model, and in this case, the initial state probabilities can be represented as β � (β 1 , . . . β p ) T and the matrix of the transitions as Γ � (c jr ) p×p .
Moreover, for the specific component based on finite mixture modeling the β x 1 and c x j−1 x j are replaced by the β kx 1 and c kx j−1 x j and the model can be written as follows: e log-likelihood of equation (2) can be expressed as follows: In equation (3), the I(.) is indicator function and m i indicates the length of i th categorical sequence. Expectationmaximization (EM) algorithm is employed to estimate the parameters [56].

Prediction of Future Drought Occurrences for Spatial-Temporal Categorical Sequences.
e setting of transition probability matrices can be represented by Γ 1 , Γ 1 , . . . , Γ k and a probability distribution π 1 , π 2 , . . . , π K connected with mixture components, and the M-step transition probability matrix can be created by where Γ M k indicates the matrix Γ M raised to the power M. e choice of the appropriate distribution π 1 , π 2 , . . . , π K is linked with the application. However, the (α 1 , α 2 , . . ., α K ) and (i.e., z i1 , z i2 , . . ., z iK ), which are the mixing proportion estimated vector and the posterior probability estimated vector, respectively, associated with a particular sequence, can hold significant influence for the computation of probability distribution for future drought occurrences.

Application
e choice of the region is based on its structural impacts and other climatic characteristics that affect the other parts of the country. e outcomes of the study are obtained from the six selected stations with time-series data from January 1971 to December 2017 of the northern area of Pakistan ( Figure 1) using SPI at various time scales. e selected stations have significant importance for the selected region and other regions of the country. For example, the reservoir system and agriculture sector are highly associated with the selected region; therefore, the climatic discrepancy of the region is significant for the other parts of the country [57,58]. Furthermore, the fluctuation of the weather pattern in other regions within the country also contributed to their impacts on socioeconomic and environmental sectors. Most of the parts of the country have been facing the highest temperature, and these parts are being highly influenced by global warming [58,59]. Undoubtedly, extreme climate events, including high temperatures, rainstorms, and droughts, are frequently associated with global climate warming. Climate warming significantly affects the universe, which usually causes a high temperature and water deficiency. ese issues are associated with drought occurrences that damage the environment, natural resources, and lives of the people distinctly more than any other natural hazard. Furthermore, it produces convoluted consequences for society and the economic sectors of the country. erefore, it is vital to recognize the drought occurrences more instantly by emerging comprehensive and efficient frameworks and techniques. In this regard, a new technique is applied to the selected stations that will influentially expand the capability of detecting drought occurrences and improve the competencies for drought evaluation and its assessment.

Results.
e findings of this study are obtained by using long time series data collected from six climatological stations in the northern area of Pakistan. e selected stations are observed to show homogenous results for the specific indices when calculated for varying stations with a single time scale [55]. However, on the varying time scales, the observations of the indices may vary. Furthermore, the inconsistency in their observations and varying generating processes of the drought states causes for developing a new method (i.e., MBCSTCS). e MBCSTCS considers the various time scales for a particular station as sequences with inconsistency in their sizes and varying data generating processes to analyze the spatiotemporal behavior of the drought states. It means that the observations of the SPI at scale-1 (SPI-1) for Astore station are considered as sequence-1, sequence-2 takes all observations of Astore station in SPI at scale-3 (i.e., SPI-3), and these sequences are continued to the last scale (SPI-24). Accordingly, these sequences can be assigned for other stations and time scales. Furthermore, the observations of each sequence assume that they come from the specific components that are selected appropriately for the data. e selected states are observed corresponding to every calculated value of SPI. ese selected states are further distributed categorically for the computation of this study.
Moreover, Niaz et al. [55] proposed a new technique for monthly forecasting drought intensities using model-based clustering of categorical drought state sequences. e mentioned study is performed on various stations based on a single time scale. However, in this study, the various time scales are accumulatively considered for the monthly prediction of drought severity in a region. e outcomes of the current analysis are more appropriate, especially for the selected stations, and help the policymakers to make better policies related to various kinds of droughts including meteorological, hydrological, and socioeconomic. Furthermore, the current analysis is performed by using the R package ClickClust [45] that handles the case of coming observations from various probability distributions (K-components). e package is based on finite mixtures with Markov model components and is used to find the specific outcomes related to the specific sequence. e appropriate order K of the components (i.e., the mixture model) is identified by minimizing the Bayesian information criterion (BIC) [60]. Moreover, for a specific sequence, the mixing proportions estimated vector and the associated posterior probability estimated vector were used to calculate probability distribution associated with future steps of transitions from the last state of these sequences. Furthermore, climatological statistics on the given data of various stations are provided in Figure 2. To accomplish the analysis, the R package named propagate is used to provide appropriate findings and permit the specific analysis. In the mentioned package, various distributions are considered; among the given distributions, the appropriate choice of the distribution is based on the BIC values. is selection criterion is helpful to find the best fitting for the time scale and stations specified for the analysis.
e BIC values are given in Table 1 for the selected probability distributions fitting appropriately to the several time scales and stations. For example, at Astore station for scasle−1, the BIC value (−1036.5) of three-parameter (3P) Weibull distribution is found minimum among other distributions. erefore, the (3P) Weibull distribution is considered as best fitted distribution for the Astore station at a scale−1. Furthermore, in Astore station for scale−3, the Gamma distribution is selected with the minimum BIC value (−1279.1). Moreover, in scale−6 and scale−9, it is also found that the Gamma distribution is best fitted in Astore station with minimum BIC values −892. 8 (1, 3, 6, 9, 12, and 24). After standardization with a selected probability distribution, the next step is the classification of the SPI for various drought states (Table 2). In Figure 3, the temporal behavior of the SPI at scale-1 is presented for various stations. However, the behavior of SPI for other selected time scales can be presented accordingly.
After calculating values to quantify SPI at various time scales, we first categorized SPI for its magnitude. e behavior of several drought classes for SPI at a one-month time scale in selected stations is provided in Table 3, where the observations are taken as an example for various months of the year 2017. Accordingly, the behavior of several drought classes for other years for different time scales is calculated. ese observed drought classes are further used to find the probability distribution associated with the three-step transition from the last state in the various sequences. e posterior vector related to these sequences specifies the parameter values (briefly described in Section 2.2). e obtained results show that the most likely state to visit in three steps is ND, which means the probability associated with ND is higher than the other selected states in varying sequences (Table 4). For example, for the Astore station, in sequence-1, the value indicates that the probability of ND occurrence is 0.6668, which is higher than other selected states. is probability of occurrence for ND can be observed from other sequences. Further in sequence-2, the probability of ND occurrence after three months is 0.6729. Moreover, the probabilities of ND in sequence-3, sequence-4, sequence-5, and sequence-6 are 0.6611, 0.6221, 0.6450, and 0.6729, respectively. It means that the policymakers should make their plans accordingly for ND. Other information can  be observed from the various sequences for different time scales. However, the ND is prevailing in all time scales in the selected region. So, the policymakers should work to mitigate negative impacts for this specific drought state (ND).

Discussion.
e time series data were collected from six meteorological stations in the northern area of Pakistan. e drought index SPI is used for the analysis with various time scales for selected stations. e reliable and efficient outcomes of the analysis provide strong indications about the drought occurrences that can significantly help for an early warning system [31,53,58,59,61]. erefore, a new MBCSTCS method is developed for the drought monitoring and mitigation policies that explicitly envisage spatiotemporal information. e proposed technique uses the longrun behavior of drought states (categorical sequences) from various time scales and stations in the selected region. If a time scale changes, then the categorical sequence sizes are affected. erefore, in past studies, researchers have not been studied various time scales accumulatively due to inconsistency in their sizes and the phenomenon that has generated the observations for varying stations. However, these issues are being resolved effectively by the current technique. Furthermore, the outcomes associated with the present technique help to accomplish the current objective and provide more substantial outcomes for the selected drought states based on varying time scales and stations. MBCSTCS uses state selection procedures through finite mixture modeling and model-based clustering. Niaz et al. [55] developed a new model-based clustering technique that predicts probabilities for various drought classes. ey computed categorical drought state (classes) sequences for selected drought classes and predicted their probabilities for the future. e mentioned study used a single time scale on various stations. However, in this study, the varying time scales are accumulatively measured for the monthly prediction of drought severity in selected stations. erefore, it is a novel method for predicting drought severity using spatiotemporal categorical sequences. MBCSTCS is applied to six meteorological stations in the northern area of Pakistan. It is found that MBCSTCS provides expeditious information for the long-term spatiotemporal categorical sequences. e present analysis results are more suitable,   (1, 3, 6, 9, 12, and 24)     8 Complexity especially for the selected region, and help the policymakers make better policies related to various kinds of drought, including meteorological, hydrological, and socioeconomic. e MBCSTCS may help to make plans for early warning systems, water resource management, and drought mitigation policies to reduce the severe effects of drought.

Conclusions
e outcomes of MBCSTCS provide the future probabilities corresponding to each of the drought states in varying stations and time scales. e obtained outcomes show that the most likely state to visit is ND, which means the probability associated with ND is higher than the other selected states in varying sequences. For instance, in sequence-1, the value shows that the probability of ND is 0.6668, which is higher than other selected states. Further in sequence-2, the probability of ND after three months is 0.6729.
is probability of ND also prevails in other sequences. Furthermore, in sequence-6, the ND has a higher probability (0.6729) of occurrence in the future. erefore, policymakers should work to reduce the negative impacts of this drought state (ND). In conclusion, this study suggests a more appropriate technique that emphasizes evaluating drought occurrences more instantaneously. e MBCSTCS helps the policymakers to make better policies related to various kinds of drought, including meteorological, hydrological, and socioeconomic. Furthermore, the analysis provides the basis to bring more attention to early warning systems. Moreover, the outcomes of the current analysis are only capable of transmitting in the present circumstances of the application site, as the circumstance of the selected stations will change the influence of the outcome for the extrapolations. Furthermore, the study can find some propagations and compute several thresholds for different drought severities for the region. Moreover, other drought indices can be incorporated to envisage the drought occurrences effectively.
Data Availability e data used for the preparation of the manuscript are available from the corresponding author and can be provided upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Sequence ED  SD  MD  ND  MW  SW  EW  1