Rainfall modeling is significant for prediction and forecasting purposes in agriculture, weather derivatives, hydrology, and risk and disaster preparedness. Normally two models are used to model the rainfall process as a chain dependent process representing the occurrence and intensity of rainfall. Such two models help in understanding the physical features and dynamics of rainfall process. However rainfall data is zero inflated and exhibits overdispersion which is always underestimated by such models. In this study we have modeled the two processes simultaneously as a compound Poisson process. The rainfall events are modeled as a Poisson process while the intensity of each rainfall event is Gamma distributed. We minimize overdispersion by introducing the dispersion parameter in the model implemented through Tweedie distributions. Simulated rainfall data from the model shows a resemblance of the actual rainfall data in terms of seasonal variation, means, variance, and magnitude. The model also provides mechanisms for small but important properties of the rainfall process. The model developed can be used in forecasting and predicting rainfall amounts and occurrences which is important in weather derivatives, agriculture, hydrology, and prediction of drought and flood occurrences.
Climate variables, in particular, rainfall occurrence and intensity, hugely impact human and physical environment. Knowledge of the frequency of the occurrence and intensity of rainfall events is essential for planning, designing, and management of various water resources system [
Rainfall modeling is also important in pricing of weather derivatives which are financial instruments that are used as a tool for risk management to reduce risk associated with adverse or unexpected weather conditions.
Further as climate change greatly affects the environment there is an urgent need for predicting the variability of rainfall for future periods for different climate change scenarios in order to provide necessary information for high quality climate related impact studies [
However modeling precipitation poses a lot of challenges, namely, accurate measurement of precipitation since rainfall data consists of sequences of values which are either zero or some positive numbers (intensity) depending on the depth of accumulation over discrete intervals. In addition factors like wind can affect collection accuracy. Rainfall is localized unlike temperature which is highly correlated across regions; therefore a derivative holder based on rainfall may suffer geographical basis risk in case of pricing weather derivatives. The final challenge is the choice of a proper probability distribution function to describe precipitation data. The statistical property of precipitation is far more complex and a more sophisticated distribution is required [
Rainfall has been modeled as a chain dependent process where a two-state Markov chain model represents the occurrence of rainfall and the intensity of rainfall is modeled by fitting a suitable distribution like Gamma [
Wilks [
In study of Leobacher and Ngare [
Another approach of modeling rainfall is based on the Poisson cluster model where two of the most recognized cluster based models in the stochastic modeling of rainfall are the Newman-Scott Rectangular Pulses model and the Bartlett-Lewis Rectangular Pulse model. These models represent rainfall sequences in time and rainfall fields in space where both the occurrence and depth processes are combined. The difficulty in Poisson cluster models as observed by Onof et al. [
Carmona and Diko [
In this study the rainfall process is modeled as a single model where the occurrence and intensity of rainfall are simultaneously modeled. The Poisson process models the daily occurrence of rainfall while the intensity is modeled using Gamma distribution as the magnitude of the jumps of the Poisson process. Hence we have a compound Poisson process which is Poisson-Gamma model. The contribution of this study is twofold: a Poisson-Gamma model that simultaneously describes the rainfall occurrence and intensity at once and a suitable model for zero inflated data which reduces overdispersion.
This paper is structured as follows. In Section
Rainfall comprises discrete and continuous components in that if it does not rain the amount of rainfall is discrete whereas if it rains the amount is continuous. In most research works [
Our interest in this research is to simultaneously model the occurrence and intensity of rainfall in one model. We would model the rainfall process by using a Poisson-Gamma probability distribution which is flexible to model the exact zeros and the amount of rainfall together.
Rainfall is modeled as a compound Poisson process which is a Lévy process with Gamma distributed jumps. This is motivated by the sudden changes of rainfall amount from zero to a large positive value following each rainfall event which are modeled as pure jumps of the compound Poisson process.
We assume rainfall arrives in forms of storms following a Poisson process, and at each arrival time the current intensity increases by a random amount based on Gamma distribution. The jumps of the driving process represent the arrival of the storm events generating a jump size of random size. Each storm comprises cells that also arrive following another Poisson process.
The Poisson cluster processes gives an appropriate tool as rainfall data indicating presence of clusters of rainfall cells. As observed by Onof et al. [
Lord [
Let
The compound Poisson process (
The moment generating function
If we observe the occurrence of rainfall for
If on a particular day there is no rainfall that occurred, then
Therefore the process has a point mass at
The probability density function of
Let
We can express the probability density function
Consider a random sample of size
We observe that
A probability density function of the form
If we let
The relationship
The family of exponential dispersion models, whose variance functions are of the form
Examples are as follows: for
From
For
The cumulant generating function of a Tweedie distribution for
From (
For
By comparing the cumulant generating functions in Lemma
The requirement that the Gamma shape parameter
Based on Tweedie distribution, the probability of receiving no rainfall at all is
This follows by directly substituting the values of
The function
We approximate the function
The log maximum approximation of
It can be observed that
Generalized linear models extend the standard linear regression models to incorporate nonnormal response distributions and possibly nonlinear functions of the mean. The advantage of GLMs is that the fitting process maximizes the likelihood for the choice of the distribution for a random variable
The exponential dispersion models are the response distributions for the generalized linear models. Tweedie distributions are members of the exponential dispersion models upon which the generalized linear models are based. Consequently fitting a Tweedie distribution follows the framework of fitting a generalized linear model.
In case of a canonical link function, the sufficient statistics for
For
But
Given that
Let
Hence
Clearly a GLM only requires the first two moments of the response
Under the standard regularity conditions, for large
From the log-likelihood, the covariance matrix of the distribution is the inverse of the information matrix
So
Therefore
To compute
However estimating
Given the estimated values of
Daily rainfall data of Balaka district in Malawi covering the period 1995–2015 is used. The data was obtained from Meteorological Surveys of Malawi. Figure
Daily rainfall amount for Balaka district.
In summary the minimum value is 0 mm which indicates that there were no rainfall on particular days, whereas the maximum amount is 123.7 mm. The mean rainfall for the whole period is 3.167 mm.
We investigated the relationship between the variance and the mean of the data by plotting the
Variance mean relationship.
To model the daily rainfall data we use
The canonical link function is given by
In the first place we estimate
Profile likelihood.
From the results obtained after fitting the model, both the cyclic
Estimated parameter values.
Parameter | Estimate | Std. error | | Pr(> |
---|---|---|---|---|
| | | | |
| | | | <2 |
| | | | <2 |
| | - | - | - |
With
The predicted
Actual versus predicted mean.
Let the maximum likelihood estimate of
The goodness of fit is determined by deviance which is defined as
In terms of Tweedie distributions with
Based on results from fitting the model, the residual deviance is 43144 less than the null deviance 62955 which implies that the fitted model explains the data better than a null model.
The model diagnostic is considered as a way of residual analysis. The fitted model faces challenges to be assessed especially for days with no rainfall at all as they produce spurious results and distracting patterns similarly as observed by [
Residuals of the model.
So we assess the model based on quantile residuals which remove the pattern in discrete data by adding the smallest amount of randomization necessary on the cumulative probability scale.
The quantile residuals are obtained by inverting the distribution function for each response and finding the equivalent standard normal quantile.
Mathematically, let
Figure
Q-Q plot of the quantile residuals.
The model is simulated to test whether it produces data with similar characteristics to the actual observed rainfall. The simulation is done for a period of two years where one was the last year of the data (2015) and the other year (2016) was a future prediction. Then comparison was done with a graph for 2015 data as shown in Figure
Simulated rainfall and observed rainfall.
The different statistics of the simulated data and actual data are shown in Table
Data statistics.
Min | 1st Qu. | Median | Mean | 3rd Qu. | Max | |
---|---|---|---|---|---|---|
Predicted data | | | | | | |
Actual data | | | | | | |
Actual data | | | | | | |
The main objective of simulation is to demonstrate that the Poisson-Gamma can be used to predict and forecast rainfall occurrence and intensity simultaneously. Based on the results above (Figure
Probability of rainfall occurrence.
However the model performed poorly in predicting probability of rainfall occurrence as it underestimated the probability of rainfall occurrence. It is suggested here that probably the use of truncated Fourier series can improve this estimation as compared to the sinusoidal.
But it performed better in predicting probability of no rainfall on days where there was little or no rainfall as indicated in Figure
It can also be observed that the model produces synthetic precipitation that agrees with the four characteristics of a stochastic precipitation model as suggested by [
In addition the model allows modeling of exact zeros in the data and is able to predict a probability of no rainfall event simultaneously.
A daily stochastic rainfall model was developed based on a compound Poisson process where rainfall events follow a Poisson distribution and the intensity is independent of events following a Gamma distribution. Unlike several researches that have been carried out into precipitation modeling whereby two models are developed for occurrence and intensity, the model proposed here is able to model both processes simultaneously. The proposed model is also able to model the exact zeros, the event of no rainfall, which is not the case with the other models. This precipitation model is an important tool to study the impact of weather on a variety of systems including ecosystem, risk assessment, drought predictions, and weather derivatives as we can be able to simulate synthetic rainfall data. The model provides mechanisms for understanding the fine scale structure like number and mean of rainfall events, mean daily rainfall, and probability of rainfall occurrence. This is applicable in agriculture activities, disaster preparedness, and water cycle systems.
The model developed can easily be used for forecasting future events and, in terms of weather derivatives, the weather index can be derived from simulating a sample path by summing up daily precipitation in the relevant accumulation period. Rather than developing a weather index which is not flexible enough to forecast future events, we can use this model in pricing weather derivatives.
Rainfall data is generally zero inflated in that the amount of rainfall received on a day can be zero with a positive probability but continuously distributed otherwise. This makes it difficult to transform the data to normality by power transforms or to model it directly using continuous distribution. The Poisson-Gamma distribution has a complicated probability density function whose parameters are difficult to estimate. Hence expressing it in terms of a Tweedie distribution makes estimating the parameters easy. In addition, Tweedie distributions belong to the exponential family of distributions upon which generalized linear models are based; hence there is an already existing framework in place for fitting and diagnostic testing of the model.
The model developed allows the information in both zero and positive observations to contribute to the estimation of all parts of the model unlike the other model [
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The authors extend their gratitude to Pan African University Institute for Basic Sciences, Technology and Innovation for the financial support.