Assessing the Performance of the Discrete Generalised Pareto Distribution in Modelling Non-Life Insurance Claims

In this paper, non-life insurance claims were modelled under the three parameter discrete generalised Pareto distribution. Data from the National Insurance Commission of Ghana on reported and settled claims were considered for the period 2012-2016. The maximum likelihood estimation principle was adopted in fitting the discrete Pareto distribution to the yearly and aggregated data. The estimation involved two steps. Firstly, the $\mu$ and $(\mu+1)$ frequency method of \citet{Prieto2014} was modified to suit the characteristics of the data under study. Secondly, a bootstrap algorithm was implemented to obtain the standard errors of the estimators of the parameters of the discrete generalised Pareto distribution. The performance of the discrete generalised Pareto distribution is compared to the negative binomial distribution in modelling the non-life insurance claims data using the information criteria of Akaike and Bayesian. The results show that the discrete generalised Pareto distribution provides a better fit to the non-life claims data. Keywords: Non-life insurance claims, discrete generalised Pareto distribution, negative binomial distribution, maximum likelihood estimation, information criteria.


Introduction
Non-life or General Insurance involves the provision of financial loss protection against risks on interests other than life, such as buildings, vehicles, machinery and equipment. Conditioned on periodic payments or one-off advance of a predetermined amount, called premium, non-life policies are designed to provide coverage against the occurrence of the insured probabilistic events, for individuals, private organisations, and public institutions. The payments effected in response to occurrences of such events are termed as insurance claims (Wuthrich, 2019). The non-life insurance claims process is characterised by two features: claims frequency or count, and claims severity or size. As noted by Renshaw (1994Renshaw ( ) andÖzgürel (2005, the underlying expectations of claims frequency and severity, quantified as a product, are foremost considerations in the computation of pure or risk premiums. The main objective of this paper is to illustrate that the discrete generalised Pareto (DGP) distribution can be employed to model the counts of non-life insurance claims, collated by an insurance regulatory authority from a licensed class of insurers. In the absence of suitable actuarial models, non-life insurers largely encounter difficulties in conducting evidence-based assessment of risks insured, often resulting in the mis-computation of premiums, and inability to settle claims when due. In response, developing probability models that describe claims frequencies offer a distributional framework for evaluating risks, to facilitate premium setting and liquidity reserving by non-life insurance service providers.
A random variable X that follows the discrete generalised Pareto distribution with shape (α), scale (λ), and location (µ) parameters is denoted by X ∼ DGP(α, λ, µ). This is a parametric model is obtained by discretising the continuous generalised Pareto distribution, introduced by Pickands III (1975) and it is particularly noted for tail modelling properties. The discrete generalised Pareto distribution assumes varying forms based on omission of one or two of its parameters. For instance, if the location parameter, µ = 0, the discrete generalised Pareto distribution transforms into the Discrete Lomax distribution, DLo(α, λ).
A fair amount of research has demonstrated the application of probabilistic models to the study of non-life insurance claims. Among the relevant literature, selected studies have principally explored the subject with reference to standard probability distributions, from the outlook of randomised spatial effects (Gschlößl and Czado, 2007), collective risk simulation (Pacáková, 2011), and incorporation of covariates (Renshaw, 1994). In this paper, we attempt to contribute to the gap by providing a method of fitting the discrete generalised Pareto distribution to nonlife claims data from the National Insurance Commission in Ghana. Further, to illustrate the performance of the discrete generalised Pareto distribution, we compare it to the negative binomial distribution, similar to the work of (Prieto et al., 2014). The results show that the discrete generalised Pareto distribution is appropriate in modelling the count frequencies of non-life insurance claims.
The study contributes to the field of claims modelling in three folds. First, it proposes a count-based data scheme for analysis of non-life insurance claim frequencies, to enhance the precision of statistical models. Second, a modification is made to the µ and µ + 1 frequency method of Prieto et al. (2014), to obtain initial estimators of the discrete generalised Pareto distribution in claims data characterised by varying discrete observational intervals. Third, the algorithm implemented under the estimation of the parameters of the discrete generalised Pareto distribution offers a resource to researchers, on performance analysis in future statistical and/or actuarial work. The rest of the paper is organised as follows. In Section 2, the methodology is presented including the maximum likelihood estimation of the parameters of the distributions and model selection criteria. Section 3 presents the data and the arrangements needed to put the data into a form necessary for the model fitting. Lastly, in Section 4, we present concluding remarks.

Methodology
This section presents the systematic approach followed to model the reported and settled claims datasets. Specifically, the section entails the description of the probability distributions, parameter estimation and model selection criteria.
The parameter estimation method called the Maximum Likelihood Estimation (MLE) technique, would be used in fitting the models. Consider the case where a random variable is available from a population with a known probability distribution except for its parameter θ θ θ ∈ R d . The maximum likelihood principle suggests that the criterion for making the selection should be the probability (or likelihood) with which a particular distribution can produce the given sample. The value of θ θ θ for that distribution is the maximum likelihood estimate of the unknown θ θ θ.
Suppose x x x = {x 1 , x 2 , ..., x n } ′ is an independent random sample of size, n, from a distribution with dependence on one or more unknown parameters θ θ θ = {θ 1 , θ 2 , ..., θ d } ′ . Let f (x i ; θ θ θ) be the probability density (or mass) function of x i , with θ θ θ restricted to a given parameter space Ω ∈ R d . The likelihood function of the sample is given by The maximum likelihood estimator,θ θ θ, of θ θ θ is the solution to the equation Usually, the L(θ θ θ; x x x) may involve exponentials, and hence, ln L(θ θ θ; x x x) is maximised. Since the logarithm of a function increases or decreases with the function, the maximiser of L(θ θ θ; x x x) also maximises ln L(θ θ θ; x x x).

Negative Binomial Distribution
The negative binomial is a discrete probability distribution which characterises the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified number of failures (denoted r) occurs. Suppose a sequence of Bernoulli trials is observed. By definition, a turn of each trial yields two possible outcomes, success or failure, with respective probabilities of occurrence denoted by p and 1 − p. Also, the trials are independent and p remains constant for each trial. If X represents the number of trials (or failures) prior to the r-th success, then X follows a negative binomial distribution with probability mass function: The geometric distribution is a special case of the negative binomial, where the Bernoulli trial discontinues at first failure, r = 1. Since the negative binomial distribution may be represented by alternative parametrisations, three factors inform distinctions: starting point of the support -whether at x = 0 or x = r; definition of p -whether it represents the probability of success or failure, and interpretation of r -whether it denotes the number of success or failure (DeGroot and Schervish, 2012).
Given N independent and identically distributed claims count observations, (k 1 , · · · , k N ), the likelihood function can be expressed as: (2) Substituting (1) into (2) and taking the logarithm results in the log-likelihood function given by To maximise equation (3), the partial derivative with respect to r and p are set to zero (0), Here the digamma function ψ(k) = Γ ′ (k)/Γ(k). Furthermore, solving for p in equation (4) produces:p Finally, substituting p in (5) yields, The form of (7) suggests that a closed form solution for r may not be obtained analytically. Therefore, numerical methods can be used in order to obtain estimators of r and p. For example, in R Core Team (2017), the function fitdistr in the MASS package provides a routine for estimating the parameters of the negative binomial distribution.

Discrete Generalised Pareto Distribution
The discrete generalised Pareto distribution arises from a discretisation of the continuous generalised Pareto distribution. To provide a basis for the discussion on the discrete generalised Pareto, the Pareto Type-I and generalised Pareto are given as and respectively. The generalised Pareto distribution is noted for its ability to model tails of distribution functions (See e.g. Davison and Smith, 1990;Beirlant et al., 2004). Upon discretisation of the generalised Pareto distribution, the resultant discrete generalised Pareto distribution inherits the prior continuous properties in forms adapted to the discrete probability space. From the stated generalised Pareto distribution (9), the probability mass function of the discrete generalised Pareto can be formally deduced. First, consider the cumulative distribution function of the discrete generalised Pareto expressed as: where α, λ, µ > 0 and F(x) = 0 if x < µ. Also, Krishna and Pundir (2009) addressed the discretisation of a continuous model by observing unit groupings on failure time axis. The authors reasoned that for a continuous failure time X , with survival function S(x) = P[X > x] and time groupings of intervals dX = ⌊X ⌋, the discrete observed variable, dX , would have the probability mass function, Next, consider a standard generalisation for survival function from Xekalaki (1983), then, employing equations (10) and (12), the survival function of the discrete generalised Pareto distribution is given Finally, evaluating equations (11) and (13) simultaneously result in the discrete generalised Pareto distribution, Suppose x 1 , · · · , x n is a sample of size n from a discrete generalised Pareto distribution. The parameters α and λ are estimated on the assumption that µ is known, sinceμ = x min ≤ x i , ∀i. Adopting the µ and (µ + 1) frequencies method of Prieto et al. (2014), the initial values, (α 0 , λ 0 , µ 0 ) can be obtained and used as seed estimators in the subsequent maximum likelihood operation. Thus, the relative frequencies of x = µ, and x = (µ + 1), respectively denoted bŷ f µ andf µ+1 , are calculated from the sample data. Analogously, α and λ are determined by substituting x = µ and x = (µ + 1) into the discrete generalised Pareto probability mass function in (14) and equating the expressions to their respectivef µ andf µ+1 values. However, the µ and (µ + 1) frequencies method assumes that the count data used is observed in increasing steps of 1. However, in real-life situations, such as the data presented in Section 3.1, may exhibit variation of intervals other than 1. Therefore, applying the method strictly on the count data results in generating several µ + 1 = 0, and hence, leading tof µ+1 = 0.
In that regard, proceeding with the computations with zero (0) relative frequencies, will result in a loss of essential frequency information in the dataset. As a result, we provide a modification of the method as µ and µ + ε frequency methods, where ε > 0 and (µ + ε) is the smallest observation larger than the minimum, µ. Therefore, the estimators of α and λ are obtained by solving the resulting expressions, simultaneously. The expression in (17) results after α is eliminated from equations (15) and (16), Following from this, (18) is obtained by appropriate substitutions into (17), Next, the maximum likelihood estimation method is employed to obtain estimators of the parameters of the discrete generalised Pareto. The log-likelihood function is constructed as, where f (x i ) refers to the probability mass function specified in (14). Partial derivatives of (19) are taken with respect to α and λ, and set to zero (0) to obtain normal equations, To proceed with estimation of the parameters of the discrete generalised Pareto distribution, a function in R was written to perform the following this algorithm: A1. Specify the log-likelihood function (19) based on the discrete generalised Pareto probability mass function, (14). The log-likelihood function is set to return a negation of the log-likelihood value, since the R optim function is a minimiser. In effect, minimising the negated log-likelihood function at the initial estimates, produces the equivalent of maximising the log-likelihood function.
A3. Extract the estimated parameters, α and λ from the output generated in A2 and compute the standard errors of the estimators using bootstrap resampling of Efron and Tibshirani (1993).

Model Selection Criteria
The Akaike (1974) and Bayesian Schwarz et al. (1978) Information Criteria, usually denoted AIC and BIC respectively, form the basis for selecting the suitable model. The AIC and BIC are stated as follows: and In comparison with AIC, BIC addresses the issue of overfitting with a factor, ln(n), thereby placing a higher penalty for model complexity (Dziak et al., 2019). In statistical decision making, a candidate model with minimum AIC and/or BIC values is selected.

Data
The study employs secondary data on non-life claims from the National Insurance Commission (NIC) of Ghana for the five-year period, 2012 to 2016. The historical data covers insurance claims of twenty-nine (29) non-life service providers. For each fiscal year, the dataset indicates total number of claims administered under each of the five (5) business classes of non-life insurance in Ghana. The classes are: Fire, Burglary, and Property Damage; Accident; Marine and Aviation; Motor; and General Liability. The claims data is organised into three (3) categories: Incurred But Not Reported (IBNR), Reported But Not Settled (RBNS), and Settled But Outstanding (SEBO) each bearing the standard actuarial definitions.
However, since IBNR is necessarily an estimate, the study focuses on RBNS and SEBO, hereinafter referred to as reported and settled claims respectively. Overall, the data consists of 3,878,355 non-life insurance policies, generating 39,563 reported claims, of which 5,210 claims were settled within the period. Figure 1 provides an overview of the annual aggregates for policy subscriptions, reported claims, and settled claims. Although policy subscriptions have seen a decrease from 2014 onwards, the number of reported and settled claims have been increasing within the period. This observation shows some evidence of potential liquidity challenges for the non-life insurers, if the trend persists into the future. Following Prieto et al. (2014), the dataset is organised in such a way as to enable the fit of the discrete distributions to the frequency of occurrence of reported and settled claims. Tables 1 and 2 present descriptive statistics on the reported and settled claims datasets respectively.
For each year of the period considered, the skewness indicates the extent of symmetry and shows that there is positive skewness for the distribution of count of claims. Also, among the reported claims, the fiscal year 2016 recorded some unusually large values culminating in its large kurtosis value. Similar results can be found in 2013 for the settled claims data.  In addition, Tables 3 and 4 record the individual observations of reported and settled claim counts with corresponding frequencies. For instance, in 2012, there were 2 records of reported claims count of 19, and 6 records of cases where no claims were settled.
It should be noted that, the respective columns for the count frequencies for both reported and settled claims sum up to 29. Thus, each of them tallies with the total number of nonlife insurers from whom records are gathered by the National Insurance Commission. Lastly, non-settlement of reported claims, among other reasons, may result from eligibility of reported interest, proximity of cause of insured event, and non-compliance with coverage provisions of the insurance policy.

Model Fitting and Discussion of Results
This section presents the outcomes of the model fitting methods discussed in Section 2 on the claims data from the preceding section. The parameter estimates are obtained through maximum likelihood method. The maximum likelihood estimation of the negative binomial and discrete generalised Pareto parameters are performed in R. The negative binomial parameters are estimated using the mle function and its standard arguments in the fitdistrplus package. However, to the best of the authors' knowledge, no statistical package exists for estimating the parameters of the discrete generalised Pareto in R. Therefore, the authors wrote an R-function for estimating the parameters of the discrete generalised Pareto distribution, using the algorithm A1-A3, and it is available upon request. In addition, the selection criteria for model comparison are presented for the individual years and the aggregated claims data for the five-year period.  Table 5 shows the parameters of the negative binomial distribution estimated using an alternative parametrisation given by X ∼ NegativeBinomial(r, m/(m + r), wherer andm represent the mean and dispersion parameters respectively. The standard errors are placed in parenthesis.

Parameter Estimation for Yearly Data
Also, Table 6 presents estimates from the estimators,μ,α andλ, representing the estimated discrete generalised Pareto distribution's location, shape and scale parameters respectively. The bootstrap standard errors are placed in parenthesis.
In terms of reported claims count, Table 7 shows that the discrete generalised Pareto model presents smaller AIC and BIC values, in comparison with the negative binomial model. The observation is consistent across the fiscal timelines under consideration. In addition, for the settled claim counts, the discrete generalised Pareto distribution is preferred as it exhibits smaller AIC and BIC values throughout the period as shown in Table 8. Therefore, the discrete generalised Pareto distribution is recommended as it provides a better fit to both classes of the non-life insurance claims data.

Parameter Estimation for Aggregated Data
This section presents the results of fitting the negative binomial and discrete generalised Pareto distributions to the aggregated 5-year count data on reported and settled claims. The results of the parameter estimation for negative binomial and discrete generalised Pareto distributions are presented in Tables 9 and 10 respectively.   In comparing the negative binomial and discrete generalised Pareto distributions, Table 11 shows the AIC and BIC values for the fit of the two probability distributions. It is obvious that discrete generalised Pareto distribution model produces smaller AIC and BIC values for the aggregate reported claim counts. Also, regarding the aggregate settled claim counts, smaller AIC and BIC values are produced by the discrete generalised Pareto distribution. Therefore, in alignment with the year-based modelling conclusion, the discrete generalised Pareto distribution is recommended, as it provides a better fit to both classes of yearly data and the aggregated nonlife insurance claims data.

Conclusion
The study demonstrates that non-life insurance claims frequency can be described by the threeparameter discrete generalised Pareto distribution. Relative to the negative binomial, the discrete generalised Pareto provides a better fit to the distribution of non-life claims data regarding reported and settled counts, as evident in both cases of disaggregated and aggregated year periods. Additionally, profiling non-life insurance datasets by the number of claims and corresponding frequency of counts was shown to provide a data structure for efficient statistical modelling. Further, in evaluating initial estimators (α 0 , λ 0 , µ 0 ) for determining the discrete generalised Pareto parameters, the constant unit increment in the µ−frequency and (µ+1)−frequency methods of Prieto et al. (2014) were modified to the µ and µ + ε, ε > 0, where µ + ε is the smallest observation greater than µ (the minimum value). This frequency method, µ and µ + ε, extends the application of µ and (µ + 1) frequency method to practical count data exhibiting varying observational intervals.