Shrinkage Methods for Estimating the Shape Parameter of the Generalized Pareto Distribution

. The generalized Pareto distribution is one of the most important distributions in statistics of extremes as it has wide applications in ﬁ elds such as ﬁ nance, insurance, and hydrology. This study proposes two new methods for estimating the shape parameter of the generalized Pareto distribution (GPD). The proposed methods use the shrinkage principle to adapt the existing empirical Bayesian with data-based prior and the likelihood moment method to obtain two estimators. The performance of the proposed estimators is compared with the existing estimators (i.e., maximum likelihood, likelihood moment estimators, etc.) for the shape parameter of the generalized Pareto distribution in a simulation study. The results show that the proposed estimators perform better for small to moderate number of exceedances in estimating shape parameter of the light-tailed distributions and competitive when estimating heavy-tailed distributions. The proposed estimators are illustrated with practical datasets from climate and insurance studies.


Introduction
The generalized Pareto distribution (GPD) is one of the fundamental distributions used in extreme value theory.As reported in [1,2], the GPD is the limiting distribution of excesses over a large threshold regardless of the underlying distribution of data, F The significance of the GPD in extreme value theory cannot be overemphasized because of its extensive applicability: hydroelectric dam management [3], flood levels of river, insurance and finance [4,5], waiting time problems [6], ecology [7], and climatology [8], among others.
Let Y 1 , Y 2 , ⋯, Y n be a random sample from Y with unknown underlying distribution function, F An exceedance occurs over a deterministic threshold, u, if the value of Y is greater than u, i.e., Y = Y j Y i > u, j = 1, ⋯, n u ; i = 1, ⋯, n, where n u is the number of observations exceeding u Also, the excess over a threshold is defined as Y − u with Y > u The excess distribution over threshold is given by where y 0 is the finite or infinite right endpoint of the distribution F. From [1,2], the distribution of Y is well approximated by the GPD with the survival function defined as If the location μ μ ∈ ℝ , shape (ε ε ∈ ℝ ), and scale (σ σ > 0 ) parameters are included (see, e.g., [9]), the resulting three-parameter generalized Pareto distribution is given by The shape parameter, ε, is usually referred to as the tail index or the extreme value index [3,10,11].The mean and the variance of a GPD random variable, Y, are given by respectively.Here, the expected value of Y exists when ε < 1 and the variance when ε < 1/2 The shape (tail index) parameter shows the tail heaviness of the underlying distribution and can be used to classify the distribution into three classes.These are short-tailed when ε < 0, medium-tailed when ε = 0, and heavy-tailed when ε > 0. In special cases, ε = 0 and ε = 1, the GPD reduces to exponential and uniform distributions, respectively.
Several estimation techniques have been proposed for estimating the parameters of the GPD such as maximum likelihood (ML), probability weighted moments, and elemental percentile.For large sample sizes, [12] showed that the ML estimator (MLE) is asymptotically the best.However, the MLE is known to pose computational difficulties and convergence problems [13,14].In addition, it does not exist when ε > 1 [15] and provides nonfeasible estimates if ε < −0 5 [13,16].
To address these problems of the MLE, [15] proposed the likelihood moment estimators (LME).This estimator shows high efficiency for small sample sizes.However, [17] reports that it performs poorly if ε ≥ 2 and tends to overestimate the shape parameter for larger values of the scale parameter.Also, the LME again is computationally intensive and slow.Hence, [18,19] proposed an empirical Bayesian method to solve the computational intensiveness of the LME.The resulting estimators of the tail index from this method showed better performance for ε ∈ −0 5,1 Despite this advantage, the authors use all available data in the computation of the estimates of the parameters of the GPD.However, it is known that for GPD-based models, the inclusion of small to moderate observations introduces bias [11,20].
In this paper, we propose two estimators of the tail index of the generalized Pareto distribution using the idea of the shrinkage principle.The estimators are a combination of the likelihood-based approach and that of the empirical Bayesian methods.The performance of the proposed method is evaluated in conjunction with the existing estimators in a simulation.
The rest of the paper is organized as follows.Review of the existing estimators and proposed two new estimators are in Section 2. The performances of the estimators are examined via an extensive simulation study in Section 3. A practical application of these estimators on two datasets is demonstrated.Finally, Section 5 provides concluding remarks.

Estimation of the Parameters of the Generalized Pareto Distribution
Assuming F is a GPD with scale and shape parameters σ and ε, respectively, the following methods can be used to obtain the estimators of the parameters, maximum likelihood [13], likelihood moment [15], profile likelihood with empirical Bayesian [18], and improved profile likelihood with empirical Bayesian [19] methods.In the next subsections, we present a brief description of each of these methods.

Maximum Likelihood Estimator (MLE).
The maximum likelihood principle involves finding the set of parameters that maximize the likelihood function L Y, θ evaluated on the sample from the distribution, F, of the random variable, Y: where f is the density function associated with F.
Many studies have shown that the estimators based on the maximum likelihood estimator are usually better in the case of large samples (see, e.g., [11,12,21,22]), among others.In addition, its asymptotic properties including normality are easily found compared with other estimation methods.
In the case of statistics of extremes, and in particular the GPD, [11,13,21], among others, show that for small sample sizes, the maximum likelihood performs poorly.Also, the numerical algorithm may fail to converge to the maxima.[21] suggests reparametrization of (ε, σ) to (ε, θ) of which θ is of the form ε/σ and σ is estimated as σ = ε/θ, with Hence, With this reparametrization, the MLE becomes asymptotically efficient.However, the computational issues persist such as nonconvergence and, thus, the search for solutions including the likelihood moment estimation technique.

Likelihood Moment
Estimator.The likelihood moment estimator (LME) was proposed by [15] to solve the computational complexities, efficiency, and convergence problems of the MLE.LME is reported to be efficient and robust, yet it is computationally intensive.The LME is estimated with the log-likelihood and the moment estimators as It should be noted that for any constant r satisfying 1 + rε > 0, the E 1 − θY r = 1 + rε −1 of which the sample version is equivalent to Using some conditions on ε and r for the existence of the MLE estimator of the tail index, [15] obtains Here, p = rn / ∑ n i=1 log 1 − θY i , r < 1 and θ = ε/σ [15] depicts that when r = ε, the asymptotic variances of LME and MLE are the same.Even though the choice of threshold is an important issue in the estimation of the GPD, it was not considered by the authors.Also, [23] states that the LME is less sensitive than other estimators to the choice of threshold.In general, the LME estimator performs poorly if ε ≥ 2 and tends to overestimate the shape parameter for large values of the scale parameter [17].

The Profile Likelihood with Empirical Bayesian Method (PB)
. This estimation method was proposed by [18] to improve the MLE and also solve the computational difficulties of the LME method.It borrows ideas from empirical Bayesian method where ε and σ are reparameterized as ε and θ, of which θ is of the form ε/σ The authors defined their estimator as where l θ = n log θ/ε + ε − 1 is the profile log-likelihood function and π • is a data-driven "prior" density function for θ The integral in ( 14) is computationally difficult for most priors; hence, a simplified numerical version was proposed as where with Y ⋆ = Y n/4 +0 5 as the first quartile of the sample data, m = 20 + n , j = 1, ⋯, m and w θ j = 1/∑ m t=1 e l θ t −l θ j The shape and scale parameters are obtained as However, this method is reported to be sensitive to the shape of the prior distribution but performs well when −1 ≤ ε ≤ 0 5 Also, it is reported to have very poor performance for extremely heavy-tailed distributions and efficient for only small sample sizes [19].
Since this estimation method is in the peaks-over threshold framework, the selection of a suitable threshold is critical as it leads to bias-variance trade-offs.However, the authors did not consider this in their simulation study.
In this case, a GPD with scale σ ⋆ and shape ε ⋆ was chosen as the prior distribution for Here, ε ⋆ is chosen as negative to ensure that Y −1 n is positive and finite.
Similar to the estimator in Section 2.3, the modified estimators of ε and σ are obtained as Here, with the computation of θ * j as However, this method is reported to be sensitive to the estimation of the scale parameter and has poor performance for light-tailed distributions [9,24].In addition, this estimator is not valid for ε < 0 5 [22].Furthermore, the authors did not consider the effect of the number of exceedances, k, on their proposed estimator.

Empirical Review of Some Existing Estimation Methods.
In [18,19], it has been demonstrated numerically that their estimators of the tail index are less biased and efficient compared with existing estimators such as the maximum likelihood and likelihood moments.However, a closer look at the codes implemented for the simulation study reveals that the number of exceedances (k) is taken as the sample size (i.e., the whole sample is utilised in the estimation of the parameters).This is in contrast to the concepts in extreme value theory relating to the peaks-over threshold (POT) where emphasis is on the tails of the distributions.Therefore, in such a simulation study, estimators should be assessed on their sensitivity to the choice of top order statistics, i.e., number of exceedances.In view of the above, we present a simulation study to assess the performance of the proposals in [18,19] with that of the MLE and the LME of tail index as a function of the number of top order statistics or exceedances.We generate samples of size n = 200 and 1000 from the GPD with parameters, ε = −0 25, σ = 1 and ε = 0 25, σ = 1 .Figure 1 shows the behaviour of the estimators arising from these estimators of the tail index of the underlying distributions of the generated datasets.

Journal of Applied Mathematics
In the case of the estimation of ε = −0 25, LME tends to overestimate (show large deviations) the tail index when k is small.Also, PB and IPB methods underestimate the value of the tail index for small values of k and are only closer to the true value as k ⟶ n.In addition, for positive values of the shape parameter, the LME gives very good estimates, i.e., closer to the true value.However, the PB and IPB estimators give good estimates of ε only if k ⟶ n.Therefore, the estimators, PB [18] and IPB [19], do not perform well when the number of exceedances is small compared to the sample size.
Even though the [18,19] estimators (i.e., PB and IPB) show attractive properties in the estimation of the extreme value index, this small simulation study gives an indication of their sensitivities to the choice of k Therefore, in this study, we aim to provide estimators of the tail index that takes into account the advantages of the [18,19] estimators but are less sensitive to the choice of the number of exceedances, k.
2.6.The Proposed Method.This estimation technique seeks to improve the methods proposed by [15] and that of [18,19].The proposed method uses the idea of shrinkage estimation which relies on a weighted combination of these methods.
For convenience, we reparametrize σ and ε of the GPD as θ and ε, where θ = εσ −1 as done by [21].This implies that the shape and scale parameter are estimated, respectively, by ε and σ = ε θ −1 by maximizing the log-likelihood for the sample For ν ∈ 0, 1 , the first and second proposed shrinkage estimators of θ are given by respectively.The value of ν plays an important role in the shrinkage estimator: ν ⟶ 0 implies that the LME dominates whereas with ν ⟶ 1, the empirical Bayesian methods (PB and IPB) dominate.Also, when ν ⟶ 0 5, equal proportion of both estimators is included in the proposed estimators of the tail index. Therefore, For optimal weight, we minimise the MSE θ s j , θ with respect to ν: The asymptotic properties of θ q (i.e., θ PB and θ IPB ) remain an open problem, and hence, we resort to the use of simulation to find an approximate value for ν.An extensive simulation was done to choose the value of ν, and in practice, the findings show that for light-tailed situations, ν ∈ 0 4,0 7 provides an appropriate or suitable estimates whereas for heavy-tailed cases, ν ∈ 0,0 2 gives an optimal estimates.We provide a few examples as shown in Figure 2. The rest are available on request from the authors.

Simulation Study
In this section, a simulation study is conducted to compare the performance of the tail index estimation methods discussed in the previous section.Specifically, the two proposed estimators are compared with the MLE [13], LME [15], PB [18], and IPB [19].We present the simulation design and a discussion of the results in the next subsubsections.
3.1.Simulation Design.Samples of different sizes are generated from the GPD with various values of the shape and scale parameters.The four existing estimators, MLE, LME, PB, and IPB, and the two proposed estimators, Proposed 1 and Proposed 2, are used to estimate the values of the parameters of the GPD.Two error metrics are used for comparison of the estimators, i.e., mean square error (MSE) and bias.
Algorithm 1 outlines the procedure for comparing the performance of estimators.
We remark that, from Step 2, the estimation ε leads to concurrent estimates of σ However, our interest is the tail index (i.e., shape) parameter only.

Simulation Results and Discussion
. The simulation is implemented in the statistical package R.The codes for IPB and PB are obtained from [18,22], respectively.Also, the codes for LME can be found in the R package POT.Journal of Applied Mathematics The results of the simulation on the estimation of the shape parameter, ε, are presented in Figures 3 and 4 for light and heavy-tailed distributions, respectively.
First, the results show that the proposed methods generally have smaller MSE for small values of k (i.e., k less than 40% of the sample size).As k ⟶ n, PB and Step 1. Generate a sample of size n n = 50,200,500 from a GPD with known shape ε and scale (σ) parameters such that both light and heavy-tailed distributions are realized.
Step 2. For each number of top order statistic k k = 5, ⋯, n − 1 or equivalently threshold value u, compute estimates of ε i,k where i ∈ {MLE, LME, PB, IPB, Proposed 1 and Proposed 2}.
Step 3. In order to compute the bias and mean square error for each estimator, repeat Steps 1 and 2 a large number of times, R R ≥ 1000 to obtain the estimates, IPB estimators appear to be the best as their MSEs approach 0. This is in conformity with what have been reported in [18,19] and the results in Section 2.5.Also, the MLE estimator gives better performance than all other estimation methods when ε = −1 at n = 50 In general, the LME overestimates all values of ε < 0 and, hence, performs poorly.
In addition, Figure 4 shows the MSEs of the six estimators of ε ε ≥ 0 The results indicate that the proposed methods are competitive with LME as they have smaller MSEs across most of the values of k Closely following the sample path of both proposed methods and LME is the MLE: it has better performance in some cases.Lastly, the MSEs of the estimators converge to 0 as k ⟶ n, and this shows a sign of empirical consistency.
Therefore, we conclude that the proposed estimators are competitive with the existing ones, and at worse, it has MSEs close to that of likelihood-based method (LME).

Practical Application
In order to demonstrate the use of the proposed methods, we consider the estimation of the tail indexes of the underlying distributions of two datasets.First, the monthly mean    The P values of the Shapiro-Wilk test for normality reject the null hypothesis of a normal distribution for both series, thereby making it appropriate for extreme value analysis.
From the plots in Figure 5, no obvious trend can be seen in the temperature data.However, the Danish fire insurance claim had few outliers.The temperature series is dense in the range of 25.5 to 27 while the Danish fire claims are denser from one million to 2 million Danish Krone.The histogram and the exponential quantile-quantile plots of the two datasets indicate that the fire insurance claims have a long tail (heavy tail) while the temperature series has a shorter tail.
In addition, the mean excess plots show that the temperature data is light-tailed and, hence, indicates a likely negative value of the tail index, ε, whereas the Danish shows more of heavy-tailed.4.2.Estimation of the Tail Index.Figure 6 gives the sample paths of the estimators of the tail index of the underlying distribution of the Danish and temperature datasets.The sample paths of the estimators show that there exist large variations at smaller values of k and smoother for larger values of k The former can be explained by the small number of observations involved in the estimation of the tail index and, hence, the large variations associated with the sample paths.On the other hand, the latter can be explained by the large number of observations involved in the estimation of the tail index and, thus, the less variations associated with the sample paths.
In addition, the proposed estimators are in between the sample paths of the likelihood-based and empirical Bayesian-based estimators as required.

Conclusion
The importance of the generalized Pareto distribution cannot be overemphasized as it has wide applications in finance, hydrology, and economics, among others.In this paper, we proposed two estimators of the parameters of the generalized Pareto distribution.The estimators are based on the existing likelihood moment and the empirical Bayesianbased estimators using the idea of shrinkage.The performance of the proposed estimators was assessed using a large-scale simulation study.The results show that the likelihood moment estimator is not appropriate for negative values of the shape parameter.Also, the empirical Bayesian-based estimators use all the available data and hence defeat the threshold characteristics of the GPD.In addition, it performs well for only in the case where the number of exceedances approaches the sample size.The proposed estimators have competitive MSE for small to moderate number of exceedances.This is appropriate as the inclusion of smaller observations (i.e., large number of exceedances) introduces large bias.Moreover, the proposed estimators are at worst close to the performance of the likelihood-based estimator.The estimators were illustrated using two real datasets from the climate sector representing light-tailed distributions and the insurance industry representing heavy-tailed distributions.The asymptotic properties of the extreme value index estimators in [18,19] remain an open problem.Consequently, since our proposed shrinkage estimators of the extreme value index depend on these estimators, their asymptotic properties also remain an open problem.Therefore, these asymptotic properties, estimation of the scale parameter, and other problems such as estimation of high quantiles and exceedance probabilities are the subject of future research.

Data Availability
Data were obtained from the Climate Research Unit and the EVIR package in R. The temperature data used to support the findings of this study have been deposited in a git-hub repository (https://github.com/wilheminapels2/Shrinkage/blob/main/Temperature.csv).

Figure 5 :
Figure 5: Plot of the monthly mean temperature of Ghana.

Figure 6 :
Figure 6: (a) Danish fire insurance (b) mean temperature of Ghana.

Table 1 :
Descriptive statistics of the monthly mean temperature of Ghana.Table1shows the main descriptive measures for both monthly mean temperature of Ghana with 1417 monthly observations and the fire insurance claims in Denmark with 2167 daily series.Whereas the fire claim data is heavy-tailed, the distribution of the temperature data shows a mild positive skewness.