Dual Divergence Estimators of the Tail Index

The main purpose of the present paper is to propose a new estimator of the tail index using -divergences and the duality technique. These estimators are explored with respect to robustness through the influence function approach. The empirical performances of the proposed estimators are illustrated by simulation.


Introduction
In extreme value statistics emphasis lies on the modelling of rare events, mostly events with a low frequency but a high impact.Common practice is to characterize the size and frequency of such extreme events mainly by the extreme value index γ , and here the main problem is to estimate the unknown parameter γ .Since only the upper tail of the distribution is involved, it is reasonable to construct estimators of γ based on the top extreme values of a sample X 1 , . . ., X n .The most commonly used estimator of the kind is that proposed by Hill 1 .We mention that the most prominent estimators of this real-valued parameter γ are maximum likelihood estimators of specific parametric models which are fitted to excesses over large thresholds see 2 .Indeed, alternatives to the Hill estimator are discussed by Smith 2 .One of his conclusions see, e.g., 2 , pp. 1181-1182 is that in general the Hill estimator compares favourably with other competitors.In general, these maximum likelihood estimators often prove to be highly efficient, though nonrobust against deviations of the actual distribution from the assumed parametric model.This is, for instance, the case in the presence of outliers or suspicious data, where the performance of the maximum likelihood estimators and the quality of the corresponding estimates of the tail index are often seriously affected.

Extreme Value Statistics and φ-Divergence Setting
A widely usen family of divergences is the so-called "power divergences", introduced by Cressie and Read 14 see also Liese and Vajda 15, Chapter 2 and also the Renyi 16 's paper is to be mentioned here , which are defined through the class of convex real-valued functions, for β in R \ {0, 1}: We have φ 0 x : − log x x − 1 and φ 1 x : x log x − x 1.For all β ∈ R, we define φ β 0 : lim x↓0 φ β x .So, the KL-divergence is associated to φ 1 , the KL m to φ 0 , the χ 2 to φ 2 , the χ 2 m to φ −1 and the Hellinger distance to φ 1/2 .In the monograph by Liese and Vajda 15 the reader may find detailed ingredients of the modeling theory as well as surveys of This result was elegantly proven in, Keziou 19 , Liese and Vajda 20 and Broniatowski and Keziou 17 .Broniatowski and Keziou 18 called it the dual form of a divergence, due to its connection with convex analysis.Furthermore, the supremum in this display 2.4 is unique and reached in θ θ 0 , independently upon the value of θ.Let X 1 , . . ., X n be an independent, identically distributed i.i.d.sample from an unknown distribution function d.f.P θ 0 .Naturally, a class of estimators of θ 0 , called "dual φ-divergence estimators" DφDE's , is defined by In what follows, we describe the procedure used to obtain the DφDE for the tail index.Let X 1 , . . ., X n be an independent, identically distributed i.i.d.sample from an unknown distribution function d.f.P. Since it is well known that a distribution is in the domain of attraction of a Fréchet distribution if and only if the distribution has a regularly varying tail, we can assume that 1 − P is regularly varying at ∞ with the exponent −1/γ , γ is called the tail index of distribution P: or equivalently, where x is slowly varying at ∞, namely, lim In the Pareto-type case, the conditional distribution it is easily seen that ultimately

2.13
A popular choice for the threshold u in threshold based methods is X n−k:n , the kth largest observation of the sample, with k k n nα for some α ∈ 0, 1 .Here and elsewhere, u denotes the largest integer ≤ u.The quantile function pertaining to P, is defined, for u ∈ 0, 1 , by The empirical quantile function is given, for each n ≥ 1 and u ∈ 0, 1 , by The threshold X n−k:n is easily seen to be equal to Q n 1 − k/n .The idea of constructing the DφDE for the tail index is to assume the above Pareto approximation to hold exactly as a model for the conditional distribution of the relative excesses That is we can fit a Pareto model to the relative excesses.In this framework, the estimation of γ can be handled through dual divergences techniques, which provide a wide range of estimators, including the Hill estimator, they all can be compared with respect to robustness properties.Consider the Pareto density 2.17 Specializing 2.4 to this setting, elementary calculation, for β in R \ {0, 1}, gives

2.18
Using this last equality, one finds

ISRN Probability and Statistics
We now consider an interesting particular case of the previous setup, for β 0, one obtains which leads to the famous Hill estimator 1 , given by independently upon γ , where log x log max x, 1 .Mason 27 show that consistency of the Hill estimator if k k n is a sequence of positive integers satisfying

2.22
Further investigations concerning the asymptotic distribution of the Hill estimator have been made by Hall

Influence Function
In this section we study the robustness properties of the proposed estimators theoretically.In particular, we derive their influence functions from which the asymptotic variance follows.
The following definition is needed for the statement of our forthcoming result.Recall that the influence function of a functional T at a distribution P describes the effect on the estimate T of an infinitesimal contamination to P at the point x and is given by where and δ x is the Dirac measure putting all its mass at x and 0 ≤ ≤ 1.In the following, we will derive the influence function for the functional form of the newly proposed estimator in an analogous way as for the classical M-estimators 33 .General results on influence functions of the dual φ-divergence estimators can be found in Toma and Broniatowski 34 .We will use the following notations S :

3.3
where 1 A stands for the indicator function of the event A. Our results are summarized in the following theorem; its proof is given in the next section.
Proposition 3.1.The influence function of the functional T α P n corresponding to an estimator γ φ γ is given by To illustrate the behavior of the obtained influence functions we restrict ourselves to the strict Pareto case; simulation results for other heavy-tailed distribution are presented in the next section.Figure 1 plots the influence functions of our estimators for the Pareto distribution.Observe that, for the Hellinger distance β 0.5 and the χ 2 -divergence β 2 , the influence for the tail index γ becomes negligible for large outliers.The influence functions are bounded, making the associated functionals robust, in contrast with the Hill estimator β 0 and the other divergences.

Simulation
In order to illustrate the robustness of the proposed statistical method, its finite sample behavior is investigated, both at contaminated as well as uncontaminated data.For the tail index γ , a comparison is made between the well-known Hill 1 estimator and the newly proposed estimator.
We consider 1000 simulated samples without contamination, each containing n 200 observations, from the two Pareto-type distributions.When the data are uncontaminated, although most robust estimators are known to be less efficient at the true model than maximum likelihood estimators 1 , we notice that the γ estimates seem to be fairly stable for intermediate values of k, making the influence of the choice of k less troublesome and even with respect to mean squared error, the newly proposed estimator does not seem to lose too much accuracy, a close look to Figures 2 and 3 shows that there is a slight tendency to overestimation.The newly proposed DφDE's perform remarkably as well as the Hill 1 estimator.
However, a slight contamination 5% is sufficient to make the DφDE associated to the χ 2 -divergence β 2 more appealing in terms of low MSE; see Figures 4, 5, 6, and 7. Furthermore when contamination increases, the Dχ 2 DE performs remarkably better.
Overall, the simulation results in this section provide supporting evidence of the adequacy of the DφDE associated with the χ 2 -divergence with observations drawn from Fréchet and Burr distributions.Moreover, the sensitivity of this estimator for the choice of k is low.We define the estimator as the value γ φ γ which maximizes, independently of k/n , the following estimating equation: or, equivalently, as the solution, in γ , of the following equation: In this view, the estimator may be written in the form of a functional T α P n , given by We continue by rewriting A.4 for the contaminated distribution as given in the definition of the influence function defined in 3.1 , that is, A.9 The proof of Proposition 3.1 is therefore completed.

Figure 1 :
Figure 1: Influence functions for the Pareto distribution.