The main purpose of the present paper is to propose a new estimator of the tail index using ϕ-divergences and the duality technique. These estimators are explored with respect to robustness through the influence function approach. The empirical performances of the proposed estimators are illustrated by simulation.

1. Introduction

In extreme value statistics emphasis lies on the modelling of rare events, mostly events with a low frequency but a high impact. Common practice is to characterize the size and frequency of such extreme events mainly by the extreme value index γ, and here the main problem is to estimate the unknown parameter γ. Since only the upper tail of the distribution is involved, it is reasonable to construct estimators of γ based on the top extreme values of a sample X1,…,Xn. The most commonly used estimator of the kind is that proposed by Hill [1]. We mention that the most prominent estimators of this real-valued parameter γ are maximum likelihood estimators of specific parametric models which are fitted to excesses over large thresholds (see [2]). Indeed, alternatives to the Hill estimator are discussed by Smith [2]. One of his conclusions (see, e.g., [2], pp. 1181-1182) is that in general the Hill estimator compares favourably with other competitors. In general, these maximum likelihood estimators often prove to be highly efficient, though nonrobust against deviations of the actual distribution from the assumed parametric model. This is, for instance, the case in the presence of outliers or suspicious data, where the performance of the maximum likelihood estimators and the quality of the corresponding estimates of the tail index are often seriously affected. It is known that the maximum likelihood estimation is very sensitive to deviations from theoretical distributions which is, not surprisingly, the case for the class of heavy-tailed distributions, and fails to provide a reasonable parameter estimation; refer to Alexander [3] among others.

Robustness is an important issue in extreme value theory; see for instance DellAquila and Embrechts [4]. As shown in Brazauskas and Serfling [5], small errors in the estimation of the tail index can already produce large errors in the estimation of quantities based on the tail index γ. Hence, to overcome the lack of robustness to outliers of this kind of estimators, some robust methods for extreme values have already been discussed in recent literature. The interested reader may refer to Brazauskas and Serfling [6] for robust estimation in the context of strict Pareto distributions. Dupuis and Field [7], respectively, Peng and Welsh [8] and Juárez and Schucany [9], derived robust estimation methods for the case where the observations follow a generalized extreme value distribution, respectively, a generalized Pareto distribution, light or heavy tailed. Vandewalle et al. [10] considered a robust estimation method based on the minimization of the integrated squared error criterion using an incomplete density mixture model; Kim and Lee [11] used the minimum density power divergence approach of Basu et al. [12] to estimate the tail index in the dependent case; more recently Hubert et al. [13] proposed a method to detect outliers that can influence the Hill [1] estimator.

In this paper, we propose a new robust tail index estimation procedure, based on ϕ-divergences and the duality technique, for the semi-parametric setting of Pareto-type (or heavy-tailed) distributions. So here, the strict Pareto distribution is assumed to hold only asymptotically, that is, for excess distributions over high enough threshold values. The proposed method extends the maximum likelihood procedure, and, it will be seen that the last method corresponds to the particular choice of the KLm-divergence which leads to the Hill [1]'s etimator.

The remainder of this paper is organized as follows. After some motivations in this Introduction, Section 2, is devoted to preliminary results on ϕ-divergence and the introduction of our estimator. Section 3 presents our new results on the Influence function. In Section 4, we investigate the finite-sample performance of the newly proposed estimators. To avoid interrupting the flow of the presentation, all technical arguments are deferred to the Appendix.

2. Extreme Value Statistics and <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M11"><mml:mrow><mml:mi>ϕ</mml:mi></mml:mrow></mml:math></inline-formula>-Divergence Setting

A widely usen family of divergences is the so-called “power divergences”, introduced by Cressie and Read [14] (see also Liese and Vajda [15, Chapter 2] and also the Renyi [16]'s paper is to be mentioned here), which are defined through the class of convex real-valued functions, for β in ℝ∖{0,1}:
(2.1)x∈ℝ+*→ϕβ(x):=xβ-βx+β-1β·(β-1).
We have ϕ0(x):=-logx+x-1 and ϕ1(x):=xlogx-x+1. (For all β∈ℝ, we define ϕβ(0):=limx↓0ϕβ(x).) So, the KL-divergence is associated to ϕ1, the KLm to ϕ0, the χ2 to ϕ2, the χm2 to ϕ-1 and the Hellinger distance to ϕ1/2. In the monograph by Liese and Vajda [15] the reader may find detailed ingredients of the modeling theory as well as surveys of the commonly used divergences. We recall some basic definitions for the readers' convenience. Unless otherwise specified we will assume that the function ϕ is a function of class 𝒞2, strictly convex, such that, for fixed θ~,
(2.2)∫|ϕ′(dℙθ~(x)dℙθ(x))|dℙθ~(x)<∞,∀θ∈Θ.
According to Broniatowski and Keziou [17], if the function ϕ(·) satisfies the following conditions:
(2.3)thereexists0<δ<1suchthat∀cin[1-δ,1+δ],wecanfindnumbersc1,c2,c3suchthatϕ(cx)≤c1ϕ(x)+c2|x|+c3,∀realx,
then the assumption (2.2) is satisfied whenever Dϕ(θ,θ~)<∞, where Dϕ(θ,θ~) stands for the ϕ-divergence between ℙθ and ℙθ~, we refer the reader to Broniatowski and Keziou [18, Lemma 3.2]. Also the real convex functions ϕ(·) (2.1), associated with the class of power divergences, all satisfy the condition (2.2), including all standard divergences. Under assumption (2.2), using Fenchel duality technique, the divergence Dϕ(θ,θ0) can be represented as resulting from an optimization procedure,
(2.4)Dϕ(θ~,θ0):=∫ϕ(dℙθ~(x)dℙθ0(x))dℙθ0(x)=supθ∈𝒰∫h(θ~,θ,x)dℙθ0(x),
where h(θ~,θ,·):x↦h(θ~,θ,x) and
(2.5)h(θ~,θ,x):=∫ϕ′(dℙθ~(x)dℙθ(x))dℙθ~(x)dx-[dℙθ~(x)dℙθ(x)ϕ′(dℙθ~(x)dℙθ(x))-ϕ(dℙθ~(x)dℙθ(x))].
This result was elegantly proven in, Keziou [19], Liese and Vajda [20] and Broniatowski and Keziou [17]. Broniatowski and Keziou [18] called it the dual form of a divergence, due to its connection with convex analysis. Furthermore, the supremum in this display (2.4) is unique and reached in θ=θ0, independently upon the value of θ~. Let X1,…,Xn be an independent, identically distributed (i.i.d.) sample from an unknown distribution function (d.f.) ℙθ0. Naturally, a class of estimators of θ~0, called “dual ϕ-divergence estimators” (DϕDE's), is defined by
(2.6)θ^ϕ(θ~):=argsupθ∈Θ1n∑i=1nh(θ~,θ,Xi),θ~∈Θ,
where h(θ~,θ) is the function defined in (2.5). The class of estimators θ^ϕ(θ~) satisfies
(2.7)1n∑i=1n∂∂θh(θ~,θ^ϕ(θ~),Xi)=0.
Formula (2.6) defines a family of M-estimators indexed by the function ϕ specifying the divergence and by some instrumental value of the parameter θ~. Application of dual representation of ϕ-divergences has been considered by many authors; we cite among others, Keziou and Leoni-Aubin [21] for semiparametric two-sample density ratio models, bootstrapped ϕ-divergences estimates are considered in Bouzebda and Cherfi [22], extension of dual ϕ-divergences estimators to right censored data are introduced in Cherfi [23], for estimation and tests in copula models we refer to Bouzebda and Keziou [24, 25], and the references therein. Performances of dual ϕ-divergence estimators for normal models are studied in Cherfi [26].

In what follows, we describe the procedure used to obtain the DϕDE for the tail index. Let X1,…,Xn be an independent, identically distributed (i.i.d.) sample from an unknown distribution function (d.f.) ℙ. Since it is well known that a distribution is in the domain of attraction of a Fréchet distribution if and only if the distribution has a regularly varying tail, we can assume that 1-ℙ is regularly varying at ∞ with the exponent -1/γ, γ is called the tail index of distribution ℙ:
(2.8)limx→∞1-ℙ(ux)1-ℙ(x)=u-1/γ,foreveryu>0,
or equivalently,
(2.9)1-ℙ(x)=x-1/γℓ(x),
where ℓ(x) is slowly varying at ∞, namely,
(2.10)limx→∞ℓ(ux)ℓ(x)=1,foreveryu>0.
In the Pareto-type case, the conditional distribution ℙYu of relative excesses
(2.11)Yu=(Xu∣X>u).
over a threshold u satisfies
(2.12)1-ℙYu(y)=P(Xu>y∣X>u)=y-1/γℓ(ux)ℓ(x),
it is easily seen that ultimately
(2.13)ℙYu(y)→1-y-1/γasu→∞.
A popular choice for the threshold u in threshold based methods is Xn-k:n, the kth largest observation of the sample, with k=kn=⌊nα⌋ for some α∈(0,1). Here and elsewhere, ⌊u⌋ denotes the largest integer ≤u. The quantile function pertaining to ℙ, is defined, for u∈(0,1), by
(2.14)ℚ(u)=inf{x:ℙ(x)≥u}.
The empirical quantile function is given, for each n≥1 and u∈(0,1), by
(2.15)ℚn(u)=inf{x:ℙn(x)≥u}.
The threshold Xn-k:n is easily seen to be equal to ℚn(1-(k/n)). The idea of constructing the DϕDE for the tail index is to assume the above Pareto approximation to hold exactly as a model for the conditional distribution of the relative excesses
(2.16)Yjk=Xn-j+1:nu,j=1,…,kaboveahighthresholdu=Xn-k:n.
That is we can fit a Pareto model to the relative excesses. In this framework, the estimation of γ can be handled through dual divergences techniques, which provide a wide range of estimators, including the Hill estimator, they all can be compared with respect to robustness properties. Consider the Pareto density
(2.17){pγ(x):=x-1/γ-1γ:x>1;γ∈ℝ+*}.
Specializing (2.4) to this setting, elementary calculation, for β in ℝ∖{0,1}, gives
(2.18)1β-1∫(dℙγ~(x)dℙγ(x))β-1dℙγ~(x)dx=(γγ~)(β-1)(γγβ(β-1)-γ~(β-1)2).
Using this last equality, one finds
(2.19)D^β(γ~,γ~0)=supγ{∑j=1k(γγ~)(β-1)(γγβ(β-1)-γ~(β-1)2)(γ~,γ~0)-1βk∑j=1k(γγ~)βYjk{-β(1/γ~-1/γ)}-1β(β-1)∑j=1k(γγ~)(β-1)}.
We now consider an interesting particular case of the previous setup, for β=0, one obtains
(2.20)D^KLm(γ~,γ~0):=supγ{-1k∑j=1k{log(γγ~)-(1γ~-1γ)log(Yjk)}},
which leads to the famous Hill estimator [1], given by
(2.21)γ^k=1k∑j=1klog+(Yjk):=1k∑j=1klog+(Xn-j+1:n)-log+(Xn-k:n),
independently upon γ~, where log+x=log(max(x,1)). Mason [27] show that consistency of the Hill estimator if k=kn is a sequence of positive integers satisfying
(2.22)1≤kn≤n-1,kn→∞,n-1kn→0asn→∞.
Further investigations concerning the asymptotic distribution of the Hill estimator have been made by Hall [28], Csörgő and Mason [29], Haeusler and Teugels [30], Beirlant and Teugels [31], and Bouzebda [32] ISUP. This is shown, under certain additional regularity conditions, on ℙ and on k=kn satisfying (2.22).

3. Influence Function

In this section we study the robustness properties of the proposed estimators theoretically. In particular, we derive their influence functions from which the asymptotic variance follows. The following definition is needed for the statement of our forthcoming result. Recall that the influence function of a functional T at a distribution ℙ describes the effect on the estimate T of an infinitesimal contamination to ℙ at the point x and is given by
(3.1)IF(x;T,ℙ)=limϵ→0T(ℙϵ,x)-T(ℙ)ϵ,
where
(3.2)ℙϵ,x=(1-ϵ)ℙ+ϵδx
and δx is the Dirac measure putting all its mass at x and 0≤ϵ≤1. In the following, we will derive the influence function for the functional form of the newly proposed estimator in an analogous way as for the classical M-estimators [33]. General results on influence functions of the dual ϕ-divergence estimators can be found in Toma and Broniatowski [34]. We will use the following notations
(3.3)S:=-∫ℙ-1(1-α)∞∂2∂γ2h(γ~,Tα(ℙ),ℙ-1(1-α),z)dℙ(z),ψα(γ~,γ,x):=∫ℙ-1(1-α)∞∂∂t[∂∂γh(γ~,Tα(ℙ),t,z)]|t=ℙ-1(1-α)dℙ(z)(1{x≥ℙ-1(1-α)}-αp(ℙ-1(1-α)))+∂∂γh(γ~,Tα(ℙ),ℙ-1(1-α),x)1{x≥ℙ-1(1-α)}-∂∂γh(γ~,Tα(ℙ),ℙ-1(1-α),ℙ-1(1-α))(1{x≥ℙ-1(1-α)}-α),
where 1A stands for the indicator function of the event A. Our results are summarized in the following theorem; its proof is given in the next section.

Proposition 3.1.

The influence function of the functional Tα(ℙn) corresponding to an estimator γ^ϕ(γ~) is given by
(3.4)IF(x;Tα(ℙ),ℙ)=S-1ψα(γ~,γ,x).

To illustrate the behavior of the obtained influence functions we restrict ourselves to the strict Pareto case; simulation results for other heavy-tailed distribution are presented in the next section. Figure 1 plots the influence functions of our estimators for the Pareto distribution. Observe that, for the Hellinger distance (β=0.5) and the χ2-divergence (β=2), the influence for the tail index γ becomes negligible for large outliers. The influence functions are bounded, making the associated functionals robust, in contrast with the Hill estimator (β=0) and the other divergences.

Influence functions for the Pareto distribution.

4. Simulation

In order to illustrate the robustness of the proposed statistical method, its finite sample behavior is investigated, both at contaminated as well as uncontaminated data. For the tail index γ, a comparison is made between the well-known Hill [1] estimator and the newly proposed estimator.

We consider 1000 simulated samples without contamination, each containing n=200 observations, from the two Pareto-type distributions.

The Fréchet distribution given by exp(-x-1/γ).

The Burr distribution given by
(4.1)(ξξ+xτ)λ,suchthatγ=1λτ.

In the simulations, we have chosen γ=1/2, for the Burr distribution ξ=10, τ=2, λ=1. The means of the DϕDE (left) and the corresponding empirical mean squared errors (right), also as a function of k are plotted. The horizontal line indicates the true value of γ=1/2.

When the data are uncontaminated, although most robust estimators are known to be less efficient at the true model than maximum likelihood estimators [1], we notice that the γ estimates seem to be fairly stable for intermediate values of k, making the influence of the choice of k less troublesome and even with respect to mean squared error, the newly proposed estimator does not seem to lose too much accuracy, a close look to Figures 2 and 3 shows that there is a slight tendency to overestimation. The newly proposed DϕDE's perform remarkably as well as the Hill [1] estimator.

Mean γ estimates, corresponding to empirical mean squared errors as a function of k for the Fréchet distribution.

Mean γ estimates, corresponding to empirical mean squared errors as a function of k for the Burr distribution.

However, a slight contamination (5%) is sufficient to make the DϕDE associated to the χ2-divergence (β=2) more appealing in terms of low MSE; see Figures 4, 5, 6, and 7. Furthermore when contamination increases, the Dχ2DE performs remarkably better.

Mean γ estimates, corresponding to empirical mean squared errors as a function of k for the Fréchet distribution, with ϵ=0.05.

Mean γ estimates, corresponding to empirical mean squared errors as a function of k for the Burr distribution, with ϵ=0.05.

Mean γ estimates, corresponding to empirical mean squared errors as a function of k for the Fréchet distribution, with ϵ=0.2.

Mean γ estimates, corresponding to empirical mean squared errors as a function of k for the Burr distribution, with ϵ=0.2.

Overall, the simulation results in this section provide supporting evidence of the adequacy of the DϕDE associated with the χ2-divergence with observations drawn from Fréchet and Burr distributions. Moreover, the sensitivity of this estimator for the choice of k is low.

Appendix

This section is devoted to the proofs of our result.

Proof of Proposition <xref ref-type="statement" rid="prop3.1">3.1</xref>.

For convenience, we recall the definition of the empirical measure ℙn associated with the random variables Xi, i=1,…,n, which is given by
(A.1)ℙn=1n∑i=1nδXn-i+1:n.
We define the estimator as the value γ^ϕ(γ~) which maximizes, independently of (k/n), the following estimating equation:
(A.2)∫ℚn(1-k/n)∞h(γ~,γ,ℚn(1-kn),x)dℙn(x),
or, equivalently, as the solution, in γ, of the following equation:
(A.3)∫ℚn(1-k/n)∞∂∂γh(γ~,γ,ℚn(1-kn),x)dℙn(x)=0.
In this view, the estimator may be written in the form of a functional Tα(ℙn), given by
(A.4)∫ℙ-1(1-α)∞∂∂γh(γ~,Tα(ℙ),ℙ-1(1-α),x)dℙ(x)=0.
We continue by rewriting (A.4) for the contaminated distribution as given in the definition of the influence function defined in (3.1), that is,
(A.5)(1-ϵ)∫ℙϵ,x-1(1-α)∞∂∂γh(γ~,Tα(ℙϵ,x),ℙϵ,x-1(1-α),z)dℙ(z)+ϵ∂∂γh(γ~,Tα(ℙϵ,x),ℙϵ,x-1(1-α),x)1{x≥ℙϵ,x-1(1-α)}=0.
Observe that
(A.6)∂∂ϵ[ϵ∂∂γh(γ~,Tα(ℙϵ,x),ℙϵ,x-1(1-α),x)1{x≥ℙϵ,x-1(1-α)}]|ϵ=0=∂∂γh(γ~,Tα(ℙ),ℙ-1(1-α),x)1{x≥ℙ-1(1-α)},(A.7)∂∂ϵ[(1-ϵ)∫ℙϵ,x-1(1-α)∞∂∂γh(γ~,Tα(ℙϵ,x),ℙϵ,x-1(1-α),z)dℙ(z)]|ϵ=0=∂∂ϵ[∫ℙϵ,x-1(1-α)∞∂∂γh(γ~,Tα(ℙϵ,x),ℙϵ,x-1(1-α),z)dℙ(z)]|ϵ=0-∫ℙ-1(1-α)∞∂∂γh(γ~,Tα(ℙ),ℙ-1(1-α),z)dℙ(z).
Keeping in mind the definition of Tα as in (A.4), the last term in (A.7) disappears. We next evaluate the first term in the right side of (A.7). We infer readily by using Leibnitz's integral rule, that:
(A.8)∂∂ϵ[∫ℙϵ,x-1(1-α)∞∂∂γh(γ~,Tα(ℙϵ,x),ℙϵ,x-1(1-α),z)dℙ(z)]|ϵ=0=∫ℙ-1(1-α)∞∂∂ϵ[∂∂γh(γ~,Tα(ℙϵ,x),ℙϵ,x-1(1-α),z)]|ϵ=0dℙ(z)-∂∂γh(γ~,Tα(ℙ),ℙ-1(1-α),ℙ-1(1-α))(1{x≥ℙ-1(1-α)}-α).
In a similar way, we can therefore write,
(A.9)∫ℙ-1(1-α)∞∂∂ϵ[∂∂γh(γ~,Tα(ℙϵ,x),ℙϵ,x-1(1-α),z)]|ϵ=0dℙ(z)=[∫ℙ-1(1-α)∞∂2∂γ2h(γ~,Tα(ℙ),ℙ-1(1-α),z)dℙ(z)]∂∂ϵTα(ℙϵ,x)|ϵ=0+∫ℙ-1(1-α)∞∂∂t[∂∂γh(γ~,Tα(ℙ),t,z)]|t=ℙ-1(1-α)dℙ(z)(1{x≥ℙ-1(1-α)}-αp(ℙ-1(1-α))).
The proof of Proposition 3.1 is therefore completed.

Acknowledgments

The authors would like to thank the four editors for their helpful comments on the paper.

HillB. M.A simple general approach to inference about the tail of a distributionSmithR. L.Estimating tails of probability distributionsAlexanderC.FrenkelM.HommelU.RudolfM.Assessment of operational risk capital. newblockDell'AquilaR.EmbrechtsP.Extremes and robustness: a contradiction?BrazauskasV.SerflingR.Robust estimation of tail parameters for twoparameter and exponential models via generalized quantile statisticsBrazauskasV.SerflingR.Robust and effcient estimation of the tail index of a single-parameter Pareto distributionDupuisD. J.FieldC. A.Robust estimation of extremesPengL.WelshA. H.Robust estimation of the generalized Pareto distributionJuárezS. F.SchucanyW. R.Robust and efficient estimation for the generalized Pareto distributionVandewalleB.BeirlantJ.ChristmannA.HubertM.A robust estimator for the tail index of Pareto-type distributionsKimM.LeeS.Estimation of a tail index based on minimum density power divergenceBasuA.HarrisI. R.HjortN. L.JonesM. C.Robust and efficient estimation by minimising a density power divergenceHubertM.DierckxG.VanpaemelD.Detecting inuential data points for the Hill estimator in Pareto-type distributionsComputational Statistics & Data Analysis. In press10.1016/j.csda.2012.07.011CressieN.ReadT. R. C.Multinomial goodness-of-fit testsLieseF.VajdaI.RényiA.On measures of entropy and information1Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability1961Berkeley, Calif, USAUniversity of California Press547561BroniatowskiM.KeziouA.Parametric estimation and tests through divergences and the duality techniqueBroniatowskiM.KeziouA.Minimization of φ-divergences on sets of signed measuresKeziouA.Dual representation of φ-divergences and applicationsLieseF.VajdaI.On divergences and informations in statistics and information theoryKeziouA.Leoni-AubinS.On empirical likelihood for semiparametric two-sample density ratio modelsBouzebdaS.CherfiM.General bootstrap for dual φ-divergence estimatesCherfiM.Dual divergences estimation for censored survival dataBouzebdaS.KeziouA.New estimates and tests of independence in semiparametric copula modelsBouzebdaS.KeziouA.A new test procedure of independence in copula models via χ2-divergenceCherfiM.Dual ϕ-divergences estimation in normal modelsComputation. In press. http://arxiv.org/abs/1108.2999MasonD. M.Laws of large numbers for sums of extreme valuesHallP.On some simple estimates of an exponent of regular variationCsörgőS.MasonD. M.Central limit theorems for sums of extreme valuesHaeuslerE.TeugelsJ. L.On asymptotic normality of Hill's estimator for the exponent of regular variationBeirlantJ.TeugelsJ. L.Asymptotics of Hill's estimatorBouzebdaS.Bootstrap de l'estimateur de Hill: théorèmes limitesHampelF. R.RonchettiE. M.RousseeuwP. J.StahelW. A.TomaA.BroniatowskiM.Dual divergence estimators and tests: robustness results