General bootstrap for dual phi-divergence estimates

A general notion of bootstrapped $\phi$-divergence estimates constructed by exchangeably weighting sample is introduced. Asymptotic properties of these generalized bootstrapped $\phi$-divergence estimates are obtained, by mean of the empirical process theory, which are applied to construct the bootstrap confidence set with asymptotically correct coverage probability. Some of practical problems are discussed, including in particular, the choice of escort parameter and several examples of divergences are investigated. Simulation results are provided to illustrate the finite sample performance of the proposed estimators.


Introduction
The φ-divergence modeling has proved to be a flexible tool and provided a powerful statistical modeling framework in a variety of applied and theoretical contexts refer to 1-4 and the references therein . For good recent sources of references to the research literature in this area along with statistical applications, consult 2, 5 . Unfortunately, in general, the limiting distribution of the estimators, or their functionals, based on φ-divergences depends crucially on the unknown distribution, which is a serious problem in practice. To circumvent this matter, we will propose, in this work, a general bootstrap of φ-divergencebased estimators and study some of its properties by means of sophisticated empirical process techniques. A major application for an estimator is in the calculation of confidence intervals. By far the most favored confidence interval is the standard confidence interval based on a normal or a Student's t-distribution. Such standard intervals are useful tools, but they are based on an approximation that can be quite inaccurate in practice. Bootstrap procedures are an attractive alternative. One way to look at them is as procedures for handling data when one is not willing to make assumptions about the parameters of the populations from which one sampled. The most that one is willing to assume is that the data are a reasonable representation of the population from which they come. One then resamples from the data and draws inferences about the corresponding population and its parameters. The resulting confidence intervals have received the most theoretical study of any topic in the bootstrap analysis.
Our main findings, which are analogous to that of Cheng and Huang 6 , are summarized as follows. The φ-divergence estimator α φ θ and the bootstrap φ-divergence estimator α * φ θ are obtained by optimizing the objective function h θ, α based on the independent and identically distributed i.i.d observations X 1 , . . . , X n and the bootstrap sample X * 1 , . . . , X * n , respectively, α φ θ : arg sup where X * 1 , . . . , X * n are independent draws with replacement from the original sample. We will mention that α * φ θ can alternatively be expressed as where the bootstrap weights are given by W n1 , . . . , W nn ∼ Multinomial n; n −1 , . . . , n −1 .

1.3
In this paper, we will consider the more general exchangeable bootstrap weighting scheme that includes Efron's bootstrap 7, 8 . The general resampling scheme was first proposed in 9 and extensively studied by Bickel and Freedman 10 , who suggested the name "weighted bootstrap"; for example, Bayesian Bootstrap when W n1 , . . . , W nn D n1 , . . . , D nn is equal in distribution to the vector of n spacings of n − 1 ordered uniform 0, 1 random variables, that is, D n1 , . . . , D nn ∼ Dirichlet n; 1, . . . , 1 . 1.4 The interested reader may refer to 11 . The case D n1 , . . . , D nn ∼ Dirichlet n; 4, . . . , 4 1.5 was considered in 12, Remark 2.3 and 13, Remark 5 . The Bickel and Freedman result concerning the empirical process has been subsequently generalized for empirical processes based on observations in R d , d > 1 as well as in very general sample spaces and for various set and function-indexed random objects see, e.g., 14-18 . In this framework, 19 Journal of Probability and Statistics 3 developed similar results for a variety of other statistical functions. This line of research was continued in the work of 20, 21 . There is a huge literature on the application of the bootstrap methodology to nonparametric kernel density and regression estimation, among other statistical procedures, and it is not the purpose of this paper to survey this extensive literature. This being said, it is worthwhile mentioning that the bootstrap as per Efron's original formulation see 7 presents some drawbacks. Namely, some observations may be used more than once while others are not sampled at all. To overcome this difficulty, a more general formulation of the bootstrap has been devised: the weighted or smooth bootstrap, which has also been shown to be computationally more efficient in several applications. We may refer to 22-24 . Holmes and Reinert 25 provided new proofs for many known results about the convergence in law of the bootstrap distribution to the true distribution of smooth statistics employing the techniques based on Stein's method for empirical processes. Note that other variations of Efron's bootstrap are studied in 26 using the term "generalized bootstrap." The practical usefulness of the more general scheme is well documented in the literature. For a survey of further results on weighted bootstrap, the reader is referred to 27 . The remainder of this paper is organized as follows. In the forthcoming section we recall the estimation procedure based on φ-divergences. The bootstrap of φ-divergence estimators is introduced, in detail, and their asymptotic properties are given in Section 3. In Section 4, we provide some examples explaining the computation of the φ-divergence estimators. In Section 5, we illustrate how to apply our results in the context of right censoring. Section 6 provides simulation results in order to illustrate the performance of the proposed estimators. To avoid interrupting the flow of the presentation, all mathematical developments are relegated to the appendix.

Dual Divergence-Based Estimates
The class of dual divergence estimators has been recently introduced by Keziou 28 and Broniatowski and Keziou 1 . Recall that the φ-divergence between a bounded signed measure Q and a probability measure P on D, when Q is absolutely continuous with respect to P, is defined by where φ · is a convex function from −∞, ∞ to 0, ∞ with φ 1 0. We will consider only φ-divergences for which the function φ · is strictly convex and satisfies the domain of φ · , dom φ : {x ∈ R : φ x < ∞} is an interval with end points The Kullback-Leibler, modified Kullback-Leibler, χ 2 , modified χ 2 , and Hellinger divergences are examples of φ-divergences; they are obtained, respectively, for φ x

Journal of Probability and Statistics
The squared Le Cam distance sometimes called the Vincze-Le Cam distance and L 1 -error are obtained, respectively, for We extend the definition of these divergences on the whole space of all bounded signed measures via the extension of the definition of the corresponding φ · functions on the whole real space R as follows: when φ · is not well defined on R − or well defined but not convex on R, we set φ x ∞ for all x < 0. Notice that, for the χ 2 -divergence, the corresponding φ · function is defined on whole R and strictly convex. All the above examples are particular cases of the so-called "power divergences," introduced by Cressie and Read 29 see also 4, Chapter 2 , and also Rényi's paper 30 is to be mentioned here , which are defined through the class of convex real-valued functions, for γ in R \ {0, 1}, For all γ ∈ R, we define φ γ 0 : lim x↓0 φ γ x . So, the KL-divergence is associated to φ 1 , the KL m to φ 0 , the χ 2 to φ 2 , the χ 2 m to φ −1 , and the Hellinger distance to φ 1/2 . In the monograph by 4 , the reader may find detailed ingredients of the modeling theory as well as surveys of the commonly used divergences.
Let {P θ : θ ∈ Θ} be some identifiable parametric model with Θ a compact subset of R d . Consider the problem of estimation of the unknown true value of the parameter θ 0 on the basis of an i.i.d sample X 1 , . . . , X n . We will assume that the observed data are from the probability space X, A, P θ 0 . Let φ · be a function of class C 2 , strictly convex such that φ dP θ x dP α x dP θ x < ∞, ∀α ∈ Θ.

2.5
As it is mentioned in 1 , if the function φ · satisfies the following conditions: there exists 0 < δ < 1 such that for all c in 1 − δ, 1 δ , we can find numbers c 1 , c 2 , c 3 such that then the assumption 2.5 is satisfied whenever D φ θ, α < ∞, where D φ θ, α stands for the φ-divergence between P θ and P α ; refer to 31, Lemma 3.2 . Also the real convex functions φ · 2.4 , associated with the class of power divergences, all satisfy the condition 2.5 , including all standard divergences. Under assumption 2.5 , using Fenchel duality technique, the divergence D φ θ, θ 0 can be represented as resulting from an optimization procedure, this result was elegantly proved in 1, 3, 28 . Broniatowski and Keziou 31 called it the dual form of a divergence, due to its connection with convex analysis. According to 3 , under the strict convexity and the differentiability of the function φ · , it holds Journal of Probability and Statistics 5 where the equality holds only for s t. Let θ and θ 0 be fixed, and put t dP θ x /dP θ 0 x and s dP θ x /dP α x in 2.7 , and then integrate with respect to P θ 0 , to obtain Furthermore, the supremum in this display 2.8 is unique and reached in α θ 0 , independently upon the value of θ. Naturally, a class of estimators of θ 0 , called "dual φ-divergence estimators" DφDEs , is defined by where h θ, α is the function defined in 2.9 and, for a measurable function f · , The class of estimators α φ θ satisfies

Asymptotic Properties
In this section, we will establish the consistency of bootstrapping under general conditions in the framework of dual divergence estimation. Define, for a measurable function f · , where W ni 's are the bootstrap weights defined on the probability space W, Ω, P W . In view of 2.10 , the bootstrap estimator can be rewritten as The definition of α * φ θ , defined in 3.2 , implies that The bootstrap weights W ni 's are assumed to belong to the class of exchangeable bootstrap weights introduced in 23 . In the sequel, the transpose of a vector x will be denoted by x . We will assume the following conditions. W.1 The vector W n W n1 , . . . , W nn is exchangeable for all n 1, 2, . . .; that is, for any permutation π π 1 , . . . , π n of 1, . . . , n , the joint distribution of π W n W nπ 1 , . . . , W nπ n is the same as that of W n .
W.2 W ni ≥ 0 for all n, i and n i 1 W ni n for all n.
In Efron's nonparametric bootstrap, the bootstrap sample is drawn from the nonparametric estimate of the true distribution, that is, empirical distribution. Thus, it is easy to show that W n ∼ Multinomial n; n −1 , . . . , n −1 and conditions W.1 -W.5 are satisfied. In general, conditions W.3 -W.5 are easily satisfied under some moment conditions on W ni ; see 23, Lemma 3.1 . In addition to Efron's nonparametric bootstrap, the sampling schemes that satisfy conditions W.1 -W.5 include Bayesian bootstrap, Multiplier bootstrap, Double bootstrap, and Urn bootstrap. This list is sufficiently long to indicate that conditions W.1 -W.5 , are not unduly restrictive. Notice that the value of c in W.5 is independent of n and depends on the resampling method, for example, c 1 for the nonparametric bootstrap and Bayesian bootstrap and c √ 2 for the double bootstrap. A more precise discussion of this general formulation of the bootstrap can be found in 23, 34, 35 . There exist two sources of randomness for the bootstrapped quantity, that is, α * φ θ : the first comes from the observed data and the second is due to the resampling done by the bootstrap, that is, random W ni 's. Therefore, in order to rigorously state our main theoretical results for the general bootstrap of φ-divergence estimates, we need to specify relevant probability spaces and define stochastic orders with respect to relevant probability measures. Following 6, 36 , we will view X i as the ith coordinate projection from the canonical probability space X ∞ , A ∞ , P ∞ θ 0 onto the ith copy of X. For the joint randomness involved, the product probability space is defined as Throughout the paper, we assume that the bootstrap weights W ni 's are independent of the data X i 's, thus Given a real-valued function Δ n defined on the above product probability space, for example, α * φ θ , we say that Δ n is of an order O o P W 1 in P θ 0 -probability if, for any , η > 0, as n → 0, and that Δ n is of an order O o P W 1 in P θ 0 -probability if, for any η > 0, there exists a 0 < M < ∞ such that, as n → 0, where the superscript "o" denotes the outer probability; see 34 for more details on outer probability measures. For more details on stochastic orders, the interested reader may refer to 6 , in particular, Lemma 3 of the cited reference.
To establish the consistency of α * φ θ , the following conditions are assumed in our analysis.

Journal of Probability and Statistics
A.1 One has for any open set N θ 0 ⊂ Θ containing θ 0 .
A.2 One has The following theorem gives the consistency of the bootstrapped estimate α * φ θ .

Theorem 3.1. Assume that conditions (A.1) and (A.2) hold. Suppose that conditions (A.3)-(A.5) and
The proof of Theorem 3.1 is postponed until the appendix. We need the following definitions; refer to 34, 37 among others. If F is a class of functions for which, we have almost surely, then we say that F is a P-Glivenko-Cantelli class of functions. If F is a class of functions for which where G is a mean-zero P-Brownian bridge process with uniformly continuous sample paths with respect to the semimetric ρ P f, g , defined by then we say that F is a P-Donsker class of functions. Here and G is a P-Brownian bridge process on F if it is a mean-zero Gaussian process with covariance function Remark 3.2. i Condition A.1 is the "well-separated" condition, compactness of the parameter space Θ and the continuity of divergence imply that the optimum is well separated, provided the parametric model is identified; see 37, Theorem 5.7 .
ii Condition A.2 holds if the class is shown to be P-Glivenko-Cantelli, by applying 34, Lemma 3.6.16 and 6, Lemma A.1 .
For any fixed δ n > 0, define the class of functions H n andḢ n as

3.19
We will say a class of functions H ∈ M P θ 0 if H possesses enough measurability for randomization with i.i.d multipliers to be possible, that is, P n can be randomized, in other words, we can replace To state our result concerning the asymptotic normality, we will assume the following additional conditions.

A.3 The matrices
are nonsingular.
A. 4 The class H n ∈ M P θ 0 ∩ L 2 P θ 0 and is P-Donsker.
Conditions A.4 and A.5 ensure that the "size" of the function classes H n andḢ n are reasonable so that the bootstrapped empirical processes indexed, respectively, by H n andḢ n , have a limiting process conditional on the original observations; we refer, for instance, to 23, Theorem 2.2 . The main result to be proved here may now be stated precisely as follows.

10
Journal of Probability and Statistics Theorem 3.3. Assume that α φ θ and α * φ θ fulfil 2.12 and 3.3 , respectively. In addition suppose that

Assume that conditions (A.3)-(A.5) and (W.1)-(W.5) hold. Then one has
where "≤" is taken componentwise and "c" is given in (W.5), whose value depends on the used sampling scheme, and where S and V are given in condition (A.3). Thus, one has The proof of Theorem 3.1 is captured in the forthcoming appendix. Following 6 , we will illustrate how to apply our results to construct the confidence sets. A lower th quantile of bootstrap distribution is defined to be any q * n ∈ R d fulfilling where x is an infimum over the given set only if there does not exist a x 1 < x in R d such that Keep in mind the assumed regularity conditions on the criterion function, that is, h θ, α in the present framework, we can, without loss of generality, suppose that

3.30
Making use of the distribution consistency result given in 3.27 , we can approximate the th quantile of the distribution of Therefore, we define the percentile-type bootstrap confidence set as In a similar manner, the th quantile of √ n α φ θ − θ 0 can be approximated by q * n , where q * n is the th quantile of the hybrid quantity

3.33
Note that Thus, the hybrid-type bootstrap confidence set would be defined as follows: Note that q * n and q * n are not unique by the fact that we assume θ is a vector. Recall that, for any x ∈ R d , According to the quantile convergence theorem, that is, 37, Lemma 21.1 , we have, almost surely, When applying quantile convergence theorem, we use the almost sure representation, that is, 37, Theorem 2.19 , and argue along subsequences. Considering Slutsky's Theorem which ensures that we further have

3.40
The above arguments prove the consistency of the hybrid-type bootstrap confidence set, that is, 3.42 , and can also be applied to the percentile-type bootstrap confidence set, that is, 3.41 .
For an in-depth study and more rigorous proof, we may refer to 37, Lemma 23.3 . The above discussion may be summarized as follows.

Corollary 3.5.
Under the conditions in Theorem 3.3, one has, as n → ∞, It is well known that the above bootstrap confidence sets can be obtained easily through routine bootstrap sampling.
Remark 3.6. Notice that the choice of weights depends on the problem at hand: accuracy of the estimation of the entire distribution of the statistic, accuracy of a confidence interval, accuracy in large deviation sense, and accuracy for a finite sample size; we may refer to 38 and the references therein for more details. Barbe

On the Choice of the Escort Parameter
The very peculiar choice of the escort parameter defined through θ θ 0 has the same limit properties as the MLE one. The DφDE α φ θ 0 , in this case, has variance which indeed coincides with the MLE one; see, for instance, 28, Theorem 2.2, 1 b . This result is of some relevance, since it leaves open the choice of the divergence, while keeping good asymptotic properties. For data generated from the distribution N 0, 1 , Figure 1 shows that the global maximum of the empirical criterion P n h θ n , α is zero, independently of the value of the escort parameter θ n the sample mean X n −1 n i 1 X i , in Figure 1 a and the median in Figure 1 b for all the considered divergences which is in agreement with the result of 39, Theorem 6 , where it is showed that all differentiable divergences produce the same estimator of the parameter on any regular exponential family, in particular the normal models, which is the MLE one, provided that the conditions 2.6 and D φ θ, α < ∞ are satisfied.
Unlike the case of data without contamination, the choice of the escort parameter is crucial in the estimation method in the presence of outliers. We plot in Figure 2 the empirical criterion P n h θ n , α , where the data are generated from the distribution where 0.1, θ 0 0, and δ x stands for the Dirac measure at x. Under contamination, when we take the empirical "mean," θ n X, as the value of the escort parameter θ, Figure 2 a shows how the global maximum of the empirical criterion P n h θ n , α shifts from zero to the contamination point. In Figure 2 b , the choice of the "median" as escort parameter value leads to the position of the global maximum remaining close to α 0, for Hellinger γ 0.5 , χ 2 γ 2 , and KL-divergence γ 1 , while the criterion associated to the KL m -divergence γ 0, the maximum is the MLE is still affected by the presence of outliers.
In practice, the consequence is that if the data are subject to contamination the escort parameter should be chosen as a robust estimator of θ 0 , say θ n . For more details about the performances of dual φ-divergence estimators for normal density models, we refer to 40 .

Examples
Keep in mind the definitions 2.8 and 2.9 . In what follows, for easy reference and completeness, we give some usual examples of divergences, discussed in 41, 42 , of divergences and the associated estimates; we may refer also to 43 for more examples and details.
i Our first example is the Kullback-Leibler divergence:

4.1
The estimate of D KL θ, θ 0 is given by Journal of Probability and Statistics Figure 1: Criterion for the normal location model. and the estimate of the parameter θ 0 , with escort parameter θ, is defined as follows: ii The second one is the χ 2 -divergence:

4.4
The estimate of D χ 2 θ, θ 0 is given by and the estimate of the parameter θ 0 , with escort parameter θ, is defined by iii Another example is the Hellinger divergence:

4.7
The estimate of D H θ, θ 0 is given by and the estimate of the parameter θ 0 , with escort parameter θ, is defined by iv All the above examples are particular cases of the so-called "power divergences," which are defined through the class of convex real-valued functions, for γ in R \ {0, 1}, x ∈ R * −→ ϕ γ x :

Journal of Probability and Statistics
The estimate of D γ θ, θ 0 is given by and the parameter estimate is defined by

4.12
Remark 4.1. The computation of the estimate α φ θ requires calculus of the integral in the formula 2.9 . This integral can be explicitly calculated for the most standard parametric models. Below, we give a closed-form expression for Normal, log-Normal, Exponential, Gamma, Weibull, and Pareto density models. Hence, the computation of α φ θ can be performed by any standard nonlinear optimization code. Unfortunately, the explicit formula of α φ θ , generally, cannot be derived, which also is the case for the ML method. In practical problems, to obtain the estimate α φ θ , one can use the Newton-Raphson algorithm taking as initial point the escort parameter θ. This algorithm is a powerful technique for solving equations numerically, performs well since the objective functions α ∈ Θ → P θ 0 h θ, α are concave and the estimated parameter is unique for functions α ∈ Θ → P n h θ, α ; for instance, refer to 1, Remark 3.5 .

Example of Normal Density
Consider the case of power divergences and the Normal model N θ, σ 2 : θ, σ 2 ∈ Θ R × R * .

4.15
This yields

Example of Log-Normal Density
Consider the case of power divergences and the log-Normal model

Journal of Probability and Statistics
Simple calculus gives, for γ in R \ {0, 1},

4.21
This yields

Example of Exponential Density
Consider the case of power divergences and the Exponential model We have, for γ in R \ {0, 1},

4.24
Then, using this last equality, one finds

4.25
In more general case, we may consider the Gamma density combined with the power divergence. The Gamma model is defined by

Journal of Probability and Statistics 19
where Γ · is the Gamma function Simple calculus gives, for γ in R \ {0, 1},

Example of Weibull Density
Consider the case of power divergences and the Weibull density model, with the assumption that k ∈ R * is known and θ is the parameter of interest to be estimated, and recall that Routine algebra gives, for γ in R \ {0, 1},

20
Journal of Probability and Statistics

Example of the Pareto Density
Consider the case of power divergences and the Pareto density Simple calculus gives, for γ in R \ {0, 1},

4.34
As before, using this last equality, one finds

4.35
For γ 0, Remark 4.2. The choice of divergence, that is, the statistical criterion, depends crucially on the problem at hand. For example, the χ 2 -divergence among various divergences in the nonstandard problem e.g., boundary problem estimation is more appropriate. The idea is to include the parameter domain Θ into an enlarged space, say Θ e , in order to render the boundary value an interior point of the new parameter space, Θ e . Indeed, Kullback-Leibler, modified Kullback-Leibler, modified χ 2 , and Hellinger divergences are infinite when dQ/dP takes negative values on nonnegligible with respect to P subset of the support of P, since the corresponding φ · is infinite on −∞, 0 , when θ belongs to Θ e \ Θ. This problem does not hold in the case of χ 2 -divergence, in fact, the corresponding φ · is finite on R; for more details refer to 41, 42, 44 , and consult also 1, 45 for related matter. It is well known that when the underlying model is misspecified or when the data are contaminated the maximum likelihood or other classical parametric methods may be severely affected and lead to very poor results. Therefore, robust methods, which automatically circumvent the contamination effects and model misspecification, can be used to provide a compromise between efficient classical parametric methods and the semiparametric approach provided they are reasonably efficient at the model; this problem has been investigated in 46, 47 . In 41, 42 , simulation results show that the choice of χ 2 -divergence has good properties in terms of efficiency robustness. We mention that some progress has been made on automatic databased selection of the tuning parameter α > 0, appearing in formula 1 of 47 ; the interested reader is referred to 48, 49 . It is mentioned in 50 , where semiparametric minimum distance estimators are considered, that the MLE or inversion-type estimators involve solving a nonlinear equation which depends on some initial value. The second difficulty is that the objective function is not convex in θ, in general, which give the situation of multiple roots. Thus, in general, "good" consistent initial estimates are necessary and the DφDE should serve that purpose.

Random Right Censoring
Let T T 1 , . . . , T n be i.i.d survival times with continuous survival function 1 − F θ 0 · 1 − P θ 0 T ≤ · , and let C 1 , . . . , C n be independent censoring times with d.f. G · . In the censoring setup, we observe only the pair Y i min T i , C i and δ i 1{T i ≤ C i }, where 1{·} is the indicator function of the event {·}, which designs whether an observation has been censored or not. Let Y 1 , δ 1 , . . . , Y n , δ n denote the observed data points, and let t 1 < t 2 < · · · < t k 5.1 be the k distinct death times. Now define the death set and risk set as follows: for j 1, . . . , k, D j : i : y i t j , δ i 1 , R j : i : y i ≥ t j .

5.3
One may define a generally exchangeable weighted bootstrap scheme for the Kaplan-Meier estimator and related functionals as follows cf. 38, page 1598 : 1 − F * n t :

5.7
In the similar way, we define a more appropriate representation, which will be used in the sequel, as follows:

5.9
Here, Y 1:n ≤ · · · ≤ Y n:n are ordered Y -values and δ i:n denotes the concomitant associated with Y i:n . Hence, we may write P * n : n j 1 π jn δ Y i:n .

5.10
For the right censoring situation, the bootstrap DφDEs, is defined by replacing P n in 2.10 by P * n , that is,

5.11
The corresponding estimating equation for the unknown parameter is then given by where we recall that Formula 5.11 defines a family of M-estimators for censored data. In the case of the power divergences family 2.4 , it follows that from 4.11 and, for 1 ≤ j ≤ n, ω jn δ j:n n − j 1

5.16
Consider the lifetime distribution to be the one-parameter exponential exp θ with density θe −θx , x ≥ 0. Following 52 , the Kaplan-Meier integral h θ, α d P n may be written as

Journal of Probability and Statistics
For γ 0, Observe that this divergence leads to the AMLE, independently upon the value of θ.

5.22
For more details about dual φ-divergence estimators in right censoring, we refer to 54 ; we leave this study open for future research. We mention that the bootstrapped estimators, in this framework, are obtained by replacing the weights ω jn by π jn in the preceding formulas.

Simulations
In this section, series of experiments were conducted in order to examine the performance of the proposed random weighted bootstrap procedure of the DφDEs, defined in 3.2 . We provide numerical illustrations regarding the mean-squared error MSE and the coverage probabilities. The computing program codes were implemented in R.
The values of γ are chosen to be −1, 0, 0.5, 1, 2, which corresponds, as indicated above, to the well-known standard divergences: χ 2 m -divergence, KL m , the Hellinger distance, KL, and the χ 2 -divergence, respectively. The samples of sizes considered in our simulations are 25, 50, 75, 100, 150, and 200, and the estimates, DφDEs α φ θ , are obtained from 500 independent runs. The value of escort parameter θ is taken to be the MLE, which, under the model, is a consistent estimate of θ 0 , and the limit distribution of the DφDE α φ θ 0 , in this case, has variance which indeed coincides with the MLE; for more details on this subject, we refer to 28, Theorem 2.2, 1 b , as it is mentioned in Section 3.1. The bootstrap weights are chosen to be W n1 , . . . , W nn ∼ Dirichlet n; 1, . . . , 1 . 6.1 In Figure 3, we plot the densities of the different estimates, it shows that the proposed estimators perform reasonably well. Tables 1 and 2       estimators. Notice that the empirical coverage probabilities as in any other inferential context, the greater the sample size, the better. From the results reported in these tables, we find that for large values of the sample size n, the empirical coverage probabilities are all close to the nominal level. One can see that the DφDE with γ 2 has the best empirical coverage probability which is near the assigned nominal level.

Right Censoring Case
This subsection presents some simulations for right censoring case discussed in Section 5. A sample is generated from exp 1 and an exponential censoring scheme is used; the censoring distribution is taken to be exp 1/9 that the proportion of censoring is 10%. To study the robustness properties of our estimators, 20% of the observations are contaminated by exp 5 . The DφDE's α φ θ are calculated for samples of sizes 25, 50, 100, and 150, and the hole procedure is repeated 500 times. We can see from Table 9 that the DφDEs perform well under the model in terms of MSE and are an attractive alternative to the AMLE. Table 10 shows the variation in coverage of nominal 95% asymptotic confidence intervals according to the sample size. Clearly, under coverage of the confidence intervals, the DφDEs have poor coverage probabilities due to the censoring effect. However, for smalland moderate-sized samples, the DφDEs associated to γ 2 outperform the AMLE.
Under contamination the performances of our estimators decrease considerably. Such findings are evidences for the need for more adequate procedures for right-censored data Tables 11 and 12 .
Remark 6.1. In order to extract methodological recommendations for the use of an appropriate divergence, it will be interesting to conduct extensive Monte Carlo experiments for several divergences or investigate theoretically the problem of the choice of the divergence which leads to an "optimal" in some sense estimate in terms of efficiency and robustness, which would go well beyond the scope of the present paper. Another challenging task is how to choose the bootstrap weights for a given divergence in order to obtain, for example, an efficient estimator.       Consequently, we have the following inequality: A.3 According to 23, Theorem 2.2 under condition A.4 , we have G 1 O o P W 1 in P θ 0probability. In view of the CLT, we have G 2 O P θ 0 1 . By applying a Taylor series expansion, we have where α is between α * φ θ and θ 0 . By condition A.5 and 23, Theorem 2.2 , we conclude that the right term in A.4 is of order O o P W α * φ θ − θ 0 in P θ 0 -probability. By the fact that α * φ θ is assumed to be consistent, we have G 3 o o P W 1 in P θ 0 -probability. An analogous argument yields that is of order O P θ 0 α * φ θ − θ 0 , by the consistency of α * φ θ , we have G 4 o o P W 1 in P θ 0probability. Finally, G 5 0 based on 3.3 . In summary, A.3 can be rewritten as follows: in P θ 0 -probability. On the other hand, by a Taylor series expansion, we can write Clearly it is straightforward to combine A.7 with A.6 , to infer the following: A.14 By combining 3.25 and A.14 , we readily obtain the desired conclusion 3.27 .