JPS Journal of Probability and Statistics 1687-9538 1687-952X Hindawi Publishing Corporation 10.1155/2014/913621 913621 Research Article An Analysis of a Heuristic Procedure to Evaluate Tail (in)dependence http://orcid.org/0000-0001-7247-3825 Ferreira Marta 1 http://orcid.org/0000-0001-7298-3980 Silva Sérgio 2 Zitikis Ricardas 1 Department of Mathematics and Applications Center of Mathematics Minho University Campus de Gualtar, 4710-057 Braga Portugal uminho.pt 2 Department of Mathematics and Applications Minho University Campus de Gualtar, 4710-057 Braga Portugal uminho.pt 2014 2172014 2014 29 01 2014 02 07 2014 21 7 2014 2014 Copyright © 2014 Marta Ferreira and Sérgio Silva. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Measuring tail dependence is an important issue in many applied sciences in order to quantify the risk of simultaneous extreme events. A usual measure is given by the tail dependence coefficient. The characteristics of events behave quite differently as these become more extreme, whereas we are in the class of asymptotic dependence or in the class of asymptotic independence. The literature has emphasized the asymptotic dependent class but wrongly infers that tail dependence will result in the overestimation of extreme value dependence and consequently of the risk. In this paper we analyze this issue through simulation based on a heuristic procedure.

1. Introduction

The degree of association between concurrent rainfall extremes at different locations may lead to a better understanding of extreme rainfall events, a very important matter due to their severe impacts on the economy and the environment. The globalization and an absence of market regulation increased the dependence on financial asset returns and thus the risk of simultaneous crashes. Pearson’s correlation is not an appropriate measure of dependence whenever extreme realizations are important. It gives the same weight to extreme values as for all of the other observations, and the dependence characteristics for extreme realizations may differ from all others in the sample. For more details, see for example, Embrechts et al. . The most used measure of tail dependence is the so called tail dependence coefficient (TDC), a concept introduced by Sibuya , which is defined as follows: (1)λ=limt0P(F2(X2)>1-tF1(X1)>1-t), where F1 and F2 are the distribution functions (d.f.’s) of the random variables (r.v.’s) X1 and X2, respectively, which are considered continuous. Observe that the TDC can also be formulated through the copula function introduced by Sklar . A copula function C is a d.f. whose marginals are standard uniform; that is, if C is the copula function of (X1,X2), having joint d.f. F, then (2)F(x1,x2)=P(F1(X1)F1(x1),F2(X2)F2(x2))=C(F1(x1),F2(x2)), and thus (3)λ=2-limt01-C(1-t,1-t)t. If 0<λ1, the r.v.’s X1 and X2 are said to be tail dependent whose degree of dependence is measured through λ (λ=1 means total dependence in the tail). The case λ=0, corresponds to asymptotic independence in the tail. However, as noticed in Ledford and Tawn [4, 5], it may occur a residual tail dependence captured through the convergence rate of P(F1(X1)>1-t,F2(X2)>1-t) towards zero, as t0. More precisely, consider (4)P(F1(X1)>1-t,F2(X2)>1-t)=t1/ηL(t), where L is a slow varying function at 0, that is, L(tx)/L(t)1, as t0 and x>0. The parameter η(0,1], known as Ledford and Tawn coefficient, measures the residual tail dependence and the function L the relative strength of dependence given a particular value of η. Observe that η=1 and L(t) converging to some positive constant corresponds to tail dependence (λ>0), whilst η<1 means tail independence. If η=1/2, we have (almost) perfect independence (perfect if L(t)=1), and for 0<η<1/2 or 1/2<η<1 we have association, respectively, negative (i.e., P(F1(X1)>1-t,F2(X2)>1-t)<P(F1(X1)>1-t)P(F2(X2)>1-t)) or positive (i.e., P(F1(X1)>1-t,F2(X2)>1-t)>P(F1(X1)>1-t)P(F2(X2)>1-t)).

Relation (4) also means that the function q(t)=P(F1(X1)>1-t,F2(X2)>1-t) is regularly varying (of first order) with index 1/η. In Draisma et al. , it is considered a refinement of this relation under a second order regularly varying condition. More precisely, it is assumed that the limit (5)limt0(P(F1(X1)>1-tx,F2(X2)>1-ty)q(t)-c(x,y))×(q1(t))-1=c1(x,y) exists, for all x,y0 and x+y>0, with q1(t)0, as t0, being a regularly varying function of index τ0 and c1 a nonconstant function and nonmultiple of c. It is also assumed that the convergence is uniform in {(x,y)[0,)2:  x2+y2=1}, that l=limt0q(t)/t exists,and, without loss of generality, c(1,1)=1. In addition, the function c is homogeneous of order 1/η; that is, c(tx,ty)=t1/ηc(x,y).

Now observe that (6)P(F1(X1)>1-t,F2(X2)>1-t)=P(W1>1t,W2>1t), with Wj=(1-Fj(Xj))-1, j=1,2, and hence we can write (7)P(W1>t,W2>t)=t-1/ηL(t-1). Therefore, η corresponds to the tail index of (8)T=min(W1,W2), and thus it can be estimated as so. The second order regularly varying condition in (5) allows to derive the asymptotics of the estimator Draisma et al. [6, Theorem 2.1]. This will be addressed in Section 2.

An alternative measure for the residual tail dependence was introduced in Coles et al. . By considering (9)P(F1(X1)>1-t,F2(X2)>1-t)=[P(F1(X1)>1-t)P(F2(X2)>1-t)]1/(2η), and applying logarithms to both members, we derive (10)12η=logP(F1(X1)>1-t,F2(X2)>1-t)logP(F1(X1)>1-t)+logP(F2(X2)>1-t),

or (11)χ¯=2η-1=2logtlogC¯(1-t,1-t)-1, with χ¯[-1,1] and C¯ corresponding to the survival copula; that is, (12)F¯(x1,x2)=P(F1(X1)>F1(x1),F2(X2)>F2(x2))=C¯(F1(x1),F2(x2)). Observe that χ¯<1 means tail independence (λ=0) and if χ¯=1, we have tail dependence (λ>0). We also have positive and negative associations whenever χ¯>0 and χ¯<0, respectively, with χ¯=0 corresponding to (almost) exact independence. Estimators for λ and χ¯ based on the expressions (3) and (11), respectively, will be also presented in Section 2.

The behavior of events within the class of asymptotic dependence is quite different from the one detected in the class of asymptotic independence. Both forms allow dependence between moderately large values of each variable, but only when the variables exhibit tail dependence the very largest values from each variable can occur together. If we wrongly infer tail dependence, we will overestimate the dependence on the high values and consequently the risk. This overestimation is related to the degree of residual dependence which is measured through η or χ¯. Therefore, it is important to assess whether a data set presents tail dependence or independence and to quantify the degree of dependence for the appropriate dependence class. This can be done through the estimation of λ and of η (or χ¯) together with tests for tail independence. These topics can be found in many references such as Huang , Joe , Coles et al. , Frahm et al. , and Schmidt and Stadtmüller  for the TDC estimation, Ledford and Tawn [4, 5] and Peng  concerning the estimation of η, and Coles et al.  for the χ¯ estimation. The tail independence tests can be seen in, for example, Poon et al.  and Draisma et al. .

Most of the nonparametric estimation of extremal parameters requires the choice of the number k of upper order statistics to be used in it. A paradigmatic example is the univariate tail index estimation of regularly varying distributions (for a survey, see Beirlant et al.  2012 and references therein). A similar problem exists for tail dependence estimation. In practice, we have to deal with a trade-off between variance and bias, since small values of k correspond to larger variance whilst large values of k increase the bias of the estimators. Figure 1 illustrates this issue. Observe that the true value (horizontal line) can be inferred from a kind of first stability region within the sample path of estimators. In order to overcome this problem, Frahm et al.  developed a heuristic procedure, where k is estimated based on a simple plateau-finding algorithm after smoothing the latter plot by some box kernel. They proposed some values for the bandwidth but no study was carried out in order to evaluate possible choices. In this paper we address this issue through a simulation study, by applying the heuristic procedure to nonparametric estimators of the TDC. In addition, we also analyze the performance of the procedure when applied to the estimation of η and χ¯, as well as, within the context of the referred tail independence tests (Section 4). An illustration with financial data is presented in Section 5. We end with some final remarks (Section 6).

Sample path of estimator λ^LOG (a) and estimator λ^SEC (b), plotted against k/n, 1k<n, considering n=1000 realizations of a bivariate Student t. The horizontal lines correspond to the true values.

2. Inference on the Extremal (in)dependence

Consider (X1(1),X2(1)),,(X1(n),X2(n)) independent and identically distributed (i.i.d.) copies of the random pair (X1,X2). From (3), it is possible to deduce the estimator : (13)λ^SECλ^SEC(k)=2-1-C^(1-k/n,1-k/n)k/n,aaaaaaaaaaaaaaaaaaaaaaaaaaaaa1k<n. By using log(1-t)~-t, with t0, it can be derived the estimator : (14)λ^LOGλ^LOG(k)=2-logC^(1-k/n,1-k/n)log(1-k/n),aaaaaaaaaaaaaaaaaaaaaaaaaaaaa1k<n, where C^ denotes the empirical copula given by (15)C^(1-kn,1-kn)=1ni=1n1{F^1(X1(i))1-k/n,F^2(X2(i))1-k/n},aaaaaaaaaaaaaaaaaaaaa1k<n, with 1 denoting the indicator function and F^j, j=1,2, the marginal empirical d.f.’s of X1 and X2, respectively. For more accurate estimates, it is considered (16)F^j(u)=1n+1i=1n1{Xj(i)u},  j=1,2. See Beirlant et al. [16, Section 9.4.1] for more details. Note that both estimators depend on the parameter k and the number of upper order statistics involved in the estimation. The choice of k is of major difficulty within these estimators because of the compromise between variance and bias explained in the introduction. To ensure properties as asymptotic normality and consistency it is necessary to assume that kkn is an intermediate sequence, that is, (17)k,k/n0,asn,

(see Huang  and Schmidt and Stadtmüller ).

We have already seen that, by considering (7), coefficient η corresponds to the tail index of the r.v. T defined in (8). The tail index estimation has been largely exploited in literature and a survey on this topic can be seen in, for example, Beirlant et al. . The most used estimator within positive tail indexes is the Hill estimator Hill . More precisely, considering in (8) the respective empirical counterparts, we have (18)Ti(n)=min(W^1,i,W^2,i),i=1,,n with W^j,i=(1-F^j(Xj,i))-1 and F^j given in (16), j=1,2, i=1,,n. Thus, considering the order statistics, Tn:n(n)Tn:n-1(n)Tn:n-k(n), the Hill estimator for coefficient η is given by (19)η^η^(k)=1ki=1klogTn:n-i+1(n)Tn:n-k(n),1k<n. Observe that η^ is also a function of the parameter k, under the same conditions described above, thus suffering from the same problem involving the bias and variance.

Observe that, from the first equality in (11), we can derive the estimator (20)χ¯~=2η^-1, with η^ given in (19). From the second equality in (11), it is obtained the estimator  (21)χ¯^χ¯^(k)=2log(k/n)logC¯^(1-k/n,1-k/n)-1,1k<n, where C¯^ denotes the empirical survival copula, (22)C¯^(1-kn,1-kn)=1ni=1n1{F^1(X1(i))>1-k/n,F^2(X2(i))>1-k/n},aaaaaaaaaaaaaaaaaaaa1k<n, with F^j, j=1,2, given in (16). Once again, we have dependency on the parameter k.

In most of the cases, the TDC estimators do not behave well under asymptotic independence, that is, whenever λ=0 (see, e.g., Frahm et al.  and Ferreira ). A possible way to deal with this problem is to consider preliminar tests for tail independence. Poon et al.  suggest to test H0:η=1 versus H1:η<1, that is, dependence versus independence, based on estimator η^ in (19). Considering kkn an intermediate sequence and under some quite general additional conditions, we have k(η-η^) approximately N(0,η2), as n. Thus, we reject H0 in favor of H1, at the significance level α, if (23)η^+z1-αη^k<1, where z1-α denotes the (1-α)-quantile of N(0,1).

An analogous test was developed in Draisma et al. , based on relation (5). More precisely, assuming that (5) holds for a function c with first derivatives cx=c(x,y)/x and cy=c(x,y)/y and considering kkn an intermediate sequence such that kq1(q-1(k/n))0, with n, then k(η^-η) is asymptotically normal with null mean value and variance: (24)σ2=η2(1-l)(1-2lcx(1,1)cy(1,1)). Consider (25)l^=knTn:n-k(n),c^x(1,1)=k^5/4n(Tn:n-k(n,k^5/4)-Tn:n-k(n)), with k^=k/l^ and Tn:i(n,u), i=1,,n, the ordinal statistics of (26)Ti(n,u)=min(W^1,i(1+u),W^2,i),i=1,,n. Defining similarly c^y(1,1), if (5) holds under the above mentioned conditions, then l^Pl, where P denotes convergence in probability. Moreover, if η=1, then (27)c^x(1,1)Pcx(1,1),c^y(1,1)Pcy(1,1),σ^Pσ, where (28)σ^2=η^2(1-l^)(1-2l^  c^x(1,1)c^y(1,1)), with η^ corresponding to the Hill estimator of η, given in (19). Therefore, for the same test hypotheses, we reject H0 if (29)η^+z1-αη^k(1-l^)(1-2l^  c^x(1,1)c^y(1,1))<1.

Observe that the variance in test (29) includes a correctness factor when compared with the one in (23). This will render its value slightly smaller, making the test more accurate under tail independence, as shall be seen in the simulations afterwards.

3. The Heuristic Procedure

In this section we describe the “plateau-finding” heuristic procedure presented in Frahm et al. . A stability on the sample path of the graph (k,λ^(k)), 1k<n, for high thresholds (small values of k) is observed once the diagonal section of the copula is expected to be smooth in the neighborhood of 1 and the first derivative approximately constant. However, in order to decrease variance, k cannot be too small (see Figure 1). The algorithm proposed in Frahm et al.  aims to identify the plateau, that is, the stability region which is induced by the homogeneity. More precisely, first we smooth the graph (k,λ^(k)) by a box kernel with bandwidth w=bnN consisting of the means of 2w+1 successive points of λ^(i), i=1,,n. Now, in the smoothed moving average values, λ^¯(1),,λ^¯(n-2w), the plateaus with length m=n-2w are defined as pk=(λ^¯(k),,λ^¯(k+m-1)), k=1,,n-2w-m+1. The algorithm stops at the first plateau fulfilling the criterium (30)i=k+1k+m-1|λ^¯(i)-λ^¯(k)|2σ, with σ corresponding to the standard deviation of λ^¯(1),,λ^¯(n-2w), and the TDC estimate corresponding to (31)λ^=1mi=1mλ^¯(k+i-1). If no plateau fulfills the stopping condition, the TDC is estimated as zero.

Observe that, if the diagonal section of the copula follows a power law, the homogeneity of λ^LOG still holds for larger k and larger bandwidths may be chosen in order to reduce the variance.

4. Simulations

We simulate 1000 independent random samples of sizes n=250,1000,2500, from the models:

bivariate Normal with ρ=0.5 and ρ=0.85 (λ=0; η=0.75,0.925, resp.);

bivariate Student t with ρ=0.5, ν=1.5 and ρ=0, ν=2 (λ=0.4406,0.2254, resp.; η=1);

logistic with dependence parameter r=1/1.56 (λ=0.4406; η=1) Ledford and Tawn [4, 5];

asymmetric logistic with dependence parameter r=1/2.78 and asymmetry parameters t1=0.5 and t2=0.9 (λ=0.4406; η=1) Ledford and Tawn [4, 5];

Morgenstern with dependence parameter r=0.75 (λ=0; η=0.5) Ledford and Tawn [4, 5];

We apply the algorithm described in Section 3 to the tail dependence coefficients estimated by λ^SEC, λ^LOG, η^, and χ¯^, defined in, respectively, (13), (14), (19), and (21), as well as, to the tail independence tests (23) and (29). In the sequel we denote (23) as test 1 and (29) as test 2. The variances within test 1 and test 2, respectively, σ^12=η^2/k and σ^22=σ^2/k with σ^2 given in (28), are estimated by applying the algorithm to the plots (k,σ^i2(k)), i=1,2, but we pick the plateau at the same location of the one given by the respective coefficient estimation. In all the cases we consider the values b=0.0025,0.005,0.01,0.015. The boundary cases of a bivariate Normal with ρ=0.85 (tail independent model but with η=0.9251), and a bivariate Student t with ρ=0 and ν=2 (tail dependent model with a very low TDC of 0.2254) are included in simulations in order to assess the robustness of the method.

The absolute bias and the root mean squared error obtained in the simulation results of λ^SEC, λ^LOG, η^, and χ¯^ are reported in Tables 1, 2, 3, and 4, respectively, and also plotted in Figures 2, 3, 4, and 5.

Absolute bias and root mean squared error (rmse) of estimator λ^SEC.

n = 250 n = 1000 n = 2500
abs. bias (rmse) abs. bias (rmse) abs. bias (rmse)
b = 0.0025
Normal (ρ=0.5) 0.1920 (0.2187) 0.1573 (0.1709) 0.1363 (0.1447)
Normal (ρ=0.85) 0.5347 (0.5444) 0.4860 (0.4915) 0.4572 (0.4610)
Student t (ρ=0.5) 0.0098 (0.1262) 0.0022 (0.0875) 0.0048 (0.0666)
Student t (ρ=0) 0.0057 (0.1091) 0.0016 (0.0773) 0.0037 (0.0601)
Logistic 0.4439 (0.4596) 0.0069 (0.0905) 0.0008 (0.0646)
A. Logistic 0.0007 (0.1254) 0.0057 (0.0921) 0.0001 (0.0701)
Morgenstern 0.0542 (0.0770) 0.0302 (0.0421) 0.0213 (0.0287)
b = 0.005
Normal (ρ=0.5) 0.2055 (0.2298) 0.1650 (0.1772) 0.1471 (0.1541)
Normal (ρ=0.85) 0.5418 (0.5510) 0.4895 (0.4946) 0.4655 (0.4688)
Student t (ρ=0.5) 0.0057 (0.1203) 0.0015 (0.0837) 0.0027 (0.0616)
Studentt (ρ=0) 0.0049 (0.1096) 0.0032 (0.0735) 0.0047 (0.0549)
Logistic 0.0067 (0.1153) 0.0076 (0.0878) 0.0010 (0.0596)
A. Logistic 0.0043 (0.1220) 0.0045 (0.0888) 0.0021 (0.0646)
Morgenstern 0.0614 (0.0841) 0.0334 (0.0448) 0.0256 (0.0322)
b = 0.01
Normal (ρ=0.5) 0.2148 (0.2364) 0.1808 (0.1904) 0.1650 (0.1701)
Normal (ρ=0.85) 0.5461 (0.5552) 0.4997 (0.5038) 0.4828 (0.4852)
Student t (ρ=0.5) 0.0043 (0.1158) 0.0006 (0.0750) 0.0001 (0.0542)
Student t (ρ=0) 0.0089 (0.1057) 0.0062 (0.0660) 0.0061 (0.0476)
Logistic 0.0081 (0.1097) 0.0101 (0.0803) 0.0044 (0.0518)
A. Logistic 0.0054 (0.1170) 0.0004 (0.0798) 0.0056 (0.0566)
Morgenstern 0.0683 (0.0898) 0.0414 (0.0513) 0.0338 (0.0391)
b = 0.015
Normal (ρ=0.5) 0.2234 (0.2428) 0.1939 (0.2017) 0.1810 (0.1849)
Normal (ρ=0.85) 0.5515 (0.5598) 0.5101 (0.5134) 0.4989 (0.5006)
Student t (ρ=0.5) 0.0031 (0.1117) 0.0023 (0.0682) 0.0018 (0.0480)
Student t (ρ=0) 0.0118 (0.1019) 0.0089 (0.0608) 0.0078 (0.0426)
Logistic 0.4500 (0.4621) 0.0124 (0.0738) 0.0076 (0.0463)
A. Logistic 0.0072 (0.1121) 0.0034 (0.0730) 0.0088 (0.0510)
Morgenstern 0.0749 (0.0952) 0.0495 (0.0580) 0.0420 (0.0462)

Absolute bias and root mean squared error (rmse) of estimator λ^LOG.

n = 250 n = 1000 n = 2500
bias (rmse) bias (rmse) bias (rmse)
b = 0.0025
Normal (ρ=0.5) 0.1842 (0.2179) 0.1511 (0.1676) 0.1287 (0.1383)
Normal (ρ=0.85) 0.5257 (0.5363) 0.4863 (0.4919) 0.4617 (0.4655)
Student t (ρ=0.5) 0.0214 (0.1256) 0.0092 (0.0857) 0.0096 (0.0665)
Student t (ρ=0) 0.2896 (0.3111) 0.2725 (0.2835) 0.0060 (0.0608)
Logistic 0.0061 (0.1142) 0.0037 (0.0860) 0.0022 (0.0594)
A. Logistic 0.0085 (0.1198) 0.0076 (0.0830) 0.0022 (0.0630)
Morgenstern 0.0240 (0.0634) 0.0126 (0.0325) 0.0091 (0.0215)
b = 0.005
Normal (ρ=0.5) 0.1930 (0.2235) 0.1557 (0.1704) 0.1363 (0.1444)
Normal (ρ=0.85) 0.5340 (0.5441) 0.4925 (0.4977) 0.4679 (0.4713)
Student t (ρ=0.5) 0.0225 (0.1231) 0.0095 (0.0829) 0.0091 (0.0630)
Student t (ρ=0) 0.0259 (0.1152) 0.0127 (0.0758) 0.0065 (0.0560)
Logistic 0.0046 (0.1108) 0.0041 (0.0819) 0.0017 (0.0544)
A. Logistic 0.0071 (0.1160) 0.0076 (0.0795) 0.0020 (0.0587)
Morgenstern 0.0279 (0.0684) 0.0140 (0.0338) 0.0111 (0.0215)
b = 0.01
Normal (ρ=0.5) 0.1996 (0.2278) 0.1694 (0.1767) 0.1499 (0.1559)
Normal (ρ=0.85) 0.5400 (0.5497) 0.5008 (0.5052) 0.4784 (0.4809)
Student t (ρ=0.5) 0.0227 (0.1188) 0.0104 (0.0764) 0.0087 (0.0559)
Student t (ρ=0) 0.0239 (0.1121) 0.0130 (0.0689) 0.0088 (0.0492)
Logistic 0.0043 (0.1079) 0.0023 (0.0753) 0.0014 (0.0487)
A. Logistic 0.0093 (0.1108) 0.0064 (0.0745) 0.0013 (0.0529)
Morgenstern 0.0313 (0.0712) 0.0175 (0.0362) 0.0147 (0.0251)
b = 0.015
Normal (ρ=0.5) 0.2038 (0.2299) 0.1730 (0.1828) 0.1623 (0.1669)
Normal (ρ=0.85) 0.5453 (0.5543) 0.5067 (0.5104) 0.4904 (0.4923)
Student t (ρ=0.5) 0.0228 (0.1156) 0.0114 (0.0708) 0.0094 (0.0501)
Student t (ρ=0) 0.2823 (0.3015) 0.0273 (0.2800) 0.0111 (0.0445)
Logistic 0.0055 (0.1044) 0.0015 (0.0697) 0.0014 (0.0444)
A. Logistic 0.0085 (0.1073) 0.0061 (0.0700) 0.0008 (0.0489)
Morgenstern 0.0346 (0.0733) 0.0210 (0.0380) 0.0183 (0.0270)

Absolute bias and root mean squared error (rmse) of estimator η^.

n = 250 n = 1000 n = 2500
bias (rmse) bias (rmse) bias (rmse)
b = 0.0025
Normal (ρ=0.5) 0.0901 (0.1513) 0.0579 (0.1109) 0.0457 (0.0887)
Normal (ρ=0.85) 0.1581 (0.1905) 0.0915 (0.1244) 0.0691 (0.0980)
Student t (ρ=0.5) 0.1671 (0.2045) 0.0892 (0.1286) 0.0675 (0.1035)
Studentt (ρ=0) 0.2575 (0.3038) 0.1523 (0.2001) 0.1044 (0.1511)
Logistic 0.1776 (0.2132) 0.0991 (0.1383) 0.0729 (0.1062)
A. Logistic 0.1817 (0.2174) 0.1096 (0.1467) 0.0763 (0.1092)
Morgenstern 0.0217 (0.0950) 0.0166 (0.0674) 0.0159 (0.0520)
b = 0.005
Normal (ρ=0.5) 0.0852 (0.1456) 0.0582 (0.1071) 0.0456 (0.0835)
Normal (ρ=0.85) 0.1454 (0.1808) 0.0885 (0.1203) 0.0691 (0.0944)
Student t (ρ=0.5) 0.1552 (0.1923) 0.0876 (0.1256) 0.0661 (0.1003)
Student t (ρ=0) 0.2614 (0.3066) 0.1570 (0.2010) 0.1080 (0.1504)
Logistic 0.1700 (0.2062) 0.0981 (0.1358) 0.0721 (0.1027)
A. Logistic 0.1722 (0.2038) 0.1070 (0.1423) 0.0755 (0.1062)
Morgenstern 0.0219 (0.0908) 0.0172 (0.0660) 0.0168 (0.0506)
b = 0.01
Normal (ρ=0.5) 0.0847 (0.1425) 0.0584 (0.1011) 0.0462 (0.0794)
Normal (ρ=0.85) 0.1403 (0.1751) 0.0868 (0.1132) 0.0680 (0.0890)
Studentt (ρ=0.5) 0.1541 (0.1903) 0.0866 (0.1205) 0.0631 (0.0940)
Studentt (ρ=0) 0.2658 (0.3085) 0.1616 (0.2006) 0.1159 (0.1496)
Logistic 0.1677 (0.2023) 0.0977 (0.1314) 0.0720 (0.0985)
A. Logistic 0.1692 (0.2038) 0.1074 (0.1397) 0.0742 (0.1018)
Morgenstern 0.0231 (0.0887) 0.0187 (0.0630) 0.0181 (0.0484)
b = 0.015
Normal (ρ=0.5) 0.0843 (0.1389) 0.0587 (0.0987) 0.0465 (0.0751)
Normal (ρ=0.85) 0.1369 (0.1705) 0.0870 (0.1101) 0.0678 (0.0849)
Student t (ρ=0.5) 0.1529 (0.1869) 0.0867 (0.1180) 0.0592 (0.0874)
Studentt (ρ=0) 0.2704 (0.3110) 0.1693 (0.2027) 0.1254 (0.1522)
Logistic 0.1667 (0.2000) 0.0980 (0.1284) 0.0711 (0.0943)
A. Logistic 0.1671 (0.1998) 0.1081 (0.1382) 0.0718 (0.0963)
Morgenstern 0.0231 (0.0861) 0.0203 (0.0609) 0.0197 (0.0469)

Absolute bias and root mean squared error (rmse) of estimator χ¯^.

n = 250 n = 1000 n = 2500
bias (rmse) bias (rmse) bias (rmse)
b = 0.0025
Normal (ρ=0.5) 0.0543 (0.2429) 0.0202 (0.2869) 0.0446 (0.3047)
Normal (ρ=0.85) 0.1001 (0.2068) 0.1273 (0.2118) 0.1277 (0.2082)
Student t (ρ=0.5) 0.3361 (0.3582) 0.3191 (0.3273) 0.3005 (0.3046)
Student t (ρ=0) 0.5158 (0.5353) 0.5552 (0.5649) 0.5394 (0.5443)
Logistic 0.3180 (0.3391) 0.3042 (0.3130) 0.2952 (0.2992)
A. Logistic 0.3235 (0.3416) 0.3107 (0.3183) 0.2934 (0.2975)
Morgenstern 0.5033 (0.1346) 0.3719 (0.1690) 0.3120 (0.2101)
b = 0.005
Normal (ρ=0.5) 0.0254 (0.2648) 0.0333 (0.2984) 0.0539 (0.3125)
Normal (ρ=0.85) 0.1134 (0.2157) 0.1310 (0.2147) 0.1325 (0.2122)
Student t (ρ=0.5) 0.3604 (0.3812) 0.3269 (0.3344) 0.3112 (0.3147)
Student t (ρ=0) 0.5543 (0.5725) 0.5689 (0.5777) 0.5523 (0.5566)
Logistic 0.3345 (0.3530) 0.3122 (0.3203) 0.3078 (0.3112)
A. Logistic 0.3395 (0.3560) 0.3188 (0.3257) 0.3063 (0.3098)
Morgenstern 0.4635 (0.1448) 0.3482 (0.1883) 0.3159 (0.3220)
b = 0.01
Normal (ρ=0.5) 0.0084 (0.2778) 0.0491 (0.3115) 0.0645 (0.3211)
Normal (ρ=0.85) 0.1224 (0.2216) 0.1370 (0.2191) 0.1401 (0.2186)
Studentt (ρ=0.5) 0.3743 (0.3932) 0.3427 (0.3488) 0.3304 (0.3332)
Studentt (ρ=0) 0.5787 (0.5957) 0.5913 (0.5986) 0.5739 (0.5772)
Logistic 0.3458 (0.3627) 0.3299 (0.3365) 0.3271 (0.3297)
A. Logistic 0.3499 (0.3652) 0.3362 (0.3419) 0.3257 (0.3285)
Morgenstern 0.4384 (0.1543) 0.3141 (0.2156) 0.2555 (0.2594)
b = 0.015
Normal (ρ=0.5) 0.0031 (0.2862) 0.0579 (0.3185) 0.0733 (0.3285)
Normal (ρ=0.85) 0.1272 (0.2244) 0.1419 (0.2229) 0.1460 (0.2237)
Studentt (ρ=0.5) 0.3847 (0.4025) 0.3562 (0.3613) 0.3481 (0.3503)
Student t (ρ=0) 0.5996 (0.6124) 0.6078 (0.6140) 0.5957 (0.5982)
Logistic 0.3558 (0.3715) 0.3451 (0.3505) 0.3433 (0.3453)
A. Logistic 0.3589 (0.3730) 0.3508 (0.3557) 0.3419 (0.3442)
Morgenstern 0.4209 (0.1616) 0.2949 (0.2300) 0.2399 (0.2721)

Absolute bias of estimator λ^LOG (a) and estimator λ^SEC (b). The four values plotted in each line correspond to b=0.0025,0.005,0.01,0.015, respectively.

Root mean squared error (rmse) of estimator λ^LOG (a) and estimator λ^SEC (b). The four values plotted in each line correspond to b=0.0025,0.005,0.01,0.015, respectively.

Absolute bias of estimator η^ (a) and estimator χ¯^ (b). The four values plotted in each line correspond to b=0.0025,0.005,0.01,0.015, respectively.

Root mean squared error (rmse) of estimator η^ (a) and estimator χ¯^ (b). The four values plotted in each line correspond to b=0.0025,0.005,0.01,0.015, respectively.

Observe in Figures 2 and 3 that estimators λ^LOG and λ^SEC behave quite similarly, although the former seems slightly better. The largest bias occurring for the smallest sample size is around 0.1 but for the largest one is close to zero, which indicates a good performance. The exception relates to the Normal model, in particular the boundary case of ρ=0.85. In the Normal model with ρ=0.5, the largest bias is about 0.2. For small samples it is preferable to choose bandwidths with b=0.005 or b=0.01. In all the other simulation results presented here, there are no significant differences between the considered bandwidths.

In what concerns estimators η^ and χ¯^, the first one is clearly better (Figures 4 and 5). It is also robust within the boundary cases of Student t (ρ=0, ν=2) and Normal (ρ=0.85), for large sample sizes. Observe that the bias and the root mean squared error results are very close to the ones obtained in Draisma et al. , where k was chosen in a range where the overall performance seems best through an intensive simulation study. Estimator χ¯^ only slightly outperforms η^ in the Normal model for n=250. The proportion of samples in which tail dependence (η=1) is rejected at a 5% significance level is plotted in Figure 6. The heuristic procedure has an overall good performance in both tests for large sample sizes. We can see that, under tail independence, test 2 outperforms test 1 as expected (see Section 2), whereas in the tail dependent case, test 1 is slightly better. However, they do not seem to be robust given the results within the above mentioned boundary cases, particularly in the Normal case.

Proportion of samples in which η=1 is rejected by a 5% test for test 1 (a) and for test 2 (b). The horizontal solid lines correspond to 95% confidence (black) and 5% significance (grey). The four values plotted in each line correspond to b=0.0025,0.005,0.01,0.015, respectively.

5. An Application: Dependence of Large Losses within Stock Markets

We consider five years of negative daily log-returns (from 1996 to 2000) of Intel (INTC), Microsoft (MSFT) and General Electric (GE) stocks, which amounts to a sample size n=1262. These data were analyzed in McNeil et al. [19, Chapter 5]. We aim to quantify the degree of a contagious risk of large losses within (INTC, MSFT), (INTC, GE), and (MSFT, GE), that is, to investigate if the pairs (INTC, MSFT), (INTC, GE), and (MSFT, GE) present tail dependence or independence and quantify the respective degree of extremal dependence. As a preliminary step, we analyze the scatter plots in Figure 7. Observe that the largest values for one variable correspond to moderately large values of the same sign for the other variable, insinuating that the variables are asymptotically independent but not perfectly. In Table 5 are the estimates of η^, σ^1, σ^2, χ¯^, λ^SEC and λ^LOG. The results correspond to b=0.005, which are very close to the ones obtained with the other bandwidths (b=0.0025,0.01,0.015) and thus omitted. Both tests reject dependency in (INTC, MSFT) and (INTC, GE). Observe the small values provided by the TDC estimators. In the case (MSFT, GE), test 2 rejects the dependence condition and test 1 does not reject it for very little. The values of λ^SEC and λ^LOG are also small indicating that tail independence may be a more plausible conclusion. Therefore, we find that the contagious risk of large losses is residual, particularly, in the case (INTC, GE).

Estimates of η^,σ^1,σ^2,χ¯^,λ^SEC and λ^LOG, for (INTC, MSFT), (INTC, GE), and (MSFT, GE), with b=0.005.

η ^ σ ^ 1 σ ^ 2 χ ¯ ^ λ ^ SEC λ ^ LOG
(INTC, MSFT) 0.7321 0.0224 0.0149 0.5741 0.2629 0.2489
(INTC, GE) 0.5549 0.0065 0.0042 0.3040 0.0551 0.0372
(MSFT, GE) 0.7300 0.0321 0.0241 0.3808 0.1762 0.1613

Left-to-right: scatter plots of (INTC, MSFT), (INTC, GE), and (MSFT, GE), respectively.

6. Final Remarks

In this paper we address the tail dependence inference problem since it is important to distinguish the type of tail dependence in order to correctly evaluate the risk of simultaneous extreme events. Most of the nonparametric estimators have to deal with the choice of the number k of order statistics to be considered in the production of an estimate. This is not an easy task since it requires a trade-off between variance and bias (small values of k cause large variance and large values of k increase the bias). An optimal choice of k that leads to the smallest mean squared error is difficult to derive and, in practice, this is frequently solved through intensive simulation studies (see, e.g., Draisma et al. ). This is also a very common problem in the estimation of the tail index, a parameter of major importance within extreme value theory (see, e.g., Beirlant et al.  and references therein). Since the nonparametric estimators yield a characteristic plateau while plotting the estimates for successive k, Frahm et al.  introduced a simple plateau-finding algorithm after smoothing the latter plot by some box kernel in order to find the optimal threshold k. Here we have applied this heuristic procedure to estimators of the TDC in (1), as well as estimators of the tail independence such as the Ledford and Tawn coefficient η in (4) and coefficient χ¯ in (11), for several box kernel bandwidths. We have also analyzed this methodology in two tests for tail independence given in (23) and (29). We conclude that the procedure has an overall good performance, especially for large samples. Some care must be given to the tests as they might not be robust, in particular for boundary cases within the normal model. We call the attention for the very good performance of η estimation. We recall that it is based on a tail index estimator (Hill estimator) which may be an indication that this procedure can also work well within the tail index estimation. Since this very simple heuristic procedure revealed some potential, we intend to develop it further and compare with other heuristic methods like, for instance, the graphical method in de Sousa and Michailidis  and bootstrap methods (see, e.g., Peng and Qi  2008 and Gomes and Oliveira  2001 and references therein). This will be addressed in a future work.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors are very grateful to the referees for their significant suggestions and corrections. Marta Ferreira was financed by FEDER Funds through “Programa Operacional Factores de Competitividade, COMPETE,” and by Portuguese Funds through FCT, “Fundação para a Ciência e a Tecnologia,” within the Project PEst-OE/MAT/UI0013/2014.

Embrechts P. McNeil A. J. Straumann D. Dempster M. A. H. Correlation and dependence in risk management: properties and pitfalls Risk Management: Value at Risk and Beyond 2002 Cambridge, UK Cambridge University Press 176 223 10.1017/CBO9780511615337.008 MR1892190 Sibuya M. Bivariate extreme statistics, I Annals of the Institute of Statistical Mathematics 1960 11 195 210 10.1007/BF01682329 MR0115241 2-s2.0-34347189101 Sklar A. Fonctions de répartition à n dimensions et leurs marges Publications de l'Institut de statistique de l'Université de Paris 1959 8 229 231 Ledford A. W. Tawn J. A. Statistics for near independence in multivariate extreme values Biometrika 1996 83 1 169 187 10.1093/biomet/83.1.169 MR1399163 ZBL0865.62040 2-s2.0-33746446858 Ledford A. W. Tawn J. A. Modelling dependence within joint tail regions Journal of the Royal Statistical Society B 1997 59 2 475 499 10.1111/1467-9868.00080 MR1440592 2-s2.0-0001484870 Draisma G. Drees H. Ferreira A. de Haan L. Bivariate tail estimation: dependence in asymptotic independence Bernoulli 2004 10 2 251 280 10.3150/bj/1082380219 MR2046774 2-s2.0-3543038081 Coles S. Heffernan J. Tawn J. Dependence measures for extreme value analysis Extremes 1999 2 339 366 Huang X. Statistics of bivariate extreme values [Ph.D. thesis] 1992 Rotterdam, The Netherlands Tinbergen Institute Research, Erasmus University Joe H. Multivariate Models and Dependence Concepts 1997 73 London, UK Chapman and Hall Monographs on Statistics and Applied Probability 10.1201/b13150 MR1462613 Frahm G. Junker M. Schmidt R. Estimating the tail-dependence coefficient: properties and pitfalls Insurance: Mathematics & Economics 2005 37 1 80 100 10.1016/j.insmatheco.2005.05.008 MR2156598 2-s2.0-23444460051 Schmidt R. Stadtmüller U. Non-parametric estimation of tail dependence Scandinavian Journal of Statistics 2006 33 2 307 335 10.1111/j.1467-9469.2005.00483.x MR2279645 2-s2.0-33745318081 Peng L. Estimation of the coefficient of tail dependence in bivariate extremes Statistics & Probability Letters 1999 43 4 399 409 10.1016/S0167-7152(98)00280-6 MR1707950 2-s2.0-0033565125 Poon S.-H. Rockinger M. Tawn J. Extreme value dependence in financial markets: diagnostics, models, and financial implications Review of Financial Studies 2004 17 2 581 610 10.1093/rfs/hhg058 2-s2.0-3543039316 Beirlant J. Caeiro F. Gomes M. I. An overview and open research topics in statistics of univariate extremes RevStat 2012 10 1 1 31 MR2912369 2-s2.0-84859451550 Joe H. Smith R. L. Weissman I. Bivariate threshold methods for extremes Journal of the Royal Statistical Society B: Methodological 1992 54 1 171 183 MR1157718 Beirlant J. Goegebeur Y. Segers J. Teugels J. Statistics of Extremes: Theory and Application 2004 New York, NY, USA John Wiley & Sons 10.1002/0470012382 MR2108013 Hill B. M. A simple general approach to inference about the tail of a distribution Annals of Statistics 1975 3 5 1163 1174 10.1214/aos/1176343247 MR0378204 Ferreira M. Nonparametric estimation of the tail-dependence coefficient Revstat Statistical Journal 2013 11 1 1 16 McNeil A. J. Frey R. Embrechts P. Quantitative Risk Management 2005 Princeton, NJ, USA Princeton University Press Princeton Series in Finance MR2175089 de Sousa B. Michailidis G. A diagnostic plot for estimating the tail index of a distribution Journal of Computational and Graphical Statistics 2004 13 4 974 995 10.1198/106186004X12335 MR2109061 2-s2.0-11244326956 Peng L. Qi Y. Bootstrap approximation of tail dependence function Journal of Multivariate Analysis 2008 99 8 1807 1824 10.1016/j.jmva.2008.01.018 MR2444820 ZBL1284.62301 2-s2.0-48549084728 Gomes M. I. Oliveira O. The bootstrap methodology in statistics of extremes—choice of the optimal sample fraction Extremes 2001 4 4 331 358 10.1023/A:1016592028871 MR1924234