Detection of Heterogeneous Structures on the Gaussian Copula Model Using Projective Power Entropy

We discuss a parameter estimation problem for a Gaussian copula model under misspecification. Conventional estimators such as the maximum likelihood estimator (MLE) do not work well if the model is misspecified. We propose the estimator that minimizes the projective power entropy. We call it the -estimator, where denotes the power index. A feasible form of the projective power entropy is given that suites the Gaussian copula model. It is shown that the -estimator is robust against outliers. In addition the -estimator can appropriately detect a heterogeneous structure of the underlying distribution, even if the underlying distribution consists of some different copula components while a single Gaussian copula is used as a statistical model. We explore such an ability of the -estimator to detect the local structures in the comparison with the MLE. We also propose a fixed point algorithm to obtain the -estimator. The usefulness of the proposed methodology is demonstrated in numerical experiments.


Introduction
Applications of copula models have been increasing in number in recent years. There are a variety of applications on finance, risk management [1] and multivariate time series analysis [2]. With copula models, the specification of the marginal distributions is parameterized separately from the dependence structure of the joint distribution. Hence, it gives a convenient way of the construction of flexible and more general multivariate distributions. As far as we know, there exist only a few works that are tackled with the identification and the statistical estimation of the mixture of copula models and most of them rely on MCMC algorithm. In this paper we focus on a misspecified Gaussian copula model. In other words, a sample follows a distribution mixed with different sources but a statistical model we fit is just a single Gaussian copula. It is very hard to construct multivariate copulas for three or more random variables [3], while the Gaussian is an exception. So we start with the Gaussian copula model, but later in Section 5 we will show that our method is closely related to -copula. As an example of misspecification, we consider that the underlying distribution is where is a mixing proportion and G (u; ) denotes the probability density function of the Gaussian copula with the correlation matrix parameter . We see that the MLE for almost surely converges to 1 + (1 − ) 2 under the assumption (1), which means that the MLE fails to detect the structure of the underlying distribution.
We make use of the -estimator [4,5] that can be obtained via minimization of the projective power entropy. Here denotes the power index, and if → 0, the -estimator reduces to the MLE. So the -estimator can be regarded as an extension of the MLE. In [5], the robustness of the -estimator was investigated in a general setting of parametric model. In [6], the minimum density power divergence estimator was proposed, which also uses power of density, for the covariance matrix of multivariate time series, and the robustness was shown. Our research shows that even if a single Gaussian copula model is incorrectly fitted to the data from the mixture distribution (1), the -estimator can detect both 1 and 2 separately if 1 and 2 are "distinct" enough and is close to 0.5.
The -estimation for the Gaussian copula model relies on the projective power cross entropy between the underlying distribution and the Gaussian copula model G (u; ). The projective power cross entropy, which is a function of , has only one local minimum or some local minima depending on the underlying distribution. We show that if 1 and 2 are "distinct" enough and is near 0.5, then the projective power cross entropy between the underlying mixture distribution (1) and the Gaussian copula G (u, ) has two local minimizers near 1 and 2 , respectively, so we propose to use these local minimizers to detect 1 and 2 .
This paper is organized as follows. The -estimator and the MLE for the Gaussian copula model and a fixed point algorithm to obtain the -estimator and the MLE are given in Section 2. Section 3 states the relationship between the projective power entropy and the -estimator. We introduce an appropriate measure for the Gaussian copula model since the projective power entropy is defined with respect to some carrier measure. Section 4 reveals the property of the -estimator to detect heterogeneous structures. Section 5 elucidates the relationship between maximum entropy distributions and the -estimation. The robustness of theestimator is discussed based on its influence function in Section 6. A simulation study is given in Section 7, and discussions are given in the last section. The proofs for all the theoretical results are provided in the appendix.

Estimation of the Gaussian Copula Model
In Section 2.1, the -estimator for the Gaussian copula model is discussed, and in the followed subsection the MLE for the Gaussian copula model is given. The last subsection lays out a fixed point algorithm to obtain the -estimator and the MLE.

The -Estimator for the Gaussian Copula Model.
The density function of the Gaussian copula is given by where x G (u) = (Φ −1 ( 1 ), . . . , Φ −1 ( )) ⊤ , Φ( ) denotes the cumulative distribution function of the standard normal distribution, is a correlation matrix, and is the identity matrix of size . Let V( ) be the ( − 1)/2-dimensional vector which consists of the column-wise stacked lower diagonal elements of . For example, is a parameter space of the Gaussian copula model. Let u (1) , . . . , u ( ) be a random sample from a copula with the probability density function (u) while G (u; ) is our statistical model. The loss function associated with the projective power entropy introduced in Section 3.1 is given by up to constant, where x ( ) = x G (u ( ) ) for = 1, . . . , . Theestimator is proposed as the set of local minimizers of ( ) and interpreted as follows. If ( ) has a local minimum, the underlying distribution is estimated by G (u;̂) using the minimizer̂. If ( ) has ℓ local minima (ℓ ≥ 2), the underlying distribution is estimated by a mixture of ℓ Gaussian copulas. Each Gaussian copula's parameter is estimated by the corresponding local minimizer.

The MLE for the Gaussian Copula Model.
We consider the MLE for the Gaussian copula model on the same setting as in Section 2.1. The log-likelihood multiplied by −1/ is given by It is easy to see that 0 ( ) and ( ) satisfy up to constant, so the MLE will be deemed to be the 0-estimator in terms of the -estimator. Generally theestimator can be regarded as an extension of the MLE. It is well known that the MLE does not work well under model misspecification. For example, in the case of (1) the MLE for the Gaussian copula model almost surely converges to 1 + (1 − ) 2 , but we cannot detect neither 1 nor 2 . If = 0.5 and 1 = ( 1 0.9 0.9 1 ) , 2 = ( 1 −0.9 −0.9 1 ) , then 1 + (1 − ) 2 is equal to the identity matrix, which has no meaning in this situation. We cannot use the MLE in the case of misspecification.

A Fixed Point
Algorithm to Obtain the -Estimator for the Gaussian Copula Model. We give a fixed point algorithm to obtain the -estimator for the Gaussian copula model using the Lagrange-multiplier method. The appendix provides the details of the derivation of the algorithm. We can still make use of this algorithm to obtain the MLE just by setting = 0.
(2) Given , calculate +1 by the following update formula: where ⊙ denotes the Hadamard product. Σ is defined by where Here Diag ( ) for a square matrix denotes the column vector which consists of the diagonal elements of and diag( ) for a vector denotes the diagonal matrix whose diagonal elements are the components of .
(3) For sufficient small given number , repeat procedure 2 while where ‖ ‖ for a square matrix denotes the matrix norm defined by √tr( ⊤ ).
If we consider the estimation problem on Gaussian distributions with mean 0, the update formula for an iteration algorithm to obtain the -estimator of the covariance matrix Σ is given by See [5] for details. If we consider the optimization problem with the objective function ( ) without the constraint that the diagonal elements of are 1, the same iteration algorithm (12) can be deduced. So the second term of the right hand side of (8) appears because of the existence of the constraint. We make a remark on the algorithm to obtain the MLE, or -estimator with = 0. We find rather complicated solution of the MLE if we consider a simpler case of = 2. In [1], an approximate MLE for the Gaussian copula model is shown because it takes quite a while to solve the constrained optimization problem in order to obtain the MLE in high dimensions. The approximate MLE is given by where diag( ) is the diagonal matrix whose diagonal elements are equal to those of . We can easily consider an iteration algorithm to obtain an approximate -estimator to combine (12) and (13). The update formula of the algorithm is given by * where If is infinity, and * converge to the same correlation matrix when tends to ∞. However and * are different in general. is preferred to * in terms of accuracy.

Projective Power Entropy and -Estimator
In Section 3.1 the projective power entropy and theestimator are given. In the next subsection we discuss an appropriate measure for the Gaussian copula model.

Projective Power Entropy and the -Estimator.
The projective power entropy of (x) with the index and the measure is defined as If is the Lebesgue measure denoted by L and (x) is a probability density function, then we have which is equivalent to the Boltzmann-Shannon entropy. The projective power cross entropy between (x) and (x) with the index and the measure is defined as The projective power divergence is given by ( , | ) satisfies ( , | ) ≥ 0, and ( , | ) = 0 if and only if (x) = (x), so ( , | ) can be seen as a kind of distance between and .
Let x (1) , . . ., x ( ) be a random sample from a probability density function (x) and (x; ) a statistical model. Since we want to find the closest distribution to (x) in the model (x; ), we want to find the minimizer of ( , (⋅; ) | ), which is equal to the minimizer of ( , (⋅; ) | ). If has the Radon-Nikodym derivative , then ( , (⋅; ) | ) is equal to where Empirically the projective power cross entropy ( , (⋅; ) | ) can be estimated by which is called the loss function. Note that The original -estimator̂is defined by the minimizer of the loss function ( | ): Note that we are not necessarily seeking to the global minimizer. Rather, we allow the loss function to be multimodal, so we refer the -estimator to the set of the local minimizers. See [4,5] for details of the -estimator.

Choice of the Carrier Measure.
In calculating theestimator, the measure can be determined by each user.
Here we propose, for Gaussian copula models, the use of a measure, denoted by G , of which Radon-Nikodym derivative is given by ( . From now on we refer this choice to G and explain its rationale by virtue of invariance. We assume that x = ( 1 , . . . , ) ⊤ ∼ (x; ), where (x; ) denotes the probability density function of thedimensional Gaussian distribution with mean 0 and cor- . It is noteworthy that the projective power cross entropy between (x) and (x; ) based on x is not always equal to the projective power cross entropy between (u) and G (u; ) based on u. So the -estimator based on x does not coincide with the -estimator based on u.
It is natural for us to require the equivalence of the two -estimators, and therefore we employ the measure G (u). It is striking that the projective power cross entropy between (u) and G (u; ) calculated under the measure G is equal to the projective power cross entropy between (x) and (x; ) calculated under the Lebesgue measure L , which is proportional to Obviously there is equalization of the two -estimators. Note that the loss function associated with cross entropy (26) becomes (4). The argument above extends to a general statement. For given one to one transformation y(x) : x → y, x(y) denotes the inverse function of y(x) and (y → x) denotes the Jacobian of the transformation x(y). Any nonnegative functions (x), (x) satisfy if and only if the Radon-Nikodym derivative of is equal to (y → x) − . When (x) and (x) are the probability density functions, to consider the projective power cross entropy on x under the Lebesgue measure is equal to consider the projective power cross entropy on y under the measure having (y → x) − as its Radon-Nikodym derivative.

Property of the -Estimator
The -estimator for the Gaussian copula model under infinite sample size is equal to the set of the local minimizers of ( , G (⋅; ) | G ). In this section we leave aside the empirical loss function ( ) for the moment and investigate the property of the -estimator (at infinity) through ( , G (⋅; ) | G ). First we consider the case where there is no misspecification. In this case we note that the -estimator is equal to { 0 }, which implies Fisher consistency. For asymptotic properties the -estimator has asymptotic consistency and normality.
Next we consider the misspecification case where the true data generating process is given by (1). We see that which is proportional to ( , G (⋅; ) | G ) is a weighted mean of the two projective power cross entropy. Each component is a unimodal function, bounded above by 0 and has one local minimum at 1 and 2 , respectively. So we expect that ( , G (⋅; ) | G ) has two local minima and these local minimizers are near 1 and 2 , respectively, if 1 and 2 are sufficiently "distinct. " However it is hard to formulate such a phenomenon mathematically so we show through easy examples and a graph that such a phenomenon occurs. To obtain numerical solutions, we use the expected (or population) version of the algorithm in Section 2.3.
Like these examples, ( , G (⋅; ) | G ) has some local minima depending on the underlying distribution. Owing to this property we can detect the heterogeneous structures of the underlying distribution under misspecification.

Maximum Entropy Distribution
So far we have considered the -estimation of the Gaussian copula model. In this section we uncover that the choice of copula model can be characterized in terms of the maximum entropy distribution. In this regard, the most closely related work is in [2], in which the MLE on meta--distribution is addressed. A -copula is deduced from a multivariatedistribution while the meta--distribution is constructed by linking a -copula to univariate -distributions as its marginal distributions. In our framework, [2]'s work can be interpreted as the maximum likelihood estimation of -copulas with the marginals estimated simultaneously. Actually theestimation of Gaussian copulas and the maximum likelihood estimation of -copulas look very similar and share a common idea.
In [4], it is analyzed what the maximum projective power entropy distributions would be under the given (population) mean vector and covariance matrix. The answer depends on the power index . When = 0, the Gaussian distribution emerges as the maximum projective power entropy distribution. If < 0, the -distribution comes up. We show that a similar result holds for copulas. Let ] be the cumulative distribution function of the -distribution with degrees of freedom ], and x ,] (u) = ( −1 ] ( 1 ), . . ., −1 ] ( )) ⊤ . We suppose that = −2/(] + ), ,] ( u) = (x ,] ) − u. Let C ( ) be the set of probability density functions (u) on [0, 1] which satisfy the following equation: Let (x; ], ) denote the probability density function ofdistribution with degrees of freedom ] and correlation matrix , and let (u; ], ) denote the probability density function of its copula ( -copula): If → 0, then ] → Φ and ,] → L . So we see that That is, -copula can be characterized as the maximum projective power entropy distribution on [0, 1] . Moreover it has limiting equivalence (by letting → 0) with the Gaussian copula which is tagged with the maximum Boltzmann-Shannon entropy distribution. We call these maximum projective power entropy copulas the -copulas. Let us consider the relationship between the -copula and the -estimation. Our method is discussed on the pair of the Gaussian copula (0-copula) and -estimator. On the other hand [2] discussed on the pair of -copula model ( < 0) and the MLE (0-estimator). We see a sort of duality relationship between two choices of the pair.

Robustness
In this section we examine robustness of the -estimator for the Gaussian copula model through its influence function.

ISRN Probability and Statistics
The influence function measures the asymptotic bias caused by contamination at the x. The boundedness of the influence function means boundedness of the influence from the outlier, hence its robustness. The influence function of theestimator is given in Section 6.1. We show that it is bounded when > 0. In the next subsection, a brief simulation is performed.
6.1. Influence Function. The -estimator for the Gaussian copula model can be regarded as a functional ( ) of a distribution defined by Let (x; ) be Then the influence function IF(x; , ) of the -estimator is given by wherė(x; ) = ( / V( )) (x; ). See [7] for details. The boundedness of the influence function is equivalent to the boundedness of (x; ). The following theorem gives a bound of (x; ).
Theorem 5. When = 0, that is, for the MLE, the influence function is not bounded. When < 0, the influence function is not bounded. When > 0, the influence function is bounded and a bound is given by where ⊗ denotes the Kronecker product and ‖ℎ‖ for andimensional vector ℎ denotes the Euclidean norm defined by √ ℎ ⊤ ℎ.

Simulation. This subsection describes the results of
Monte Carlo simulations carried out in order to examine the robustness of the -estimator for the Gaussian copula model. We generate 500 pseudorandom samples of size 500 from distribution 0.9 G (u, ) + 0.1 G (u, 10 ) , where G (u, 10 ) is equal to the independent copula and is given by ) .
For each sample, we calculate the -estimator̂G E for the Gaussian copula model with = 0.5 and the MLÊM LE for the Gaussian copula model. We use the norm ‖̂− ‖ as the accuracy measure. Table 1 shows the root mean squared error (RMSE) of the norm for the -estimator and MLE. We can see that the norm for the -estimator is less than that for the MLE, so we see that the -estimator is more robust than the MLE.

Simulation Study
The property of the -estimator to detect heterogeneous structures is investigated by a bunch of simulations. A comparison of the -estimator with the MLE for a mixture Gaussian copula (1) is also discussed.

Simulation Setup.
We conducted two kinds of simulation.

1
) . (48) The -estimator for the Gaussian copula model with = 0.7 is investigated. Initial values of which is used in calculating the -estimator are (±0.1), (±0.3), . . . , (±0.9), where ( ) is the correlation matrix whose ( , ) component ( < ) is equal to − . If the -estimator has two components 1 and 2 such that then 1 is thought of as an estimator of 1 and denoted bŷ 1,GE . Similarly 2 for 2 and denoted bŷ2 ,GE . We adopt the MLE for a mixture Gaussian copula model (1). Although 1 and 2 are the correlation matrices, we tentatively view them to be the covariance matrices and use EM-Algorithm to obtain an approximate MLE. The obtained estimatorŝ1 and̂2 are not necessarily the correlation matrices, so they are transformed into the correlation matrices by diag (̂) which is denoted bŷ, MLE for = 1,2. The initial value of ( , 1 , 2 ) which is used in calculating the MLE is set to (0.5, (0.5), (−0.5)).
A set of data of size ( = 500, 1000) was generated from (1), and the norm of̂1 ,GE − 1 ,̂2 ,GE − 2 ,̂1 ,MLE − 1 , and 2,MLE − 2 were calculated. 500 simulations were carried out, and then, we calculated the RMSE of the norm based on 500 norm values obtained by simulation. The results are shown in Table 3.

Simulation 2. Suppose that the underlying distribution is
where 1 = 2 = 0.45 and 1 , 2 are the same in Simulation 1. The other settings are the same as in Simulation 1. The results are shown in Table 5.

Result
Result of Simulation 1. Table 2 shows the ratio for theestimator to detect two correlation matrices. For = 500 nearly 80 percent was successful, and for = 1000 it worked out almost perfectly. From Table 3, the MLE had   better performance than the -estimator. However this is natural because the MLE is used under no misspecification.
Result of Simulation 2. Table 4 shows the ratio for theestimator to detect two correlation matrices. Compared to the result of Simulation 1 the detection rate at = 500 gets worse while at = 1000 the result is almost alike in Table 2. From Table 5, we find the MLE is considerably underperforming and the -estimator is much better.

Discussion
We have considered an estimation problem for misspecified Gaussian copula model. By the simulation study our methodology has been found to work well for misspecification. Though we did not consider how to determine the value of , this problem was considered in [9] for independent component analysis. It could be possible to follow their method in our problem, but it is currently a future problem. We choose the measure in terms of invariance. However the -estimator obtained is equal to the estimator with normal distribution as a statistical model, so it seems natural. If we use Lebesgue measure in calculating the -estimator for Gaussian copula model, we cannot calculate the projective power entropy for all the value of and .
Another issue is to what extent the methodology here works for time series data. Because the basic premise of this paper is that we have data as quantiles, our method would fit, for example, the modeling of unconditional loss distribution [1, page 28]. Such a case is of particular interest when the time horizon over which we measure our losses is relatively large. When we are working on the conditional 8 ISRN Probability and Statistics modeling, our method should be regarded as a tool for the post analysis. As a typical case, we may want to apply our mixture copula approach to multivariate log-return series which are appropriately standardized and declustered by the multivariate GARCH model fitted to them. See [2] for more details.

A. Derivation for the Algorithm
We derive the estimation equation for . Since is symmetric and positive definite, there exists a matrix of size which satisfies = ⊤ . The th diagonal element of is expressed by ⊤ ⊤ , where is the -dimensional column vector whose th element is 1 and the other elements are 0. Since the diagonal elements of are equal to 1, Lagrange function becomes where = ( 1 , . . . , ) ⊤ is Lagrange multiplier. We differentiate (A.1) with respect to −1 with the technique in [10]. The differential of ∑ =1 ⊤ ⊤ , which is defined in [10, Sections 5.3 and 5.16], is where diag( ) is the diagonal matrix whose diagonal elements are 1 , . . . , . From Table 2 in [10, Chapter 9] we have Set the derivative of (A.1) to ; then we have (A.4) Multiply from the left side of (A.4); then (A.4) becomes From the constraint about the diagonal elements of , we have In general, for any square matrices and of size anddimensional column vector x, we have