One of the most computationally convenient nonredundant ways to describe the dependence between two variables is by describing the corresponding copula. In many applications, a special class of copulas—known as FGM copulas—turned out to be most successful in describing the dependence between quantities. The main result of this paper is that these copulas are the fastest to compute, and this explains their empirical success. As an auxiliary result, we also show that a similar explanation can be given in terms of fuzzy logic.
Chiang Mai UniversityNational Natural Science Foundation of ChinaHRD-0734825HRD-1242122DUE-0926721Prudential Foundation1. Introduction
What Is a Copula? A Brief Reminder. In many practical situations, we know the distribution of each of the two random variables X and Y, and we now need to also describe their joint distribution.
The distribution of each of the random variables can be described by the corresponding cumulative distribution functions FX(x)=defProb(X≤x) and FY(y)=defProb(Y≤y).
Similarly, to describe their joint distribution, we can use corresponding 2D cumulative distribution function (cdf) (1)FXYx,y=defProbX≤x∧Y≤y.
In principle, we can thus try to determine the values FXY(x,y) corresponding to all possible pairs (x,y). However, from the practical viewpoint, this is redundant; indeed
the 2D cdf FXY(x,y) also contains information about the 1D cdfs FX(x) and FY(y), as FX(x)=FXY(x,+∞) and FY(y)=FXY(+∞,y),
so if we determine all the values FXY(x,y), we will also be determining the values FX(x) and FY(y), but
we consider the cases when the 1D cdf values are already known, so soliciting them again is unnecessary.
It is therefore desirable to describe the dependence between X and Y in a nonredundant way, so that
from this description, we will not be able to extract the known 1D cdfs, but
from this information and from the 1D cdfs, we will be able to extract the 2D cdf.
Such a nonredundant description is indeed known, it is a copula C(u,v), a function from [0,1]×[0,1] to [0,1], for which, for all real numbers x and y, we have (2)FXYx,y=CFXx,FYy;see, for example, [1–5].
Properties of Copulas. Not every function C(u,v) is a copula for an appropriate 2D distribution. For a function to be a copula, it has to satisfy some properties. In this paper, we will use the following properties, which can be easily derived from the definition of the copula: (3)C0,v=Cu,0=0;C1,v=v;Cu,1=u.
FGM Copulas and Their Success. There exist many different copulas. Interestingly, in many practical applications, the following Farlie–Gumbel–Morgenstern (FGM) copula turns out to be very successful: (4)Cu,v=u·v+θ·u·1-u·v·1-vfor θ∈[-1,1]. The original papers are [6–8]; see, for example, [9, 10] and references therein for latest results.
Why? To the best of our knowledge, until now, there was no convincing explanation of why FGM copulas are so empirically successful. In this paper, we provide such an explanation.
2. Materials and Methods2.1. Explanation Based on Computational Complexity: Main Result
Statistical Data Processing Is Computing. Statistical data processing involves a large amount of computing. With the ever increasing amount of data, processing all this data requires more and more computation time—often to the extent that we exceed the capabilities of our computers.
From this viewpoint, it is desirable to select techniques which are as computationally efficient as possible. With respect to copulas, this means that we should select copulas C(u,v) whose values are the easiest (thus, the fastest) to compute.
Which Functions Are the Fastest to Compute? In the computers, the only exactly hardware supported operations are addition, subtraction, and multiplication. Everything else—from division to special functions such as exp(x), sin(x), and so on—is approximated by a sequence of elementary hardware supported operations. The more accuracy we need, the more elementary operations we need, and, thus, the longer the corresponding computations will be.
Therefore, the fastest-to-compute functions are functions that can be exactly represented as a sequence of elementary operations: in this case, the number of elementary operations remains the same no matter what accuracy we desire in our computations. In other words, we are looking for functions which can be obtained from constants and original quantities x1,…,xn by applying addition, subtraction, and multiplication. One can easily see that such functions are polynomials; indeed
every polynomial is a sum of monomials, and each monomial is a product of a constant and variables, so each polynomial is indeed a superposition of additions and multiplications;
and vice versa, each constant and each variable are polynomials, and the sum, the difference, and the product of two polynomials are also polynomials; thus, by induction, we can prove that every superposition of addition, subtraction, and multiplication is a polynomial.
Not all polynomials are equally easy or equally difficult to compute. Out of the three elementary operations, the most time-consuming operation is multiplication. Thus, the fewer the multiplications are, the faster the computation of the corresponding function is.
With one multiplication—performed in parallel—we can compute linear functions a0+∑i=1nci·xi and also products xi·xj of two variables.
By applying second multiplication to the results of the first one, we can thus compute 3rd degree polynomials—or products of 4 variables and so on.
In general, the higher the degree is, the more the time is needed to compute the corresponding polynomial.
Resulting Idea. From the viewpoint of selecting fastest-to-compute copulas, we should select polynomial copulas and, among them, copulas of the smallest possible degree.
Let us describe the results of such a selection.
Proposition 1.
Every polynomial copula has the form (5)Cu,v=u·v+θu,v·u·1-u·v·1-v,for some polynomial θ(u,v).
Comments
For reader’s convenience, the proof is placed in the special proof section.
As a consequence of this proposition, we get the following results.
Corollary 2.
The only polynomial copula of 3rd degree is C(u,v)=u·v.
Comment. This copula is actually of 2nd degree; it corresponds to the case of two independent variables. Thus, to describe dependence, we need to consider polynomials of higher degree.
Corollary 3.
The only polynomial copulas of 4th degree are FGM copulas.
Comments
This result explains the empirical success of the FGM copulas: among copulas describing true dependence, they are the easiest to compute.
Since the FGM copulas are symmetric C(u,v)=C(v,u), asymmetric dependence requires higher-degree polynomial copulas.
An alternative explanation of the FGM formulas, based on fuzzy logic, is given in the next subsection.
2.2. Explanation Based on Computational Complexity: Proof of the Main Result
(1) The first condition on the copula, the condition that C(0,v)=0 for all v, means that if u=0, then C(u,v)=0.
An arbitrary polynomial C(u,v) can be represented as (6)Cu,v=C0v+u·C1u,v,where C0(v) is the sum of all the monomials that do not contain u and C1(u,v) is the result of dividing all u-containing monomials by u.
For u=0, the condition C(0,v)=0 means that C0(v)=0 for all v. Thus, C(u,v)=u·C1(u,v) for some polynomial C1(u,v).
(2) The condition C(u,0)=u·C1(u,0)=0 for all u≠0 implies that C1(u,0)=0 for all u and, thus, that C1(u,v)=v·C2(u,v) for some function C2(u,v). Therefore, (7)Cu,v=u·C1u,v=u·v·C2u,v.
(3) The condition C(1,v)=v takes the form v·C2(1,v)=v, so C2(1,v)=1, and so f(u,v)=defC2(u,v)-1=0 when u=1, that is, when 1-u=0.
(4) Similarly to part 1 of this proof, this implies that (8)C2u,v-1=1-u·C3u,vfor some polynomial C3(u,v). Similarly, the condition C(u,1)=1 implies that C3(u,v)=(1-v)·C4(u,v) for some polynomial C4(u,v). Thus, (9)C2u,v-1=1-u·C3u,v=1-u·1-v·C4u,v,and hence (10)C2u,v=1+1-u·1-v·C4u,v,Cu,v=u·v·C2u,v=u·v·1+1-u·1-v·C4u,v.This is the desired formula, with θ(u,v)=C4(u,v).
The proposition is proven.
2.3. Explanation Based on Fuzzy Logic
What Is Fuzzy Logic? A Brief Reminder. An alternative explanation comes from fuzzy logic, where numbers from the interval [0,1] describe the expert’s degree of confidence in a statement. Fuzzy logic was invented by Zadeh [11]; for the state-of-the-art, see, for example, [12–15].
In fuzzy logic, once we know the expert’s degree of confidence a in a statement A, his/her degree of confidence in its negation ¬A is estimated as 1-a.
Similarly, if we know the expert’s degree of confidence a in a statement A and we know the expert’s degree of confidence b in a statement B, then the expert’s degree of confidence in a conjunction A∧B is estimated as f∧(a,b) for an appropriate function f∧(a,b); this function is known as an “and”-operation or a t-norm. One of the most widely use “and”-operations is the algebraic product f∧(a,b)=a·b – that corresponds to the situation when A and B are statistically independent and we take probability as degree of confidence. This is the “and”-operation that we will use in this section.
Similarly, to estimate the expert’s degree of confidence in a statement A∨B, we apply an appropriate “or”-operation f∨(a,b) (also called t-conorm) to the corresponding degrees a and b. One of the most widely used “or”-operations is f∨(a,b)=min(a+b,1). This is the “or”-operation that we will use in this section.
Copula as a Particular Case of an “and”-Operation. A copula can also be viewed as an “and”-operation: it transforms the probabilities FX(x)=Prob(X≤x) and FY(y)=Prob(Y≤y) of the events X≤x and Y≤y into the probability FXY(x,y)=Prob(X≤x∧Y≤y) that the first event occurs and the second event occurs. How can we go from the original “crisp” “and”-operation to a new “fuzzy” one?
Towards a Fuzzy Explanation of the FGM Copula. For each of the two statements A and B, we want to cover both possibilities:
that the corresponding statement is absolutely true,
that the corresponding statement is “fuzzy,” that is, to some extent true and to some extent false.
In other words, fuzzy means that there is some degree of belief that A is true and that its negation is true.
Thus, we can say that the statement A∧B is true if
either A and B are absolutely true,
or A and B are both “fuzzy,” that is, true to some extent and false to some extent.
The degree to which A is true is a. Thus, the degree to which the negation ¬A is true is 1-a. Therefore, the degree to which both the statement A and its negation are both true is a·(1-a). This is a degree to which the statement A is fuzzy.
Similarly, the degree to which B is fuzzy is equal to b·(1-b). Thus, the degree to which both A and B are fuzzy is equal to the product a·(1-a)·b·(1-b).
If we denote the degree to which this both-fuzzy case contributes to “and” by θ, then the contribution of this case to the overall trueness of the conjunction A∧B is θ·a·(1-a)·b·(1-b).
The degree to which both A and B are true can be estimated as a·b. Thus, if we use min(a+b,1) as the “or”-operation, then the resulting overall degree has the desired form (11)a·b+θ·a·1-a·b·1-b.(at least while this sum does not exceed 1, and for the FGM copulas, it does not exceed 1.)
Therefore, we indeed have an alternative—fuzzy-logic-based—explanation of the FGM copula.
Comment. For general aspects of relation between fuzzy and copulas, see, for example, [16–18].
3. Discussion and Conclusion
Problem: Reminder. In many practical applications, correlation is used to describe dependence between random variables. However, correlation only captures possible linear dependence between random variables. To describe a general—possibly nonlinear—dependence, we need to use, for example, the copula techniques.
There exist many different families of copulas. It turns out that, in many applications, the actual dependence between random variables is best described by copulas from a special family of FGM copulas. Up to now, there have been no convincing explanations for this empirical observation.
Our Results. In this paper, we provide two possible theoretical explanations for this empirical phenomenon. First, we show that the FGM copulas are the easiest to compute—this is one possible explanation for their empirical success. Second, we show that these copulas naturally appear when we use fuzzy logic to formalize our imprecise understanding of how to describe the dependence between random variables.
Discussion. The fact that these two explanations lead to the same class of empirically successful copulas makes us confident that this is indeed the best possible class.
Our results will also, hopefully, make practitioners and researchers more confidence that FGM copulas are indeed the best and, thus, encourage them to use these copulas even more.
Remaining Open Problems. An interesting open problem is related to the fact that the FGM family of copulas is a 1-parametric family. This family may be the most accurate approximator among all 1-parametric families, but the general dependence can be more complex than this. Thus, to get an even more accurate description of the dependence between several variables, it is desirable to use 2- and more-parametric families. Which 2-,3-,…, parametric families should we use?
Can we use computational complexity-related ideas to come up with appropriate multidimensional families of copulas? Our arguments imply that all elements of such families should be polynomials of higher order, but what exactly are the formulas that we should use? Can we use fuzzy logic to transform our informal understanding of this problem into precise formulas for such families? Or do we need new methods for that? This would be interesting to investigate. A good start would be to first analyze this problem empirically: Which 2-parametric families of copula are empirically the best?
Disclosure
A preliminary version of this paper was posted to the University of Texas at El Paso Technical Report UTEP-CS-17-24.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the Center of Excellence in Econometrics, Chiang Mai University, Thailand. It was also supported in part by the National Science Foundation Grants HRD-0734825 and HRD-1242122 (Cyber-ShARE Center of Excellence) and DUE-0926721 and by an award “UTEP and Prudential Actuarial Science Academy and Pipeline Initiative” from Prudential Foundation. The authors are greatly thankful to all the participants of the 2017 International Conference of the Thailand Econometric Society TES’2017, especially to Zheng Wei for valuable discussions and to the anonymous referees for important suggestions.
JaworskiP.DuranteF.HärdleW. K.RychlikT.2010New York, NY, USASpringer Verlag, Berlin, HeidelbergZbl1194.62077KreinovichV.NguyenH. T.SriboonchittaS.KoshelevaO.HuynhV.-N.InuiguchiM.DenoeuxT.Why copulas have been successful in many practical applications: a theoretical explanation based on computational efficiency9376Integrated Uncertainty in Knowledge Modeling and Decision Making, Proceedings of The Fourth International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making (IUKM’15)2015Nha Trang, Vietnam112125Springer Lecture Notes in Artificial IntelligenceMaiJ.-F.SchererM.2017SingaporeWorld ScientificMcNeilA. J.FreyR.EmbrechtsP.2015Princeton, New Jersey, USAPrinceton University PressNelsenR. B.2007New York, NY, USASpringer Verlag, Berlin, HeidelbergFarlieD. J. G.The performance of some correlation coefficients for a general bivariate distribution19604730732310.1093/biomet/47.3-4.307MR0119312GumbelE. J.Bivariate exponential distributions196055698707MR011640310.1080/01621459.1960.10483368Zbl0099.145012-s2.0-8494739209110.2307/2281591MorgensternD.Einfache beispiele zweidimensionaler verteilungen19568234235MR0081575KreinovichV.SriboonchittaS.HuynhV. N.2017Cham, SwitzerlandSpringer VerlagWeiZ.KimD.WangT.TeetranontT.KreinovichV.SriboonchittaS.HuynhV. N.A multivariate generalized FGM copulas and its application to multiple regression2017692SwitzerlandSpringer Verlag, Cham363380Studies in Computational Intelligence10.1007/978-3-319-50742-2_22ZadehL. A.Fuzzy sets1965833383532-s2.0-3424866654010.1016/S0019-9958(65)90241-XZbl0139.24606BelohlavekR.DaubenJ. W.KlirG. J.2017New York, NY, USAOxford University PressKlirG.YuanB.1995New Jersey, NJ, USAPrentice Hall, Upper Saddle RiverMR1329731MendelJ. M.2017Cham, SwitzerlandSpringerNguyenH. T.WalkerE. A.20063rdBoca Raton, Fla, USAChapman & HallMR2180829HájekP.MesiarR.On copulas, quasicopulas and fuzzy logic20081212123912432-s2.0-4904908670210.1007/s00500-008-0286-zZbl1152.03018SunC.BieZ.XieM.JiangJ.Fuzzy copula model for wind speed correlation and its application in wind curtailment evaluation20169368762-s2.0-8495896658310.1016/j.renene.2016.02.049YagerR. R.Modelng holistic fuzzy implication using co-copulas20065320722610.1007/s10700-006-0011-2MR22509432-s2.0-33746072258