Why Are FGM Copulas Successful : A Simple Explanation

One of the most computationally convenient non-redundant ways to describe the dependence between two variables is by describing the corresponding copula. In many application, a special class of copulas – known as FGM copulas – turned out to be most successful in describing the dependence between quantities. The main result of this paper is that these copulas are the fastest-to-compute, and this explains their empirical success. As an auxiliary result, we also show that a similar explanation can be given in terms of fuzzy logic.


Introduction
What Is a Copula?A Brief Reminder.In many practical situations, we know the distribution of each of the two random variables  and , and we now need to also describe their joint distribution.
The distribution of each of the random variables can be described by the corresponding cumulative distribution functions   () ( In principle, we can thus try to determine the values   (, ) corresponding to all possible pairs (, ).However, from the practical viewpoint, this is redundant; indeed (i) the 2D cdf   (, ) also contains information about the 1D cdfs   () and   (), as   () =   (, +∞) and   () =   (+∞, ), (ii) so if we determine all the values   (, ), we will also be determining the values   () and   (), but (iii) we consider the cases when the 1D cdf values are already known, so soliciting them again is unnecessary.
It is therefore desirable to describe the dependence between  and  in a nonredundant way, so that (i) from this description, we will not be able to extract the known 1D cdfs, but (ii) from this information and from the 1D cdfs, we will be able to extract the 2D cdf.
Such a nonredundant description is indeed known, it is a copula (, V), a function from for which, for all real numbers  and , we have see, for example, [1][2][3][4][5].
Properties of Copulas.Not every function (, V) is a copula for an appropriate 2D distribution.For a function to be a copula, it has to satisfy some properties.In this paper, we will use the following properties, which can be easily derived from the definition of the copula: Advances in Fuzzy Systems
Why? To the best of our knowledge, until now, there was no convincing explanation of why FGM copulas are so empirically successful.In this paper, we provide such an explanation.

Explanation Based on Computational Complexity: Main Result
Statistical Data Processing Is Computing.Statistical data processing involves a large amount of computing.With the ever increasing amount of data, processing all this data requires more and more computation time-often to the extent that we exceed the capabilities of our computers.
From this viewpoint, it is desirable to select techniques which are as computationally efficient as possible.With respect to copulas, this means that we should select copulas (, V) whose values are the easiest (thus, the fastest) to compute.
Which Functions Are the Fastest to Compute?In the computers, the only exactly hardware supported operations are addition, subtraction, and multiplication.Everything else-from division to special functions such as exp(), sin(), and so on-is approximated by a sequence of elementary hardware supported operations.The more accuracy we need, the more elementary operations we need, and, thus, the longer the corresponding computations will be.
Therefore, the fastest-to-compute functions are functions that can be exactly represented as a sequence of elementary operations: in this case, the number of elementary operations remains the same no matter what accuracy we desire in our computations.In other words, we are looking for functions which can be obtained from constants and original quantities  1 , . . .,   by applying addition, subtraction, and multiplication.One can easily see that such functions are polynomials; indeed (i) every polynomial is a sum of monomials, and each monomial is a product of a constant and variables, so each polynomial is indeed a superposition of additions and multiplications; (ii) and vice versa, each constant and each variable are polynomials, and the sum, the difference, and the product of two polynomials are also polynomials; thus, by induction, we can prove that every superposition of addition, subtraction, and multiplication is a polynomial.
Not all polynomials are equally easy or equally difficult to compute.Out of the three elementary operations, the most time-consuming operation is multiplication.Thus, the fewer the multiplications are, the faster the computation of the corresponding function is.
(i) With one multiplication-performed in parallel-we can compute linear functions  0 + ∑  =1   ⋅   and also products   ⋅   of two variables.
(ii) By applying second multiplication to the results of the first one, we can thus compute 3rd degree polynomials-or products of 4 variables and so on.
In general, the higher the degree is, the more the time is needed to compute the corresponding polynomial.
Resulting Idea.From the viewpoint of selecting fastest-tocompute copulas, we should select polynomial copulas and, among them, copulas of the smallest possible degree.
Let us describe the results of such a selection.

Proposition 1. Every polynomial copula has the form
for some polynomial (, V).

Comments
(i) For reader's convenience, the proof is placed in the special proof section.
(ii) As a consequence of this proposition, we get the following results.

Corollary 2. The only polynomial copula of 3rd degree is
Comment.This copula is actually of 2nd degree; it corresponds to the case of two independent variables.Thus, to describe dependence, we need to consider polynomials of higher degree.
Corollary 3. The only polynomial copulas of 4th degree are FGM copulas.

Comments
(i) This result explains the empirical success of the FGM copulas: among copulas describing true dependence, they are the easiest to compute.
(iii) An alternative explanation of the FGM formulas, based on fuzzy logic, is given in the next subsection.

Explanation Based on Computational Complexity: Proof of the Main Result. (1)
The first condition on the copula, the condition that (0, V) = 0 for all V, means that if  = 0, then (, V) = 0.An arbitrary polynomial (, V) can be represented as where  0 (V) is the sum of all the monomials that do not contain  and  1 (, V) is the result of dividing all -containing monomials by .
The proposition is proven.

Explanation Based on Fuzzy Logic
What Is Fuzzy Logic?A Brief Reminder.An alternative explanation comes from fuzzy logic, where numbers from the interval [0, 1] describe the expert's degree of confidence in a statement.Fuzzy logic was invented by Zadeh [11]; for the state-of-the-art, see, for example, [12][13][14][15].
In fuzzy logic, once we know the expert's degree of confidence  in a statement , his/her degree of confidence in its negation ¬ is estimated as 1 − .
Similarly, if we know the expert's degree of confidence  in a statement  and we know the expert's degree of confidence  in a statement , then the expert's degree of confidence in a conjunction  ∧  is estimated as  ∧ (, ) for an appropriate function  ∧ (, ); this function is known as an "and"-operation or a t-norm.One of the most widely use "and"-operations is the algebraic product  ∧ (, ) = ⋅ -that corresponds to the situation when  and  are statistically independent and we take probability as degree of confidence.This is the "and"-operation that we will use in this section.
Similarly, to estimate the expert's degree of confidence in a statement  ∨ , we apply an appropriate "or"-operation  ∨ (, ) (also called t-conorm) to the corresponding degrees  and .One of the most widely used "or"-operations is  ∨ (, ) = min( + , 1).This is the "or"-operation that we will use in this section.
Copula as a Particular Case of an "and"-Operation.A copula can also be viewed as an "and"-operation: it transforms the probabilities   () = Prob( ≤ ) and   () = Prob( ≤ ) of the events  ≤  and  ≤  into the probability   (, ) = Prob( ≤  ∧  ≤ ) that the first event occurs and the second event occurs.How can we go from the original "crisp" "and"-operation to a new "fuzzy" one?
Towards a Fuzzy Explanation of the FGM Copula.For each of the two statements  and , we want to cover both possibilities: (i) that the corresponding statement is absolutely true, (ii) that the corresponding statement is "fuzzy," that is, to some extent true and to some extent false.
In other words, fuzzy means that there is some degree of belief that  is true and that its negation is true.Thus, we can say that the statement  ∧  is true if (i) either  and  are absolutely true, (ii) or  and  are both "fuzzy," that is, true to some extent and false to some extent.
The degree to which  is true is .Thus, the degree to which the negation ¬ is true is 1 − .Therefore, the degree to which both the statement  and its negation are both true is  ⋅ (1 − ).This is a degree to which the statement  is fuzzy.
Similarly, the degree to which  is fuzzy is equal to  ⋅ (1 − ).Thus, the degree to which both  and  are fuzzy is equal to the product  ⋅ (1 − ) ⋅  ⋅ (1 − ).
If we denote the degree to which this both-fuzzy case contributes to "and" by , then the contribution of this case to the overall trueness of the conjunction The degree to which both  and  are true can be estimated as  ⋅ .Thus, if we use min( + , 1) as the "or"operation, then the resulting overall degree has the desired form (at least while this sum does not exceed 1, and for the FGM copulas, it does not exceed 1.) Therefore, we indeed have an alternative-fuzzy-logicbased-explanation of the FGM copula.

Discussion and Conclusion
Problem: Reminder.In many practical applications, correlation is used to describe dependence between random variables.However, correlation only captures possible linear dependence between random variables.To describe a general-possibly nonlinear-dependence, we need to use, for example, the copula techniques.
There exist many different families of copulas.It turns out that, in many applications, the actual dependence between random variables is best described by copulas from a special family of FGM copulas.Up to now, there have been no convincing explanations for this empirical observation.
Our Results.In this paper, we provide two possible theoretical explanations for this empirical phenomenon.First, we show that the FGM copulas are the easiest to compute-this is one possible explanation for their empirical success.Second, we show that these copulas naturally appear when we use fuzzy logic to formalize our imprecise understanding of how to describe the dependence between random variables.
Discussion.The fact that these two explanations lead to the same class of empirically successful copulas makes us confident that this is indeed the best possible class.
Our results will also, hopefully, make practitioners and researchers more confidence that FGM copulas are indeed the best and, thus, encourage them to use these copulas even more.

Remaining Open Problems.
An interesting open problem is related to the fact that the FGM family of copulas is a 1parametric family.This family may be the most accurate approximator among all 1-parametric families, but the general dependence can be more complex than this.Thus, to get an even more accurate description of the dependence between several variables, it is desirable to use 2-and more-parametric families.Which 2-, 3-, . .., parametric families should we use?
Can we use computational complexity-related ideas to come up with appropriate multidimensional families of copulas?Our arguments imply that all elements of such families should be polynomials of higher order, but what exactly are the formulas that we should use?Can we use fuzzy logic to transform our informal understanding of this problem into precise formulas for such families?Or do we need new methods for that?This would be interesting to investigate.A good start would be to first analyze this problem empirically: Which 2-parametric families of copula are empirically the best?