Towards an (Even More) Natural Probabilistic Interpretation of Fuzzy Transforms (and of Fuzzy Modeling)

In many practical applications, it turns out to be useful to use the notion of fuzzy transform : once we have functions A 1 ( x ) (cid:21) 0, . . . , A n (cid:21) 0, with n ∑ i =1 A i ( x ) = 1, we can then represent each function f ( x ) by the coef-ﬁcients F i = ∫ f ( x ) (cid:1) A i ( x ) dx ∫ A i ( x ) dx . Once we know the coeﬃcients F i , we can (approximately) reconstruct the original function f ( x ) as n ∑ i =1 F i (cid:1) A i ( x ). The original motivation for this transformation came from fuzzy modeling, but the transformation itself is a purely mathematical transformation. Thus, the empirical successes of this transformation suggest that this transformation can be also interpreted in more traditional (non-fuzzy) mathematics as well. Such an interpretation is presented in this paper. Speciﬁcally, we show that the 2002 probabilistic interpretation of fuzzy modeling by S´anchez et al. can be modiﬁed into a natural probabilistic explanation of fuzzy transform formulas.


Introduction: Fuzzy Transform and the Need for Its Probabilistic Interpretation
Fuzzy transform: a definition.The notion of a fuzzy transform (F-transform, for short) turned out to be very useful in many application areas such as image compression, solving differential equations under initial uncertainty, etc.; see, e.g., [12,13] and references therein.Generally speaking, the F-transform of function f is a vector with weighted local mean values of f as components.The first step in the definition of the F-transform of f : X −→ R is a selection of a fuzzy partition of universal set X (e.g, a bounded interval [a, b] on R) by a finite set of basic functions A 1 (x) ≥ 0, . . ., A n (x) ≥ 0, which are continuous and satisfy the condition: Basic functions are called membership functions of respective fuzzy sets, or, alternatively, granules, information pieces, etc.Their choice reflects the type of uncertainty which is related to the knowledge of x.
Once the basic functions are selected, we define the F-transform of a continuous function f : X −→ R as a vector (F 1 , . . ., F n ) where F-transform satisfies the following properties [12,13]: , where h i is the length of the support of A i .F-transform is used in applications as a "skeleton model" of f .This model provides a compressed image if f is an image [3], values of a trend if f is a time series [14], a numeric model if f is used in numeric computations (integration, differentiation) [15], etc.
Once we know the F-transform components F i , we can (approximately) reconstruct the original function f as In [12], the formula (2) is called the F-transform inversion formula.The formula (2) represents a continuous function that approximates f .Under certain reasonable conditions, a sequence of functions represented by (2) uniformly converges to f (see [12] for more details).
Example.Let us give an example of the F-transform of x 2 on the domain [0, 1] with respect to A 1 , . . ., A 5 .For simplicity, we assume that basic functions A 1 , . . ., A 5 are of triangular shape and constitute a uniform fuzzy partition of [0,1].Their analytical representation is as follows: By (1), the value of the components F 1 , . . ., F 5 of the F-transform are: Figure 1 provides a graphical representation of the basic functions A 1 , . . ., A 5 , of the function f (x) = x 2 , of its F-transform components F 1 , . . ., F 5 , and of the inverse F-transform f (x) of x 2 .
F-transform: original motivation.The original motivation for F-transform came from fuzzy modeling [12,13].For example, in the situation corresponding to the inverse F-transform, we have n rules These rules are Takagi-Sugeno (TSK) rules with singleton (constant) right-hand sides.For TSK rules, the value corresponding to a given input  The purpose was to show that this type of modeling can be as useful in applications as more traditional techniques such as Fourier transform and wavelet transform.Moreover, F-transform has a potential advantage over Fourier and wavelet transforms: while the Fourier transform uses a single type of basic functions (exp(i • ω • x)) and the wavelet transform uses a single "mother wavelet" that determines all the basic functions, F-transform can use several different basic functions A i .An additional advantage is that, in contrast to the purely mathematical basic functions used in Fourier and wavelet transforms, the basic functions A i in a fuzzy partition usually come from natural language terms like "low" or "high".(For a detailed description of fuzzy modeling e.g., [5,10].) Just like any other tool of applied mathematics, F-transform is not a panacea.It is more successful in some problems, and in other problems, it is less successful.It is therefore desirable to combine F-transform with other mathematical tools, so as to combine relative advantages of different techniques.For combining F-transform with other mathematical tools, it is desirable to come up with a purely mathematical (non-fuzzy) interpretation for this transform.
In particular, since most mathematical data processing tools are based on probability and statistics, it is desirable to come up with a probabilistic interpretation for F-transform.
The known probabilistic interpretation of fuzzy modeling leads to a probabilistic interpretation of F-transform.We have mentioned that Ftransform was originally designed as a particular case of fuzzy modeling.A seminal paper [17] provided a reasonable probabilistic model for a particular case of fuzzy modeling.Specifically, this paper shows that if we use piece-wise constant probability density functions for describing the output, then we get a particular case of a fuzzy model -the case when we use product for "and" and sum for "or".Since F-transform corresponds to exactly this type of fuzzy modeling, we thus get a probabilistic model for F-transform as well.
What we do in this paper.In this paper, we show that a modification of the probabilistic interpretation from [17] enables us to justify formulas of F-transform without making any additional assumptions about the probability distributions.In mathematical terms, this modification consists of using Bayes formulas -and making assumptions about prior distributions (a natural way to describe prior knowledge in statistics) instead of making assumptions about the actual distributions.
Thus, we get an even more natural probabilistic interpretation of F-transform.Specifically: • the paper [17] shows, in effect, that there exists a reasonable probabilistic interpretation of the F-transform formulas; • however, in principle, this interpretation leaves the possibility that there exist other equally reasonable assumptions about the probability distributions can lead to different formulas; • in our modified interpretation, we show that the basic probabilistic setting uniquely determines the F-transform formulas -without the need to make any assumptions about the probability distributions.
We also show that a similar modification can be applied to the probabilistic interpretation of general fuzzy modeling formulas.
Comment.From the mathematical viewpoint, the resulting formulas are very similar to the formulas from [17] (with the exception of the Bayes formula step).However, in our opinion, this mathematically minor modification leads to a major change in interpretation: now, to probabilistic researchers, F-transform is • not just a possible model, corresponding to one of the possible reasonable choices of probability distributions, • but the model uniquely emerging from the natural probabilistic setting.
Similar conclusion can be made about the probabilistic interpretation of more general fuzzy models.
In other words, our minor modification uncovers an even deeper fundamental meaning of the probabilistic interpretation originally proposed in [17].

A Natural Practical Problem that Leads to F-transform
Physical setting: general discussion.Let us assume that we have a physical process that is characterized by two quantities x and z, and we know that these quantities are related by a functional dependence z = f (x).
In the ideal situation of complete knowledge, • we know the exact value of x, and • we have the exact description of the function f .
In this case, we can get the corresponding exact value z = f (x) of the second quantity.
In practice, we know the value x with uncertainty, i.e., several different values of x are consistent with our knowledge.We must therefore provide a reasonable estimate for z.Finding such an estimate will be the first problem with which we will be dealing.In this first problem, we assume that the function f is known exactly.
If this function has to be determined empirically, then we shall transform the empirical (often, partial) knowledge about f into a reasonable estimate for this function.This will be the second problem with which we will be dealing in this section.

First problem: estimating the value f (x) for an imprecisely known x.
If we only know one piece of information Second problem: estimating the function z = f (x) based on partial information about the dependence between x and z.Assume that for every information piece X i , 1 ≤ i ≤ n, we have the corresponding measured value F i of z.Since we know only n numerical characteristics F i of the unknown function f , we cannot exactly reconstruct this function.Instead, we need to provide a good estimate for each value f (x) of this function.

A Natural Probabilistic Problem that Leads to the Probabilistic Interpretation of F-transform
Uncertainty in x: a general probabilistic description.Assume that we have a model of the estimation procedure, that enables us, given the actual value x, to compute the probability P (X i | x) ≥ 0 of this procedure resulting in X iunder the condition that the actual (unknown) value of the estimated quantity is x.
To simplify formulas, we denote Since for every x, we must have exactly one of the n possible outcomes, we thus conclude that the probabilities P (X 1 | x), . . ., P (X n | x) of different estimation results must add up to one, i.e., we must have In the above simplified notation, this formula takes the form First problem: estimating the value f (x) for an imprecisely known x.Let us consider the first problem.In practice, we do not know the exact value of the quantity x.Instead, we only have one of the information pieces Under the assumption that we know X i , what is the reasonable estimate for z = f (x)?
In terms of probability theory, we would like to find the conditional expected value By definition, this expected value is equal to Thus, to compute this expected value, we must know the probabilities P (x | X i ).Instead, we know the probabilities P (X i | x).
In general, the problem of reconstructing in which P (H x ) is a prior probability of the hypothesis H x (strictly speaking, P (H x | X i ) and P (H x ) are probability densities).
In our case, different hypotheses H x correspond to different possible values x of the quantity of interest.Thus, (7) takes the form Since there is no a priori reason to prefer one value of x to the other, it is reasonable to assume that all the values x are equally probable, i.e., that all prior values P (x) are equal to each other: P (x) = P 0 .
Substituting P (x) = P 0 into the formula (8) and dividing both the numerator and the denominator by the common factor P 0 , we get the expression Substituting this expression into formula (6) (and renaming the variable in the denominator), we get In terms of the simplified notation (3), we thus get i.e., exactly the formula (1) corresponding to F-transform.
Second problem: estimating the function z = f (x) based on partial information about the dependence between x and z.In some practical situations, we do not know the exact expression for the function f (x).Instead, we must estimate f (x) from the empirical data, i.e., from the previous results of simultaneous measuring x and z.
In each such measurement, the only information that we get about x is one of the values X 1 , . . ., X n .For each case when the information about x is X i , we have one or several values z.
Ideally, we should have a large number of values z corresponding to each x-measurement result X i .Based on these values z, we should then be able to reconstruct the conditional distribution of z under the condition of X i .Based on these conditional distributions, we should be able to reconstruct the values f (x) for all x.
In practice, however, we have only a few values z corresponding to each xmeasurement result X i .In this case, at best, instead of the entire conditional probability distribution, we can only reconstruct a single parameter -the conditional mean Since we only know n characteristics F i of the unknown function f (x), we cannot exactly reconstruct this function.Instead, we need to describe a good estimates for each value f (x) of this function.
Similarly to the first problem, we take the mean as a reasonable estimate.Thus, in the above practical setting, the problem of estimating the function f (x) takes the following form: • for every i, we know the conditional mean • based on these conditional means, for every x, we want to estimate the mean value f (x) For this problem, the formula of full probability leads to the following result: By using the notations f and A i (x) for P (X i | x), we can transform the formula (10) into the form i.e., exactly the F-transform inversion formula (2).

Conclusion.
The above (minor) modification of a probability model from [17] uniquely determined both basic formulas (1) and ( 2) related to F-transform.
Relation with the random set interpretation of fuzzy sets.It is worth mentioning that the probabilistic interpretation from [17] is related to the random set interpretation of fuzzy sets (see, e.g., [6]).
In this interpretation, the meaning of an imprecise (fuzzy) term like "small" is based on the following idea.The fact that the term is imprecise means that for the same value x, some people will say that this value is small, while other people will say that this value is not small.To take this imprecision into account, we can store, for each person, a set of all the values that this person considers small.
Since there is no prior reason to prefer the opinion of one of these folks, we consider their opinions equally reasonable.We can then take the ratio µ small (x) of people who consider x to be small as a reasonable measure of smallness.(This is actually one of the standard ways to construct a membership function corresponding to a certain term.) We can describe this ratio in probabilistic terms if we assume that all the persons are equally probable.In these terms, the value µ small (x) can be interpreted as the probability P (small | x) that a randomly selected person would consider x to be small.This interpretation of the membership function A i (x) as the conditional probability P (X i | x) is exactly what we used in our probabilistic interpretation of F-transform.
Terminological comment.For completeness, let us explain why the above interpretation is called the random sets interpretation.
For crisp (well-defined) properties, each property can be described by the set of all the values that satisfy this property.
For each imprecise property like "small", instead of a single set describing all the values that satisfy this property, we have several sets describing the opinions of several persons.We consider the opinions of all these persons to be equally valid, so each of N persons has the exact same probability 1/N of being correct.In this case, we have different sets, each occurring with probability 1/N .
In mathematical terms, we can describe this situation by saying that we have a probability distribution on the class of all possible sets.In probability theory, such a distribution is called a random set -similarly to the fact that a probability distribution on the class of all possible numbers is called a random number.

A Similar Modification of a Probabilistic Interpretation Is Possible For Mamdani-Style Fuzzy Modeling (and Fuzzy Control)
From F-transform to fuzzy modeling.Let us show that the above modification of a probabilistic interpretation from [17] can be extended from Ftransform to a more general case of Mamdani-type fuzzy modeling and fuzzy control.
Comment.In this section, we concentrate on Mamdani's approach since Ftransform can be viewed as a particular case of this approach, and since for Mamdani's approach, a probabilistic interpretation is possible [17].Please note that while Mamdani's approach was historically the first, at present, there are many different approaches to fuzzy modeling and fuzzy control; we mention some of them in this chapter, but there are many others; see, e.g., [7,8,9].How to best interpret these other approaches in probabilistic terms -and whether such an interpretation is at all possible -is an interesting open question.For example, an interesting question is how to interpret type-2 approaches to fuzzy modeling and fuzzy control; see, e.g., [1,2,4,20]; maybe via intervalvalued probabilities?Mamdani's approach to fuzzy modeling and fuzzy control: a brief reminder.In Mamdani's approach, we start with rules like "if x is small, then u should be medium", and then use membership functions for "small" and "medium" to transform these rules into an exact control strategy.
In general, we have rules "if x has a property A i then u has the property with known membership functions A i (x) and B i (u) for the corresponding properties.Mamdani's methodology is based on saying that for each input x, the value u is a reasonable value of control if and only if one of the above n rules is applicable, i.e., • either the first rule is applicable, i.e., x satisfies the property A 1 and u satisfies the property B 1 , • or the second rule is applicable, i.e., x satisfies the property A 2 and u satisfies the property B 2 , • . . .
• or the n-th rule is applicable, i.e., x satisfies the property A n and u satisfies the property B n .
Once we select functions f & (a, b) and f ∨ (a, b) to represent "and" and "or" (these functions are called t-norm and t-conorm), we can thus describe the degree of our belief µ x (u) that u is reasonable (for a given input x) as In particular, if we select ) (and if the added values do not go beyond 1), we get Once we know this membership function, we can find the appropriate value of u by using the so-called centroid defuzzification: A natural probabilistic analog of Mamdani's approach to fuzzy modeling.In [17], it was shown that in a probabilistic setting, we get formulas similar to Mamdani rules corresponding to we assume a uniform distribution on the outputs.Let us show that by using Bayes formula, we can avoid this additional assumption, and thus, make the resulting probabilistic analog of Mamdani's fuzzy modeling even more natural.Similarly to the above probabilistic interpretation of F-transform, let us assume that we have possible pieces of information X 1 , . . ., X n about the quantity x, and that for each piece of information, we also know the corresponding probability P (X i | x) which we will be denoted by A i (x).
Similarly, let us assume that we have possible pieces of information U 1 , . . ., U m about u, and we know the corresponding probabilities P (U i | u) which we will denote by B i (u).
We know that u depends on x, but we do not know the exact dependence.Instead, for each information X i about x, we know the corresponding information U j about the corresponding u.
Since we did not select any specific order for the informations U i , we can select the value corresponding to X 1 as U 1 , the value corresponding to X 2 by U 2 , etc.Under this selection, the available information simply means that if x is described by the piece of information X i , then the corresponding u is described by the piece of information U i .
Our objective is, given these rules and given a new value x, to find a good estimate for the appropriate u.
Due to the formula of full probability, the conditional probability density P (u | x) of u under the condition x has the form We know the probabilities P (X i | x) = A i (x).The probability densities P (u | U i ) can be determined by using the Bayes theorem -similarly to the F-transform case -as i.e., in terms of the values B i (u), as Substituting the formula ( 17) and the expression (3) into the formula (15) (and changing the multiplication order), we get the formula Once we know these probabilities, we can produce the mean u as a reasonable estimate for u: These are exactly the formulas derived in [17] from the additional assumption of a piece-wise constant output distribution.Thus, our (minor) modification of [17] indeed uniquely determines the corresponding probabilistic analog of Mamdani's formulas.
In Mamdani-type setting, fuzzy and probabilistic formulas are, in general, different.It is worth mentioning that • while in F-transform, the probabilistic and fuzzy derivations lead to exactly the same formulas, • in the general fuzzy modeling case, as mentioned in [17], the formulas are somewhat different: while the formula ( 19) is exactly the same as (14), with P (u | x) instead of µ x (u); the formula ( 18) is slightly different from Mamdani's formula (13)by the integral in the denominator.
Cases when fuzzy and probabilistic formulas coincide.For F-transform (and, more generally, in all the cases when the value ∫ B i (y) dy is the same for all i), this additional denominator simply divides all the values P (u | x) by the constant.This constant appears both in the numerator and in the denominator of the formula (18) and thus, it does not affect the resulting value u(x).
Another case when the fuzzy and probabilistic formulas coincide is the case of the Takagi-Sugeno (TSK) approach; see, e.g., [7].This equivalence is, in effect, proven in [17].In the TSK approach, rules have the type for known functions f i (x).In the probabilistic setting, we assume that under piece of information U i , we must take u = f i (x).Thus, for a given input x, we select f i (x) with probability P (X i | x) = A i (x), where The resulting mean u(x) is thus equal to this is exactly the TSK formula.
Comparison between fuzzy and probabilistic modeling.For Mamdanitype situations when fuzzy and probabilistic formulas are different, the comparison of the corresponding probabilistic and fuzzy rules is done, in detail, in [17].
Let us add three more situations to this comparison, situations that are naturally related to our modified derivation.
Case when probabilistic control is better.When the values ∫ B i (y) dy are different, probabilistic control and fuzzy control lead, in general, to a different value u.We will show, on an example originally proposed by R. Yager, that in this case, the result of the probabilistic control is closer to common sense that the result of Mamdani's control.
Indeed, let us consider the situation in which we have two rules: • the first rule is a more general rule saying that if x is small, then u should be small; • the second rule is a very specific rule, saying that if x is very close to 0.11, then u should be very close to 0.15.
Intuitively, if we have a value x for which a very specific rule is applicable, e.g., the value x = 0.11, then this specific rule should have a priority over the general rule.However, since the width of the membership function B 2 (u) is small, the corresponding term in (13) will practically not affect the resulting estimate (14).In contrast, in the probabilistic control, the effect of B 2 (u) is normalized by, crudely speaking, the total width of the corresponding function B 2 (u).Thus, even the most specific rules will have -as desired -the significant influence on the result (19).
Comment.It should be mentioned that the problem with specific rules occurs only in Mamdani's approach to fuzzy control.In the alternative logical approach, this problem does not appear; see, e.g., [11].
Another case when probabilistic control is better.The probabilistic interpretation enables us to naturally consider more general situations in which the rules are themselves probabilistic, i.e., when, for each i and j, we know the conditional probability P (U i | X j ) that if x has the property X j , then u has the property U i .
In other words, instead of the original rules "if x has the property X i , then u has the property U i ", we now have rules "if x has the property X j , then u has the property U i with probability P (U i | X j )".
Indeed, in this case, due to the formula of full probability, the conditional probability density P (u | x) of z under the condition x has the form Here, we know the original probabilities P (U i | X j ) and the probabilities P (X i | x) = A i (x).The probability densities P (u | U i ) can be determined by using the Bayes theorem as an expression (17).Substituting the formula ( 17) and the expression P (X i | x) = A i (x) into the formula (20) (and changing the multiplication order), we get the formula Once we know these probabilities, we can produce the mean u by using the formula (19).
In some cases, fuzzy control is better.We have shown that in some situations, probabilistic control is better than the original Mamdani's fuzzy control.However, in other situations, the fuzzy control is better.Let us give two examples.
Case when Mamdani's formulas are better.The above probabilistic formulas only work for the case when n ∑ i=1 A i (x) = 1 -i.e., in the probabilistic terms, when the properties A i are mutually exclusive.In practice, we may have non-exclusive properties, in which case we may have It is not clear how to handle this situation within the probabilistic approach.However, such situations are not a problem if we apply fuzzy control: its formulas are applicable no matter whether we satisfy the requirement n ∑ i=1 A i (x) = 1 or not.
Other cases when Mamdani's formulas are better.The probabilistic interpretation is only possible when we use multiplication and addition as "and" and "or" operations f & and f ∨ .
Fuzzy control does not necessarily have to use these operations, it can use different t-norms and t-conorms.It is an empirical fact that in many control situations, the use of t-norm different from the product and of the t-conorm different from the sum leads to a much better quality control -e.g., a more stable or a smoother one.
In [19], we have formulated the problem of selecting the t-norm and the tconorm as a precise optimization problem, and for several objective functions like smoothness or stability, we gave an explicit analytical solutions to these optimization problem -specifically, we described the selection that leads to the optimal values of smoothness or stability.In many of these case, the optimal selection is indeed different from the probabilistic case of product and sum.Thus, fuzzy control methodology indeed leads to a better quality control.

Conclusion
The fuzzy transform (F-transform) techniques have been lately shown to be very successful in various applications, including applications where until recently, only more traditional tools like Fourier transform or wavelet transform have been applied.In many other applications, however, the traditional tools have a clear advantage.It is therefore desirable to combine F-transform with the more traditional tools, so as to combine the relative advantages of both techniques.To make this combination easier, it is desirable to interpret F-transform in traditional mathematical terms.
In this paper, we describe a modification of a probabilistic interpretation described in [17].In this modification, the corresponding probabilistic model uniquely leads to the formulas of the F-transform.A similar modification is described in a more general situation of fuzzy modeling.
we get the above formula (2).