Detection of Fuzzy Association Rules by Fuzzy Transforms

We present a new method based on the use of fuzzy transforms for detecting coarse-grained association rules in the datasets. The fuzzy association rules are represented in the form of linguistic expressions and we introduce a pre-processing phase to determine the optimal fuzzy partition of the domains of the quantitative attributes. In the extraction of the fuzzy association rules we use the AprioriGen algorithm and a conﬁdence index calculated via the inverse fuzzy transform. Our method is applied to datasets of the 2001 census database of the district of Naples (Italy); the results show that the extracted fuzzy association rules provide a correct coarse-grained view of the data association rule set.


Introduction
Fuzzy association rules extraction [1] is a fundamental process in data mining for many topics as classification and information retrieval.Many techniques have been presented for extracting fuzzy association rules in datasets and databases ; some authors are using soft computing approaches as evolutionary methods [14,[21][22][23][24][25][26] and clustering algorithms [24,27,28] for creating fuzzy partitions of data attribute domains.In many practical cases the user does not need to make a detailed fuzzy partition of the domain attributes and a fine exploration of fuzzy association rules between attributes in datasets.Indeed, sometimes his purpose is to acquire a more immediate coarse-grained knowledge of hidden relations in the data creating a coarsegrained fuzzy partition of each attribute domain and by estimating fuzzy association rules with evaluative linguistic expressions.
Here we propose a new approach for detecting coarsegrained fuzzy association rules in datasets, based on fuzzy transforms (for short, F-transforms), which are already used for image analysis [18,[29][30][31][32][33][34], data analysis [31,35], and forecasting [34,36].In particular, in [31] a modality of extraction of fuzzy association rules in a coarse grained way is proposed by using F-transforms.In this paper we follow this approach; our framework is composed from a pre-processing phase (necessary in order to obtain the optimal cardinality of the fuzzy partitions of the data attribute domains), and of two successive processes for extracting the fuzzy association rules.Let us consider a dataset represented by Table 1.
We define the context w i = [a i , b i ] of the given attribute X i by setting a i = min{p 1i , . . ., p mi } and b i = max{p 1i , . . ., p mi }.Thus we can consider a fuzzy partition of n(i) fuzzy sets F = {A i1 , A i2 , . . ., A in(i) } for each context w i .A fuzzy association rule S → T between two disjoint sets of attributes S = {X 1 , . . ., X k } and T = {X l , . . ., X z } can be generally expressed as (1) In [31] the F-transforms approach, for detecting fuzzy association rules from a dataset in a coarse-grained view and to construct a framework of fuzzy association rules between   is the set of the objects or instances in the dataset.A value p ji belonging to the domain of the attribute X i is called an item and a set of items is called an itemset.An association rule is represented as a directional dependence between sets of attributes in the dataset.It is indicated with an expression of the type S → T, where S, T are sets of attributes.The implication operator means that if all the items in S exist in an object, then all the items in T are also in the object with a high probability [2].
where (in accordance to [18,29]) A ih(i) is the h i th basic function of the uniform partition of the ith context associated to the node X hi .Each clause in the antecedent assumes the linguistic meaning "X i is approximately LV(A ih(i) )," with LV(A i,h(i) ) being the evaluative expression assigned to the fuzzy set (A i,h(i) ) and ℵ a pure evaluative linguistic expression that characterizes the component of the Ftransform corresponding at the fuzzy sets A 1h(1) , . . ., A kh(k) .
The term "mean" in the consequent derives from the fact that the component is obtained as a mean of the values of the item X z weighted over the basic functions A 1h(1) , . . ., A kh(k) .The symbol F ∼ represents an association between attributes obtained with the F-transforms.In other words, we can say that a fuzzy association rule expressed in the form (2) provides a synthetic valuation of the association rule between attributes, with the associations expressed linguistically with the model of [18,29] and the fuzzy sets in the antecedent A 1h(1) , . . ., A kh(k) being the basic functions of the uniform fuzzy partition of the corresponding context.
Following the definitions and notations of [32], let n ≥ 2 and x 1 , x 2 , . . ., x n be points of a specific context [a,b], called nodes, such that We say that the fuzzy sets {A 1 (x), . . ., A n (x)} form an uniform fuzzy partition if (F) n ≥ 3 and x i = a+h•(i−1), where h = (b−a)/(n−1) and i = 1, 2, . . ., n (equidistance of the nodes); An example of basic functions is given by triangular fuzzy sets.For example, if n = 4, a = x 1 = 10, and b = x 4 = 25, we obtain h = 3.Table 2 shows the nodes that characterize each basic function.
If we define basic functions of triangular form, as example, we have for k = 2, . . ., n − 1: Figure 1 shows the four basic functions forming the fuzzy partition of the context [8,37] given in (3) for n = 4.
The method given in [31] can be very useful when we need to extract fuzzy association rules in an approximate way from a dataset; for each attribute a coarse-grained uniform fuzzy partition of its context is created and the evaluative linguistic expression in the consequent represents a weighted mean of the values of the attribute X z .Nevertheless, as pointed in [35], this approach does not take into account the necessity to have the data sufficiently dense with respect to the chosen fuzzy partition, otherwise the F-transforms cannot be used.In order to avoid the choice of a fuzzy partition either too fine or too coarse of the contexts, it is necessary to define a pre-processing phase which determines the optimal fuzzy partition to choose with respect to the density of the data.Here we propose a technique based on F-transforms to detect the framework of strong fuzzy association rules from a dataset in the form (2). In our method we determine the best uniform fuzzy partition of each context constituted from triangular fuzzy sets like in Figure 1.For each context w i , i = 1, . . ., r, of the dataset in Table 1, an initial uniform fuzzy partition of n i triangular fuzzy sets (3) is established.
To control that the data are sufficiently dense with respect to the chosen partition, we must check that for  Nazionale di Statistica).These databases contain information on population, buildings, housing, family, employment work for each census zone of Naples.The reliability of the fuzzy association rules is also discussed as well.

Pre-Processing and Extraction Fuzzy Association Rule Phases
We use a pre-processing phase that determines the optimal value of n by starting from an initial cardinality of the fuzzy partitions.Furthermore this phase ensures that the data are sufficiently dense with respect to the chosen fuzzy partitions, so that of basic functions with respect to a minimal density of data points (x i1 , x i2 , . . ., x ir ).For each combination (A 1h(1) , A 2h(2) , . . ., A rh(r) ), we calculate the value which is the number of data points (x i1 , x i2 , . . ., x ir ) for which The user can use a minimal density ρ ε of data, in the sense that and the fulfillment of this constraint is to determine a value for the parameter n which is consistent with the distribution of the data correspondent to the uniform fuzzy partitions.The use of ρ ε allows us to control how the uniform fuzzy partition set of the attributes should be finer.If ρ ε = 0, we obtain the more coarse-grained fuzzy partition set in accordance to this constraint.As examples, in Figures 2 and 3 we show the case of two attributes, X 1 and X 2 .Each object of the dataset is represented as a point in the Cartesian graph of X 1 and X 2 and in both examples we consider ρ ε = 5.
In Figure 2 we have a too fine partition for n = 8; in fact ρ h( 1)h(2) = 4 < ρ ε = 5 by considering the combination of triangular fuzzy sets (A 11 , A 28 ).In Figure 3 we have a more coarse-grained fuzzy partition for n = 7 which should be optimal.A finer partition set does not satisfy the constraint of minimum density of data and a coarser partition set would be under-sampled with respect to the dataset dimension.In this pre-processing phase we start with a value n 0 of the parameter n.If the partition set is too fine, we decrement the value of n by 1 until we determine the optimal partition.If n ≤ 2, the dataset is too coarse grained for the fuzzy rules extraction via F-transforms and the process is stopped (inconsistent dataset), otherwise we use the optimal partition in the successive step of fuzzy rules extraction.Figure 4 gives the schema of the pre-processing phase.Following [31], in the extraction process we establish fuzzy association rules of the form (2) by calculating the support index as the percentage of objects in the dataset for which the antecedent in the fuzzy association rule ( 2 where m is the dimension of the dataset.Then we calculate the confidence of each rule to evaluate the precision of a potential fuzzy association rule.Normally the confidence index is given by the ratio of the number of the objects in the dataset for which the antecedent and the consequent in the fuzzy association rule (2) are not null with respect to the number of objects in the data set for which the antecedent in the fuzzy association rule (2) is not null.In other words, we use the confidence index proposed in [31] given by where

is the value of the inverse F-transform applied on the ith data object and F h(1)h(2)•••h(k) is the fuzzy transform component associated with the fuzzy sets
) has been already used in [31,35] to model the dependency of the attribute X z via the predictors X 1 , . . .X k like X z = H(X 1 , . . ., X k ), where H is a function estimated via a suitable fuzzy partition of the independent attribute domains.The formula (7) provides an estimate of the grade of precision of a potential fuzzy association rule.If the above index is equal to 1, then the error in the approximation of X z obtained with the inverse fuzzy transform H F n (o i ) is null.In our framework we also use two sub-processes for extracting the fuzzy association rules.In the first sub-process we use the AprioriGen algorithm to extract the candidate fuzzy association rule with maximal dimension and support greater than or equal to a threshold sup ε .In the successive sub-process, the corresponding inverse fuzzy transform and the confidence index (7) are calculated for each fuzzy association rule candidate.The fuzzy association rule is extracted if the index (7) is greater than or equal to a threshold con ε and in this case we determine the evaluative linguistic expression ℵ with the component F h(1)h(2)•••h(k) of the F-transform corresponding to the fuzzy sets (A 1h(1) , A 2h(2) , . . ., A kh(k) ) (this modality of assignment is described in Section 4 with many details).In Figure 5 we show the processes used for extracting fuzzy association rules in the form (2).
Summarizing, we can say that in the first step the AprioriGen algorithm is applied for determining the set of potential fuzzy association rules; in the successive step the direct, the inverse fuzzy transform, and the index "con"  are calculated for each potential fuzzy association rule; if con < con ε the potential fuzzy association rule is discarded, otherwise it is extracted and inserted like a strong fuzzy association rule in the final set of the rules.

Discrete F -Transforms in Several Variables
We firstly deal with functions of one variable and only the discrete case; indeed we know that a function f assumes determined values in the set of points P = {p 1 , . . ., p m } of the interval [a, b], a, b ≥ 0. If P is sufficiently dense with respect to the fixed partition {A 1 , A 2 , . . ., A n }, that is for each i = 1, . . ., n there exists an index j ∈ {1, . . ., m} such that A i (p j ) > 0, we can define the n-tuple [F 1 , F 2 , . . ., F n ] as the discrete F-transform of f with respect to the basic functions {A 1 , A 2 , . . ., A n }, where each F i is given by for i = 1, . . ., n.We also define the inverse F-transform of f with respect to the basic functions {A 1 , A 2 , . . ., A n } by setting for every j ∈ {1, . . ., m}.We have the following theorem [32,Theorem 5].

Fuzzy Association Rules Extraction Process
We use the multi-dimensional direct and inverse Ftransform for extracting fuzzy association rules in the form (2) from the dataset represented as in Table 1, where (A i1 , . . ., A i,n ), i = 1, . . ., k, is the uniform fuzzy partition of the context w i = [a i , b i ] of the given attribute X i by setting a i = min{p 1i , . . ., p mi } and b i = max{p 1i , . . ., p mi }, with the basic functions (3).For computational simplicity, we assume constant the cardinality of the fuzzy partition of each contexts w i , that is card the multi-dimensional direct F-transform defined by (11) be corresponding to the combination (A 1h(1) , . . ., A kh(k) ) and if p jz is the given (expected) value, then the formula ( 11) is reduced to the following one:

In other words, F h(1)h(2)•••h(k
) is a mean of the values of the attribute X z weighted over (A 1h(1) , . . ., A kh(k) ).Following [18,29], ℵ is given by the combination of one of the linguistic following hedges: Ex (extremely), Si (significantly), Ve (very), empty hedge, ML (more or less), Ro (roughly), QR (quite roughly), VR (very roughly) with one of the following expressions: Sm (small), Me (medium), Bi (big).Each linguistic hedge is modelled with a continuous function ν abc defined by means of three parameters a, b, c, with 0 ≤ a < b < c ≤ 1, as The fuzzy sets of each combination of linguistic hedges with one of the expressions "Small," "Medium," "Big" are defined by Int linguistichedge small = ν abc (LH(x)), Int linguistichedge medium = ν abc (MH(x)), Int linguistichedge big = ν abc (RH(x)), (17) Advances in Fuzzy Systems where the locution "Int" stands for the intensity of the linguistic expression and The extension of a linguistic expression is obtained via a simple linear transformation defined, considering the context w z = [a z , b z ] of the attribute X z , as The linguistic expression ℵ represents a weighted mean of the values of the attribute X z , in which the weights are given by the membership values of the basic functions.In Figure 6 we show the three linear functions LH, MH, RH.
As example, in Figure 7 (resp., Figure 8) we show the fuzzy sets "Ro small," "Ro medium," and "Ro big" (resp., "Ex small," "Ex medium" and "Ex big") determined by setting a = 0.49, b = 0.5, c = 0.51 (resp., a = 0.03, b = 0.5, c = 0.96).In other words, the experts can assign specific labels to the linguistic expressions which are representative of their reasoning, or can use the same label for more fuzzy sets.For example, to the fuzzy set "Ex medium" they can associate the label "perfectly on the average".In accordance to [18,29], we define a partial ordering "≤" in the set of the linguistic hedges as Ex exp ≤ Si exp ≤ Ve exp ≤ empty hedge exp  where exp is one of the following expressions: "small," "medium" or "big".As mentioned above, if we obtain the same membership degree for two or more linguistic expressions, we assign to ℵ the linguistic expression of the "lowest fuzzy set," which is the sharpest evaluative expression with respect to the partial ordering "≤."For example, if we have x = 1, we assign the linguistic expression "Ex big" to ℵ because "Ex big" is the lowest fuzzy set among all the fuzzy Ve big Si big sets " hedge big" (like "Ex big" and "Ro big") assuming the value 1.In order to use the formulae (18), the dataset has to be sufficiently dense with respect to the chosen fuzzy partitions of basic functions.We set the optimal value of the parameter n using the preprocessing phase schematized in Figure 4, in which we control that the number of data points p j = (p j1 , p j2 , . . ., p jr ) such that (1) , A 2h(2) , . . ., A rh(r) ), where ρ ε is a prefixed threshold otherwise the cardinality n of each fuzzy partition is decremented and the process is iterated.In this mode we impose the fulfilment of the sufficient density of the data points with respect to the fuzzy partition's set and we are sure that the fuzzy partition created is too coarse grained.We set the initial value of the cardinality of the fuzzy partitions to n 0 .The pseudocode of the algorithm of the pre-processing phase is reported below.
(1) Set n := n 0 (2) Set the minimal point data density The successive extraction process of the fuzzy association rules is composed by two sub-processes, schematized in Figure 5.In the first sub-process we use the AprioriGen algorithm for selecting the candidate fuzzy association rule and for choosing the antecedents with maximal dimension and support greater than or equal to the threshold sup ε .The AprioriGen algorithm is composed by two steps: the join and the pruning step.In the join step an itemset is generated and formed by k attributes merging two (k − 1)-itemsets having the same first (k − 2) attributes.In the pruning step all the elements not having all the first (k − 1) subsets as great are deleted.In the successive sub-process the fuzzy association rule is extracted if the grade of confidence ( 7) is greater than or equal to the threshold con ε and it is calculated via the inverse F-transform [31,35] given from the following formula (similar to (12)): F l(1)l(2)...l(k) obtained considering all the multi-dimensional F-transform components F l(1)l(2)•••l(k) of the combination of basic functions (A 1l(1) , ..., A kl(k) ) (l(i) = 1, . . ., n).If the confidence index ( 7) is greater or equal than the threshold con ε , the Ftransform component correspondent to the basic functions (A 1l(1) , ..., A kl(k) ) in the antecedent of the potential fuzzy association rule is used for determining the linguistic expression of the consequent ℵ of the final fuzzy association rule.If we obtain the same membership degree for two or more linguistic expressions, we assign the linguistic expression of the lowest fuzzy sets to ℵ according to the above partial ordering "≤."The related pseudo-code is reported below.
(  Qr small 100 50 0 0 0 (8) Next (9) Calculate the confidence index (7) and call it as "con" (10) If con ≥ con ε , the F-transform component F l(1)l(2)...l(k) is to be assigned to ℵ (11) End if (12) Insert the fuzzy association rule in the fuzzy association rule set (13) Next ( 14) Return the fuzzy association rule set In Section 5 we present some results obtained from datasets relative to the 2001 census ISTAT (Istituto Nazionale di Statistica) concerning the municipalities of the district of Naples (Italy).

A Simulation Result
We consider a first dataset of the last ISTAT census database of the municipalities of the district of Naples (Italy).This dataset is obtained by extracting the information about residents with job and families.We use the following notation: X 1 stands for census code, X 2 stands for the percentage of not employed, X 3 for the percentage of managers and professional men, X 4 stands for the percentage of women employed, X 5 stands for the percentage of graduate employed, X 6 stands for the percentage of residential houses of property, X 7 stands for the percentage of families with two or more houses of propriety, and finally X 8 stands for the percentage of families with more than two sons.In the preprocessing phase we set ρ ε = 5, obtaining n = 5 as optimal cardinality partition of each attribute.Each fuzzy partition is uniform and constructed by using five triangular fuzzy sets of the form (3). We set both values of sup ε and con ε to 0.1.The domain's expert has suggested the values reported in Table 3 for the parameters a, b, c.
After the extraction process we obtain four fuzzy association rules (cfr.Table 4).To verify the reliability of these results, we report the values (as percentages) of the confidence index given by ( 7), obtained for each basic function of the attribute X z , z = 2, 4, 6, 8, in the consequent of the extracted fuzzy association rules in Table 5.The linguistic expression in the consequents of the fuzzy association rules in Table 4 can be roughly interpreted as a mean of the fuzzy sets A z1 , . . ., A z5 weighted for the value of the confidence index.In Table 5 we report the percentages of the inclusion areas of each basic function with the fuzzy set associated to the linguistic expression in the consequent with respect to the area of the basic function.Each inclusion area is the area given from the intersection between the basic function and the fuzzy set.
From the comparison of Tables 5 and 6, the confidence index is approximately similar to the correspondent percentage of inclusion.In Figure 9 (resp., Figure 10) we show graphically the inclusion areas for the association rule R2 (resp., R3).The fuzzy set associated with the linguistic expression in the consequent is in orange colour.Then we can state that the fuzzy association rules extracted in Table 4 can be interpreted as a coarse-grained fuzzy association rules in which the linguistic expression in the consequent approximates a mean of finer fuzzy set given by the basic functions (3) for the attribute A z .This approximation depends clearly on the the values of a, b, c.
The next dataset consists of attributes describing characteristics of residential buildings and houses.The notation concerning the attributes is the following: X 1 stands for census code, X 2 stands for percentage of residential buildings constructed during the last 5 years, X 3 stands for percentage of residential buildings with maintenance during the last 5 years, X 4 stands for mean year of last maintenance, X 5 stands for mean number of residential houses, X 6 stands for mean surface of residential houses, X 7 stands for percentage of residential houses whit central heating.In the pre-processing phase we set ρ ε = 5, obtaining n = 6 as optimal cardinality partition of each attribute.In Figure 11 we show the six basic functions of the form (3) which give the uniform fuzzy partition of each context.After the extraction process we obtain three fuzzy association rules (cfr.Table 7).Then we calculate the confidence index (7) for all the antecedents of the fuzzy association rules extracted in Table 6.In Table 8 we report as percentages the values of the confidence index obtained for each basic function of the attribute X z in the consequent of the extracted fuzzy association rules.
In Table 9 we report the inclusion areas of each basic function with the fuzzy set associated to the linguistic expression in the consequent with respect to the area of the basic function.
By comparing Tables 8 and 9, we can state that also for this dataset the linguistic expression in the fuzzy association rules can be roughly interpreted as a mean of the fuzzy sets A z1 , . . ., A z6 weighted from the value of the confidence index.In Figure 12 (resp., Figure 13) we show graphically the inclusion areas for the association rule R1 (resp., R2).The fuzzy set associated with the linguistic expression in the consequent is represented in orange colour.
The results confirm that the F-transforms can be used to extract fuzzy association rules in the form (2) in a coarsegrained view from datasets.The comparison of the results suggests that the linguistic evaluation used for the attribute A z in the consequent and calculated, using the inverse Ftransform, can be estimated as a weighted mean of the finer fuzzy sets composed from basic functions (4), where the weights are given from confidence index (7).

Conclusions
We propose the usage of multi-dimensional F-transforms which allow to extract fuzzy association rules from datasets in a coarse-grained form.Our approach allows always to control that the set of the assigned points is sufficiently dense respect to the basic functions of the partition and we use the support and confidence indexes for selecting and analyzing fuzzy association rules.
This method can be used in data mining processes in which a fine exploration of fuzzy association rules between attributes in the datasets is not necessary.In a future work the authors intend to explore the performances of these methods for very large datasets and compare the results with the ones obtained by using other well-known existing methods such as clustering-and evolutionary-based ones.

Figure 3 :
Figure 3: Example of correct uniform fuzzy partition set.

Figure 4 :
Figure 4: Schema of the pre-processing phase.

Figure 5 :
Figure 5: Extraction process of fuzzy association rules.

Figure 9 :
Figure 9: Inclusion areas for the fuzzy association rule R2.

Figure 10 :
Figure 10: Inclusion areas for the fuzzy association rule R3.

Figure 11 :
Figure 11: Basic functions used for the residential buildings dataset.

Figure 12 :
Figure 12: Inclusion areas for the fuzzy association rule R1.

Figure 13 :
Figure 13: Inclusion areas for the fuzzy association rule R2.

Table 1 :
Schema of a dataset.
two disjoint sets of attributes S = {X 1 , . . ., X k } and T = {X z }, is used in the form:

Table 3 :
Parameters for fuzzy sets associated to the linguistic expressions.

Table 4 :
Fuzzy rules extracted from the resident dataset.

Table 5 :
Percentages of the confidence index for the fuzzy association rules.

Table 6 :
Percentages of the confidence index for the fuzzy association rules.

Table 7 :
Fuzzy extraction rules extracted from the residential buildings dataset.X 5 is A 52 ) and (X 6 is A 65 )

Table 8 :
Percentages of the confidence index for the fuzzy association rules.

Table 9 :
Percentages of inclusion obtained for the fuzzy association rules.