Fuzzy Lattice Reasoning for Pattern Classification Using a New Positive Valuation Function

This paper describes an enhancement of fuzzy lattice reasoning (FLR) classifier for pattern classification based on a positive valuation function. Fuzzy lattice reasoning (FLR) was described lately as a lattice data domain extension of fuzzy ARTMAP neural classifier based on a lattice inclusion measure function. In this work, we improve the performance of FLR classifier by defining a new nonlinear positive valuation function. As a consequence, the modified algorithm achieves better classification results. The effectiveness of the modified FLR is demonstrated by examples on several well-known pattern recognition benchmarks.


Introduction
Much attention has been paid lately to applications of lattice theory [1] in different fields including neural networks [2].Artificial neural networks whose computation is based on lattice algebra have become known as morphological neural networks [3,4].Lattices are popular in mathematical morphology including image processing applications [5,6].Moreover, algebraic lattices have been used for modeling associative memories [7].In [8], the problem of capacity storage limitation in associative memories [9,10] has been eliminated by proposing one-way and bidirectional lattice associative memories.Furthermore, lattices are used implicitly in some neural networks such as fuzzy ART and min-max [11,12] as explained in [2,13].A practical advantage of lattice theory is the ability to model both uncertain information and disparate types of lattice-ordered data [14].The term of a fuzzy lattice was proposed by Nanda in 1989 on the basis of the concept of a fuzzy partial-order relation [15].Several authors have employed the notion "fuzzy lattice" in mathematics emphasizing algebraic properties of lattice ideals [16,17].Furthermore, the notion of fuzzy concept lattice has been studied in [18][19][20].Sussner and Esmi [21] have introduced the morphological perceptron with a fusion of fuzzy lattices for competitive learning.Fuzzy lattices have also been used in clustering and classification algorithms.More specifically, independently from the development of morphological neural networks, Petridis and Kaburlasos [13] have found inspiration in lattice theory and versions of the ART model and have devised another successful approach to lattice-based computational intelligence.Hence, they proposed a fundamentally new and inherently hierarchical approach in neuron computing named fuzzy lattice neurocomputing (FLN) [14].Moreover, fuzzy lattice reasoning (FLR) classifier was announced for inducing descriptive, decision-making knowledge (rules) in a mathematical data domain including space R N , and it has been successfully applied to a variety of problems such as ambient ozone estimation [22] as well as air quality assessment [23].Decision making in FLR is based on an inclusion measure function; moreover, the definition of an inclusion measure is based on a positive valuation function.
The original FLR model employs a linear positive valuation function to define an inclusion measure.Liu et al. [24] proposed a nonlinear valuation function (arctan) for computing the inclusion measure function and successfully applied it to several data set benchmarks.
In this work, we apply FLR algorithm to solve pattern classification problems without feature extraction and improve its performance based on a new nonlinear positive valuation function.As a consequence, the modified algorithm achieves better classification results.The effectiveness of the modified FLR is demonstrated by examples on several well-known benchmarks.
The layout of this paper is as follows.In Section 2, the mathematical background of fuzzy lattices is reviewed.Section 3 explains modified fuzzy lattice reasoning classifier model.Section 4 provides experimental results that demonstrate the performance of modified FLR.Finally, Section 5 summarizes the results of this work.

Mathematical Background
A lattice (L, ≤) is a partially ordered set (or, simply, poset) such that any two of its elements a, b ∈ L have a greatest lower bound a ∧ b = inf{a, b} and a least upper bound a ∨ b = sup{a, b}.The lattice operations ∧ and ∨ are also called meet and join, respectively.A lattice (L, ≤) is called complete when each of its subsets has a least upper bound and a greatest lower bound in L [1].A nonvoid complete lattice has a least element and a greatest element denoted by O and I, respectively.The inverse ≥ of an order relation ≤ is itself an order relation.The order ≥ is called the dual order of The lattice operations meet and join of product lattice are defined as follows: A valuation on a crisp lattice L is a real-valued function and positive if and only if a < b implies v(a) < v(b).We remark that the goal of positive valuation function v is to deal with lattice elements.Choosing a suitable valuation function is problem dependent.Decision making by the FLR is based on an inclusion measure function; therefore,a proper positive valuation function might improve performance.Definition 1.An inclusion measure σ with least element O and great element I in a complete lattice L is a mapping σ : L × L → [0, 1] such that it satisfies the following conditions [2]: ), for all a, b, c ∈ L (consistency property).
It reveals that an inclusion measure indicates the degree to which one fuzzy set is contained in another one.

Theorem 2. A positive valuation function
0 is a sufficient condition for two inclusion measures [2]: ( In our experiments, the data have been normalized in lattice L = [0, 1] N , that is, the unit N-dimensional hypercube, where N is the dimension of the input data.Furthermore, we propose the following nonlinear positive valuation function: where γ is called location parameter.Without loss of generality, let a = (a 1 , . . .
Furthermore, the proposed valuation function is a strictly increasing function, thus for any a, b The aforementioned valuation function operates in a more flexible manner compared with other valuations proposed in the literature.First, the performance of FLR can be optimized by selecting different values of the location parameter.Second, if the first variable x is assumed to be constant, then it will be converted to v(x) = ln(x + γ), in the space R, and in some special applications, it might be a proper valuation function.One may say that it does not satisfy the condition v(O) = 0, in this case, it can be defined as follows [14]: (5) A fuzzy lattice is a pair (L, μ) where (L, ≤) is a crisp lattice and (L × L, μ) is a fuzzy set with membership function μ : Note that given a lattice L, for which a positive valuation function v : L → R can be defined with v(O) = 0, then both (L, k) and (L, s) are fuzzy lattices [2].
Consider the set R of real numbers.It turns out that (R = R ∪ {−∞, +∞}, ≤) under the inequality relation ≤ between a, b ∈ R is a complete lattice with the least element −∞ and the greatest element +∞ [25].
An isomorphic function ϕ from poset As a consequence, the degree of inclusion of an interval in another one in lattice (τ O (L), ≤) is computed as follows [25]: , the following inclusion measure between two intervals A and B is defined:

FLR Model
This section presents a classifier for extracting rules from the input data based on fuzzy lattices.One of FLR important properties is the ability of dealing with disparate type of data, including real vectors, fuzzy sets, symbols, graphs, images, waves, and even any combination of the aforementioned data and this shows the ability of FLR in combining different types of data.Furthermore, FLR can handle both complete and noncomplete lattices, and it can cope with both points and intervals.Moreover, stable learning is carried out both incrementally and fast in a single pass through the training data.In some applications, we might face with "missing" or "do not care" data.In this case, FLR can manage "missing" and "do not care" data by replacing them with least element O and great element I, respectively.For example, if the constituent lattice is ([0, 1], ≤ ), then we can replace "missing" and "do not care" data by intervals of O = [1, 0] and I = [0.1],respectively [13,14,22].
It should be mentioned that an input datum to the FLR classifier (model) is represented as (a i , C K ) where C K is the class label of the datum a i , and it can be interpreted as a rule "if a i then C k ."We remark that a single real number a ∈ R corresponds to the trivial interval [a, a].Learning and generalization in FLR is based on the computation of hyperboxes in space R N , that is, a rule induced by FLR corresponds to an N-dimensional hypercube.
Suppose a knowledge base KB = {(a 1 , C 1 ), . . ., (a c , C c )} is given.KB can be empty at first.Decision making in FLR is based on an inclusion measure.During learning phase, when an input datum (a 0 , C 0 ) is presented to the network, the degree of inclusion between input and each stored rules in KB will be calculated as k(a 0 , a 1 ), . . ., k(a 0 , a c ), respectively.The fuzzy lattice reasoning classifier will choose the rule with arg max i∈{1,...,c} {k(a 0 , a i )} as the winner.If the winner rule a J and input datum a 0 have the same class label and the size of (a 0 ∨ a J ), denoted by Z, is less than a user defined threshold, then the winner rule will be updated.Note that the size of an interval [a, b] Otherwise, this process is repeated; if no more rules are left, then the input datum (a 0 , C 0 ) will be a new member of KB.Algorithm for training is described in Algorithm 1.
Note that ρ is the threshold size which is used to specify the maximum size of a hyperbox to be learned.The decision boundaries which can be formed by FLR endowed with logarithmic valuation function are illustrated in the following example.
Example 3. The Simpson benchmark is a two-dimensional data set consisting of 24 points which is used for testing the performance of a clustering algorithm [26].This is a perceptual grouping problem in vision, which deals with the detection of the right partition of an image into subsets [27].We have divided the data into three classes.
Figure 2 shows the decision surfaces of the FLR endowed with proposed logarithmic valuation function without any misclassified data.Here, due to lack of space, we have classified the data four times for location parameter (γ) equal to γ = 1 and with four different vigilance parameter ρ values.Note that the size of a hyperbox is tuned by the vigilance parameter ρ; more specifically, larger values of ρ result in more hyperboxes.
As it was said in the previous section, one of the FLR properties is the ability of knowledge representation.Indeed, FLR is capable of extracting implicit features beyond the data and represents them as rules.Each rule is represented as if (a 1 AND • • • AND a M ) then C i where a j , j = 1, . . ., M are attributes, each one corresponding to an interval, and C i , i = 1, . . ., c are class labels.Table 1 shows three induced rules corresponding to Figure 2(d).

Benchmark Dataset Description.
In this section, we evaluate the classification performance of the optimized FLR in a series of experiments on six well-known benchmarks.
4.1.1.Object Recognition.We evaluate the classification performance of the FLR model using images of the Columbia Image database [28].Columbia object image library (COIL-100) is a database of color images of 100 objects.We selected the 10 objects from the dataset shown in Figure 3.The objects were placed on a motorized turntable against a black background.The turntable was rotated through 360 • to vary object pose with respect to a fixed color camera.Images of the objects were taken at pose intervals of 5 • .This corresponds to 72 poses per object.The images were size normalized.There are 720 128 × 128-dimensional instances divided into 10 separate classes; 72 for each class; only six randomly selected instances per each object were sufficient for whole training set, and the remaining patterns were used for testing set.The aim is the correct classification of the testing data to their corresponding classes.

Image Segmentation. The image segmentation data set was donated by the Vision Group, University of Massachusetts, and is included in the Machine Learning
Repository of the University of the California, Irvine [29].The Image Segmentation data set consisted of data relating numerous analyses of the colors in subdivided images to the type of surface in the image.Each image was divided up into small subsections, each of which comprised one point of data.Each data point was composed of 18 different attributes, including one that determined what the image was of: brick face, foliage, grass, sky, window, concrete, and dirt.This data set consists of 210 samples for training and 2100 samples for testing.The goal is to distinguish between seven different classes.

Pen-Based Recognition of Handwritten Digits.
The Penbased recognition of handwritten digits dataset was taken from the UCI repository of machine learning databases [29].It was created by collecting 250 digit samples from 44 writers.The data have 16 continuous attributes distributed in 10 separated classes.A training set is given explicitly.For a faster simulation, we have resized the training set by selecting randomly six instances per each class.Distribution of digits between 0 and 9 in the dataset is shown in Figure 4.

Letter Recognition.
The letter recognition benchmark was employed from the UCI repository of machine learning databases [29].The data set consists of 20,000 unique letter images generated by randomly distorting pixel images of the 26 uppercase letters from 20 different commercial fonts.The parent fonts represented a full range of character types including script, italic, serif, and gothic.The features of each of the 20,000 characters were summarized in terms of 16 primitive numerical attributes.A training set is not given explicitly.We have divided all the data into a training set S0.The first input (a 0 , C 0 ) is memorized.At an instant, there are c Known Classes C 1 , . . ., C c memorized in the memory, initially c = 0. S1.Present the next input (a i , C k ), i = 1, ..., m to the initially "set" family of rules.S2.If no rules are "set" then Store input (a i , C K ), c = c + 1, Go to S1. Else Compute k(a 0 , a i ), i = 1, . . ., c of the "set" rules.S3.Competition among the "set" rules: Winner is rule (a J , C J ) such that J = arg max {k(a 0 , a i )}, i = 1, . . ., c .

S4. The Assimilation Condition:
Both Z(a i ∨ a J ) ≤ ρ and C i = C J .

S5. If the Assimilation Condition is satisfied then
Replace a J by a 0 ∨ a J .Else "reset" the winner (a J , C J ), Go to S2.      consisting of 10 percent of patterns which have been selected randomly and a testing set consisting of the remaining patterns.Examples of the character images are presented in Figure 5.

Semion Hand Recognition.
The semion hand recognition benchmark was taken from the UCI repository of machine learning databases [29].This dataset consists of 1593 handwritten digits from around 80 persons was scanned and stretched in a rectangular box 16 × 16 in a gray scale of 256 values.Then each pixel of each image was scaled into a boolean value using a fixed threshold.Each person wrote on a paper all the digits from 0 to 9, twice.The commitment was to write the digit the first time in the normal way and the second time in a fast way.We have used 10 percent of data for whole training set.
4.1.6.Optical Recognition of Handwritten Digits.The optical recognition of handwritten digits benchmark was employed from the UCI repository of machine learning databases [29].In this data set, 32 × 32 bitmaps of handwritten digits from a total of 43 people are divided into nonoverlapping blocks of 4 × 4, and the numbers of on pixels are counted in each block.This generates an input matrix of 8 × 8 where each element is an integer in the range of 0 to 16.This reduces dimensionality and gives invariance to small distortions.Training and testing sets are given explicitly including 3823 and 1797 64-dimensional samples.For a faster simulation, the way we have used the data set was to employ 10 percent of the training set for actual training.
Table 2 shows briefly the characteristic of the selected benchmark data sets.

Experiments and Results
. In order to provide a meaningful comparison, all the algorithms have been implemented in the same environment using the C++ object-oriented programming language, the same partitioning of data sets for training and testing, the same order of input patterns, and a full range of parameters, and we have employed the isomorphic function θ(x) = 1 − x.Furthermore, all the Ndimensional data have been normalized into space [0, 1] N by the function x norm = (x − x min )/(x max − x min ), where x min and x max stand for the least and the greatest attribute values, respectively, in a data dimension.In this work, the FLR algorithm endowed with linear valuation function x, nonlinear valuation arctan, and nonlinear logarithmic valuation function is denoted, respectively, by FLR x , FLR a , and FLR l .To compare the learning capability, Table 3 shows the comparison of the experimental results of FLR l with the ones produced by the FLR x and FLR a , the SOM [30], fuzzy ART [11], and GRNN [31].
In all our experiments in order to achieve the best performance, we have considered GRNN for different values of variance parameter between 0 and 0.5 in steps of 0.001.For fuzzy ART, we have set the choice parameter to 0.01, and the values of vigilance and learning parameters have been adopted between 0 and 1 in steps of 0.01.Computational experiments for the SOM algorithm have been carried out using M × M (M = 1, . . ., 10) grids of units and number of 100 epochs.Since the results produced by SOM depend on the initialization of the weights, we have chosen the weights that yielded best results on the testing set for 10 random initializations.The only parameter for FLR algorithm that should be tuned is the threshold size parameter ρ.In our simulations, the size of ρ was set from 0.01 up to N, where N is the dimension of the input data, in steps of 0.01 except for object recognition data set, we have set ρ in steps of 50 due to high-dimensional input data.Just for FLR l , the location parameter γ should be tuned too.We have set γ between 1 and 50 in steps of 1.
Table 3 cites the classification accuracy and ranking of different methods for each benchmark.In other words, each table cell, which belongs to a specific learning algorithm and the data set, contains the percentages of correct classification of that model over the corresponding data set.The number in brackets in each table cell shows the ranking of each method after running on a specific data set.The best results have been shown in bold face.
As can be seen in Table 3, FLR l has obtained acceptable results in comparison with other methods, and in five cases, it gets the first rank.In Table 4, the average classification accuracy on all data sets for each of the different learning algorithms along with the new relative obtained ranks has been shown.In other words, first the average of each column of the previous table has been calculated, then the corresponding ranking is shown within brackets.As it can be seen from among all the methods, FLR l has achieved the best ranking among all other methods.
In Table 5, this comparison has been made according to the sum of the ranks available in Table 3 per each column.Although this quantity is of lower precision degree for reporting results in some cases, it is common in nonparametric statistics.As can be seen in this table, FLR l , FLR x , and FLR a get first, second, and third rankings, respectively.It should be pointed out that although there is no universal learning algorithm that can get the best results on the all benchmarks, the results obtained by FLR l confirm that our proposed model is an efficient classifier compared with established classifiers from the literature.

Conclusion
In this work, we introduced an improvement of fuzzy lattice reasoning (FLR) classifier using a new nonlinear positive valuation function.We have investigated the performance of new FLR model in several well-known classification problems.Experimental results demonstrated that our proposed methods outperformed established classification models in terms of classification accuracy on the testing data.

Figure 1 (
Figure 1(a) plots the positive valuation function v(x) = ln(x + γ), whereas Figure 1(b) plots v(x) = x ln(x + γ) for γ = 1.A lattice (L, ≤) is totally ordered if and only if for any a, b ∈ L, either a ≥ b or a < b.The lattice ([0, 1] N , ≤) under inequality relation is not a totally ordered lattice.A fuzzy lattice is a pair (L, μ) where (L, ≤) is a crisp lattice and (L × L, μ) is a fuzzy set with membership function μ :L×L → [0, 1] such that μ(a, b) = 1 ⇔ a ≤ b.Note that given a lattice L, for which a positive valuation function v : L → R can be defined with v(O) = 0, then both (L, k) and (L, s) are fuzzy lattices[2].Consider the set R of real numbers.It turns out that (R = R ∪ {−∞, +∞}, ≤) under the inequality relation ≤ between a, b ∈ R is a complete lattice with the least element −∞ and the greatest element +∞[25].For lattice (L, ≤), we define the set of (closed) intervals as τ(L) = {[a, b] | a, b ∈ L and a ≤ b}.We remark that (τ(L), ≤) is a lattice with the ordering relation, lattice join and meet defined as follows[25]:

Figure 2 :
Figure 2: Decision boundaries generated by the modified FLR with four different threshold parameters.

Figure 3 :
Figure 3: Ten objects used to train the networks.

Figure 5 :
Figure 5: Examples of the character images.

Table 1 :
Three induced rules generated by the modified FLR.

Table 2 :
Characteristics of 6 used data sets.

Table 3 :
Recognition results along with relative ranking by different methods over 6 benchmarks.

Table 4 :
Average classification accuracy on the entire data sets.

Table 5 :
Sum of ranking of Table3, on each column.