Granular Computing Classification Algorithms Based on Distance Measures between Granules from the View of Set

Granular computing classification algorithms are proposed based on distance measures between two granules from the view of set. Firstly, granules are represented as the forms of hyperdiamond, hypersphere, hypercube, and hyperbox. Secondly, the distance measure between two granules is defined from the view of set, and the union operator between two granules is formed to obtain the granule set including the granules with different granularity. Thirdly the threshold of granularity determines the union between two granules and is used to form the granular computing classification algorithms based on distance measures (DGrC). The benchmark datasets in UCI Machine Learning Repository are used to verify the performance of DGrC, and experimental results show that DGrC improved the testing accuracies.


Introduction
Granular computing (GrC) is computing method based on the partition of problem space and is widely used in pattern recognition, information system, and so forth. Zadeh identified three fundamental concepts of the human cognition process, namely, granulation, organization, and causation [1,2]. Granulation is a process that decomposes a universe into parts. Conversely, organization is a process that integrates parts into a universe by introducing operation between two granules. Causation involves the association of causes and effects. Information granules based on sets, fuzzy sets or relations, and fuzzy relations are computed in [3]. In general, the fuzzy inclusion measure is induced by granule and union granule, such as the positive valuation functions of granules that are used to form the fuzzy inclusion measure [4][5][6]. But there are some problems; for example, the fuzzy inclusion measure between two atomic granules is zero no matter how far between two atomic granules is. These studies enable us to map the complexities of the world around us into simple theories. GrC based algebraic system is a frame computing paradigm that regards the set of objects as granule, and the union operator and meet operator are the two keys of GrC. The union operator and meet operator are related to the shapes of granule. There are granules with different shapes, such as hypersphere granules, hypercube granules, hyperdiamond granules, and hyperbox granules.
The present work uses distance measure between granules with the same shapes from the view of set. A granule is represented as a vector, and the distance between granules is defined by the centers of granules and the granularities, such as the half length of hyperdiamond diagonal, the radii of hypersphere, the half length of hypercube side, and the length of hyperbox diagonal. The granular computing classification algorithms based on distance measure (DGrC) are proposed.
The rest of this paper is presented as follows. Granular computing classification algorithm based on distance measure is described in Section 2. Section 3 demonstrates the comparative experimental results on two-class and multiclass problems. Section 4 summarizes the contribution of our work and presents future work plans.

Granular Computing Classification Algorithm Based on Distance Measure
For the dataset = {( , ) | = 1, 2, . . . , } in -dimensional space, we construct granular computing classification algorithms (GrC) in terms of the following steps. Firstly, the 2 Computational Intelligence and Neuroscience single point in is represented as the atomic granules which are indivisible. Secondly, the distance between two granules is proposed based on the view of set. Thirdly, the distance and granularity determine the union process jointly. Finally, the granule set is obtained and used to predict the class of unknown datum.

Representation of Granule and Granularity.
In reality, the shapes of granules are irregular, the distance between two granules is not easily measured, the union granule, and the meet granule are related to the shapes of granules. In order to study granular computing, the granule is represented as regular shapes, such as hyperdiamond, hypersphere, hypercube, and hyperbox, especially diamond, sphere, cube, and box in 2dimensional space. These four shape granules are represented as follows.
(1) Hyperdiamond granule is represented as a vector including the hyperdiamond's center and the half of diagonal length.
(2) Hypersphere granule is represented as a vector including the center and the radii of the hypersphere.
(3) Hypercube granule is represented as a vector including the center and the half of side length of the hypercube.
(4) Hyperbox granule is represented as a vector including vectors induced the beginning points and the end points.
Granularity is the size of granule, such as the half of diagonal length of hyperdiamond granule, the radii of hypersphere granule, the half of side of hypercube granule, and the maximal diagonal of hyperbox. The granularity of granule is represented as ( ).
For hyperdiamond granule where is the half of diagonal length of hyperdiamond granule. For hypersphere granule where is the radii of hypersphere. For hypercube granule where is the half of side of hypercube. The granularity of hyperbox granules is defined as the distance between the beginning point and the end point. For hyperbox granule = (x, y), the granularity is the distance In Figure 1, 1 = (0.1, 0.2, 0.5) is hyperdiamond granule in 2 space, whose center is (0.1, 0.2) and granularity is 0.5.

Distance
Measure between Granules. The distance between granules refers to the minimal distance between two points which belong to different granules.
Computational Intelligence and Neuroscience x = x 1 ∨ x 2 , y = y 1 ∧ y 2 are operators between two vectors and defined as According to the distance between two granules mentioned above, the distance between two granules is the arbitrary real number. There is margin between two granules when > 0, there is a same point between two granules when = 0, and there is an overlap between two granules when < 0. When > 0, the greater means the greater margin between two granules, and when < 0, the greater means the smaller overlap. Figure 2 shows the distance between two granules, including < 0, = 0, and > 0.

Operators between Granules.
Any points are regarded as atomic granules which are indivisible; the union process is the key to obtain the larger granules compared with atomic granules. Likewise, the whole space is a granule with the maximal granularity; the decomposition process is the key to divide the lager granules into smaller granules.
For two hyperbox granules 1 = (x 1 , y 1 ) and 2 = (x 2 , y 2 ), the union hyperbox granule is We explain the union process between granules in Figure 3   is the training algorithm and the second algorithm is the testing algorithm. For training set TS, the training granular computing classification algorithms are proposed by the following steps. Firstly, the samples are used to form the atomic granule. Secondly, the threshold of granularity is introduced to conditionally unite the atomic granules by the aforementioned union operator, and the granule set is composed of all the union granules. Thirdly, if all atomic granules are included in the granules of GS, the union process is terminated, otherwise, the second process is continued. The training algorithm is described as follows.
Suppose that the atomic granules with the same class labels induced by TS are 1 , 2 , 3 , 4 , and 5 . The training algorithm can be described as the following tree structure in Figure 4; leafs denote the atomic granules, root denotes GS including its child nodes 2 and 3 , 1 is induced by union operation of child nodes 1 and 2 , 2 is the union granule of 1 and 3 , and 3 is the union granule of 4 and 5 . The whole process of obtaining GS is the bottle up process.
The threshold , which is the cut of granularity induced by formulas (1a)-(1d) for the different shapes of granules, is selected in descending order. The larger means the granule set induced by Algorithm 1 including the larger granules, conversely; the smaller means the granule set induced by Algorithm 1 including the smaller granules. For the same training set, the smaller means the induced granule set including more granules compared with the larger .
The purpose of training algorithm is to obtain the granule set and the corresponding class lab, which are used to predict the class label of an unknown datum. The testing data 6 Computational Intelligence and Neuroscience Input: Training set TS, threshold of granularity, the class number Output: Granule set GS, the class label lab (S1) initialize the granule set GS = ⌀, lab = ⌀ (S2) = 1 (S3) select the samples with class , and form set (S31) initialize the granule set GSt = ⌀ (S32) = 1 (S33) for the th sample in , form the corresponding atomic granule (S34) = 1 (S35) by formulas (2), (4), (6), and (8), compute the distance between the atomic granule and the th granule in GSt (S36) = + 1 (S37) find the minimal distance (S38) form the union granules by formulas (10) including multiple data and their class labels are used to form the testing set, which is used to verify the performance of granular computing algorithms. If the prediction class labels of the testing data are same as the real class labels, the testing data are classified. Otherwise, the testing data are misclassified. The classification accuracy is one of the performances of granular classification algorithms. The testing algorithm is described as Algorithm 2.

Experiments
We evaluated the effectiveness of DGrC on both two-class and multiclass problems using Intel PIV PC with 2.8 GHz CPU and 2 GB memory, running Microsoft Windows XP Professional, and Matlab 7.0. We mainly analyze and discuss DGrCs with different shape granules from training accuracy (Tr (%)), testing accuracy (generalization ability) (Ts (%)), training time (Tr (s)), and testing time (Ts (s)).

Two-Class Problems.
The spiral classification is a difficult problem to be classified and is used to evaluate the performance of classifiers. The training data are generated by the method proposed in [7]. The training set and the testing set in reference [8] are used to evaluate the performance of GrC.
The threshold of granularity is from 0.2 to 0 with step 0.001; the maximal testing accuracy is the selection indicator of optimization algorithms. Performances of GrC with four kinds of shape are listed in Table 1. The training data and their granules were shown in Figure 5 in which the single points are the atomic granules. From the table, we saw that GrC with hypersphere granules achieved the optimization performance because of the minimal size of GS including 88 granules when = 0.094, GrC with hypercube granules is poor because of maximal size of GS including 99 granules when = 0.079, and GrC with hyperdiamond granules touched the Input: inputs of unknown datum , granule set GS, the class label lab Output: class label of (S1) is represented as granule (S2) for = 1: |GS| (S3) compute the distance between and in GS (S4) find the minimal distance (S5) find the corresponding class label of the as the label of Algorithm 2: Testing algorithm.
best testing accuracy firstly. The training time and testing time are related to the size of granule set GS, so the granular computing classification algorithms with the minimal size of granule set are our pursuits in the same conditions for the maximal test accuracy.

Multiclass Problems.
For multiclass problems, datasets listed in Table 2 are selected from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) to test DGrC. They are wall-following robot navigation data (sensor2, sen-sor4, and sensor24) which are divided into training data and testing data at random, optical recognition of handwritten digits (optdigits) including training data and testing data, pen-based recognition of handwritten digits (pendigits) including training data and testing data, letter recognition (letter) which is divided into training data and testing data, and shuttle including training data and testing data. These datasets are used to verify the performances of DGrC from the aspects of size, Tr (%), Ts (%), Tr (s), and Ts (s) (see Table 3).   For the selected datasets, the optimal testing accuracies are 98.0769% (sensor2), 90.8691% (sensor4), 83.0220% (sen-sor24) 97.997% (optdigits), 97.799% (pendigits), 94.765% (letter), and 99.883% (shuttle) by KNN algorithms. We selected the optimal parameters that maximized the testing accuracy. DGrCs with 4 shapes are performed in the same environment, and the performance is listed in Table 3

Conclusions
The granular computing classification algorithms with different shape granules are proposed based on distance measures in the paper. Firstly, a training datum is represented as an atomic granule. Secondly, the distance measure between granules is form based on the centers and granularities of granules. Thirdly, the training process is constructed based on the union operator and the threshold of granularity jointly. Finally, the proposed granular computing classification algorithms are demonstrated by the dataset selected from references. DGrC is affected by the sequence of the training data the same as the other granular computing. For the future work, we will focus on the adaptive selection of threshold of granularities and apply the granular computing to image segmentations.