Granular computing classification algorithms are proposed based on distance measures between two granules from the view of set. Firstly, granules are represented as the forms of hyperdiamond, hypersphere, hypercube, and hyperbox. Secondly, the distance measure between two granules is defined from the view of set, and the union operator between two granules is formed to obtain the granule set including the granules with different granularity. Thirdly the threshold of granularity determines the union between two granules and is used to form the granular computing classification algorithms based on distance measures (DGrC). The benchmark datasets in UCI Machine Learning Repository are used to verify the performance of DGrC, and experimental results show that DGrC improved the testing accuracies.
Granular computing (GrC) is computing method based on the partition of problem space and is widely used in pattern recognition, information system, and so forth. Zadeh identified three fundamental concepts of the human cognition process, namely, granulation, organization, and causation [
GrC based algebraic system is a frame computing paradigm that regards the set of objects as granule, and the union operator and meet operator are the two keys of GrC. The union operator and meet operator are related to the shapes of granule. There are granules with different shapes, such as hypersphere granules, hypercube granules, hyperdiamond granules, and hyperbox granules.
The present work uses distance measure between granules with the same shapes from the view of set. A granule is represented as a vector, and the distance between granules is defined by the centers of granules and the granularities, such as the half length of hyperdiamond diagonal, the radii of hypersphere, the half length of hypercube side, and the length of hyperbox diagonal. The granular computing classification algorithms based on distance measure (DGrC) are proposed.
The rest of this paper is presented as follows. Granular computing classification algorithm based on distance measure is described in Section
For the dataset
In reality, the shapes of granules are irregular, the distance between two granules is not easily measured, the union granule, and the meet granule are related to the shapes of granules. In order to study granular computing, the granule is represented as regular shapes, such as hyperdiamond, hypersphere, hypercube, and hyperbox, especially diamond, sphere, cube, and box in 2-dimensional space. These four shape granules are represented as follows. Hyperdiamond granule is represented as a vector including the hyperdiamond’s center and the half of diagonal length. Hypersphere granule is represented as a vector including the center and the radii of the hypersphere. Hypercube granule is represented as a vector including the center and the half of side length of the hypercube. Hyperbox granule is represented as a vector including vectors induced the beginning points and the end points.
Granularity is the size of granule, such as the half of diagonal length of hyperdiamond granule, the radii of hypersphere granule, the half of side of hypercube granule, and the maximal diagonal of hyperbox. The granularity of granule
For hyperdiamond granule
For hypersphere granule
For hypercube granule
The granularity of hyperbox granules
In Figure
Granules with different shapes in
The distance between granules refers to the minimal distance between two points which belong to different granules.
For two hyperdiamond granules
For two hypersphere granule
For hypercube granules
For two hyperbox
According to the distance between two granules mentioned above, the distance between two granules is the arbitrary real number. There is margin between two granules when
Distances between two granules in
Any points are regarded as atomic granules which are indivisible; the union process is the key to obtain the larger granules compared with atomic granules. Likewise, the whole space is a granule with the maximal granularity; the decomposition process is the key to divide the lager granules into smaller granules.
For two hyperdiamond granules
For two hypersphere granules
For two hypercube granules
For two hyperbox granules
We explain the union process between granules in Figure
Unions between two granules. The union granules are represented as the red lines.
The granular computing classification algorithms include two algorithms, the first algorithm is the training algorithm and the second algorithm is the testing algorithm.
For training set
Suppose that the atomic granules with the same class labels induced by
The training process of TS including 5 samples.
The threshold
Input: Training set TS, threshold Output: Granule set GS, the class label lab (S1) initialize the granule set GS = (S2) (S3) select the samples with class (S31) initialize the granule set GSt = (S32) (S33) for the (S34) (S35) by formulas ( (S36) (S37) find the minimal distance (S38) form the union granules by formulas ( formulas ( the granule (S39) remove (S4) GS = GS (S5) if
The purpose of training algorithm is to obtain the granule set and the corresponding class lab, which are used to predict the class label of an unknown datum. The testing data including multiple data and their class labels are used to form the testing set, which is used to verify the performance of granular computing algorithms. If the prediction class labels of the testing data are same as the real class labels, the testing data are classified. Otherwise, the testing data are misclassified. The classification accuracy is one of the performances of granular classification algorithms. The testing algorithm is described as Algorithm
Input: inputs of unknown datum class label lab Output: class label of (S1) (S2) for (S3) compute the distance (S4) find the minimal distance (S5) find the corresponding class label of the label of
We evaluated the effectiveness of DGrC on both two-class and multiclass problems using Intel PIV PC with 2.8 GHz CPU and 2 GB memory, running Microsoft Windows XP Professional, and Matlab 7.0. We mainly analyze and discuss DGrCs with different shape granules from training accuracy (Tr (%)), testing accuracy (generalization ability) (Ts (%)), training time (Tr (s)), and testing time (Ts (s)).
The spiral classification is a difficult problem to be classified and is used to evaluate the performance of classifiers. The training data are generated by the method proposed in [
The threshold
Performance of GrC with different shape granules.
Shapes |
|
Size | Tr (%) | Ts (%) | Tr (s) | Ts (s) |
---|---|---|---|---|---|---|
Hyperdiamond | 0.1 | 97 | 100 | 100 | 0.35938 | 0.015625 |
Hypersphere | 0.094 | 88 | 100 | 100 | 0.3125 | 0.015625 |
Hypercube | 0.079 | 99 | 100 | 100 | 1.4063 | 0.03125 |
Hyperbox | 0.095 | 97 | 100 | 100 | 5.2656 | 0.0326 |
Spiral classification problem and GS (a) hypersphere granules, (b) hypercube granules, (c) hyperdiamond granules, and (d) hyperbox granules.
For multiclass problems, datasets listed in Table
Multiclass problems.
Data sets | Inputs | Outputs | Training size | Testing size |
---|---|---|---|---|
Sensor2 | 2 | 4 | 3636 | 1820 |
Sensor4 | 4 | 4 | 3638 | 1818 |
Sensor24 | 24 | 4 | 3636 | 1820 |
Optdigits | 64 | 10 | 3823 | 1797 |
Pendigits | 16 | 10 | 7494 | 3498 |
Letter | 16 | 26 | 13333 | 6667 |
Shuttle | 9 | 7 | 43500 | 14500 |
Performance of DGrC on multiclass problems.
Data sets | Shapes |
|
Size | Tr (%) | Ts (%) | Tr (s) | Ts (s) |
---|---|---|---|---|---|---|---|
Sensor2 | Hyperdiamond | 0.005 | 948 | 99.67 | 98.3516 | 1.1875 | 0.2031 |
Hypersphere | 0.0115 | 338 | 99.3674 |
|
1.2031 | 0.1406 | |
Hypercube | 0.01 | 365 | 99.2299 | 98.1319 | 3.8125 | 0.0938 | |
Hyperbox | 0.004 | 1560 | 99.945 | 98.297 | 1.2031 | 0.71875 | |
|
|||||||
Sensor4 | Hyperdiamond | 0.0195 | 974 | 99.5052 |
|
1.1563 | 0.2344 |
Hypersphere | 0.0085 | 1387 | 99.9175 | 91.7492 | 0.9063 | 0.5938 | |
Hypercube | 0.0255 | 466 | 98.3233 | 90.4290 | 2.1563 | 0.1563 | |
Hyperbox | 0.00765 | 906 | 97.306 | 91.474 | 1.5469 | 0.8125 | |
|
|||||||
Sensor24 | Hyperdiamond | 0.4450 | 2154 | 99.6425 |
|
9.2188 | 2.7969 |
Hypersphere | 0.4450 | 1700 | 98.3773 | 83.2418 | 4.7031 | 2.4063 | |
Hypercube | 0.1750 | 2276 | 99.3674 | 75.6593 | 10.0625 | 3.7969 | |
Hyperbox | 0.265 | 2711 | 99.67 | 83.132 | 7.5469 | 6.4531 | |
|
|||||||
Optdigits | Hyperdiamond | 1.99 | 3685 | 99.9738 | 97.4958 | 18.5625 | 4.813 |
Hypersphere | 2 | 1005 | 100 |
|
2.1094 | 2.4063 | |
Hypercube | 0.15 | 3823 | 100 | 96.3272 | 6.5781 | 12.2344 | |
Hyperbox | 2.1 | 2028 | 99.9738 | 98.0523 | 25.0625 | 7.1875 | |
|
|||||||
Pendigits | Hyperdiamond | 0.62 | 2334 | 99.97856 | 97.5129 | 5.2500 | 2.3281 |
Hypersphere | 0.28 | 4041 | 100 | 97.9131 | 5.0000 | 7.5000 | |
Hypercube | 0.25 | 2074 | 99.9733 | 97.5129 | 8.8594 | 4.2344 | |
Hyperbox | 0.64 | 5801 | 99.9466 |
|
2.7031 | 9.6563 | |
|
|||||||
Letter | Hyperdiamond | 0.13 | 11993 | 100 | 94.6603 | 7.0156 | 18.2188 |
Hypersphere | 0.065 | 12685 | 100 | 94.7953 | 3.1719 | 28.0469 | |
Hypercube | 0.08 | 7350 | 100 | 90.2955 | 9.2344 | 19.9063 | |
Hyperbox | 0.5 | 10427 | 98.77 | 94.5853 | 6.9844 | 36.8438 | |
|
|||||||
Shuttle | Hyperdiamond | 0.0028 | 3052 | 99.9931 |
|
47 | 10.255 |
Hypersphere | 0.0015 | 3348 | 100 |
|
36.3125 | 10.8594 | |
Hypercube | 0.00006 | 5895 | 99.9977 | 99.9379 | 58.7969 | 31.1250 | |
Hyperbox | 0.0025 | 2920 | 99.9885 | 99.9379 | 26.2688 | 28.7813 |
For the selected datasets, the optimal testing accuracies are 98.0769% (sensor2), 90.8691% (sensor4), 83.0220% (sensor24) 97.997% (optdigits), 97.799% (pendigits), 94.765% (letter), and 99.883% (shuttle) by KNN algorithms. We selected the optimal parameters that maximized the testing accuracy. DGrCs with 4 shapes are performed in the same environment, and the performance is listed in Table
The granular computing classification algorithms with different shape granules are proposed based on distance measures in the paper. Firstly, a training datum is represented as an atomic granule. Secondly, the distance measure between granules is form based on the centers and granularities of granules. Thirdly, the training process is constructed based on the union operator and the threshold of granularity jointly. Finally, the proposed granular computing classification algorithms are demonstrated by the dataset selected from references. DGrC is affected by the sequence of the training data the same as the other granular computing. For the future work, we will focus on the adaptive selection of threshold of granularities and apply the granular computing to image segmentations.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported in part by the Natural Science Foundation of China (Grant no. 61170202) and Natural Science Foundation of Henan (nos. 132300410421, 132300410422).