Tsallis entropy is a q-parameter extension of Shannon entropy. By extremizing the Tsallis entropy within the framework of fuzzy c-means clustering (FCM), a membership function similar to the statistical mechanical distribution function is obtained. The Tsallis entropy-based DA-FCM algorithm was developed by combining it with the deterministic annealing (DA) method. One of the challenges of this method is to determine an appropriate initial annealing temperature and a q value, according to the data distribution. This is complex, because the membership function changes its shape by decreasing the temperature or by increasing q. Quantitative relationships between the temperature and q are examined, and the results show that, in order to change uikq equally, inverse changes must be made to the temperature and q. Accordingly, in this paper, we propose and investigate two kinds of combinatorial methods for q-incrementation and the reduction of temperature for use in the Tsallis entropy-based FCM. In the proposed methods, q is defined as a function of the temperature. Experiments are performed using Fisher’s iris dataset, and the proposed methods are confirmed to determine an appropriate q value in many cases.
1. Introduction
Statistical mechanics investigates the macroscopic properties of a physical system consisting of multiple elements. In recent years, a popular area of research has been the application of statistical mechanical models or tools to information science.
There exists a strong relationship between the membership functions of fuzzy c-means clustering (FCM) [1] and the maximum entropy or entropy regularization methods [2, 3] and the statistical mechanical distribution functions. In other words, FCM, when regularized or maximized with a Shannon-like entropy, yields a membership function that is similar to the Boltzmann (or Gibbs) distribution function [2, 4], and when regularized or maximized with fuzzy-like entropy [5], FCM yields a membership function similar to the Fermi-Dirac distribution function [6]. These membership functions are suitable for annealing methods, because they contain a parameter corresponding to a system temperature. The advantage of using entropy maximization methods is that fuzzy clustering can be interpreted and analyzed from both statistical physical and information-processing points of view.
Tsallis [7] achieved a nonextensive extension of Boltzmann-Gibbs statistics by postulating a generalized form of entropy with a generalization parameter q, which, in the limit as q goes to 1, approaches the Shannon entropy. Tsallis entropy is applicable to numerous fields, including physics, chemistry, bioscience, networks, and computer science, and it has proved to be useful [8–10]. For example, Tsallis entropy can be applicable for attribute selection in network intrusion detection [11]. It also can be utilized as an optimization function of thresholding image segmentation [12]. In [13, 14], Menard et al. discussed fuzzy clustering in the framework of nonextensive thermostatistics. By taking the possibilistic constraint into account, the possibilistic membership function was derived, and its properties were considered from various viewpoints.
On the other hand, based on the Tsallis entropy, another form of entropy (or a measure of fuzziness) for a membership function can be defined. A form of the membership function can then be derived by extremizing (maximizing) this entropy within the framework of FCM [15]. In comparison with the conventional entropy maximization methods [2, 3], this method yields superior results [15].
Deterministic annealing (DA) [4] is a deterministic variant of simulated annealing, and it can be applied to clustering [16]. By applying DA to FCM using Tsallis entropy, a DA-FCM algorithm using Tsallis entropy has been developed [15]. As for another application example of DA, in [17], the q-parameterized DA expectation maximization algorithm is proposed.
One of the important characteristics of the membership function of this method is that centers of clusters are given as a weighted function of the membership function to the power of q(uikq). We also note that it changes its shape in a similar way by decreasing the system temperature (or annealing) or by increasing q. However, it remains unknown how appropriate q value and initial annealing temperature Thigh should be determined according to the data distribution.
The purpose of the present study is to overcome the above problem, which involves quantitative analyses of the relationships between the temperature and q, and to develop q-incrementation algorithms by integrating q and the temperature.
The analyses show that the temperature and q affect uikq almost inversely. Based on these results, we developed two kinds of q-incrementation algorithms for Tsallis entropy-based FCM, in which q is defined as a function of the temperature. These algorithms are compared with the conventional Tsallis entropy-based DA-FCM method.
In the first algorithm, q is increased so as to maintain similar shapes of uikq with the conventional T-reduction method. In the second algorithm, q is defined as an inverse of a decreasing pseudo-temperature.
Experiments are performed using Fisher’s iris dataset [18], and it was confirmed that, in many cases, appropriate q value is determined automatically from the temperature. Furthermore, the proposed methods improve the accuracy of classification and are superior to the conventional method.
However, it was also found that the number of computation iterations depends on Thigh, and sometimes it becomes greater than that of the conventional method; this suggests that Thigh should be optimized to some extent.
2. Entropy Maximization Method
Let X={x1,…,xn} (xk=(xk1,…,xkp)∈Rp) be a data set in p-dimensional real space, which is to be divided into c clusters. In addition, let V={v1,….,vc} (vi=(vi1,…,vip)) be the centers of the clusters, and let uik∈[0,1] (i=1,…,c;k=1,…,n) be the membership functions. Furthermore, let(1)J=∑k=1n∑i=1cuikmdikdik=xk-vi2;m∈R;m>1be the FCM objective function that is to be minimized.
2.1. Entropy Maximization for FCM
The Tsallis entropy is defined as(2)Sq=-1q-1∑i=1Wpiq-1∑i=1Wpi=1;q∈R,where W is the total number of microscopic possibilities of the system.
Based on (2), the entropy (or a measure of the fuzziness) of a membership function is defined as follows: (3)S=-1q-1∑k=1n∑i=1cuikq-1.
The objective function can be written as(4)J=∑k=1n∑i=1cuikqdik.
Under the normalization constraint of(5)∑i=1cuik=1∀k,the Tsallis entropy functional is given by(6)δS-∑k=1nαkδ∑i=1cuik-1-β∑k=1n∑i=1cδuikqdik,where αk and β are the Lagrange multipliers and αk must be determined so as to satisfy (5).
By extremizing (6) with respect to uik, the stationary condition yields the following membership function:(7)uik=1-β1-qdik1/1-qZ,where(8)Z=∑j=1c1-β1-qdjk1/1-q.
In the same way, the center of the cluster is given by(9)vi=∑k=1nuikqxk∑k=1nuikq.
J and S satisfy(10)S-βJ=∑k=1nZ1-q-11-q,which leads to(11)∂S∂J=β.By analogy with statistical mechanics, this relationship makes it possible to regard J as the internal energy and β-1 as an artificial system temperature T [19].
3. Dependencies of uikq on Temperature and q
In (9), uikq works as a weight value to each xk, and it determines vi. In this paper, for simplicity, vi is set to be 0. This makes the denominator of (7) become the sum of the same forms of its numerator. In Figures 1(a) and 1(b), the numerator of uikq is plotted as a function of xk, parameterized by T and q, respectively. In these figures, in order to examine the shape of uq(x) (the subscript ik is omitted in this formula) as a function of the distance between the center of the cluster and various data points, xk is considered to be a continuous variable x.
Plots of uq(x)={1-β(1-q)x2}q/(1-q), parameterized by (a) T and (b) q.
q=2.0
T=1.0
The extent of uikq becomes narrower with increasing q and as the temperature decreases, the distribution becomes narrower. This leads to q-incrementation clustering instead of annealing or T-reduction.
4. Quantitative Relationship between Temperature and q
As stated in the previous section, T and q inversely affect the extent of uikq, which changes in a similar way with increasing q or decreasing T. Accordingly, in order to examine the quantitative relationship between T and q, we change them independently, as follows.
First, we define (12)u′qx,T,q=1-1-qx2Tq/1-q.
Then, u′q(x,T,q) is calculated by fixing T and q to some constants T0 and q0. Next, by decreasing T, we determine the q values that minimize the sum of squares of the residuals of these two functions: (13)∑k=0kmaxu′qΔxk,T0,q0-u′qΔxk,T,q2.
In these calculations, the parameters are set as follows: Thigh(=T0) is set to 2.0; the domain of x is set to 0≤x≤100; the number of sampling points of the sum of residuals is 10001 (kmax and Δx in (13) are set to 10000 and 0.01, resp.).
For q(=q0) values of 1.01, 2.0, 6.0, and 10.0 and for T decreasing from Thigh, the q value that minimizes the sum of squares of the residuals (expressed by qmin) is shown in Figure 2(a).
Plots of qmin as a function of T parameterized by (a) q and (b) Thigh.
Thigh=2.0
q=2.0
Figure 2(b), on the other hand, shows the results of cases in which q is set to 2.0 and T is lowered from Thigh=2.0, 20.0, 100.0, 200.0.
Approximate curves in Figures 2(a) and 2(b) are obtained by fitting the data to the following formula:(14)qmin=aT-b,where a and b are the fitting parameters. Optimal values for these parameters are summarized in Tables 1 and 2. It was found that b is nearly equal to 1.0, suggesting that q is inversely proportional to T. In addition, it can be seen that though b does not change its value, a increases with increasing T. Accordingly, by using the relationship of T and qmin as shown in Tables 1 and 2, q-incrementation clustering is possible.
Parameters of approximate curves (Thigh=2.0).
q
a
b
1.01
2.71
1.126
2.0
4.67
1.066
6.0
12.65
1.023
10.0
20.64
1.014
Parameters of approximate curves (q=2.0).
Thigh
a
b
2.0
4.67
1.066
20.0
54.33
1.066
100.0
302.06
1.066
200.0
632.35
1.066
5. q-Incrementation FCM Algorithm
In this section, we develop q-incrementation FCM that uses Tsallis entropy instead of annealing. We begin by considering parameters a and b in (14) for q=2.0 and Thigh=2.0. In this case, q is derived from T by the following equation:(15)q=4.67T-1.066.
The temperature is held at Thigh during the clustering.
The q-incrementation FCM algorithm using Tsallis entropy maximization is presented as follows:
Set the number of clusters c, the highest temperature Thigh, the temperature reduction parameter, the thresholds of the convergence test δ1 and δ2, and the q-incrementation parameter.
Generate initial clusters at random positions and assign each data point to the nearest cluster. Set the current temperature to Thigh.
Calculate the membership function uik using (7).
Calculate the cluster centers vi using (9).
Compare the differences between the current centers and the centers of the previous iteration vi′. If the convergence condition max1≤i≤cvi-vi′<δ1 is satisfied, then go to Step (6). Otherwise, return to Step (3).
Compare the difference between the current centers and the centers obtained in the previous iteration vi′′. If the convergence condition max1≤i≤cvi-vi′′<δ2 is satisfied, then stop. Otherwise, update q using (15) and return to Step (3).
6. Experiment 1
In Experiment 1, classification results of the conventional T-reduction method and the q-incrementation methods in the previous section are compared to examine if they give similar results.
In the experiment, we used Fisher’s iris dataset [18], consisting of 150 four-dimensional vectors of iris flowers. The dataset contains three clusters of flowers: versicolor, virginica, and setosa. Each cluster consists of 50 vectors.
The parameters were set as follows:
c=3, Thigh=2.0, δ1=δ2=0.01, the initial value of q was set to 2.0, and q=2.0 was used in the conventional annealing method. For the cooling schedule of the annealing method, we used very fast annealing (VFA) [15, 20] with cd=2.0.
6.1. Classification Results of the T-Reduction and q-Incrementation Methods
The maximum, minimum, and average numbers of misclassified data points of the conventional T-reduction and the q-incrementation methods for 1000 trials are summarized in Table 3.
Maximum, minimum, and average numbers of misclassified data points.
Maximum
Minimum
Average
T-reduction
17
16
16.001
q-incrementation
12
10
11.202
The maximum, minimum, and average numbers of computation iterations required for the conventional T-reduction and the q-incrementation methods for 1000 trials are summarized in Table 4.
Maximum, minimum, and average numbers of computation iterations required.
Maximum
Minimum
Average
T-reduction
40
9
21.598
q-incrementation
34
9
18.954
Tables 3 and 4 show that the q-incrementation method reduces misclassifications and requires fewer computation iterations than does the T-reduction method; this is true even though q is increased in order to minimize the sum of squares of the residuals of u′q(x,T=const.,q) and u′q(x,T,q=const.). This suggests that there exists a significant difference between the shapes of uikq in T-reduction and those in q-incrementation.
In Figure 3, changes in u′q(x,T,q) as a function of T are plotted. As shown in this figure, q=2.23 and 54.36 were obtained from T=2.0 and 0.1, respectively, by using (15).
Plot of u′q(x,T,q), parameterized by T and q; q is determined from T by (15).
By comparing the plots of u′q(x,2.0,2.0) and u′q(x,2.0,2.23) and those of u′q(x,0.1,2.0) and u′q(x,0.1,54.36), it can be seen that when T=2.0, both u′q’s have a similar shape. However, when T=0.1 and x is large, u′q(x,0.1,54.36) has a steeper slope than does u′q(x,0.1,2.0); this results in a lack of agreement of the clustering results.
7. Modified q-Incrementation FCM Algorithm
In the previous section, it was confirmed that q-incrementation clustering is available for FCM using Tsallis entropy maximization.
In this section, we consider a very simple and general algorithm, in which T is fixed to Thigh and q is defined as the inverse of T′, where T′ is a pseudo-temperature that is decreased using the DA method. That is, q is given as(16)q=Thigh+eT′,where e is any small constant (the small constant e is added in order to prevent q from reaching 1.0 when T′=Thigh; this needs to be avoided because, in the limit of q=1.0, the Tsallis entropy equals the Shannon entropy). Steps (1), (2), and (6) in the algorithm presented in Section 5 should be changed as follows:
Set the number of clusters c, the highest temperature Thigh, the temperature reduction parameter, the thresholds of the convergence test δ1 and δ2, and the initial q value.
Generate initial clusters at random positions and assign each data point to the nearest cluster. Set the current temperature T′ to Thigh.
Compare the difference between the current centers and the centers obtained in the previous iteration vi′′. If the convergence condition max1≤i≤cvi-vi′′<δ2 is satisfied, then stop. Otherwise, decrease T′, update q using (16), and return to Step (3).
8. Experiment 2
In Experiment 2, we compare the classification results of the T-reduction method and those of the modified q-incrementation method (hereafter the proposed method) that was presented in Section 7.
Fisher’s iris dataset was clustered using the same parameters as were used in Experiment 1, with the exception that q was changed from 1.1 to 10.0 for the conventional method with Thigh=2.0, and Thigh was changed from 0.01 to 10.0 for the proposed method. VFA was used as the cooling schedule for both methods, and e in (16) was set to 0.01.
8.1. Classification Results of the T-Reduction and Modified q-Incrementation Methods
The maximum, minimum, and average numbers of misclassified data points and required number of computation iterations for 1000 trials each of the conventional T-reduction and the proposed method are summarized in Tables 5 and 6, respectively. In both cases, Thigh was set to 2.0.
Maximum, minimum, and average numbers of misclassified data points of the T-reduction method parameterized by q (Thigh=2.0).
Method
q
Maximum
Minimum
Average
T-reduction
1.1
69
17
25.116
2.0
17
16
16.002
3.0
17
14
14.949
5.0
16
13
13.021
8.0
14
11
12.017
10.0
14
11
11.063
Proposed
2.0
13
11
11.576
Maximum, minimum, and average numbers of computation iterations required for the T-reduction method parameterized by q (Thigh=2.0).
Method
q
Maximum
Minimum
Average
T-reduction
1.1
41
10
18.510
2.0
38
9
21.268
3.0
41
9
23.930
5.0
40
11
25.547
8.0
39
10
24.801
10.0
36
10
24.678
Proposed
2.0
48
8
18.865
Tables 5 and 6 show that the conventional method misclassifies fewer data points as q increases. The number of computation iterations required increases with increasing q, reaching a maximum at q=5.0.
The numbers of misclassified data points and the number of computation iterations required for the proposed method with Thigh=2.0 are close to the minimums of those numbers for the conventional method with q=10.0 and q=1.1, respectively.
In the conventional method, as q increases, the number of misclassified data points decreases; this occurs because uikq is narrowly distributed when q is large, and thus clustering is done locally and optimally. On the other hand, the number of required computation iterations tends to decrease with decreasing q, because when q is small, clustering has been done widely and efficiently.
In the proposed method, q is initially given as q=(Thigh+e)/Thigh, which is nearly equal to 1.0. Thus, at an early stage in annealing, clustering is automatically done widely. This is because the modified method does not require as many iterations.
In summary, the conventional method has an inconsistency, in that the q value that minimizes the number of misclassified data points increases the number of computation iterations that are required. However, by setting Thigh to be the same as the value used in the conventional method, the proposed method is better able than the conventional method to balance the number of misclassified data points with the number of computation iterations that are required.
8.2. Properties of the Modified q-Incrementation Method
In this subsection, we examine the reason why the proposed method has improved clustering.
Tables 7 and 8 summarize, for Thigh changing from 10.0 to 0.01, the maximum, minimum, and average numbers of misclassified data points and the number of computation iterations required for 1000 trials of the proposed method.
Maximum, minimum, and average numbers of misclassified data points with the modified q-incrementation method parameterized by Thigh.
Thigh
Maximum
Minimum
Average
10.0
49
12
12.037
8.0
12
12
12.000
5.0
13
12
12.995
2.0
13
11
11.576
1.0
12
11
11.000
0.1
12
12
12.000
0.01
12
12
12.000
Maximum, minimum, and average numbers of computation iterations required by the modified q-incrementation method parameterized by Thigh.
Thigh
Maximum
Minimum
Average
10.0
70
4
50.494
8.0
80
20
39.937
5.0
42
13
21.638
2.0
48
8
18.865
1.0
39
17
22.262
0.1
41
19
27.170
0.01
39
21
27.906
In Table 8, the number of computation iterations required by the proposed method decreases with decreasing Thigh until Thigh=2.0. After that point, it begins to increase, suggesting that there exist minima in Thigh. The reasons for this property are considered to be as follows. When Thigh is as high as 10.0, the relative size of q is too small to change the shape of uikq, and this increases the required number of computation iterations. However, when Thigh=0.01, the width of uikq is very narrow, and a long time is required for the centers of the clusters to converge. For these reasons, we assume there is at least one minimum in Thigh.
As stated in the previous subsection, the proposed method can limit the number of misclassified data points to as few as 11 to 13 points. Thus, we conclude that Thigh does not significantly affect the number of misclassifications.
In summary, our experiments confirmed that when q increases as the inverse of the decreasing pseudo-temperature, the proposed method works at least as well as the conventional method. However, the number of computation iterations required by the proposed method apparently depends on the value of Thigh, and it is not yet known how to determine an appropriate value of Thigh for a given dataset.
9. Conclusion
We formulated the membership function of Tsallis entropy-based FCM by maximizing the Tsallis entropy functional. In this formulation, the q-parameter of the Tsallis entropy strongly affects the accuracy of the clustering.
In order to determine an appropriate value of q for a given data distribution, it is first necessary to examine quantitatively the effect of q on the extent of uikq. We determined that, in order to minimize the square of the residual of uikq for the T-reduction and q-incrementation, q must be increased as the inverse of the temperature.
Based on this relationship, we proposed two kinds of q-incrementation methods and combined them with the DA method. In the first method, q is increased according to the approximation function that minimizes the square of the residual of uikq. The experimental results show that, compared to the conventional annealing method, the proposed method reduces both the number of misclassifications and the number of required computation iterations.
In the second method, q is simply defined as the inverse of the decreasing pseudo-temperature. The experimental results reveal that, in most cases, this method determines an appropriate q value, has improved accuracy, and is superior to the conventional method.
However, the results also confirm that the number of computation iterations depends on Thigh, and, in some cases, it can become greater than that of the conventional method. This should be avoidable by using the value of Thigh that minimizes the number of iterations.
In the future, first we intend to estimate the validity of our approximation method used in Sections 3 and 4 accurately. We then intend to explore ways to increase efficiency by performing convergence and computation time tests using various formulas of q and T. Furthermore, we intend to develop a better schedule for annealing and a method for optimizing Thigh.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The present study was supported by JSPS KAKENHI Grant no. 25330297.
BezdekJ. C.1981New York, NY, USAPrenum PressLiR.-P.MukaidonoM.A maximum entropy approach to fuzzy clusteringProceedings of the 4th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE/IFES '95)199522272232MiyamotoS.MukaidonoM.Fuzzy C-means as a regularization and maximum entropy approach2Proceedings of the 7th International Fuzzy Systems Association World Congress (IFSA '97)June 1997Prague, Czech Republic8692RoseK.GurewitzE.FoxG.A deterministic annealing approach to clustering199011958959410.1016/0167-8655(90)90010-YZBL0800.688172-s2.0-0025585402De LucaA.TerminiS.A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory197220430131210.1016/S0019-9958(72)90199-4ZBL0239.940282-s2.0-0015340630YasudaM.FuruhashiT.MatsuzakiM.OkumaS.Fuzzy clustering using deterministic annealing method and its statistical mechanical characteristics3Proceedings of the 10th IEEE International Conference on Fuzzy SystemsDecember 20017978002-s2.0-0036346207TsallisC.Possible generalization of Boltzmann-Gibbs statistics1988521-247948710.1007/BF01016429ZBL1082.825012-s2.0-33646516485AbeS.OkamotoY.2001SpringerGell-MannM.TsallisC.2004New York, NY, USAOxford University PressTsallisC.2009SpringerLimaC. F. L.AssisF. M.de SouzaC. P.A comparative study of use of Shannon, Rényi and Tsallis entropy for attribute selecting in network intrusion detectionProceedings of the IEEE International Workshop on Measurements and NetworkingOctober 2011Anacapri, ItalyIEEE778210.1109/iwmn.2011.60884962-s2.0-83755228787WeiW.LinX.ZhangG.Fast image segmentation based on two-dimensional minimum Tsallis-cross entropyProceedings of the 2nd International Conference on Image Analysis and Signal Processing (IASP '10)April 201033233510.1109/iasp.2010.54761032-s2.0-77954324608MenardM.CourboulayV.DardignacP.-A.Possibilistic and probabilistic fuzzy clustering: unification within the framework of the non-extensive thermostatistics20033661325134210.1016/s0031-3203(02)00049-32-s2.0-0037410622MenardM.DardignacP.ChibelushiC. C.Non-extensive thermostatistics and extreme physical information for fuzzy clustering200424163YasudaM.Entropy maximization and very fast deterministic annealing approach to fuzzy C-means clusteringProceedings of the 5th Joint International Conference on Soft Computing and 11th International Symposium on Intelligent Systems201015151520SU-B1-3RossiF.Villa-VialaneixN.Optimizing an organized modularity measure for topographic graph clustering: a deterministic annealing approach2010737–91142116310.1016/j.neucom.2009.11.0232-s2.0-77949262999GuoW.CuiS.A q-Parameterized deterministic annealing EM algorithm based on nonextensive statistical mechanics20085673069308010.1109/TSP.2008.9196402-s2.0-46649099085FisherR. A.The use of multiple measurements in taxonomic problems19367217918810.1111/j.1469-1809.1936.tb02137.xReichlL. E.1998New York, NY, USAJohn Wiley & SonsRosenB. E.Function optimization based on advanced simulated annealingProceedings of the IEEE Workshop on Physics and Computation (PhysComp '92)October 1992IEEE28929310.1109/PHYCMP.1992.615559