Deterministic Annealing Approach to Fuzzy C-Means Clustering Based on EntropyMaximization

This paper is dealing with the fuzzy clustering method which combines the deterministic annealing (DA) approach with an entropy, especially the Shannon entropy and the Tsallis entropy. By maximizing the Shannon entropy, the fuzzy entropy, or the Tsallis entropy within the framework of the fuzzy c-means (FCM) method, membership functions similar to the statistical mechanical distribution functions are obtained. We examine characteristics of these entropy-based membership functions from the statistical mechanical point of view. After that, both the Shannonand Tsallis-entropy-based FCMs are formulated as DA clustering using the very fast annealing (VFA) method as a cooling schedule. Experimental results indicate that the Tsallis-entropy-based FCM is stable with very fast deterministic annealing and suitable for this annealing process.


Introduction
Statistical mechanics investigates the macroscopic properties of a physical system consisting of several elements.Recently, research activities that attempt to apply statistical mechanical models or tools to information science have become popular.
The deterministic annealing (DA) method [1] is a deterministic variant of the simulated annealing (SA) method [2].DA characterizes the minimization problem of the cost function as the minimization of the free energy, which depends on temperature and tracks its minimum while decreasing the temperature, and thus it can deterministically optimize the cost function at each temperature.Hence, DA is more efficient than SA, but does not guarantee a global optimal solution.
There exists a strong relationship between the membership functions of the fuzzy c-means (FCM) clustering [3] with the maximum entropy or entropy regularization methods [4,5] and the statistical mechanical distribution function.That is, FCM regularized with the Shannon entropy gives a membership function similar to the Boltzmann (or Gibbs) distribution function [1,4], and FCM regularized with the fuzzy entropy [6] gives a membership function similar to the Fermi-Dirac distribution function [7].These membership functions are suitable for the annealing methods because they contain a parameter corresponding to the system temperature.
Tsallis [8] achieved nonextensive extension of the Boltzmann-Gibbs statistics.Tsallis postulated a generalization form of entropy with a generalization parameter q, which, in a limit of q → 1, reaches the Shannon entropy.Later on, Ménard et al. [9] derived a membership function by regularizing FCM with the Tsallis entropy.
In this study, the membership function which takes the familiar form of the statistical mechanical distribution function is derived by maximizing the Shannon and fuzzy entropy within the framework of FCM.Similarly, the Tsallis entropy-based FCM membership function is derived [10,11] by maximizing the Tsallis entropy.Then, the formulations of the free energy for these membership functions are calculated and examined from the statistical mechanical viewpoint.
On the other hand, there are some representative cooling schedules of the temperature for SA; for example, inversely proportional to a logarithmic function and inversely proportional to exponential function are well adopted.Rosen [12] proposed the more effective method for SA known as very fast annealing (VFA).

Advances in Fuzzy Systems
However, an applicability of VFA to DA is not known yet.In order to achieve good clustering by DA, a reliable annealing process is desirable.Therefore, by introducing VFA to DA, we formulate the Shannon-and Tsallis-entropy based FCMs as very fast DA clustering, to examine their reliabilities.
Experiments are performed on the numerical and iris data [13], and the obtained results indicate that Tsallisentropy-based FCM clustering is suitable for very fast DA clustering because of its shape of the membership function.

Entropy Maximization Method
) be a data set in the p-dimensional real space, which should be divided into c clusters.In addition, let )) be the centers of clusters, and let u ik ∈ [0, 1] (i = 1, . . ., c; k = 1, . . ., n) be the membership functions.Furthermore, let be the objective function of FCM, where 2.1.Shannon Entropy Maximization of FCM.First, we introduce the Shannon entropy into the FCM clustering.The Shannon entropy is given by Under the normalization constraint of and setting m to 1, the fuzzy entropy functional is given by where α k and β are the Lagrange multipliers and α k must be determined so as to satisfy (3).The stationary condition for (4) leads to the following membership function (5) and the cluster centers

Fuzzy Entropy Maximization of FCM.
We then introduce the fuzzy entropy into the FCM clustering.The fuzzy entropy is given by The fuzzy entropy functional is given by where α k and β are the Lagrange multipliers [14].The stationary condition for (8) leads to the following membership function: and the cluster centers In (9), β defines the extent of the distribution [7].Equation ( 9) is formally normalized as

Tsallis Entropy Maximization of FCM.
Let v i and u ik be the centers of clusters and the membership functions, respectively.The Tsallis entropy is defined as where q ∈ R is any real number.The objective function is rewritten as where Accordingly, the Tsallis entropy functional is given by The stationary condition for ( 14) yields the following membership function: where In this case, the cluster centers are given by In the limit of q → 1, the Tsallis entropy recovers the Shannon entropy [8] and u ik approaches u ik in (5).

Statistical Mechanical Interpretation of Entropy-Based FCM
3.1.Shannon-Entropy-Based FCM Statistics.In the Shannonentropy-based FCM, the sum of the states (the partition function) for the grand canonical ensemble of fuzzy clustering can be written as By substituting (18) for F = −(1/β)(log Z) [15], the free energy becomes Stable thermal equilibrium requires a minimization of the free energy.By formulating deterministic annealing as a minimization of the free energy, ∂F/∂v i = 0 yields This cluster center is the same as that in (6).

Fuzzy-Entropy-Based FCM Statistics.
In the fuzzyentropy based FCM, by analogy with statistical mechanics, the grand partition function for the grand canonical ensemble of fuzzy clustering can be written as because data can belong to any cluster.By substituting (21) for [15], the free energy becomes It should be noted that J m=1 − T S, the Legendre transform of the fuzzy entropy, gives the same form for the free energy.
which leads to Equation ( 24) makes it possible to regard β −1 as an artificial system temperature T [15].Then, the free energy can be Advances in Fuzzy Systems Tsallis-entropy-based FCM Shannon-entropy-based FCM defined as U can be derived from F as

Dependency of Shapes of Membership Functions on
Temperature.By reducing the temperature according to the annealing schedule, the deterministic annealing method achieves thermal equilibrium which minimizes the free energy.At absolute zero, the particle system settles down to the ground state, that is, the state of minimum energy.
Figure 1 shows the forms of the entropy functions S, S, and S. Figure 2 shows the forms of the membership functions u ik , u ik , and u ik .
In the deterministic annealing method, cluster distribution which minimizes the free energy is searched at the given temperature.At high temperature, the membership functions are widely distributed and clusters to which a data belongs are fuzzy.In case of u ik with q = 2, the width of the distribution is roughly proportional to β −0.5 .At the limit of low temperature, on the other hand, fuzzy clustering reaches hard clustering.The relationship F = U − TS suggests that the higher temperature causes the larger entropy state, that is, chaotic state.This increase of the entropy is the result of the extent of the membership function.
In Figure 2, it can be seen that u ik has a flat peak, though both u ik and u ik have Gaussian forms.Also, it can be found that u ik has a more gentle base slope than u ik .

Cooling Schedule
4.2.1.Representative Annealing Methods.In SA, the temperature decreases according to a cooling schedule.The representative cooling schedules [16] for SA are (i) proportional to an exponential function where T high is a sufficiently high initial temperature, r is a parameter which defines a temperature reduction speed, and t is the number of iterations, (ii) inversely linear function (iii) inversely proportional to a logarithmic function (iv) inversely proportional to exponential function T high e r −1 t . (31)

Very Fast Annealing. Rosen proposed another inversely proportional to exponential function known as very fast annealing (VFA).
In VFA, T is given by where m is a temperature reduction parameter and D is a dimension of a state space.Equations ( 31) and ( 32) are compared in Figure 3.It is observed that VFA initially decreases a temperature extremely.
In Section 6, we apply VFA as a cooling schedule of entropy based FCM clustering using DA.

Fuzzy C-Means as Clustering Algorithm Using Very Fast Annealing DA
The very fast deterministic annealing algorithm for the Tsallis-entropy-based FCM is given as follows.
(1) Set the number of clusters c, the highest temperature T high , the temperature reduction rate m, and the threshold of convergence test δ 1 and δ 2 ; (2) generate initial clusters at random positions and assign each data point to the nearest cluster.Set current temperature T to T high ; Advances in Fuzzy Systems (3) calculate u ik by ( 15); (4) calculate cluster centers by (17); (5) compare the difference between the current centers and those obtained at the previous iteration v i .If the convergence condition If max 1≤i≤c v i − v i < δ 1 is satisfied, then go to (6), otherwise go back to (3); Otherwise decrease the temperature with (32) and go back to (3).

Experiment 1.
In experiment 1, we generated five randomly placed clusters composed of 2,000 data points shown in Figure 4. We set c to be 10, δ 1 to be 50, and δ 2 to be 2 (measured by the scale of Figure 4).We also set T high = 1.0×10 6 First, we have applied the inversely exponential scheduling method to the Tsallis-entropy-based FCM clustering.The cooling schedule is illustrated in Figure 5.The changes of β are parameterized by the temperature reduction rate r: from 1 to 1000.
At the higher levels of T (Figure 5 (A)), clusters are created near the center of gravity of data because β is comparatively small and the membership function extends over the whole data area and is extremely uniform.As T is lowered from Figure 5 (B) to (C), the width of the membership functions becomes narrower; that is, the Tsallis entropy decreases, and the associations become less fuzzy.And finally, the desired result is obtained.
In case of r = 10 or r = 1 (Figure 5 (E) or (F)), it is observed that u ik and v i converge more rapidly.In case of T high = 1.0 × 10 6 (β = 1.0 × 10 −6 ), however, the initial distribution of u ik becomes too wide and the algorithm is not converged with r = 100 and r = 10 (indicated by "not converged" in Figure 5).Thus, it is important to set T high and r values properly.
To examine the effectiveness of VFA as a cooling schedule of DA, we made numerical experiments of the Shannon-and Tsallis-entropy-based FCM clustering.
The shifts of cluster centers with decreasing temperature are illustrated in Figures 7 and 8.
Initially, clusters are located randomly.At the higher levels of T, clusters move to near the center of gravity of data because β is comparatively small and the membership function extends over the whole data area and is extremely uniform.
As T is lowered, the width of the membership functions becomes narrower and the associations of data become less fuzzy.In this process, in the Shannon-entropy-based FCM clustering, the clusters move to their nearest local data distribution centers.However, in the Tsallis-entropy-based FCM clustering, clusters can move a long distance to optimal positions because of their gentle base slopes.
Figures 9 and 10 illustrate the three-dimensional plots of u ik and u ik in the progress of very fast DA clustering.
At the higher temperature, roughness of u ik is smaller than that of u ik .After that, the shapes of both membership functions do not change greatly, because VFA reduces the temperature extremely only at the early annealing stage.
Consequently, because the Tsallis-entropy-based FCM has gentle slope in the region far from the origin, clusters can move long distance to optimal positions stably and the temperature can be reduced rapidly.This feature makes it possible to use VFA as a cooling schedule of DA for the Tsallis-entropy-based FCM.On the other hand, final cluster positions obtained by the Shannon-entropy-based FCM tend to depend on their initial positions.The Shannon-and Tsallis-entropy-based FCM with DA are examined.VFA is used as a cooling schedule of DA.We set the parameters as follows: c = 3, T high = 2, δ 1 = 0.1, δ 2 = 0.01, and q = 1.5.
The minimum, maximum, and average values of misclassified data of 100 trials are summarized in Table 1.The Shannon-entropy-based FCM gives slightly better results than the Tsallis-entropy-based FCM.However, it is found that the Tsallis-entropy-based FCM gives the best results when the temperature reduction rate m = 1.0 or 2.0, though the best results for the Shannon-entropy-based FCM are obtained only when m = 2.0.Furthermore, variances of the Tsallis-entropy-based FCM are smaller than those of the Shannon-entropy-based FCM.These features indicate that a wide range of m values are applicable to Tsallis-entropybased FCM.
Figure 6 shows the reduction of the objective values of the Tsallis-and Shannon-entropy-based FCM with decreasing the temperature by VFA.The Shannon-entropy-based FCM does not converge properly when T = 0.023 for m = 2.0 and T = 0.029 for m = 3.0.That is, with larger m values, the Shannon-entropy-based FCM becomes unstable.

Conclusion
By maximizing the Tsallis-entropy, the membership function of the Tsallis-entropy-based FCM is formulated.It has a more gentle base slope in the region far from the origin than that of the Shannon-and fuzzy-entropy-based FCMs.This feature allows clusters to move long distance and the temperature can be reduced rapidly in the Tsallis-entropy-based FCM.
Next, the deterministic annealing (DA) method using very fast annealing (VFA) as its cooling schedule is applied to the Tsallis-entropy-based FCM.VFA initially decreases a temperature extremely, and experimental results showed that the Tsallis-entropy-based FCM was suitable for DA combined with VFA.
Our future works include convergence and computational time test under various conditions (temperatures and parameters, especially q-value) of the Tsallis-entropy-based FCM using very fast deterministic annealing.They also include experiments and examinations of its applications.

Figure 2 :
Figure 2: The plots of the membership functions u ik , u i k and u ik at (a) high and (b) low temperature (n = 1, c = 2, q = 1.5, α k = −2).

Figure 5 :
Figure 5: The inversely exponential cooling schedule of DA.The temperature decreases from T high .(Inverse of temperature β increases from 1.0 × 10 −5 or 1.0 × 10 −6 to 1.0 × 10 −3 ) The curves are parameterized by the temperature reduction rate r.

3. 3 .
The Tsallis-Entropy-Based FCM Statistics.On the other hand, U and S satisfy

Figure 6 :
Figure 6: Reduction of the objective values for iris data with decreasing the temperature by VFA.The curves are parameterized by the temperature reduction rate m.

Figure 7 :
Figure 7: The shifts of cluster centers of Shannon entropy based clustering with decreasing the temperature by VFA (m = 1.0, 2.0, 3.0).

Table 1 :
Comparison of minimum, maximum, and average values of misclassified iris data (100 trials).